Statistics and Machine Learning in Mass Spectrometry-Based Metabolomics Analysis.
Imputation
Integrative analysis
Mass spectrometry
Metabolomics
Normalization
Statistical and machine learning
Journal
Methods in molecular biology (Clifton, N.J.)
ISSN: 1940-6029
Titre abrégé: Methods Mol Biol
Pays: United States
ID NLM: 9214969
Informations de publication
Date de publication:
2023
2023
Historique:
entrez:
17
3
2023
pubmed:
18
3
2023
medline:
22
3
2023
Statut:
ppublish
Résumé
In this chapter, we review the cutting-edge statistical and machine learning methods for missing value imputation, normalization, and downstream analyses in mass spectrometry metabolomics studies, with illustration by example datasets. The missing peak recovery includes simple imputation by zero or limit of detection, regression-based or distribution-based imputation, and prediction by random forest. The batch effect can be removed by data-driven methods, internal standard-based, and quality control sample-based normalization. We also summarize different types of statistical analysis for metabolomics and clinical outcomes, such as inference on metabolic biomarkers, clustering of metabolomic profiles, metabolite module building, and integrative analysis with transcriptome.
Identifiants
pubmed: 36929081
doi: 10.1007/978-1-0716-2986-4_12
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
247-269Informations de copyright
© 2023. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.
Références
Barupal DK et al (2018) Generation and quality control of lipidomics data for the Alzheimer’s disease neuroimaging initiative cohort. Scientific Data 5(1):1–13
doi: 10.1038/sdata.2018.263
Troyanskaya O et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525
doi: 10.1093/bioinformatics/17.6.520
pubmed: 11395428
Hu L-Y, Huang M-W, Ke S-W, Tsai C-F (2016) The distance function effect on k-nearest neighbor classification for medical datasets. SpringerPlus 5(1):1–9
doi: 10.1186/s40064-016-2941-7
Kim H, Golub GH, Park H (2005) Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187–198
doi: 10.1093/bioinformatics/bth499
pubmed: 15333461
Lee JY, Styczynski MP (2018) NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data. Metabolomics 14(12):1–12
doi: 10.1007/s11306-018-1451-8
Shah JS et al (2017) Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies. BMC Bioinf 18(1):1–13
doi: 10.1186/s12859-017-1547-6
Nounou MN, Bakshi BR, Goel PK, Shen X (2002) Bayesian principal component analysis. Journal of Chemometrics: A Journal of the Chemometrics Society 16(11):576–595
doi: 10.1002/cem.759
Li Q et al (2020) GMSimpute: a generalized two-step lasso approach to impute missing values in label-free mass spectrum analysis. Bioinformatics 36(1):257–263
doi: 10.1093/bioinformatics/btz488
pubmed: 31199438
Kumar N, Hoque M, Sugimoto M et al (2021) Kernel weighted least square approach for imputing missing values of metabolomics data. Sci Rep 11(1):1–12
Bromke MA et al (2015) Metabolomic profiling of 13 diatom cultures and their adaptation to nitrate-limited growth conditions. PloS One 10(10):e0138965
doi: 10.1371/journal.pone.0138965
pubmed: 26440112
pmcid: 4595471
Yang S, Sadilek M, Lidstrom ME (2010) Streamlined pentafluorophenylpropyl column liquid chromatography–tandem quadrupole mass spectrometry and global 13c-labeled internal standards improve performance for quantitative metabolomics in bacteria. J Chromatogr A 1217(47):7401–7410
doi: 10.1016/j.chroma.2010.09.055
pubmed: 20950815
pmcid: 3007600
Boysen AK, Heal KR, Carlson LT, Ingalls AE (2018) Best-matched internal standard normalization in liquid chromatography–mass spectrometry metabolomics applied to environmental samples. Anal Chem 90(2):1363–1369
doi: 10.1021/acs.analchem.7b04400
pubmed: 29239170
Sysi-Aho M, Katajamaa M, Yetukuri L, Orešič M (2007) Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinf 8(1):1–17
doi: 10.1186/1471-2105-8-93
Li B et al (2017) NOREVA: normalization and evaluation of MS-based metabolomics data. Nucleic Acids Res 45(W1):W162–W170
doi: 10.1093/nar/gkx449
pubmed: 28525573
pmcid: 5570188
Fan S et al (2019) Systematic error removal using random forest for normalizing large-scale untargeted lipidomics data. Anal Chem 91(5):3590–3596
doi: 10.1021/acs.analchem.8b05592
pubmed: 30758187
pmcid: 9652764
Storey JD, Xiao W, Leek JT, Tompkins RG, Davis RW (2005) Significance analysis of time course microarray experiments. Proc Natl Acad Sci 102(36):12837–12842
doi: 10.1073/pnas.0504609102
pubmed: 16141318
pmcid: 1201697
Orešič M et al (2012) Phospholipids and insulin resistance in psychosis: a lipidomics study of twin pairs discordant for schizophrenia. Genome Med 4(1):1–11
doi: 10.1186/gm300
pubmed: 22257447
pmcid: 3334549
Li Q et al (2021) Plasma metabolome and circulating vitamins stratified onset age of an initial islet autoantibody and progression to type 1 diabetes: the teddy study Diabetes 70(1):282–292
doi: 10.2337/db20-0696
pubmed: 33106256
Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinf 9(1):1–13
doi: 10.1186/1471-2105-9-559
Tang X et al (2014) A joint analysis of metabolomics and genetics of breast cancer. Breast Cancer Res 16(4):1–15
doi: 10.1186/s13058-014-0415-9
Langfelder P, Zhang B, Horvath S (2008) Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for R. Bioinformatics 24(5):719–720
doi: 10.1093/bioinformatics/btm563
pubmed: 18024473
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57(1):289–300
Wilson CM et al (2020) Ontogeny related changes in the pediatric liver metabolome. Front Pediatr 8:549
doi: 10.3389/fped.2020.00549
pubmed: 33117761
pmcid: 7550739
Meier R et al (2018) Ontogeny-related pharmacogene changes in the pediatric liver transcriptome. Pharmacogenet Genomics 28(3):86
doi: 10.1097/FPC.0000000000000326
pubmed: 29360682