Statistics and Machine Learning in Mass Spectrometry-Based Metabolomics Analysis.

Imputation Integrative analysis Mass spectrometry Metabolomics Normalization Statistical and machine learning

Journal

Methods in molecular biology (Clifton, N.J.)
ISSN: 1940-6029
Titre abrégé: Methods Mol Biol
Pays: United States
ID NLM: 9214969

Informations de publication

Date de publication:
2023
Historique:
entrez: 17 3 2023
pubmed: 18 3 2023
medline: 22 3 2023
Statut: ppublish

Résumé

In this chapter, we review the cutting-edge statistical and machine learning methods for missing value imputation, normalization, and downstream analyses in mass spectrometry metabolomics studies, with illustration by example datasets. The missing peak recovery includes simple imputation by zero or limit of detection, regression-based or distribution-based imputation, and prediction by random forest. The batch effect can be removed by data-driven methods, internal standard-based, and quality control sample-based normalization. We also summarize different types of statistical analysis for metabolomics and clinical outcomes, such as inference on metabolic biomarkers, clustering of metabolomic profiles, metabolite module building, and integrative analysis with transcriptome.

Identifiants

pubmed: 36929081
doi: 10.1007/978-1-0716-2986-4_12
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

247-269

Informations de copyright

© 2023. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.

Références

Barupal DK et al (2018) Generation and quality control of lipidomics data for the Alzheimer’s disease neuroimaging initiative cohort. Scientific Data 5(1):1–13
doi: 10.1038/sdata.2018.263
Troyanskaya O et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525
doi: 10.1093/bioinformatics/17.6.520 pubmed: 11395428
Hu L-Y, Huang M-W, Ke S-W, Tsai C-F (2016) The distance function effect on k-nearest neighbor classification for medical datasets. SpringerPlus 5(1):1–9
doi: 10.1186/s40064-016-2941-7
Kim H, Golub GH, Park H (2005) Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187–198
doi: 10.1093/bioinformatics/bth499 pubmed: 15333461
Lee JY, Styczynski MP (2018) NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data. Metabolomics 14(12):1–12
doi: 10.1007/s11306-018-1451-8
Shah JS et al (2017) Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies. BMC Bioinf 18(1):1–13
doi: 10.1186/s12859-017-1547-6
Nounou MN, Bakshi BR, Goel PK, Shen X (2002) Bayesian principal component analysis. Journal of Chemometrics: A Journal of the Chemometrics Society 16(11):576–595
doi: 10.1002/cem.759
Li Q et al (2020) GMSimpute: a generalized two-step lasso approach to impute missing values in label-free mass spectrum analysis. Bioinformatics 36(1):257–263
doi: 10.1093/bioinformatics/btz488 pubmed: 31199438
Kumar N, Hoque M, Sugimoto M et al (2021) Kernel weighted least square approach for imputing missing values of metabolomics data. Sci Rep 11(1):1–12
Bromke MA et al (2015) Metabolomic profiling of 13 diatom cultures and their adaptation to nitrate-limited growth conditions. PloS One 10(10):e0138965
doi: 10.1371/journal.pone.0138965 pubmed: 26440112 pmcid: 4595471
Yang S, Sadilek M, Lidstrom ME (2010) Streamlined pentafluorophenylpropyl column liquid chromatography–tandem quadrupole mass spectrometry and global 13c-labeled internal standards improve performance for quantitative metabolomics in bacteria. J Chromatogr A 1217(47):7401–7410
doi: 10.1016/j.chroma.2010.09.055 pubmed: 20950815 pmcid: 3007600
Boysen AK, Heal KR, Carlson LT, Ingalls AE (2018) Best-matched internal standard normalization in liquid chromatography–mass spectrometry metabolomics applied to environmental samples. Anal Chem 90(2):1363–1369
doi: 10.1021/acs.analchem.7b04400 pubmed: 29239170
Sysi-Aho M, Katajamaa M, Yetukuri L, Orešič M (2007) Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinf 8(1):1–17
doi: 10.1186/1471-2105-8-93
Li B et al (2017) NOREVA: normalization and evaluation of MS-based metabolomics data. Nucleic Acids Res 45(W1):W162–W170
doi: 10.1093/nar/gkx449 pubmed: 28525573 pmcid: 5570188
Fan S et al (2019) Systematic error removal using random forest for normalizing large-scale untargeted lipidomics data. Anal Chem 91(5):3590–3596
doi: 10.1021/acs.analchem.8b05592 pubmed: 30758187 pmcid: 9652764
Storey JD, Xiao W, Leek JT, Tompkins RG, Davis RW (2005) Significance analysis of time course microarray experiments. Proc Natl Acad Sci 102(36):12837–12842
doi: 10.1073/pnas.0504609102 pubmed: 16141318 pmcid: 1201697
Orešič M et al (2012) Phospholipids and insulin resistance in psychosis: a lipidomics study of twin pairs discordant for schizophrenia. Genome Med 4(1):1–11
doi: 10.1186/gm300 pubmed: 22257447 pmcid: 3334549
Li Q et al (2021) Plasma metabolome and circulating vitamins stratified onset age of an initial islet autoantibody and progression to type 1 diabetes: the teddy study Diabetes 70(1):282–292
doi: 10.2337/db20-0696 pubmed: 33106256
Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinf 9(1):1–13
doi: 10.1186/1471-2105-9-559
Tang X et al (2014) A joint analysis of metabolomics and genetics of breast cancer. Breast Cancer Res 16(4):1–15
doi: 10.1186/s13058-014-0415-9
Langfelder P, Zhang B, Horvath S (2008) Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for R. Bioinformatics 24(5):719–720
doi: 10.1093/bioinformatics/btm563 pubmed: 18024473
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57(1):289–300
Wilson CM et al (2020) Ontogeny related changes in the pediatric liver metabolome. Front Pediatr 8:549
doi: 10.3389/fped.2020.00549 pubmed: 33117761 pmcid: 7550739
Meier R et al (2018) Ontogeny-related pharmacogene changes in the pediatric liver transcriptome. Pharmacogenet Genomics 28(3):86
doi: 10.1097/FPC.0000000000000326 pubmed: 29360682

Auteurs

Sili Fan (S)

Graduate Group of Biostatistics, University of California, Davis, CA, USA.

Christopher M Wilson (CM)

Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA.

Brooke L Fridley (BL)

Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA.

Qian Li (Q)

Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, USA. Qian.Li@stjude.org.

Articles similaires

Humans Male Female Intensive Care Units COVID-19

Insect diversity estimation in polarimetric lidar.

Dolores Bernenko, Meng Li, Hampus Månefjord et al.
1.00
Animals Biodiversity Insecta Algorithms Cluster Analysis
Humans Arthritis, Rheumatoid Lipid Metabolism Male Female

Classifications MeSH