Statistics and Machine Learning in Mass Spectrometry-Based Metabolomics Analysis.

Cluster Analysis Mass Spectrometry / methods Metabolomics / methods Quality Control

Imputation Integrative analysis Mass spectrometry Metabolomics Normalization Statistical and machine learning

Journal

Methods in molecular biology (Clifton, N.J.)

ISSN: 1940-6029

Titre abrégé: Methods Mol Biol

Pays: United States

ID NLM: 9214969

Informations de publication

Date de publication:
2023

Historique:

entrez: 17 3 2023

pubmed: 18 3 2023

medline: 22 3 2023

Statut: ppublish

Résumé

In this chapter, we review the cutting-edge statistical and machine learning methods for missing value imputation, normalization, and downstream analyses in mass spectrometry metabolomics studies, with illustration by example datasets. The missing peak recovery includes simple imputation by zero or limit of detection, regression-based or distribution-based imputation, and prediction by random forest. The batch effect can be removed by data-driven methods, internal standard-based, and quality control sample-based normalization. We also summarize different types of statistical analysis for metabolomics and clinical outcomes, such as inference on metabolic biomarkers, clustering of metabolomic profiles, metabolite module building, and integrative analysis with transcriptome.

Identifiants

DOI: 10.1007/978-1-0716-2986-4_12 PMID: 36929081

pubmed: 36929081

doi: 10.1007/978-1-0716-2986-4_12

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

247-269

Informations de copyright

Références

Barupal DK et al (2018) Generation and quality control of lipidomics data for the Alzheimer’s disease neuroimaging initiative cohort. Scientific Data 5(1):1–13

doi: 10.1038/sdata.2018.263

Troyanskaya O et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525

doi: 10.1093/bioinformatics/17.6.520 pubmed: 11395428

Hu L-Y, Huang M-W, Ke S-W, Tsai C-F (2016) The distance function effect on k-nearest neighbor classification for medical datasets. SpringerPlus 5(1):1–9

doi: 10.1186/s40064-016-2941-7

Kim H, Golub GH, Park H (2005) Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187–198

doi: 10.1093/bioinformatics/bth499 pubmed: 15333461

Lee JY, Styczynski MP (2018) NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data. Metabolomics 14(12):1–12

doi: 10.1007/s11306-018-1451-8

Shah JS et al (2017) Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies. BMC Bioinf 18(1):1–13

doi: 10.1186/s12859-017-1547-6

Nounou MN, Bakshi BR, Goel PK, Shen X (2002) Bayesian principal component analysis. Journal of Chemometrics: A Journal of the Chemometrics Society 16(11):576–595

doi: 10.1002/cem.759

Li Q et al (2020) GMSimpute: a generalized two-step lasso approach to impute missing values in label-free mass spectrum analysis. Bioinformatics 36(1):257–263

doi: 10.1093/bioinformatics/btz488 pubmed: 31199438

Kumar N, Hoque M, Sugimoto M et al (2021) Kernel weighted least square approach for imputing missing values of metabolomics data. Sci Rep 11(1):1–12

Bromke MA et al (2015) Metabolomic profiling of 13 diatom cultures and their adaptation to nitrate-limited growth conditions. PloS One 10(10):e0138965

doi: 10.1371/journal.pone.0138965 pubmed: 26440112 pmcid: 4595471

Yang S, Sadilek M, Lidstrom ME (2010) Streamlined pentafluorophenylpropyl column liquid chromatography–tandem quadrupole mass spectrometry and global 13c-labeled internal standards improve performance for quantitative metabolomics in bacteria. J Chromatogr A 1217(47):7401–7410

doi: 10.1016/j.chroma.2010.09.055 pubmed: 20950815 pmcid: 3007600

Boysen AK, Heal KR, Carlson LT, Ingalls AE (2018) Best-matched internal standard normalization in liquid chromatography–mass spectrometry metabolomics applied to environmental samples. Anal Chem 90(2):1363–1369

doi: 10.1021/acs.analchem.7b04400 pubmed: 29239170

Sysi-Aho M, Katajamaa M, Yetukuri L, Orešič M (2007) Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinf 8(1):1–17

doi: 10.1186/1471-2105-8-93

Li B et al (2017) NOREVA: normalization and evaluation of MS-based metabolomics data. Nucleic Acids Res 45(W1):W162–W170

doi: 10.1093/nar/gkx449 pubmed: 28525573 pmcid: 5570188

Fan S et al (2019) Systematic error removal using random forest for normalizing large-scale untargeted lipidomics data. Anal Chem 91(5):3590–3596

doi: 10.1021/acs.analchem.8b05592 pubmed: 30758187 pmcid: 9652764

Storey JD, Xiao W, Leek JT, Tompkins RG, Davis RW (2005) Significance analysis of time course microarray experiments. Proc Natl Acad Sci 102(36):12837–12842

doi: 10.1073/pnas.0504609102 pubmed: 16141318 pmcid: 1201697

Orešič M et al (2012) Phospholipids and insulin resistance in psychosis: a lipidomics study of twin pairs discordant for schizophrenia. Genome Med 4(1):1–11

doi: 10.1186/gm300 pubmed: 22257447 pmcid: 3334549

Li Q et al (2021) Plasma metabolome and circulating vitamins stratified onset age of an initial islet autoantibody and progression to type 1 diabetes: the teddy study Diabetes 70(1):282–292

doi: 10.2337/db20-0696 pubmed: 33106256

Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinf 9(1):1–13

doi: 10.1186/1471-2105-9-559

Tang X et al (2014) A joint analysis of metabolomics and genetics of breast cancer. Breast Cancer Res 16(4):1–15

doi: 10.1186/s13058-014-0415-9

Langfelder P, Zhang B, Horvath S (2008) Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for R. Bioinformatics 24(5):719–720

doi: 10.1093/bioinformatics/btm563 pubmed: 18024473

Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57(1):289–300

Wilson CM et al (2020) Ontogeny related changes in the pediatric liver metabolome. Front Pediatr 8:549

doi: 10.3389/fped.2020.00549 pubmed: 33117761 pmcid: 7550739

Meier R et al (2018) Ontogeny-related pharmacogene changes in the pediatric liver transcriptome. Pharmacogenet Genomics 28(3):86

doi: 10.1097/FPC.0000000000000326 pubmed: 29360682

Statistics and Machine Learning in Mass Spectrometry-Based Metabolomics Analysis.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Sili Fan (S)

Christopher M Wilson (CM)

Brooke L Fridley (BL)

Qian Li (Q)

Articles similaires

Clustering based on renal and inflammatory admission parameters in critically ill patients admitted to the ICU.

Insect diversity estimation in polarimetric lidar.

Integrated multi-omics revealed that dysregulated lipid metabolism played an important role in RA patients with metabolic diseases.

Quantitative proteomics and multi-omics analysis identifies potential biomarkers and the underlying pathological molecular networks in Chinese patients with multiple sclerosis.

Classifications MeSH