Machine learning approaches in microbiome research: challenges and best practices.
AutoML
colorectal cancer
feature selection
machine learning methods
microbiome data analysis
model selection
predictive modeling
preprocessing
Journal
Frontiers in microbiology
ISSN: 1664-302X
Titre abrégé: Front Microbiol
Pays: Switzerland
ID NLM: 101548977
Informations de publication
Date de publication:
2023
2023
Historique:
received:
19
07
2023
accepted:
04
09
2023
medline:
9
10
2023
pubmed:
9
10
2023
entrez:
9
10
2023
Statut:
epublish
Résumé
Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.
Identifiants
pubmed: 37808286
doi: 10.3389/fmicb.2023.1261889
pmc: PMC10556866
doi:
Types de publication
Journal Article
Review
Langues
eng
Pagination
1261889Informations de copyright
Copyright © 2023 Papoutsoglou, Tarazona, Lopes, Klammsteiner, Ibrahimi, Eckenberger, Novielli, Tonda, Simeon, Shigdel, Béreux, Vitali, Tangaro, Lahti, Temko, Claesson and Berland.
Déclaration de conflit d'intérêts
GP was directly affiliated with JADBio—Gnosis DA, S.A., which offers the JADBio service commercially. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Références
Gastroenterology. 2018 Aug;155(2):383-390.e8
pubmed: 29729257
Comput Struct Biotechnol J. 2021 Jan 27;19:1092-1107
pubmed: 33680353
Sci Rep. 2021 Feb 3;11(1):2925
pubmed: 33536501
Genome Biol. 2010;11(3):R25
pubmed: 20196867
Arthritis Rheumatol. 2023 Jan;75(1):41-52
pubmed: 35818337
Gut. 2017 Apr;66(4):633-643
pubmed: 26992426
PLoS One. 2014 Apr 08;9(4):e93827
pubmed: 24714158
Front Microbiol. 2021 Feb 22;12:635781
pubmed: 33692771
Front Microbiol. 2021 Feb 19;12:634511
pubmed: 33737920
Clin Transl Med. 2019 May 13;8(1):16
pubmed: 31081530
Bioinformatics. 2019 Mar 1;35(5):807-814
pubmed: 30816927
Nat Commun. 2022 Sep 15;13(1):5418
pubmed: 36109499
Nat Rev Gastroenterol Hepatol. 2020 Oct;17(10):635-648
pubmed: 32647386
Nat Med. 2019 Apr;25(4):679-689
pubmed: 30936547
PLoS One. 2013;8(2):e57923
pubmed: 23460914
Gastroenterology. 2021 Mar;160(4):1179-1193.e14
pubmed: 32920015
Genomics Inform. 2019 Mar;17(1):e6
pubmed: 30929407
Genome Res. 2012 Feb;22(2):299-306
pubmed: 22009989
Forensic Sci Int Genet. 2020 May;46:102257
pubmed: 32058299
Microbiome. 2017 Mar 3;5(1):27
pubmed: 28253908
J Exp Clin Cancer Res. 2020 Dec 14;39(1):285
pubmed: 33317591
Nat Biotechnol. 2023 Mar;41(3):310-313
pubmed: 36759708
Genome Biol. 2014;15(12):550
pubmed: 25516281
Stud Health Technol Inform. 2021 Oct 27;285:165-170
pubmed: 34734869
Front Microbiol. 2015 Aug 04;6:771
pubmed: 26300854
Sci Rep. 2021 Feb 25;11(1):4565
pubmed: 33633172
Front Microbiol. 2017 Nov 07;8:2114
pubmed: 29163406
BMC Genomics. 2020 Jan 2;21(1):6
pubmed: 31898477
Brain Inform. 2022 Jul 26;9(1):17
pubmed: 35882684
BMC Bioinformatics. 2009 Jan 26;10:34
pubmed: 19171069
Nat Rev Gastroenterol Hepatol. 2017 Oct;14(10):585-595
pubmed: 28790452
PLoS One. 2020 Feb 13;15(2):e0228899
pubmed: 32053657
Cell Host Microbe. 2021 Oct 13;29(10):1573-1588.e7
pubmed: 34453895
Biol Psychiatry Glob Open Sci. 2022 Feb 10;3(2):283-291
pubmed: 37124355
Int J Radiat Oncol Biol Phys. 2020 Jul 15;107(4):736-746
pubmed: 32315676
NPJ Precis Oncol. 2022 Jun 16;6(1):38
pubmed: 35710826
Mach Learn. 2018;107(12):1895-1922
pubmed: 30393425
PLoS Comput Biol. 2015 Mar 16;11(3):e1004075
pubmed: 25775355
Front Microbiol. 2021 Oct 11;12:727398
pubmed: 34737726
BMC Genomics. 2018 Apr 20;19(1):274
pubmed: 29678163
Brief Bioinform. 2023 Mar 19;24(2):
pubmed: 36653900
Crit Care. 2004 Dec;8(6):508-12
pubmed: 15566624
Bioinformatics. 2014 Nov 15;30(22):3152-8
pubmed: 25086004
Front Microbiol. 2021 Jan 12;11:607325
pubmed: 33510727
Curr Issues Mol Biol. 2017;24:17-36
pubmed: 28686566
Nat Commun. 2020 Mar 23;11(1):1512
pubmed: 32251296
Protein Cell. 2021 May;12(5):315-330
pubmed: 32394199
Nat Methods. 2013 Dec;10(12):1200-2
pubmed: 24076764
BMC Bioinformatics. 2018 Nov 19;19(1):432
pubmed: 30453885
Brief Bioinform. 2020 Dec 1;21(6):1954-1970
pubmed: 31776547
BMC Biol. 2019 Oct 28;17(1):83
pubmed: 31660948
Front Big Data. 2022 Dec 08;5:1027783
pubmed: 36567754
Mol Nutr Food Res. 2022 Jun;66(11):e2101091
pubmed: 35312171
Microb Pathog. 2017 Sep;110:630-636
pubmed: 28739439
Genome Biol. 2011 Jun 24;12(6):R60
pubmed: 21702898
J Biomed Sci. 2022 Oct 27;29(1):88
pubmed: 36303164
Trends Biotechnol. 2017 Jun;35(6):498-507
pubmed: 28351613
Nat Microbiol. 2019 Dec;4(12):2319-2330
pubmed: 31501538
Sci Rep. 2021 Feb 4;11(1):3030
pubmed: 33542369
Bioinformatics. 2019 May 1;35(9):1544-1552
pubmed: 30252023
J Allergy Clin Immunol. 2020 Jan;145(1):16-27
pubmed: 31910984
NPJ Biofilms Microbiomes. 2020 Dec 2;6(1):60
pubmed: 33268781
Sci Rep. 2016 Mar 15;6:23075
pubmed: 26975620
PLoS One. 2015 Mar 04;10(3):e0118432
pubmed: 25738806
Genome Res. 2012 Feb;22(2):292-8
pubmed: 22009990
PLoS Comput Biol. 2014 Apr 03;10(4):e1003531
pubmed: 24699258
Eur J Clin Microbiol Infect Dis. 2014 Aug;33(8):1381-90
pubmed: 24599709
BioData Min. 2022 Jul 26;15(1):15
pubmed: 35883154
Genome Biol. 2017 Jul 27;18(1):142
pubmed: 28750650
Mol Syst Biol. 2014 Nov 28;10:766
pubmed: 25432777
Cell Host Microbe. 2014 Mar 12;15(3):382-392
pubmed: 24629344
Front Neurosci. 2021 May 28;15:674055
pubmed: 34122000