Identifying the Best Approximating Model in Bayesian Phylogenetics: Bayes Factors, Cross-Validation or wAIC?


Journal

Systematic biology
ISSN: 1076-836X
Titre abrégé: Syst Biol
Pays: England
ID NLM: 9302532

Informations de publication

Date de publication:
17 Jun 2023
Historique:
received: 25 04 2022
revised: 20 01 2023
accepted: 17 02 2023
medline: 19 6 2023
pubmed: 23 2 2023
entrez: 22 2 2023
Statut: ppublish

Résumé

There is still no consensus as to how to select models in Bayesian phylogenetics, and more generally in applied Bayesian statistics. Bayes factors are often presented as the method of choice, yet other approaches have been proposed, such as cross-validation or information criteria. Each of these paradigms raises specific computational challenges, but they also differ in their statistical meaning, being motivated by different objectives: either testing hypotheses or finding the best-approximating model. These alternative goals entail different compromises, and as a result, Bayes factors, cross-validation, and information criteria may be valid for addressing different questions. Here, the question of Bayesian model selection is revisited, with a focus on the problem of finding the best-approximating model. Several model selection approaches were re-implemented, numerically assessed and compared: Bayes factors, cross-validation (CV), in its different forms (k-fold or leave-one-out), and the widely applicable information criterion (wAIC), which is asymptotically equivalent to leave-one-out cross-validation (LOO-CV). Using a combination of analytical results and empirical and simulation analyses, it is shown that Bayes factors are unduly conservative. In contrast, CV represents a more adequate formalism for selecting the model returning the best approximation of the data-generating process and the most accurate estimates of the parameters of interest. Among alternative CV schemes, LOO-CV and its asymptotic equivalent represented by the wAIC, stand out as the best choices, conceptually and computationally, given that both can be simultaneously computed based on standard Markov chain Monte Carlo runs under the posterior distribution. [Bayes factor; cross-validation; marginal likelihood; model comparison; wAIC.].

Identifiants

pubmed: 36810802
pii: 7050004
doi: 10.1093/sysbio/syad004
pmc: PMC10276628
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

616-638

Subventions

Organisme : Agence Nationale de la Recherche
ID : ANR-22-CE02-0027
Organisme : high-performance computing
Organisme : Centre Informatique National de l'Enseignement Superieur
ID : A0040310449
Organisme : Grand Équipement National de Calcul Intensif
Organisme : Pôle Rhône-Alpes de Bioinformatique, Laboratoire de Biométrie et Biologie Évolutve

Informations de copyright

© The Author(s) 2023. Published by Oxford University Press on behalf of the Society of Systematic Biologists.

Références

Genetics. 1998 Mar;148(3):929-36
pubmed: 9539414
Commun Biol. 2021 Feb 24;4(1):244
pubmed: 33627766
Mol Biol Evol. 2020 Feb 1;37(2):549-562
pubmed: 31688943
Mol Biol Evol. 2013 Feb;30(2):239-43
pubmed: 23090976
BMC Bioinformatics. 2013 Mar 06;14:85
pubmed: 23497171
Syst Biol. 2013 Jul;62(4):611-5
pubmed: 23564032
Mol Biol Evol. 2018 May 1;35(5):1266-1283
pubmed: 29688541
Neural Netw. 2001 Oct;14(8):1049-60
pubmed: 11681750
J Mol Evol. 2022 Dec;90(6):468-475
pubmed: 36207534
Curr Biol. 2017 Apr 3;27(7):958-967
pubmed: 28318975
Mol Biol Evol. 2004 Jun;21(6):1123-33
pubmed: 15034130
Biostatistics. 2008 Jul;9(3):523-39
pubmed: 18209015
Mol Biol Evol. 2011 Jan;28(1):729-44
pubmed: 20926596
Syst Biol. 2016 Mar;65(2):228-49
pubmed: 26493827
Bioinformatics. 2013 Aug 15;29(16):1970-9
pubmed: 23766415
Psychol Methods. 2012 Jun;17(2):228-43
pubmed: 22309957
Philos Trans R Soc Lond B Biol Sci. 2008 Apr 27;363(1496):1463-72
pubmed: 18192187
Bioinformatics. 2009 Sep 1;25(17):2286-8
pubmed: 19535536
Syst Biol. 2018 Jul 1;67(4):616-632
pubmed: 29309694
Syst Biol. 2017 Jan 01;66(1):57-73
pubmed: 28173531
Syst Biol. 2003 Oct;52(5):649-64
pubmed: 14530132
Mol Biol Evol. 2011 Jan;28(1):523-32
pubmed: 20801907
Mol Biol Evol. 2008 Jul;25(7):1307-20
pubmed: 18367465
Syst Biol. 2011 Mar;60(2):150-60
pubmed: 21187451
Mol Biol Evol. 2005 May;22(5):1208-22
pubmed: 15703242
Mol Biol Evol. 2004 Jun;21(6):1095-109
pubmed: 15014145
J Mol Evol. 1993 Feb;36(2):182-98
pubmed: 7679448
Nature. 2011 Feb 10;470(7333):255-8
pubmed: 21307940
Mol Biol Evol. 2006 Nov;23(11):2058-71
pubmed: 16931538
Mol Biol Evol. 2001 Jun;18(6):1001-13
pubmed: 11371589
Mol Biol Evol. 2001 May;18(5):691-9
pubmed: 11319253
Syst Biol. 2014 May;63(3):309-21
pubmed: 24193892
Syst Biol. 2006 Apr;55(2):195-207
pubmed: 16522570
Mol Biol Evol. 2005 Dec;22(12):2472-9
pubmed: 16107592
Syst Biol. 2004 Aug;53(4):571-81
pubmed: 15371247
Syst Biol. 2012 Jan;61(1):12-21
pubmed: 21873377
Ecology. 2014 Mar;95(3):631-6
pubmed: 24804445
Mol Biol Evol. 2005 May;22(5):1246-53
pubmed: 15703236
BMC Evol Biol. 2007 Feb 08;7 Suppl 1:S4
pubmed: 17288577
Mol Biol Evol. 2020 Dec 16;37(12):3616-3631
pubmed: 32877529
Neural Netw. 2010 Jan;23(1):20-34
pubmed: 19700261
Syst Biol. 2019 Sep 1;68(5):681-697
pubmed: 30668834
Syst Biol. 2017 Jul 01;66(4):517-530
pubmed: 28003531
Proc Natl Acad Sci U S A. 2015 Dec 15;112(50):15402-7
pubmed: 26621703
Mol Biol Evol. 2012 Sep;29(9):2157-67
pubmed: 22403239
Comput Appl Biosci. 1992 Jun;8(3):275-82
pubmed: 1633570

Auteurs

Nicolas Lartillot (N)

Université de Lyon, Université Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Villeurbanne, France.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Amaryllidaceae Alkaloids Lycoris NADPH-Ferrihemoprotein Reductase Gene Expression Regulation, Plant Plant Proteins
Drought Resistance Gene Expression Profiling Gene Expression Regulation, Plant Gossypium Multigene Family

Classifications MeSH