Ancestral state reconstruction with large numbers of sequences and edge-length estimation.
Ancestral state reconstruction
Consistency
Empirical Bayes estimator
Evolution
Maximum likelihood estimator
Phylogenetics
Journal
Journal of mathematical biology
ISSN: 1432-1416
Titre abrégé: J Math Biol
Pays: Germany
ID NLM: 7502105
Informations de publication
Date de publication:
21 02 2022
21 02 2022
Historique:
received:
31
03
2021
accepted:
03
01
2022
revised:
18
10
2021
entrez:
21
2
2022
pubmed:
22
2
2022
medline:
1
4
2022
Statut:
epublish
Résumé
Likelihood-based methods are widely considered the best approaches for reconstructing ancestral states. Although much effort has been made to study properties of these methods, previous works often assume that both the tree topology and edge lengths are known. In some scenarios the tree topology might be reasonably well known for the taxa under study. When sequence length is much smaller than the number of species, however, edge lengths are not likely to be accurately estimated. We study the consistency of the maximum likelihood and empirical Bayes estimators of the ancestral state of discrete traits in such settings under a star tree. We prove that the likelihood-based reconstruction is consistent under symmetric models but can be inconsistent under non-symmetric models. We show, however, that a simple consistent estimator for the ancestral states is available under non-symmetric models. The results illustrate that likelihood methods can unexpectedly have undesirable properties as the number of sequences considered gets very large. Broader implications of the results are discussed.
Identifiants
pubmed: 35188616
doi: 10.1007/s00285-022-01715-5
pii: 10.1007/s00285-022-01715-5
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
21Informations de copyright
© 2022. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
Références
Ané C (2008) Analysis of comparative data with hierarchical autocorrelation. Ann Appl Stat 2(3):1078–1102
doi: 10.1214/08-AOAS173
Bickel PJ, Doksum K (2007) Mathematical Statistics: Basic Ideas and Selected Topoics, vol I. Pearson, New Jersey
Collins TM, Wimberger PH, Naylor GJP (1994) Compositional bias, character state bias, and character-state reconstruction using parsimony. Syst Biol 43(4):482–496
doi: 10.1093/sysbio/43.4.482
Dinh V, Ho LST, Suchard MA, Matsen FA IV (2018) Consistency and convergence rate of phylogenetic inference via regularization. Ann Stat 46(4):1481
doi: 10.1214/17-AOS1592
Erdös PL, Steel MA, Székely L, Warnow TJ (1999) A few logs suffice to build (almost) all trees: Part ii. Theoret Comput Sci 221(1–2):77–118
doi: 10.1016/S0304-3975(99)00028-6
Erdős PL, Steel MA, Székely LA, Warnow TJ (1999) A few logs suffice to build (almost) all trees (i). Random Struct Algorithms 14(2):153–184
doi: 10.1002/(SICI)1098-2418(199903)14:2<153::AID-RSA3>3.0.CO;2-R
Eyre-Walker A (1998) Problems with parsimony in sequences of biased base composition. J Mol Evol 47:686–690
doi: 10.1007/PL00006427
Fan W-TL, Roch S (2018) Necessary and sufficient conditions for consistent root reconstruction in markov models on trees. Electron J Probab 23:1–24
doi: 10.1214/18-EJP165
Faria NR, Rambaut A, Suchard MA, Baele G, Bedford T, Ward MJ, Tatem AJ, Sousa JD, Arinaminpathy N, Pépin J et al (2014) The early spread and epidemic ignition of HIV-1 in human populations. Science 346(6205):56–61
doi: 10.1126/science.1256739
Felsenstein J (2004) Inferring Phylogenies. Sinauer, Massachusets
Finarelli JA, Flynn JJ (2006) Ancestral state reconstruction of body size in the Caniformia (Carnivora, Mammalia): the effects of incorporating data from the fossil record. Syst Biol 55(2):301–313
doi: 10.1080/10635150500541698
Gascuel O, Steel M (2010) Inferring ancestral sequences in taxon-rich phylogenies. Math Biosci 227(2):125–135
doi: 10.1016/j.mbs.2010.07.002
Gascuel O, Steel M (2020) A Darwinian uncertainty principle. Syst Biol 69(3):521–529
doi: 10.1093/sysbio/syz054
Gaucher EA, Thomson JM, Burgan MF, Benner SA (2003) Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins. Nature 425:285–288
doi: 10.1038/nature01977
Gill MS, Ho LST, Baele G, Lemey P, Suchard MA (2017) A relaxed directional random walk model for phylogenetic trait evolution. Syst Biol 66(3):299–319
Gojobori T, Li W-H, Graur D (1982) Patterns of nucleotide substitution in pseudogenes and functional genes. J Mol Evol 18:360–369
doi: 10.1007/BF01733904
Graybeal A (1998) Is it better to add taxa or characters to a difficulty phylogenetic problem? Syst Biol 47(1):9–17
doi: 10.1080/106351598260996
Herbst L, Fischer M (2018) On the accuracy of ancestral sequence reconstruction for ultrametric trees with parsimony. Bull Math Biol 80(4):864–879
doi: 10.1007/s11538-018-0407-5
Herbst L, Li H, Steel M (2019) Quantifying the accuracy of ancestral state prediction in a phylogenetic tree under maximum parsimony. J Math Biol 78(6):1953–1979
doi: 10.1007/s00285-019-01330-x
Ho LST, Dinh V, Nguyen CV (2019) Multi-task learning improves ancestral state reconstruction. Theor Popul Biol 126:33–39
doi: 10.1016/j.tpb.2019.01.001
Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann Math Stat 27(4):887–906
doi: 10.1214/aoms/1177728066
Lemey P, Rambaut A, Drummond AJ, Suchard MA (2009) Bayesian phylogeography finds its roots. PLoS Comput Biol 5(9):e1000520
doi: 10.1371/journal.pcbi.1000520
Liberles D (2007) Ancestral sequence reconstruction. Oxford University Press, Oxford
doi: 10.1093/acprof:oso/9780199299188.001.0001
Maddison DR (1994) Phylogenetic methods for inferring the evolutionary history and processes of change in discretely valued characters. Annu Rev Entomol 39:267–292
doi: 10.1146/annurev.en.39.010194.001411
Maddison WP (1995) Calculating the probability distributions of ancestral states reconstructed by parsimony on phylogenetic trees. Syst Biol 44(4):474–481
doi: 10.2307/2413655
Mossel E, Steel M (2014) Majority rule has transition ratio 4 on yule trees under a 2-state symmetric model. J Theor Biol 360:315–318
doi: 10.1016/j.jtbi.2014.07.029
Neyman J, Scott EL (1948) Consistent estimates based on partially consistent estimation. Econometrica 16(1):1–32
doi: 10.2307/1914288
Odom KJ, Hall ML, Riebel K, Omland KE, Langmore NE (2014) Female song is widespread and ancestral in songbirds. Nat Commun 5(1):1–6
doi: 10.1038/ncomms4379
Pollock DD, Zwickl DJ, McGuire JA, Hillis DM (2002) Increased taxon sampling is advantageous for phylogenetic inferenc. Syst Biol 51(4):664–671
doi: 10.1080/10635150290102357
Royer-Carenzi M, Pontarotti P, Didier G (2013) Choosing the best ancestral character state reconstruction method. Math Biosci 242(1):95–109
doi: 10.1016/j.mbs.2012.12.003
Shaw DA, Dinh VC, Matsen FA (2019) Joint maximum likelihood of phylogeny and ancestral states is not consistent. Mol Biol Evol 36(10):2352–2357
doi: 10.1093/molbev/msz128
Steel M, Rodrigo A (2008) Maximum likelihood supertrees. Syst Biol 57(2):243–250
doi: 10.1080/10635150802033014
Susko E, Roger AJ (2013) Problems with estimation of ancestral frequencies under stationary models. Syst Biol 62:330–338
doi: 10.1093/sysbio/sys075
Tuffley C, Steel M (1997) Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull Math Biol 59(3):581–607
doi: 10.1007/BF02459467
Yang Z (1998) On the best evolutionary rate for phylogenetic analysis. Syst Biol 47:125–133
doi: 10.1080/106351598261067
Zwickl Derrick J, Hillis DM (2002) Increased taxon sampling greatly reduces phylogenetic error. Syst Biol 51(4):588–598
doi: 10.1080/10635150290102339