Universal probabilistic programming offers a powerful approach to statistical phylogenetics.
Journal
Communications biology
ISSN: 2399-3642
Titre abrégé: Commun Biol
Pays: England
ID NLM: 101719179
Informations de publication
Date de publication:
24 02 2021
24 02 2021
Historique:
received:
04
07
2020
accepted:
21
01
2021
entrez:
25
2
2021
pubmed:
26
2
2021
medline:
11
8
2021
Statut:
epublish
Résumé
Statistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here, we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.
Identifiants
pubmed: 33627766
doi: 10.1038/s42003-021-01753-7
pii: 10.1038/s42003-021-01753-7
pmc: PMC7904853
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
244Commentaires et corrections
Type : ErratumIn
Références
Felsenstein, J. Inferring Phylogenies (Sinauer Associates, 2003).
Yang, Z. Molecular Evolution: A Statistical Approach (Oxford University Press, 2014).
Nascimento, F. F., dos Reis, M. & Yang, Z. A biologist’s guide to Bayesian phylogenetic analysis. Nat. Ecol. Evol. 1, 1446–1454 (2017).
pubmed: 28983516
pmcid: 5624502
doi: 10.1038/s41559-017-0280-x
Höhna, S. et al. Probabilistic graphical model representation in phylogenetics. Syst. Biol. 63, 753–771 (2014).
pubmed: 24951559
pmcid: 4184382
doi: 10.1093/sysbio/syu039
Höhna, S. et al. RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Syst. Biol. 65, 726–736 (2016).
pubmed: 27235697
pmcid: 4911942
doi: 10.1093/sysbio/syw021
Fourment, M. & Darling, A. E. Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics. PeerJ 7, e8272 (2019).
pubmed: 31976168
pmcid: 6966998
doi: 10.7717/peerj.8272
Bouchard-Côté, A. et al. Blang: Bayesian declarative modelling of arbitrary data structures. Preprint at https://arxiv.org/abs/1912.10396 (2019).
Kozen, D. Semantics of probabilistic programs. In 20th Annual Symposium on Foundations of Computer Science, 101–114 (San Juan, IEEE, 1979).
Goodman, N. D. & Stuhlmüller, A. The design and implementation of probabilistic programming languages. http://dippl.org (2014). Accessed 5 Dec 2020.
Wood, F., Meent, J. W. & Mansinghka, V. A new approach to probabilistic programming inference. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, 1024–1032 (Reykjavík, JMLR: W&CP, 2014).
Mansinghka, V., Selsam, D. & Perov, Y. Venture: a higher-order probabilistic programming platform with programmable inference. Preprint at https://arxiv.org/abs/1404.0099 (2014).
Ritchie, D., Stuhlmüller, A. & Goodman, N. C3: Lightweight incrementalized MCMC for probabilistic programs using continuations and callsite caching. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 28–37 (Cadiz, JMLR: W&CP, 2016).
Murray, L. M., Lundén, D., Kudlicka, J., Broman, D. & Schön, T. B. Delayed sampling and automatic Rao-Blackwellization of probabilistic programs. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, Vol. 21, 10 (Lanzarote, PMLR, 2018).
Murray, L. M. & Schön, T. B. Automated learning with a probabilistic programming language: Birch. Annu. Rev. Control 46, 29–43 (2018).
doi: 10.1016/j.arcontrol.2018.10.013
Maliet, O., Hartig, F. & Morlon, H. A model with many small shifts for estimating species-specific diversification rates. Nat. Ecolo. Evol. 3, 1086–1092 (2019).
doi: 10.1038/s41559-019-0908-0
Höhna, S. et al. A Bayesian approach for estimating branch-specific speciation and extinction rates. Preprint at https://biorxiv.org/content/10.1101/555805v1 (2019).
Rabosky, D. L. Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PLoS ONE 9, e89543 (2014).
pubmed: 24586858
pmcid: 3935878
doi: 10.1371/journal.pone.0089543
Moore, B. R., Höhna, S., May, M. R., Rannala, B. & Huelsenbeck, J. P. Critically evaluating the theory and performance of Bayesian analysis of macroevolutionary mixtures. Proc. Natl Acad. Sci. USA 113, 9569–9574 (2016).
pubmed: 27512038
pmcid: 5003228
doi: 10.1073/pnas.1518659113
Yule, G. U. A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FRS. Philos. Trans. R. Soc. Lond. B 213, 21–87 (1924).
Nee, S. Birth-death models in macroevolution. Annu. Rev. Ecol. Evol. Syst. 37, 1–17 (2006).
doi: 10.1146/annurev.ecolsys.37.091305.110035
Feller, W. Die Grundlagen der Volterraschen Theorie des Kampfes ums Dasein in wahrscheinlichkeitstheoretischer Behandlung. Acta Biotheor. 5, 11–40 (1939).
doi: 10.1007/BF01602932
Kendall, D. G. On the generalized “birth-and-death” process. Ann. Math. Stat. 19, 1–15 (1948).
doi: 10.1214/aoms/1177730285
Moen, D. & Morlon, H. Why does diversification slow down? Trends Ecol. Evol. 29, 190–197 (2014).
pubmed: 24612774
doi: 10.1016/j.tree.2014.01.010
Rabosky, D. L. et al. BAMMtools: an R package for the analysis of evolutionary dynamics on phylogenetic trees. Methods Ecol. Evol. 5, 701–707 (2014).
doi: 10.1111/2041-210X.12199
Maliet, O. & Morlon, H. Fast and accurate estimation of species-specific diversification rates using data augmentation. Preprint at https://doi.org/10.1101/2020.11.03.365155v1 (2020).
Morlon, H. et al. RPANDA: an R package for macroevolutionary analyses on phylogenetic trees. Methods Ecol. Evol. 7, 589–597 (2016).
doi: 10.1111/2041-210X.12526
Hamze, F. & de Freitas, N. Hot coupling: a particle approach to inference and normalization on pairwise undirected graphs. in Advances in Neural Information Processing Systems 18 (eds Weiss, Y., Schölkopf, B. & Platt, J. C.) 491–498 (MIT Press, 2006).
Andersson Naesseth, C., Lindsten, F. & Schön, T. B. Sequential Monte Carlo for graphical models. in Advances in Neural Information Processing Systems 27 (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D. & Weinberger, K. Q.) 1862–1870 (Curran Associates, Inc., 2014).
Gelman, A. & Meng, X.-L. Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Stat. Sci. 13, 163–185 (1998).
doi: 10.1214/ss/1028905934
Lartillot, N. & Philippe, H. Computing Bayes factors using thermodynamic integration. Syst. Biol. 55, 195–207 (2006).
pubmed: 16522570
doi: 10.1080/10635150500433722
Neal, R. M. Annealed importance sampling. Stat. Comput. 11, 125–139 (2001).
doi: 10.1023/A:1008923215028
Xie, W., Lewis, P. O., Fan, Y., Kuo, L. & Chen, M. -H. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst. Biol. 60, 150–160 (2011).
pubmed: 21187451
doi: 10.1093/sysbio/syq085
Doucet, A. & Johansen, A. A tutorial on particle filtering and smoothing: fifteen years later. in The Oxford Handbook of Nonlinear Filtering (eds Crisan, D. & Rozowskii, B.) chapter 24, 656–704 (Oxford University Press, 2008).
Doucet, A. & Lee, A. Sequential Monte Carlo methods. in Handbook of Graphical Models (eds Maathuis, M., Drton, M., Lauritzen, S., Wainwright, M.) chapter 7, pages 165–188 (CRC Press, 2019).
Naesseth, C. A., Lindsten, F. & Schön, T. B. Elements of Sequential Monte Carlo. Found. Trends Mach. Learn. 12, 307–392 (2019).
doi: 10.1561/2200000074
Lundén, D., Broman, D., Ronquist, F. & Murray, L. M. Automatic alignment of Sequential Monte Carlo inference in higher-order probabilistic programs. Preprint at https://arxiv.org/abs/1812.07439 (2018).
Kudlicka, J., Murray, L. M., Ronquist, F. & Schön, T. B. Probabilistic programming for birth-death models of evolution using an alive particle filter with delayed sampling. In Proceedings of the Conference on Uncertainty in Artificial Intelligence 2019, Vol. 2019, 11 (Tel Aviv, AUAI, 2019).
Rabosky, D. L., Mitchell, J. S. & Chang, J. Is BAMM flawed? Theoretical and practical concerns in the analysis of multi-rate diversification models. Syst. Biol. 66, 477–498 (2017).
pubmed: 28334223
pmcid: 5790138
doi: 10.1093/sysbio/syx037
Pyron, R. A. & Burbrink, F. T. Phylogenetic estimates of speciation and extinction rates for testing ecological and evolutionary hypotheses. Trends Ecol. Evol. 28, 729–736 (2013).
pubmed: 24120478
doi: 10.1016/j.tree.2013.09.007
Höhna, S., Stadler, T., Ronquist, F. & Britton, T. Inferring speciation and extinction rates under different sampling schemes. Mol. Biol. Evol. 28, 2577–2589 (2011).
pubmed: 21482666
doi: 10.1093/molbev/msr095
Rosindell, J., Cornell, S. J., Hubbell, S. P. & Etienne, R. S. Protracted speciation revitalizes the neutral theory of biodiversity. Ecol. Lett. 13, 716–727 (2010).
pubmed: 20584169
doi: 10.1111/j.1461-0248.2010.01463.x
Rabosky, D. L. Extinction rates should not be estimated from molecular phylogenies. Evolution 64, 1816–1824 (2010).
pubmed: 20030708
doi: 10.1111/j.1558-5646.2009.00926.x
Morlon, H., Parsons, T. L. & Plotkin, J. B. Reconciling molecular phylogenies with the fossil record. Proc. Natl Acad. Sci. USA 108, 16327–16332 (2011).
pubmed: 21930899
pmcid: 3182748
doi: 10.1073/pnas.1102543108
Baele, G., Dellicour, S., Suchard, M. A., Lemey, P. & Vrancken, B. Recent advances in computational phylodynamics. Curr. Opin. Virol. 31, 24–32 (2018).
pubmed: 30248578
doi: 10.1016/j.coviro.2018.08.009
Braga, M. P., Landis, M. J., Nylin, S., Janz, N. & Ronquist, F. Bayesian inference of ancestral host-parasite interactions under a phylogenetic model of host repertoire evolution. Syst. Biol. 69, 1149–1162 (2020).
pubmed: 32191324
pmcid: 7584141
doi: 10.1093/sysbio/syaa019
Ronquist, F. & Sanmartín, I. Phylogenetic methods in biogeography. Annu. Rev. Ecol. Evol. Syst. 42, 441–464 (2011).
doi: 10.1146/annurev-ecolsys-102209-144710
Matzke, N. J. Model selection in historical biogeography reveals that founder-event speciation is a crucial process in island clades. Syst. Biol. 63, 951–970 (2014).
pubmed: 25123369
doi: 10.1093/sysbio/syu056
Landis, M. J., Matzke, N. J., Moore, B. R. & Huelsenbeck, J. P. Bayesian analysis of biogeography when the number of areas is large. Syst. Biol. 62, 789–804 (2013).
pubmed: 23736102
pmcid: 4064008
doi: 10.1093/sysbio/syt040
Ree, R. H. & Sanmartín, I. Conceptual and statistical problems with the DEC+J model of founder-event speciation and its comparison with DEC via model selection. J. Biogeogr. 45, 741–749 (2018).
doi: 10.1111/jbi.13173
Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).
pubmed: 7288891
doi: 10.1007/BF01734359
Lakner, C., van der Mark, P., Huelsenbeck, J. P., Larget, B. & Ronquist, F. Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. Syst. Biol. 57, 86–103 (2008).
pubmed: 18278678
doi: 10.1080/10635150801886156
Bouchard-Côté, A., Sankararaman, S. & Jordan, M. I. Phylogenetic inference via Sequential Monte Carlo. Syst. Biol. 61, 579–593 (2012).
pubmed: 22223445
pmcid: 3376373
doi: 10.1093/sysbio/syr131
Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A. & Blei, D. M. Automatic differentiation variational inference. J. Mach. Learn. Res. 18, 1–45 (2017).
Hoffman, M. D. & Gelman, A. The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1593–1623 (2014).
Syed, S., Bouchard-Côté, A., Deligiannidis, G. & Doucet, A. Non-reversible parallel tempering: a scalable highly parallel MCMC scheme. Preprint at http://arxiv.org/abs/1905.02939 (2019).
Zhou, Y., Johansen, A. M. & Aston, J. A. Toward automatic model comparison: an adaptive Sequential Monte Carlo approach. J. Comput. Graph. Stat. 25, 701–726 (2016).
doi: 10.1080/10618600.2015.1060885
Dinh, V., Bilge, A., Zhang, C. & Matsen, F. A. Probabilistic path Hamiltonian Monte Carlo. In Proceedings of the 34th International Conference on Machine Learning, 1–10 (Sydney, PMLR, 2017).
Wang, L., Wang, S. & Bouchard-Côté, A. An annealed Sequential Monte Carlo method for Bayesian phylogenetics. Syst. Biol. 69, 155–183 (2020).
pubmed: 31173141
doi: 10.1093/sysbio/syz028
Carpenter, B. et al. Stan: a probabilistic programming language. J. Stat. Softw. 76, 1–32 (2017).
Salvatier, J., Wiecki, T. V. & Fonnesbeck, C. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2, e55 (2016).
doi: 10.7717/peerj-cs.55
Tran, D. et al. Edward: a library for probabilistic modeling, inference, and criticism. Preprint at https://arxiv.org/abs/1610.09787 (2016).
Bingham, E. et al. Pyro: deep universal probabilistic programming. J. Mach. Learn. Res. 20, 1–6 (2019).
Stadler, T. On incomplete sampling under birth-death models and connections to the sampling-based coalescent. J. Theor. Biol. 261, 58–66 (2009).
pubmed: 19631666
doi: 10.1016/j.jtbi.2009.07.018
Lundén, D., Borgström, J. & Broman, D. Correctness of Sequential Monte Carlo inference for probabilistic programming languages. Preprint at https://arxiv.org/abs/2003.05191 (2020).
Jetz, W., Thomas, G. H., Joy, J. B., Hartmann, K. & Mooers, A. O. The global diversity of birds in space and time. Nature 491, 444–448 (2012).
pubmed: 23123857
doi: 10.1038/nature11631
Hunter, J. D. Matplotlib: a 2D graphics environment. Comp. Sci. Eng. 9, 90–95 (2007).
doi: 10.1109/MCSE.2007.55
Jeffreys, H. The Theory of Probability (Oxford University Press, 1961).