Universal probabilistic programming offers a powerful approach to statistical phylogenetics.


Journal

Communications biology
ISSN: 2399-3642
Titre abrégé: Commun Biol
Pays: England
ID NLM: 101719179

Informations de publication

Date de publication:
24 02 2021
Historique:
received: 04 07 2020
accepted: 21 01 2021
entrez: 25 2 2021
pubmed: 26 2 2021
medline: 11 8 2021
Statut: epublish

Résumé

Statistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here, we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.

Identifiants

pubmed: 33627766
doi: 10.1038/s42003-021-01753-7
pii: 10.1038/s42003-021-01753-7
pmc: PMC7904853
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

244

Commentaires et corrections

Type : ErratumIn

Références

Felsenstein, J. Inferring Phylogenies (Sinauer Associates, 2003).
Yang, Z. Molecular Evolution: A Statistical Approach (Oxford University Press, 2014).
Nascimento, F. F., dos Reis, M. & Yang, Z. A biologist’s guide to Bayesian phylogenetic analysis. Nat. Ecol. Evol. 1, 1446–1454 (2017).
pubmed: 28983516 pmcid: 5624502 doi: 10.1038/s41559-017-0280-x
Höhna, S. et al. Probabilistic graphical model representation in phylogenetics. Syst. Biol. 63, 753–771 (2014).
pubmed: 24951559 pmcid: 4184382 doi: 10.1093/sysbio/syu039
Höhna, S. et al. RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Syst. Biol. 65, 726–736 (2016).
pubmed: 27235697 pmcid: 4911942 doi: 10.1093/sysbio/syw021
Fourment, M. & Darling, A. E. Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics. PeerJ 7, e8272 (2019).
pubmed: 31976168 pmcid: 6966998 doi: 10.7717/peerj.8272
Bouchard-Côté, A. et al. Blang: Bayesian declarative modelling of arbitrary data structures. Preprint at https://arxiv.org/abs/1912.10396 (2019).
Kozen, D. Semantics of probabilistic programs. In 20th Annual Symposium on Foundations of Computer Science, 101–114 (San Juan, IEEE, 1979).
Goodman, N. D. & Stuhlmüller, A. The design and implementation of probabilistic programming languages. http://dippl.org (2014). Accessed 5 Dec 2020.
Wood, F., Meent, J. W. & Mansinghka, V. A new approach to probabilistic programming inference. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, 1024–1032 (Reykjavík, JMLR: W&CP, 2014).
Mansinghka, V., Selsam, D. & Perov, Y. Venture: a higher-order probabilistic programming platform with programmable inference. Preprint at https://arxiv.org/abs/1404.0099 (2014).
Ritchie, D., Stuhlmüller, A. & Goodman, N. C3: Lightweight incrementalized MCMC for probabilistic programs using continuations and callsite caching. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 28–37 (Cadiz, JMLR: W&CP, 2016).
Murray, L. M., Lundén, D., Kudlicka, J., Broman, D. & Schön, T. B. Delayed sampling and automatic Rao-Blackwellization of probabilistic programs. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, Vol. 21, 10 (Lanzarote, PMLR, 2018).
Murray, L. M. & Schön, T. B. Automated learning with a probabilistic programming language: Birch. Annu. Rev. Control 46, 29–43 (2018).
doi: 10.1016/j.arcontrol.2018.10.013
Maliet, O., Hartig, F. & Morlon, H. A model with many small shifts for estimating species-specific diversification rates. Nat. Ecolo. Evol. 3, 1086–1092 (2019).
doi: 10.1038/s41559-019-0908-0
Höhna, S. et al. A Bayesian approach for estimating branch-specific speciation and extinction rates. Preprint at https://biorxiv.org/content/10.1101/555805v1 (2019).
Rabosky, D. L. Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PLoS ONE 9, e89543 (2014).
pubmed: 24586858 pmcid: 3935878 doi: 10.1371/journal.pone.0089543
Moore, B. R., Höhna, S., May, M. R., Rannala, B. & Huelsenbeck, J. P. Critically evaluating the theory and performance of Bayesian analysis of macroevolutionary mixtures. Proc. Natl Acad. Sci. USA 113, 9569–9574 (2016).
pubmed: 27512038 pmcid: 5003228 doi: 10.1073/pnas.1518659113
Yule, G. U. A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FRS. Philos. Trans. R. Soc. Lond. B 213, 21–87 (1924).
Nee, S. Birth-death models in macroevolution. Annu. Rev. Ecol. Evol. Syst. 37, 1–17 (2006).
doi: 10.1146/annurev.ecolsys.37.091305.110035
Feller, W. Die Grundlagen der Volterraschen Theorie des Kampfes ums Dasein in wahrscheinlichkeitstheoretischer Behandlung. Acta Biotheor. 5, 11–40 (1939).
doi: 10.1007/BF01602932
Kendall, D. G. On the generalized “birth-and-death” process. Ann. Math. Stat. 19, 1–15 (1948).
doi: 10.1214/aoms/1177730285
Moen, D. & Morlon, H. Why does diversification slow down? Trends Ecol. Evol. 29, 190–197 (2014).
pubmed: 24612774 doi: 10.1016/j.tree.2014.01.010
Rabosky, D. L. et al. BAMMtools: an R package for the analysis of evolutionary dynamics on phylogenetic trees. Methods Ecol. Evol. 5, 701–707 (2014).
doi: 10.1111/2041-210X.12199
Maliet, O. & Morlon, H. Fast and accurate estimation of species-specific diversification rates using data augmentation. Preprint at https://doi.org/10.1101/2020.11.03.365155v1 (2020).
Morlon, H. et al. RPANDA: an R package for macroevolutionary analyses on phylogenetic trees. Methods Ecol. Evol. 7, 589–597 (2016).
doi: 10.1111/2041-210X.12526
Hamze, F. & de Freitas, N. Hot coupling: a particle approach to inference and normalization on pairwise undirected graphs. in Advances in Neural Information Processing Systems 18 (eds Weiss, Y., Schölkopf, B. & Platt, J. C.) 491–498 (MIT Press, 2006).
Andersson Naesseth, C., Lindsten, F. & Schön, T. B. Sequential Monte Carlo for graphical models. in Advances in Neural Information Processing Systems 27 (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D. & Weinberger, K. Q.) 1862–1870 (Curran Associates, Inc., 2014).
Gelman, A. & Meng, X.-L. Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Stat. Sci. 13, 163–185 (1998).
doi: 10.1214/ss/1028905934
Lartillot, N. & Philippe, H. Computing Bayes factors using thermodynamic integration. Syst. Biol. 55, 195–207 (2006).
pubmed: 16522570 doi: 10.1080/10635150500433722
Neal, R. M. Annealed importance sampling. Stat. Comput. 11, 125–139 (2001).
doi: 10.1023/A:1008923215028
Xie, W., Lewis, P. O., Fan, Y., Kuo, L. & Chen, M. -H. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst. Biol. 60, 150–160 (2011).
pubmed: 21187451 doi: 10.1093/sysbio/syq085
Doucet, A. & Johansen, A. A tutorial on particle filtering and smoothing: fifteen years later. in The Oxford Handbook of Nonlinear Filtering (eds Crisan, D. & Rozowskii, B.) chapter 24, 656–704 (Oxford University Press, 2008).
Doucet, A. & Lee, A. Sequential Monte Carlo methods. in Handbook of Graphical Models (eds Maathuis, M., Drton, M., Lauritzen, S., Wainwright, M.) chapter 7, pages 165–188 (CRC Press, 2019).
Naesseth, C. A., Lindsten, F. & Schön, T. B. Elements of Sequential Monte Carlo. Found. Trends Mach. Learn. 12, 307–392 (2019).
doi: 10.1561/2200000074
Lundén, D., Broman, D., Ronquist, F. & Murray, L. M. Automatic alignment of Sequential Monte Carlo inference in higher-order probabilistic programs. Preprint at https://arxiv.org/abs/1812.07439 (2018).
Kudlicka, J., Murray, L. M., Ronquist, F. & Schön, T. B. Probabilistic programming for birth-death models of evolution using an alive particle filter with delayed sampling. In Proceedings of the Conference on Uncertainty in Artificial Intelligence 2019, Vol. 2019, 11 (Tel Aviv, AUAI, 2019).
Rabosky, D. L., Mitchell, J. S. & Chang, J. Is BAMM flawed? Theoretical and practical concerns in the analysis of multi-rate diversification models. Syst. Biol. 66, 477–498 (2017).
pubmed: 28334223 pmcid: 5790138 doi: 10.1093/sysbio/syx037
Pyron, R. A. & Burbrink, F. T. Phylogenetic estimates of speciation and extinction rates for testing ecological and evolutionary hypotheses. Trends Ecol. Evol. 28, 729–736 (2013).
pubmed: 24120478 doi: 10.1016/j.tree.2013.09.007
Höhna, S., Stadler, T., Ronquist, F. & Britton, T. Inferring speciation and extinction rates under different sampling schemes. Mol. Biol. Evol. 28, 2577–2589 (2011).
pubmed: 21482666 doi: 10.1093/molbev/msr095
Rosindell, J., Cornell, S. J., Hubbell, S. P. & Etienne, R. S. Protracted speciation revitalizes the neutral theory of biodiversity. Ecol. Lett. 13, 716–727 (2010).
pubmed: 20584169 doi: 10.1111/j.1461-0248.2010.01463.x
Rabosky, D. L. Extinction rates should not be estimated from molecular phylogenies. Evolution 64, 1816–1824 (2010).
pubmed: 20030708 doi: 10.1111/j.1558-5646.2009.00926.x
Morlon, H., Parsons, T. L. & Plotkin, J. B. Reconciling molecular phylogenies with the fossil record. Proc. Natl Acad. Sci. USA 108, 16327–16332 (2011).
pubmed: 21930899 pmcid: 3182748 doi: 10.1073/pnas.1102543108
Baele, G., Dellicour, S., Suchard, M. A., Lemey, P. & Vrancken, B. Recent advances in computational phylodynamics. Curr. Opin. Virol. 31, 24–32 (2018).
pubmed: 30248578 doi: 10.1016/j.coviro.2018.08.009
Braga, M. P., Landis, M. J., Nylin, S., Janz, N. & Ronquist, F. Bayesian inference of ancestral host-parasite interactions under a phylogenetic model of host repertoire evolution. Syst. Biol. 69, 1149–1162 (2020).
pubmed: 32191324 pmcid: 7584141 doi: 10.1093/sysbio/syaa019
Ronquist, F. & Sanmartín, I. Phylogenetic methods in biogeography. Annu. Rev. Ecol. Evol. Syst. 42, 441–464 (2011).
doi: 10.1146/annurev-ecolsys-102209-144710
Matzke, N. J. Model selection in historical biogeography reveals that founder-event speciation is a crucial process in island clades. Syst. Biol. 63, 951–970 (2014).
pubmed: 25123369 doi: 10.1093/sysbio/syu056
Landis, M. J., Matzke, N. J., Moore, B. R. & Huelsenbeck, J. P. Bayesian analysis of biogeography when the number of areas is large. Syst. Biol. 62, 789–804 (2013).
pubmed: 23736102 pmcid: 4064008 doi: 10.1093/sysbio/syt040
Ree, R. H. & Sanmartín, I. Conceptual and statistical problems with the DEC+J model of founder-event speciation and its comparison with DEC via model selection. J. Biogeogr. 45, 741–749 (2018).
doi: 10.1111/jbi.13173
Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).
pubmed: 7288891 doi: 10.1007/BF01734359
Lakner, C., van der Mark, P., Huelsenbeck, J. P., Larget, B. & Ronquist, F. Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. Syst. Biol. 57, 86–103 (2008).
pubmed: 18278678 doi: 10.1080/10635150801886156
Bouchard-Côté, A., Sankararaman, S. & Jordan, M. I. Phylogenetic inference via Sequential Monte Carlo. Syst. Biol. 61, 579–593 (2012).
pubmed: 22223445 pmcid: 3376373 doi: 10.1093/sysbio/syr131
Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A. & Blei, D. M. Automatic differentiation variational inference. J. Mach. Learn. Res. 18, 1–45 (2017).
Hoffman, M. D. & Gelman, A. The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1593–1623 (2014).
Syed, S., Bouchard-Côté, A., Deligiannidis, G. & Doucet, A. Non-reversible parallel tempering: a scalable highly parallel MCMC scheme. Preprint at http://arxiv.org/abs/1905.02939 (2019).
Zhou, Y., Johansen, A. M. & Aston, J. A. Toward automatic model comparison: an adaptive Sequential Monte Carlo approach. J. Comput. Graph. Stat. 25, 701–726 (2016).
doi: 10.1080/10618600.2015.1060885
Dinh, V., Bilge, A., Zhang, C. & Matsen, F. A. Probabilistic path Hamiltonian Monte Carlo. In Proceedings of the 34th International Conference on Machine Learning, 1–10 (Sydney, PMLR, 2017).
Wang, L., Wang, S. & Bouchard-Côté, A. An annealed Sequential Monte Carlo method for Bayesian phylogenetics. Syst. Biol. 69, 155–183 (2020).
pubmed: 31173141 doi: 10.1093/sysbio/syz028
Carpenter, B. et al. Stan: a probabilistic programming language. J. Stat. Softw. 76, 1–32 (2017).
Salvatier, J., Wiecki, T. V. & Fonnesbeck, C. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2, e55 (2016).
doi: 10.7717/peerj-cs.55
Tran, D. et al. Edward: a library for probabilistic modeling, inference, and criticism. Preprint at https://arxiv.org/abs/1610.09787 (2016).
Bingham, E. et al. Pyro: deep universal probabilistic programming. J. Mach. Learn. Res. 20, 1–6 (2019).
Stadler, T. On incomplete sampling under birth-death models and connections to the sampling-based coalescent. J. Theor. Biol. 261, 58–66 (2009).
pubmed: 19631666 doi: 10.1016/j.jtbi.2009.07.018
Lundén, D., Borgström, J. & Broman, D. Correctness of Sequential Monte Carlo inference for probabilistic programming languages. Preprint at https://arxiv.org/abs/2003.05191 (2020).
Jetz, W., Thomas, G. H., Joy, J. B., Hartmann, K. & Mooers, A. O. The global diversity of birds in space and time. Nature 491, 444–448 (2012).
pubmed: 23123857 doi: 10.1038/nature11631
Hunter, J. D. Matplotlib: a 2D graphics environment. Comp. Sci. Eng. 9, 90–95 (2007).
doi: 10.1109/MCSE.2007.55
Jeffreys, H. The Theory of Probability (Oxford University Press, 1961).

Auteurs

Fredrik Ronquist (F)

Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden. fredrik.ronquist@nrm.se.

Jan Kudlicka (J)

Department of Information Technology, Uppsala University, Uppsala, Sweden.

Viktor Senderov (V)

Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden.

Johannes Borgström (J)

Department of Information Technology, Uppsala University, Uppsala, Sweden.

Nicolas Lartillot (N)

Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard Lyon 1, Villeurbanne, France.

Daniel Lundén (D)

Department of Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden.

Lawrence Murray (L)

Uber AI, San Francisco, CA, USA.

Thomas B Schön (TB)

Department of Information Technology, Uppsala University, Uppsala, Sweden.

David Broman (D)

Department of Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Robotic Surgical Procedures Animals Humans Telemedicine Models, Animal

Odour generalisation and detection dog training.

Lyn Caldicott, Thomas W Pike, Helen E Zulch et al.
1.00
Animals Odorants Dogs Generalization, Psychological Smell

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Classifications MeSH