Random-effects substitution models for phylogenetics via scalable gradient approximations.

Bayesian inference Hamiltonian Monte Carlo Phylogeography

Journal

Systematic biology
ISSN: 1076-836X
Titre abrégé: Syst Biol
Pays: England
ID NLM: 9302532

Informations de publication

Date de publication:
07 May 2024
Historique:
received: 24 03 2023
medline: 7 5 2024
pubmed: 7 5 2024
entrez: 7 5 2024
Statut: aheadofprint

Résumé

Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.

Identifiants

pubmed: 38712512
pii: 7665881
doi: 10.1093/sysbio/syae019
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2024. Published by Oxford University Press on behalf of the Society of Systematic Biologists.

Auteurs

Andrew F Magee (AF)

Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA, USA.

Andrew J Holbrook (AJ)

Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA, USA.

Jonathan E Pekar (JE)

Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA.
Department of Biomedical Informatics, University of California San Diega, La Jolla, CA, USA.

Itzue W Caviedes-Solis (IW)

Department of Biology, Swarthmore College, Swarthmore, PA, USA.

Fredrick A Matsen Iv (FA)

Howard Hughes Medical Institute, Seattle, Washington, USA.
Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
Department of Genome Sciences, University of Washington, Seattle, Washington, USA.
Department of Statistics, University of Washington, Seattle, Washington, USA.

Guy Baele (G)

Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium.

Joel O Wertheim (JO)

Department of Medicine, University of California San Diego, La Jolla, CA, USA.

Xiang Ji (X)

Department of Mathematics, Tulane University, New Orleans, LA, USA.

Philippe Lemey (P)

Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium.

Marc A Suchard (MA)

Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA, USA.
Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California Los Angeles, Los Angeles, CA, USA.
Department of Human Genetics, David Geffen School of Medicine at UCLA, Universtiy of California Los Angeles, Los Angeles, CA, USA.

Classifications MeSH