Gradients Do Grow on Trees: A Linear-Time O(N)-Dimensional Gradient for Statistical Phylogenetics.

Bayesian inference linear-time gradient algorithm maximum likelihood random-effects molecular clock model

Journal

Molecular biology and evolution
ISSN: 1537-1719
Titre abrégé: Mol Biol Evol
Pays: United States
ID NLM: 8501455

Informations de publication

Date de publication:
01 10 2020
Historique:
pubmed: 28 5 2020
medline: 16 4 2021
entrez: 28 5 2020
Statut: ppublish

Résumé

Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order O(N)-dimensional gradient calculations based on the standard pruning algorithm require O(N2) operations, where N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend toward even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make these analyses tractable, we present a linear-time algorithm for O(N)-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus, and Lassa virus. Our proposed algorithm significantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.

Identifiants

pubmed: 32458974
pii: 5847600
doi: 10.1093/molbev/msaa130
pmc: PMC7530611
doi:

Types de publication

Evaluation Study Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

3047-3060

Subventions

Organisme : NIAID NIH HHS
ID : U19 AI135995
Pays : United States
Organisme : NIAID NIH HHS
ID : R01 AI107034
Pays : United States
Organisme : NIAID NIH HHS
ID : K25 AI153816
Pays : United States
Organisme : Wellcome Trust
ID : 206298/Z/17/Z
Pays : United Kingdom
Organisme : Wellcome Trust
Pays : United Kingdom

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Références

Virus Evol. 2018 Jun 08;4(1):vey016
pubmed: 29942656
Genome Res. 1998 Mar;8(3):222-33
pubmed: 9521926
Syst Biol. 2002 Oct;51(5):703-14
pubmed: 12396585
Mol Biol Evol. 2012 Jun;29(6):1533-43
pubmed: 22319149
BMC Biol. 2010 Aug 31;8:114
pubmed: 20807414
Proc Natl Acad Sci U S A. 2017 Apr 18;114(16):4055-4059
pubmed: 28396438
Science. 2019 Jan 4;363(6422):74-77
pubmed: 30606844
Mol Biol Evol. 1998 Dec;15(12):1647-57
pubmed: 9866200
Mol Biol Evol. 2010 Aug;27(8):1877-85
pubmed: 20203288
Nat Rev Genet. 2005 Aug;6(8):654-62
pubmed: 16136655
Bioinformatics. 2005 Feb 15;21(4):456-63
pubmed: 15608047
Philos Trans R Soc Lond B Biol Sci. 2016 Jul 19;371(1699):
pubmed: 27325829
PLoS Negl Trop Dis. 2014 Apr 17;8(4):e2769
pubmed: 24743730
Syst Biol. 2006 Apr;55(2):314-28
pubmed: 16611602
Stat Appl Genet Mol Biol. 2012 Sep 25;11(4):Article 14
pubmed: 23023698
J Mol Evol. 1980 Dec;16(2):111-20
pubmed: 7463489
Mol Biol Evol. 2000 Jul;17(7):1081-90
pubmed: 10889221
Proc Natl Acad Sci U S A. 2012 Sep 11;109(37):15066-71
pubmed: 22927414
Proc Biol Sci. 2015 Dec 22;282(1821):20142878
pubmed: 26702033
Mol Ecol. 2014 Dec;23(24):5947-65
pubmed: 25290107
J Mol Evol. 2000 Nov;51(5):423-32
pubmed: 11080365
Syst Biol. 2018 Sep 1;67(5):901-904
pubmed: 29718447
Mol Biol Evol. 2006 Jan;23(1):7-9
pubmed: 16177232
J Mol Evol. 1981;17(6):368-76
pubmed: 7288891
PLoS Biol. 2006 May;4(5):e88
pubmed: 16683862
Syst Biol. 2007 Jun;56(3):453-66
pubmed: 17558967
Genetics. 2000 Apr;154(4):1879-92
pubmed: 10747076
J Mol Evol. 1994 Sep;39(3):306-14
pubmed: 7932792
Virology. 2005 Nov 25;342(2):252-65
pubmed: 16137736
Nature. 2016 Feb 11;530(7589):228-232
pubmed: 26840485
Science. 2001 Dec 14;294(5550):2310-4
pubmed: 11743192
Mol Biol Evol. 2001 Mar;18(3):352-61
pubmed: 11230536
Syst Biol. 2015 Sep;64(5):709-26
pubmed: 25999395
Virus Evol. 2019 Sep 05;5(2):vez036
pubmed: 31720009
Cell. 2015 Aug 13;162(4):738-50
pubmed: 26276630
J Mol Evol. 1996 May;42(5):587-96
pubmed: 8662011
Syst Biol. 2019 Nov 1;68(6):1052-1061
pubmed: 31034053

Auteurs

Xiang Ji (X)

Department of Biomathematics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA.
Department of Mathematics, School of Science & Engineering, Tulane University, New Orleans, LA.

Zhenyu Zhang (Z)

Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA.

Andrew Holbrook (A)

Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA.

Akihiko Nishimura (A)

Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD.

Guy Baele (G)

Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium.

Andrew Rambaut (A)

Institute of Evolutionary Biology, Centre for Immunology, Infection and Evolution, University of Edinburgh, Edinburgh, United Kingdom.

Philippe Lemey (P)

Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium.

Marc A Suchard (MA)

Department of Biomathematics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA.
Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA.
Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Humans Interferon Type I Ross River virus Encephalitis, Tick-Borne Antibodies, Neutralizing

Classifications MeSH