CherryML: scalable maximum likelihood estimation of phylogenetic models.


Journal

Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604

Informations de publication

Date de publication:
08 2023
Historique:
received: 21 09 2022
accepted: 18 05 2023
medline: 9 8 2023
pubmed: 30 6 2023
entrez: 29 6 2023
Statut: ppublish

Résumé

Phylogenetic models of molecular evolution are central to numerous biological applications spanning diverse timescales, from hundreds of millions of years involving orthologous proteins to just tens of days relating to single cells within an organism. A fundamental problem in these applications is estimating model parameters, for which maximum likelihood estimation is typically employed. Unfortunately, maximum likelihood estimation is a computationally expensive task, in some cases prohibitively so. To address this challenge, we here introduce CherryML, a broadly applicable method that achieves several orders of magnitude speedup by using a quantized composite likelihood over cherries in the trees. The massive speedup offered by our method should enable researchers to consider more complex and biologically realistic models than previously possible. Here we demonstrate CherryML's utility by applying it to estimate a general 400 × 400 rate matrix for residue-residue coevolution at contact sites in three-dimensional protein structures; we estimate that using current state-of-the-art methods such as the expectation-maximization algorithm for the same task would take >100,000 times longer.

Identifiants

pubmed: 37386188
doi: 10.1038/s41592-023-01917-9
pii: 10.1038/s41592-023-01917-9
pmc: PMC10644697
mid: NIHMS1937734
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

1232-1236

Subventions

Organisme : NIGMS NIH HHS
ID : R35 GM134922
Pays : United States
Organisme : NIGMS NIH HHS
ID : R35-GM134922
Pays : United States

Informations de copyright

© 2023. The Author(s), under exclusive licence to Springer Nature America, Inc.

Références

BMC Bioinformatics. 2014 Oct 24;15:341
pubmed: 25344302
Mol Biol Evol. 2009 Oct;26(10):2387-95
pubmed: 19597162
Mol Biol Evol. 2008 Jul;25(7):1307-20
pubmed: 18367465
Mol Biol Evol. 2007 Aug;24(8):1586-91
pubmed: 17483113
Proc Natl Acad Sci U S A. 2020 Jan 21;117(3):1496-1503
pubmed: 31896580
BMC Bioinformatics. 2006 Oct 03;7:428
pubmed: 17018148
Nature. 2021 Aug;596(7873):583-589
pubmed: 34265844
Mol Biol Evol. 2004 Mar;21(3):468-88
pubmed: 14660683
Mol Biol Evol. 2001 May;18(5):691-9
pubmed: 11319253
Mol Biol Evol. 2020 May 1;37(5):1530-1534
pubmed: 32011700
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
Proc Biol Sci. 2018 Jun 27;285(1881):
pubmed: 29925623
Syst Biol. 2021 Aug 11;70(5):1046-1060
pubmed: 33616668
Syst Biol. 2010 May;59(3):307-21
pubmed: 20525638
Comput Appl Biosci. 1992 Jun;8(3):275-82
pubmed: 1633570
PLoS Comput Biol. 2019 Apr 8;15(4):e1006650
pubmed: 30958812
PLoS Comput Biol. 2007 Nov;3(11):e211
pubmed: 17983264
Genetics. 2020 Dec;216(4):1187-1204
pubmed: 33020189
J Mol Evol. 1994 Sep;39(3):306-14
pubmed: 7932792
Nat Methods. 2017 Jun;14(6):587-589
pubmed: 28481363
Nat Rev Genet. 2016 Feb;17(2):109-21
pubmed: 26781812

Auteurs

Sebastian Prillo (S)

Computer Science Division, University of California, Berkeley, CA, USA.

Yun Deng (Y)

Graduate Group in Computational Biology, University of California, Berkeley, CA, USA.

Pierre Boyeau (P)

Computer Science Division, University of California, Berkeley, CA, USA.

Xingyu Li (X)

Computer Science Division, University of California, Berkeley, CA, USA.

Po-Yen Chen (PY)

Computer Science Division, University of California, Berkeley, CA, USA.

Yun S Song (YS)

Computer Science Division, University of California, Berkeley, CA, USA. yss@berkeley.edu.
Department of Statistics, University of California, Berkeley, CA, USA. yss@berkeley.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Animals Hemiptera Insect Proteins Phylogeny Insecticides

Classifications MeSH