CherryML: scalable maximum likelihood estimation of phylogenetic models.
Journal
Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604
Informations de publication
Date de publication:
08 2023
08 2023
Historique:
received:
21
09
2022
accepted:
18
05
2023
medline:
9
8
2023
pubmed:
30
6
2023
entrez:
29
6
2023
Statut:
ppublish
Résumé
Phylogenetic models of molecular evolution are central to numerous biological applications spanning diverse timescales, from hundreds of millions of years involving orthologous proteins to just tens of days relating to single cells within an organism. A fundamental problem in these applications is estimating model parameters, for which maximum likelihood estimation is typically employed. Unfortunately, maximum likelihood estimation is a computationally expensive task, in some cases prohibitively so. To address this challenge, we here introduce CherryML, a broadly applicable method that achieves several orders of magnitude speedup by using a quantized composite likelihood over cherries in the trees. The massive speedup offered by our method should enable researchers to consider more complex and biologically realistic models than previously possible. Here we demonstrate CherryML's utility by applying it to estimate a general 400 × 400 rate matrix for residue-residue coevolution at contact sites in three-dimensional protein structures; we estimate that using current state-of-the-art methods such as the expectation-maximization algorithm for the same task would take >100,000 times longer.
Identifiants
pubmed: 37386188
doi: 10.1038/s41592-023-01917-9
pii: 10.1038/s41592-023-01917-9
pmc: PMC10644697
mid: NIHMS1937734
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
1232-1236Subventions
Organisme : NIGMS NIH HHS
ID : R35 GM134922
Pays : United States
Organisme : NIGMS NIH HHS
ID : R35-GM134922
Pays : United States
Informations de copyright
© 2023. The Author(s), under exclusive licence to Springer Nature America, Inc.
Références
BMC Bioinformatics. 2014 Oct 24;15:341
pubmed: 25344302
Mol Biol Evol. 2009 Oct;26(10):2387-95
pubmed: 19597162
Mol Biol Evol. 2008 Jul;25(7):1307-20
pubmed: 18367465
Mol Biol Evol. 2007 Aug;24(8):1586-91
pubmed: 17483113
Proc Natl Acad Sci U S A. 2020 Jan 21;117(3):1496-1503
pubmed: 31896580
BMC Bioinformatics. 2006 Oct 03;7:428
pubmed: 17018148
Nature. 2021 Aug;596(7873):583-589
pubmed: 34265844
Mol Biol Evol. 2004 Mar;21(3):468-88
pubmed: 14660683
Mol Biol Evol. 2001 May;18(5):691-9
pubmed: 11319253
Mol Biol Evol. 2020 May 1;37(5):1530-1534
pubmed: 32011700
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
Proc Biol Sci. 2018 Jun 27;285(1881):
pubmed: 29925623
Syst Biol. 2021 Aug 11;70(5):1046-1060
pubmed: 33616668
Syst Biol. 2010 May;59(3):307-21
pubmed: 20525638
Comput Appl Biosci. 1992 Jun;8(3):275-82
pubmed: 1633570
PLoS Comput Biol. 2019 Apr 8;15(4):e1006650
pubmed: 30958812
PLoS Comput Biol. 2007 Nov;3(11):e211
pubmed: 17983264
Genetics. 2020 Dec;216(4):1187-1204
pubmed: 33020189
J Mol Evol. 1994 Sep;39(3):306-14
pubmed: 7932792
Nat Methods. 2017 Jun;14(6):587-589
pubmed: 28481363
Nat Rev Genet. 2016 Feb;17(2):109-21
pubmed: 26781812