Scalable Empirical Mixture Models That Account for Across-Site Compositional Heterogeneity.

empirical distribution mixture models empirical profile mixture models long-branch attraction microsporidia phylogenetics

Journal

Molecular biology and evolution
ISSN: 1537-1719
Titre abrégé: Mol Biol Evol
Pays: United States
ID NLM: 8501455

Informations de publication

Date de publication:
16 12 2020
Historique:
pubmed: 3 9 2020
medline: 12 5 2021
entrez: 3 9 2020
Statut: ppublish

Résumé

Biochemical demands constrain the range of amino acids acceptable at specific sites resulting in across-site compositional heterogeneity of the amino acid replacement process. Phylogenetic models that disregard this heterogeneity are prone to systematic errors, which can lead to severe long-branch attraction artifacts. State-of-the-art models accounting for across-site compositional heterogeneity include the CAT model, which is computationally expensive, and empirical distribution mixture models estimated via maximum likelihood (C10-C60 models). Here, we present a new, scalable method EDCluster for finding empirical distribution mixture models involving a simple cluster analysis. The cluster analysis utilizes specific coordinate transformations which allow the detection of specialized amino acid distributions either from curated databases or from the alignment at hand. We apply EDCluster to the HOGENOM and HSSP databases in order to provide universal distribution mixture (UDM) models comprising up to 4,096 components. Detailed analyses of the UDM models demonstrate the removal of various long-branch attraction artifacts and improved performance compared with the C10-C60 models. Ready-to-use implementations of the UDM models are provided for three established software packages (IQ-TREE, Phylobayes, and RevBayes).

Identifiants

pubmed: 32877529
pii: 5900673
doi: 10.1093/molbev/msaa145
pmc: PMC7743758
doi:

Types de publication

Evaluation Study Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

3616-3631

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Références

Syst Biol. 2005 Oct;54(5):743-57
pubmed: 16243762
J Mol Evol. 1981;17(6):368-76
pubmed: 7288891
Mol Biol Evol. 2012 Oct;29(10):2921-36
pubmed: 22491036
Proc Natl Acad Sci U S A. 2010 Mar 9;107(10):4629-34
pubmed: 20176949
Gene. 2000 Apr 4;246(1-2):1-8
pubmed: 10767522
Curr Opin Genet Dev. 1998 Dec;8(6):616-23
pubmed: 9914208
Annu Rev Microbiol. 2002;56:93-116
pubmed: 12142484
Mol Biol Evol. 2000 Jan;17(1):23-31
pubmed: 10666703
Mol Biol Evol. 2004 Jun;21(6):1095-109
pubmed: 15014145
Syst Biol. 2013 Jul;62(4):611-5
pubmed: 23564032
J Mol Evol. 1994 Sep;39(3):306-14
pubmed: 7932792
J Mol Evol. 1994 Jul;39(1):105-11
pubmed: 8064867
Mol Biol Evol. 2001 May;18(5):691-9
pubmed: 11319253
Syst Biol. 2010 May;59(3):277-87
pubmed: 20525635
Curr Biol. 2017 Dec 18;27(24):3864-3870.e4
pubmed: 29199080
Mol Biol Evol. 2017 Mar 1;34(3):772-773
pubmed: 28013191
Syst Biol. 2018 Mar 01;67(2):216-235
pubmed: 28950365
Curr Biol. 2017 Apr 3;27(7):958-967
pubmed: 28318975
Proc Natl Acad Sci U S A. 1999 Jan 19;96(2):580-5
pubmed: 9892676
Curr Biol. 2019 Jun 3;29(11):1818-1826.e6
pubmed: 31104936
Curr Opin Struct Biol. 2008 Apr;18(2):170-7
pubmed: 18328690
Syst Biol. 2017 Mar 01;66(2):232-255
pubmed: 27633354
Syst Biol. 2016 Jul;65(4):726-36
pubmed: 27235697
Nature. 2013 Dec 12;504(7479):231-6
pubmed: 24336283
Proteins. 1997 Jul;28(3):405-20
pubmed: 9223186
Mol Biol Evol. 2008 Jul;25(7):1307-20
pubmed: 18367465
BMC Evol Biol. 2008 Dec 16;8:331
pubmed: 19087270
J Biochem. 1996 Dec;120(6):1095-103
pubmed: 9010756
PLoS Biol. 2011 Mar;9(3):e1000602
pubmed: 21423652
Nature. 1987 Mar 26-Apr 1;326(6111):332-3
pubmed: 3561476
Mol Biol Evol. 2007 Sep;24(9):2139-50
pubmed: 17652333
Genome Res. 2004 Jun;14(6):1188-90
pubmed: 15173120
Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3965-76
pubmed: 18852096
Nature. 2002 Aug 22;418(6900):865-9
pubmed: 12192407
Genetics. 1998 May;149(1):445-58
pubmed: 9584116
Mol Biol Evol. 2018 May 1;35(5):1266-1283
pubmed: 29688541
J Theor Biol. 2004 May 7;228(1):97-106
pubmed: 15064085
Mol Biol Evol. 2018 Feb 1;35(2):518-522
pubmed: 29077904
Bioinformatics. 2008 Oct 15;24(20):2317-23
pubmed: 18718941
Mol Biol Evol. 2015 Jan;32(1):268-74
pubmed: 25371430
Mol Biol Evol. 2005 May;22(5):1246-53
pubmed: 15703236
BMC Evol Biol. 2007 Feb 08;7 Suppl 1:S4
pubmed: 17288577
Bioinformatics. 2005 Jun 1;21(11):2596-603
pubmed: 15713731
Mol Biol Evol. 1994 May;11(3):459-68
pubmed: 8015439
J Mol Biol. 1996 Oct 25;263(2):196-208
pubmed: 8913301
Nucleic Acids Res. 1997 Jan 1;25(1):226-30
pubmed: 9016541
Nature. 1987 Mar 26-Apr 1;326(6111):411-4
pubmed: 3550472
Syst Biol. 2010 May;59(3):307-21
pubmed: 20525638
Nat Rev Genet. 2006 May;7(5):337-48
pubmed: 16619049
Proc Natl Acad Sci U S A. 2015 Dec 15;112(50):15402-7
pubmed: 26621703
Mol Biol Evol. 2018 Mar 1;35(3):743-755
pubmed: 29294047

Auteurs

Dominik Schrempf (D)

Department of Biological Physics, Eötvös University, Budapest, Hungary.

Nicolas Lartillot (N)

Laboratoire de Biométrie et Biologie Evolutive UMR 5558, CNRS, Université de Lyon, Villeurbanne, France.

Gergely Szöllősi (G)

Department of Biological Physics, Eötvös University, Budapest, Hungary.
ELTE-MTA "Lendület" Evolutionary Genomics Research Group, Budapest, Hungary.
Evolutionary Systems Research Group, Centre for Ecological Research, Hungarian Academy of Sciences, Tihany, Hungary.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Amaryllidaceae Alkaloids Lycoris NADPH-Ferrihemoprotein Reductase Gene Expression Regulation, Plant Plant Proteins

Classifications MeSH