Compositionally Constrained Sites Drive Long-Branch Attraction.


Journal

Systematic biology
ISSN: 1076-836X
Titre abrégé: Syst Biol
Pays: England
ID NLM: 9302532

Informations de publication

Date de publication:
07 08 2023
Historique:
received: 03 03 2022
revised: 01 03 2023
accepted: 16 03 2023
medline: 8 8 2023
pubmed: 23 3 2023
entrez: 22 3 2023
Statut: ppublish

Résumé

Accurate phylogenies are fundamental to our understanding of the pattern and process of evolution. Yet, phylogenies at deep evolutionary timescales, with correspondingly long branches, have been fraught with controversy resulting from conflicting estimates from models with varying complexity and goodness of fit. Analyses of historical as well as current empirical datasets, such as alignments including Microsporidia, Nematoda, or Platyhelminthes, have demonstrated that inadequate modeling of across-site compositional heterogeneity, which is the result of biochemical constraints that lead to varying patterns of accepted amino acids along sequences, can lead to erroneous topologies that are strongly supported. Unfortunately, models that adequately account for across-site compositional heterogeneity remain computationally challenging or intractable for an increasing fraction of contemporary datasets. Here, we introduce "compositional constraint analysis," a method to investigate the effect of site-specific constraints on amino acid composition on phylogenetic inference. We show that more constrained sites with lower diversity and less constrained sites with higher diversity exhibit ostensibly conflicting signals under models ignoring across-site compositional heterogeneity that lead to long-branch attraction artifacts and demonstrate that more complex models accounting for across-site compositional heterogeneity can ameliorate this bias. We present CAT-posterior mean site frequencies (PMSF), a pipeline for diagnosing and resolving phylogenetic bias resulting from inadequate modeling of across-site compositional heterogeneity based on the CAT model. CAT-PMSF is robust against long-branch attraction in all alignments we have examined. We suggest using CAT-PMSF when convergence of the CAT model cannot be assured. We find evidence that compositionally constrained sites are driving long-branch attraction in two metazoan datasets and recover evidence for Porifera as the sister group to all other animals. [Animal phylogeny; cross-site heterogeneity; long-branch attraction; phylogenomics.].

Identifiants

pubmed: 36946562
pii: 7083631
doi: 10.1093/sysbio/syad013
pmc: PMC10405358
doi:

Banques de données

Dryad
['10.5061/dryad.g79cnp5rh']

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

767-780

Informations de copyright

© The Author(s) 2023. Published by Oxford University Press on behalf of the Society of Systematic Biologists.

Références

PLoS One. 2012;7(6):e39822
pubmed: 22768133
BMC Evol Biol. 2007 Nov 01;7:206
pubmed: 17974035
Sci Adv. 2020 Dec 11;6(50):
pubmed: 33310849
Mol Biol Evol. 2004 Jun;21(6):1095-109
pubmed: 15014145
Nature. 2021 Aug;596(7873):583-589
pubmed: 34265844
Trends Biochem Sci. 2001 Jul;26(7):414-6
pubmed: 11469218
Mol Biol Evol. 2006 Feb;23(2):352-64
pubmed: 16237207
Nature. 2008 Dec 18;456(7224):942-5
pubmed: 19037246
Proc Natl Acad Sci U S A. 2012 May 22;109(21):E1352-9
pubmed: 22547823
Nature. 2016 Feb 4;530(7588):89-93
pubmed: 26842059
Syst Biol. 2020 Mar 1;69(2):249-264
pubmed: 31364711
Mol Biol Evol. 2001 May;18(5):866-73
pubmed: 11319270
Syst Biol. 2018 Mar 01;67(2):216-235
pubmed: 28950365
Mol Biol Evol. 1999 Apr;16(4):564-6
pubmed: 10331281
Curr Biol. 2017 Apr 3;27(7):958-967
pubmed: 28318975
Mol Biol Evol. 1995 Jan;12(1):177-9
pubmed: 7877493
Syst Biol. 2005 Oct;54(5):743-57
pubmed: 16243762
Protein Eng. 1995 Jul;8(7):641-5
pubmed: 8577693
Syst Biol. 2017 Mar 01;66(2):232-255
pubmed: 27633354
Science. 2013 Dec 13;342(6164):1242592
pubmed: 24337300
Syst Biol. 2018 Sep 1;67(5):901-904
pubmed: 29718447
Mol Biol Evol. 2002 Jan;19(1):1-7
pubmed: 11752184
Cladistics. 1999 Jun;15(2):199-204
pubmed: 34902917
Nature. 2006 Feb 23;439(7079):965-8
pubmed: 16495997
Bioinformatics. 2009 Sep 1;25(17):2286-8
pubmed: 19535536
Mod Pathol. 2002 May;15(5):577-83
pubmed: 12011264
Syst Biol. 2004 Aug;53(4):623-37
pubmed: 15371250
Mol Biol Evol. 2014 Jan;31(1):135-9
pubmed: 24109601
Mol Biol Evol. 2020 Dec 16;37(12):3632-3641
pubmed: 32637998
Mol Biol Evol. 2008 Jul;25(7):1307-20
pubmed: 18367465
Bioinformatics. 2006 May 15;22(10):1225-31
pubmed: 16492684
BMC Evol Biol. 2008 Dec 16;8:331
pubmed: 19087270
Cladistics. 2005 Apr;21(2):163-193
pubmed: 34892859
Mol Biol Evol. 2006 Jan;23(1):40-5
pubmed: 16151191
Math Biosci. 1998 Jan 1;147(1):63-91
pubmed: 9401352
Syst Biol. 2002 Jun;51(3):492-508
pubmed: 12079646
PLoS Biol. 2011 Mar;9(3):e1000602
pubmed: 21423652
Syst Biol. 2013 Jul;62(4):611-5
pubmed: 23564032
BMC Evol Biol. 2005 Oct 06;5:50
pubmed: 16209710
Mol Biol Evol. 2002 May;19(5):698-707
pubmed: 11961103
Nature. 2011 Feb 10;470(7333):255-8
pubmed: 21307940
Mol Biol Evol. 1989 May;6(3):270-89
pubmed: 2622335
Mol Biol Evol. 2004 Jul;21(7):1455-8
pubmed: 15084674
Gene. 1990 Mar 1;87(1):23-9
pubmed: 2110097
Mol Biol Evol. 2001 May;18(5):691-9
pubmed: 11319253
Curr Biol. 2009 Apr 28;19(8):706-12
pubmed: 19345102
Nat Ecol Evol. 2020 Jan;4(1):138-147
pubmed: 31819234
Mol Biol Evol. 2011 Nov;28(11):3045-59
pubmed: 21593046
Mol Biol Evol. 2020 May 1;37(5):1530-1534
pubmed: 32011700
Mol Biol Evol. 2018 May 1;35(5):1266-1283
pubmed: 29688541
Bioinformatics. 2008 Oct 15;24(20):2317-23
pubmed: 18718941
BMC Evol Biol. 2011 Jan 14;11:17
pubmed: 21235782
Syst Biol. 2014 Sep;63(5):726-42
pubmed: 24927722
Proc Biol Sci. 2013 Aug 28;280(1769):20131755
pubmed: 23986111
Syst Biol. 2004 Aug;53(4):638-43
pubmed: 15371251
Nature. 2004 Oct 21;431(7011):980-4
pubmed: 15496922
Mol Biol Evol. 2005 May;22(5):1246-53
pubmed: 15703236
BMC Evol Biol. 2007 Feb 08;7 Suppl 1:S4
pubmed: 17288577
J Mol Evol. 1997 Mar;44(3):282-8
pubmed: 9060394
Mol Biol Evol. 1994 Mar;11(2):261-77
pubmed: 8170367
Bioinformatics. 2005 Jun 1;21(11):2596-603
pubmed: 15713731
Mol Biol Evol. 1994 May;11(3):459-68
pubmed: 8015439
Mol Biol Evol. 2020 Dec 16;37(12):3616-3631
pubmed: 32877529
Mol Biol Evol. 1993 Nov;10(6):1396-401
pubmed: 8277861
Mol Biol Evol. 2021 Sep 27;38(10):4322-4333
pubmed: 34097041
Nucleic Acids Res. 1997 Jan 1;25(1):226-30
pubmed: 9016541
Genetics. 2006 Feb;172(2):1301-7
pubmed: 16299393
Nat Methods. 2017 Jun;14(6):587-589
pubmed: 28481363
J Mol Evol. 1995 Jun;40(6):622-8
pubmed: 7643413
Proc Natl Acad Sci U S A. 1996 Mar 5;93(5):1930-4
pubmed: 8700861
Mol Biol Evol. 2018 Mar 1;35(3):743-755
pubmed: 29294047

Auteurs

Lénárd L Szánthó (LL)

Department of Biological Physics, Eötvös University, Budapest, Hungary.
ELTE-MTA "Lendület" Evolutionary Genomics Research Group, Budapest, Hungary.
Institute of Evolution, Centre for Ecological Research, Budapest, Hungary.

Nicolas Lartillot (N)

Laboratoire de Biométrie et Biologie Evolutive UMR 5558, CNRS, Université de Lyon, Villeurbanne, France.

Gergely J Szöllősi (GJ)

Department of Biological Physics, Eötvös University, Budapest, Hungary.
ELTE-MTA "Lendület" Evolutionary Genomics Research Group, Budapest, Hungary.
Institute of Evolution, Centre for Ecological Research, Budapest, Hungary.

Dominik Schrempf (D)

Department of Biological Physics, Eötvös University, Budapest, Hungary.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Robotic Surgical Procedures Animals Humans Telemedicine Models, Animal

Odour generalisation and detection dog training.

Lyn Caldicott, Thomas W Pike, Helen E Zulch et al.
1.00
Animals Odorants Dogs Generalization, Psychological Smell
Animals TOR Serine-Threonine Kinases Colorectal Neoplasms Colitis Mice

Classifications MeSH