Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees.

ASTRAL ILS gene tree estimation error phylogenomics summary methods

Journal

Molecular biology and evolution
ISSN: 1537-1719
Titre abrégé: Mol Biol Evol
Pays: United States
ID NLM: 8501455

Informations de publication

Date de publication:
05 12 2022
Historique:
pubmed: 7 10 2022
medline: 17 12 2022
entrez: 6 10 2022
Statut: ppublish

Résumé

Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines.

Identifiants

pubmed: 36201617
pii: 6750035
doi: 10.1093/molbev/msac215
pmc: PMC9750496
pii:
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : NIGMS NIH HHS
ID : R35 GM142725
Pays : United States

Informations de copyright

© The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.

Références

Mol Phylogenet Evol. 2020 May;146:106756
pubmed: 32028032
Syst Biol. 2020 Jan 1;69(1):38-60
pubmed: 31062850
Curr Biol. 2018 Nov 5;28(21):3441-3449.e5
pubmed: 30344120
Mol Phylogenet Evol. 2018 May;122:110-115
pubmed: 29421312
Syst Biol. 2011 Oct;60(5):661-7
pubmed: 21447481
Syst Biol. 2009 Apr;58(2):211-23
pubmed: 20525579
Mol Biol Evol. 2009 Aug;26(8):1879-88
pubmed: 19423664
Syst Biol. 2017 Sep 01;66(5):857-879
pubmed: 28369655
Science. 2014 Dec 12;346(6215):1250463
pubmed: 25504728
Syst Biol. 2016 Mar;65(2):334-44
pubmed: 26526427
IEEE/ACM Trans Comput Biol Bioinform. 2018 Sep-Oct;15(5):1738-1747
pubmed: 28976320
Nature. 2019 Oct;574(7780):679-685
pubmed: 31645766
Syst Biol. 2021 Jun 16;70(4):803-821
pubmed: 33367855
Bioinformatics. 2014 Sep 1;30(17):i541-8
pubmed: 25161245
Mol Phylogenet Evol. 2015 Feb;83:191-9
pubmed: 25450097
Mol Biol Evol. 2008 May;25(5):960-71
pubmed: 18281270
Syst Biol. 2022 Feb 10;71(2):367-381
pubmed: 34245291
Bull Math Biol. 2021 Jul 23;83(9):93
pubmed: 34297209
BMC Evol Biol. 2015 Aug 05;15:150
pubmed: 26239519
Syst Biol. 2014 Jan 1;63(1):66-82
pubmed: 23988674
Evolution. 2009 Jan;63(1):1-19
pubmed: 19146594
Syst Biol. 2015 Mar;64(2):233-42
pubmed: 25414175
PLoS One. 2015 Jun 18;10(6):e0129183
pubmed: 26086579
Mol Phylogenet Evol. 2012 Nov;65(2):501-9
pubmed: 22835380
Syst Biol. 2018 Mar 01;67(2):285-303
pubmed: 29029338
Bioinformatics. 2015 Jun 15;31(12):i44-52
pubmed: 26072508
Trends Genet. 2021 Feb;37(2):174-187
pubmed: 32921510
Bioinformatics. 2019 Oct 15;35(20):3961-3969
pubmed: 30903685
Genetics. 1989 Aug;122(4):957-66
pubmed: 2759432
Bioinformatics. 2012 Jun 15;28(12):i274-82
pubmed: 22689772
Proc Natl Acad Sci U S A. 2019 Nov 5;116(45):22657-22663
pubmed: 31636187
Syst Biol. 2011 Mar;60(2):126-37
pubmed: 21088009
J Comput Biol. 2008 Jan-Feb;15(1):91-103
pubmed: 18199023
Theor Popul Biol. 2014 Dec 26;100C:56-62
pubmed: 25545843
BMC Evol Biol. 2010 Oct 11;10:302
pubmed: 20937096
Mol Phylogenet Evol. 2009 Dec;53(3):891-906
pubmed: 19699809
Syst Biol. 2013 Jul;62(4):574-90
pubmed: 23576318
Nucleic Acids Res. 2019 Jul 2;47(W1):W276-W282
pubmed: 30997504
Mol Phylogenet Evol. 2019 Oct;139:106539
pubmed: 31226465
Trends Ecol Evol. 2009 Jun;24(6):332-40
pubmed: 19307040
Syst Biol. 2009 Oct;58(5):468-77
pubmed: 20525601
Syst Biol. 2015 Sep;64(5):727-40
pubmed: 25979143
Syst Biol. 2016 May;65(3):366-80
pubmed: 25164915
Genes (Basel). 2018 Feb 28;9(3):
pubmed: 29495636
BMC Genomics. 2015;16 Suppl 10:S1
pubmed: 26450506
Bioinformatics. 2021 May 28;:
pubmed: 34048529
Syst Biol. 2015 Nov;64(6):1032-47
pubmed: 26227865
Science. 2014 Dec 12;346(6215):1320-31
pubmed: 25504713
Mol Biol Evol. 2016 Jul;33(7):1654-68
pubmed: 27189547
Bioinformatics. 2013 Sep 15;29(18):2277-84
pubmed: 23842808
Mol Biol Evol. 2017 Aug 1;34(8):2101-2114
pubmed: 28431121
Syst Biol. 2019 Mar 1;68(2):281-297
pubmed: 30247732
Mol Phylogenet Evol. 2015 Oct;91:98-122
pubmed: 26002829
Trends Genet. 2006 Apr;22(4):225-31
pubmed: 16490279
Syst Biol. 2016 May;65(3):357-65
pubmed: 24996413
Mol Phylogenet Evol. 2016 Jan;94(Pt A):1-33
pubmed: 26238460
BMC Genomics. 2015;16 Suppl 10:S3
pubmed: 26449326
Mol Biol Evol. 2020 Feb 1;37(2):599-603
pubmed: 31633786
IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):166-71
pubmed: 20150678
Syst Biol. 2016 Jul;65(4):612-27
pubmed: 26865273
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
J Comput Biol. 2021 May;28(5):452-468
pubmed: 33325781
Mol Biol Evol. 2020 Nov 1;37(11):3292-3307
pubmed: 32886770
Mol Phylogenet Evol. 2016 Jan;94(Pt A):447-62
pubmed: 26518740
Bull Math Biol. 2020 Jul 16;82(7):97
pubmed: 32676801
Syst Biol. 2016 Sep;65(5):843-51
pubmed: 27151419
Bioinformatics. 2022 Jan 03;:
pubmed: 34978565
BMC Bioinformatics. 2018 May 8;19(Suppl 6):153
pubmed: 29745866
Syst Biol. 2020 Mar 1;69(2):221-233
pubmed: 31504938
Nat Ecol Evol. 2017 Apr 10;1(5):126
pubmed: 28812701
Syst Biol. 2018 Sep 1;67(5):916-924
pubmed: 29893968
Syst Biol. 2015 Jan;64(1):e42-62
pubmed: 25070970
Syst Biol. 2020 May 1;69(3):479-501
pubmed: 31633766

Auteurs

Chao Zhang (C)

Bioinformatics and Systems Biology, UC San Diego, La Jolla, CA, USA.

Siavash Mirarab (S)

Department of Electrical and Computer Engineering, UC San Diego, La Jolla, CA, USA.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Amaryllidaceae Alkaloids Lycoris NADPH-Ferrihemoprotein Reductase Gene Expression Regulation, Plant Plant Proteins

Classifications MeSH