DISCO: Species Tree Inference using Multicopy Gene Family Tree Decomposition.
Journal
Systematic biology
ISSN: 1076-836X
Titre abrégé: Syst Biol
Pays: England
ID NLM: 9302532
Informations de publication
Date de publication:
19 04 2022
19 04 2022
Historique:
received:
24
05
2021
revised:
18
08
2021
accepted:
23
08
2021
pubmed:
28
8
2021
medline:
21
4
2022
entrez:
27
8
2021
Statut:
ppublish
Résumé
Species tree inference from gene family trees is a significant problem in computational biology. However, gene tree heterogeneity, which can be caused by several factors including gene duplication and loss, makes the estimation of species trees very challenging. While there have been several species tree estimation methods introduced in recent years to specifically address gene tree heterogeneity due to gene duplication and loss (such as DupTree, FastMulRFS, ASTRAL-Pro, and SpeciesRax), many incur high cost in terms of both running time and memory. We introduce a new approach, DISCO, that decomposes the multi-copy gene family trees into many single copy trees, which allows for methods previously designed for species tree inference in a single copy gene tree context to be used. We prove that using DISCO with ASTRAL (i.e., ASTRAL-DISCO) is statistically consistent under the GDL model, provided that ASTRAL-Pro correctly roots and tags each gene family tree. We evaluate DISCO paired with different methods for estimating species trees from single copy genes (e.g., ASTRAL, ASTRID, and IQ-TREE) under a wide range of model conditions, and establish that high accuracy can be obtained even when ASTRAL-Pro is not able to correctly roots and tags the gene family trees. We also compare results using MI, an alternative decomposition strategy from Yang Y. and Smith S.A. (2014), and find that DISCO provides better accuracy, most likely as a result of covering more of the gene family tree leafset in the output decomposition. [Concatenation analysis; gene duplication and loss; species tree inference; summary method.].
Identifiants
pubmed: 34450658
pii: 6358739
doi: 10.1093/sysbio/syab070
pmc: PMC9016570
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
610-629Commentaires et corrections
Type : ErratumIn
Informations de copyright
© The Author(s) 2021. Published by Oxford University Press on behalf of the Society of Systematic Biologists.
Références
Syst Biol. 2022 Feb 10;71(2):367-381
pubmed: 34245291
Mol Biol Evol. 1987 Jul;4(4):406-25
pubmed: 3447015
Mol Phylogenet Evol. 2019 Jan;130:286-296
pubmed: 30393186
Evolution. 1983 Jan;37(1):203-217
pubmed: 28568026
Mol Biol Evol. 2020 Sep 1;37(9):2763-2774
pubmed: 32502238
Mol Biol Evol. 2016 Aug;33(8):2117-34
pubmed: 27189539
Mol Biol Evol. 2016 Jul;33(7):1654-68
pubmed: 27189547
Bioinformatics. 2015 Feb 1;31(3):432-3
pubmed: 25273112
Bioinformatics. 2008 Jul 1;24(13):1540-1
pubmed: 18474508
Syst Biol. 2015 Mar;64(2):325-39
pubmed: 25540456
Syst Biol. 2011 Oct;60(5):661-7
pubmed: 21447481
Genome Res. 2013 Feb;23(2):323-30
pubmed: 23132911
Mol Biol Evol. 2009 Aug;26(8):1879-88
pubmed: 19423664
Bioinformatics. 2014 Dec 1;30(23):3317-24
pubmed: 25104814
SoftwareX. 2020 Jan-Jun;11:
pubmed: 35903557
Syst Biol. 2016 Mar;65(2):334-44
pubmed: 26526427
Nature. 2019 Oct;574(7780):679-685
pubmed: 31645766
IEEE/ACM Trans Comput Biol Bioinform. 2018 Jan-Feb;15(1):337-342
pubmed: 28113601
Bioinformatics. 2017 Mar 1;33(5):631-639
pubmed: 27663499
Bioinformatics. 2014 Sep 1;30(17):i541-8
pubmed: 25161245
Mol Biol Evol. 2020 Dec 16;37(12):3672-3683
pubmed: 32658973
Genome Res. 2012 Apr;22(4):755-65
pubmed: 22271778
BMC Bioinformatics. 2013 Nov 19;14:330
pubmed: 24252138
Syst Biol. 2014 Jan 1;63(1):66-82
pubmed: 23988674
Mol Biol Evol. 2015 Oct;32(10):2798-800
pubmed: 26130081
Syst Biol. 2018 Mar 01;67(2):285-303
pubmed: 29029338
Bioinformatics. 2014 May 1;30(9):1312-3
pubmed: 24451623
Nucleic Acids Res. 2012 Jan;40(Database issue):D136-43
pubmed: 22139910
Trends Genet. 2021 Feb;37(2):174-187
pubmed: 32921510
Genetics. 1989 Aug;122(4):957-66
pubmed: 2759432
Syst Biol. 2011 Mar;60(2):126-37
pubmed: 21088009
Syst Biol. 2007 Feb;56(1):17-24
pubmed: 17366134
BMC Evol Biol. 2010 Oct 11;10:302
pubmed: 20937096
Syst Biol. 2016 May;65(3):366-80
pubmed: 25164915
Bioinformatics. 2010 Nov 15;26(22):2910-1
pubmed: 20861028
Trends Genet. 2000 May;16(5):227-31
pubmed: 10782117
Bioinformatics. 2021 May 28;:
pubmed: 34048529
Syst Biol. 2016 May;65(3):397-416
pubmed: 25281847
Bioinformatics. 2020 Jul 1;36(Suppl_1):i57-i65
pubmed: 32657396
BMC Genomics. 2015;16 Suppl 10:S3
pubmed: 26449326
Mol Biol Evol. 2015 Jan;32(1):268-74
pubmed: 25371430
Nature. 2009 Jun 4;459(7247):657-62
pubmed: 19465905
Mol Biol Evol. 2022 Feb 3;39(2):
pubmed: 35021210
BMC Genomics. 2018 May 8;19(Suppl 5):286
pubmed: 29745854
Proc Biol Sci. 2009 Dec 22;276(1677):4261-70
pubmed: 19759036
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
J Comput Biol. 2021 May;28(5):452-468
pubmed: 33325781
Mol Biol Evol. 2020 Nov 1;37(11):3292-3307
pubmed: 32886770
Pac Symp Biocomput. 2013;:250-61
pubmed: 23424130
Mol Biol Evol. 2010 Mar;27(3):570-80
pubmed: 19906793
Methods Mol Biol. 2019;1910:149-175
pubmed: 31278664
Syst Biol. 2015 Jul;64(4):663-76
pubmed: 25813358
Bioinformatics. 2001;17 Suppl 1:S190-8
pubmed: 11473009
Mol Biol Evol. 2014 Nov;31(11):3081-92
pubmed: 25158799
Evol Bioinform Online. 2013 Oct 29;9:429-35
pubmed: 24250218
BMC Bioinformatics. 2010 Nov 23;11:574
pubmed: 21092314