Unblended disjoint tree merging using GTM improves species tree estimation.


Journal

BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258

Informations de publication

Date de publication:
16 Apr 2020
Historique:
entrez: 18 4 2020
pubmed: 18 4 2020
medline: 14 1 2021
Statut: epublish

Résumé

Phylogeny estimation is an important part of much biological research, but large-scale tree estimation is infeasible using standard methods due to computational issues. Recently, an approach to large-scale phylogeny has been proposed that divides a set of species into disjoint subsets, computes trees on the subsets, and then merges the trees together using a computed matrix of pairwise distances between the species. The novel component of these approaches is the last step: Disjoint Tree Merger (DTM) methods. We present GTM (Guide Tree Merger), a polynomial time DTM method that adds edges to connect the subset trees, so as to provably minimize the topological distance to a computed guide tree. Thus, GTM performs unblended mergers, unlike the previous DTM methods. Yet, despite the potential limitation, our study shows that GTM has excellent accuracy, generally matching or improving on two previous DTMs, and is much faster than both. The proposed GTM approach to the DTM problem is a useful new tool for large-scale phylogenomic analysis, and shows the surprising potential for unblended DTM methods.

Sections du résumé

BACKGROUND BACKGROUND
Phylogeny estimation is an important part of much biological research, but large-scale tree estimation is infeasible using standard methods due to computational issues. Recently, an approach to large-scale phylogeny has been proposed that divides a set of species into disjoint subsets, computes trees on the subsets, and then merges the trees together using a computed matrix of pairwise distances between the species. The novel component of these approaches is the last step: Disjoint Tree Merger (DTM) methods.
RESULTS RESULTS
We present GTM (Guide Tree Merger), a polynomial time DTM method that adds edges to connect the subset trees, so as to provably minimize the topological distance to a computed guide tree. Thus, GTM performs unblended mergers, unlike the previous DTM methods. Yet, despite the potential limitation, our study shows that GTM has excellent accuracy, generally matching or improving on two previous DTMs, and is much faster than both.
CONCLUSIONS CONCLUSIONS
The proposed GTM approach to the DTM problem is a useful new tool for large-scale phylogenomic analysis, and shows the surprising potential for unblended DTM methods.

Identifiants

pubmed: 32299343
doi: 10.1186/s12864-020-6605-1
pii: 10.1186/s12864-020-6605-1
pmc: PMC7161100
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

235

Références

Mol Biol Evol. 1987 Jul;4(4):406-25
pubmed: 3447015
Bioinformatics. 2014 Sep 1;30(17):i519-26
pubmed: 25161242
J Comput Biol. 1998 Fall;5(3):391-407
pubmed: 9773340
BMC Bioinformatics. 2017 Jun 7;18(Suppl 8):238
pubmed: 28617225
Trends Ecol Evol. 2013 Dec;28(12):719-28
pubmed: 24094331
Syst Biol. 2019 Mar 1;68(2):281-297
pubmed: 30247732
BMC Genomics. 2015;16 Suppl 10:S3
pubmed: 26449326
Syst Biol. 2012 Jan;61(1):90-106
pubmed: 22139466
Syst Biol. 2011 Oct;60(5):661-7
pubmed: 21447481
BMC Bioinformatics. 2018 May 8;19(Suppl 6):153
pubmed: 29745866
Genome Biol. 2015 Jun 16;16:124
pubmed: 26076734
Bioinformatics. 2014 Sep 1;30(17):i541-8
pubmed: 25161245
BMC Genomics. 2014;15 Suppl 6:S7
pubmed: 25572610
Algorithms Mol Biol. 2019 Feb 6;14:2
pubmed: 30839943
Algorithms Mol Biol. 2019 Jul 19;14:14
pubmed: 31360216
J Comput Biol. 2015 May;22(5):377-86
pubmed: 25549288
Bioinformatics. 2019 Jul 15;35(14):i417-i426
pubmed: 31510668
Bioinformatics. 2015 Jun 15;31(12):i44-52
pubmed: 26072508
Bioinformatics. 2012 Jun 15;28(12):i274-82
pubmed: 22689772
Theor Popul Biol. 2014 Dec 26;100C:56-62
pubmed: 25545843
Mol Biol Evol. 2017 Dec 1;34(12):3279-3291
pubmed: 29029241
Bioinformatics. 2006 Aug 15;22(16):2047-8
pubmed: 16679334
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
Bioinformatics. 2006 Nov 1;22(21):2688-90
pubmed: 16928733
Genome Res. 2013 Feb;23(2):323-30
pubmed: 23132911
Genome Biol. 2019 Jul 25;20(1):144
pubmed: 31345254
PLoS One. 2011;6(11):e27731
pubmed: 22132132

Auteurs

Vladimir Smirnov (V)

Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N Goodwin Ave, Urbana, 61801, IL, US.

Tandy Warnow (T)

Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N Goodwin Ave, Urbana, 61801, IL, US. warnow@illinois.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Amaryllidaceae Alkaloids Lycoris NADPH-Ferrihemoprotein Reductase Gene Expression Regulation, Plant Plant Proteins

Classifications MeSH