PhyloFisher: A phylogenomic package for resolving eukaryotic relationships.
Journal
PLoS biology
ISSN: 1545-7885
Titre abrégé: PLoS Biol
Pays: United States
ID NLM: 101183755
Informations de publication
Date de publication:
08 2021
08 2021
Historique:
received:
21
08
2020
accepted:
15
07
2021
entrez:
6
8
2021
pubmed:
7
8
2021
medline:
19
11
2021
Statut:
epublish
Résumé
Phylogenomic analyses of hundreds of protein-coding genes aimed at resolving phylogenetic relationships is now a common practice. However, no software currently exists that includes tools for dataset construction and subsequent analysis with diverse validation strategies to assess robustness. Furthermore, there are no publicly available high-quality curated databases designed to assess deep (>100 million years) relationships in the tree of eukaryotes. To address these issues, we developed an easy-to-use software package, PhyloFisher (https://github.com/TheBrownLab/PhyloFisher), written in Python 3. PhyloFisher includes a manually curated database of 240 protein-coding genes from 304 eukaryotic taxa covering known eukaryotic diversity, a novel tool for ortholog selection, and utilities that will perform diverse analyses required by state-of-the-art phylogenomic investigations. Through phylogenetic reconstructions of the tree of eukaryotes and of the Saccharomycetaceae clade of budding yeasts, we demonstrate the utility of the PhyloFisher workflow and the provided starting database to address phylogenetic questions across a large range of evolutionary time points for diverse groups of organisms. We also demonstrate that undetected paralogy can remain in phylogenomic "single-copy orthogroup" datasets constructed using widely accepted methods such as all vs. all BLAST searches followed by Markov Cluster Algorithm (MCL) clustering and application of automated tree pruning algorithms. Finally, we show how the PhyloFisher workflow helps detect inadvertent paralog inclusions, allowing the user to make more informed decisions regarding orthology assignments, leading to a more accurate final dataset.
Identifiants
pubmed: 34358228
doi: 10.1371/journal.pbio.3001365
pii: PBIOLOGY-D-20-02379
pmc: PMC8345874
doi:
Types de publication
Evaluation Study
Journal Article
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
e3001365Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
Bioinformatics. 2018 Nov 15;34(22):3929-3930
pubmed: 29868763
Mol Biol Evol. 2013 Apr;30(4):772-80
pubmed: 23329690
BMC Evol Biol. 2010 Jul 13;10:210
pubmed: 20626897
BMC Biol. 2021 Apr 16;19(1):77
pubmed: 33863338
Mol Biol Evol. 2007 Sep;24(9):2139-50
pubmed: 17652333
Nature. 2018 Dec;564(7736):410-414
pubmed: 30429611
Syst Biol. 2021 May 14;:
pubmed: 33988690
Biol Direct. 2016 Dec 28;11(1):69
pubmed: 28031045
Science. 2000 Nov 3;290(5493):972-7
pubmed: 11062127
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
Mol Biol Evol. 2019 Oct 1;36(10):2340-2351
pubmed: 31209473
PLoS One. 2013;8(3):e59565
pubmed: 23555709
Bioinformatics. 2012 Dec 1;28(23):3150-2
pubmed: 23060610
Syst Biol. 2018 Mar 01;67(2):216-235
pubmed: 28950365
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Bioinformatics. 2014 May 1;30(9):1312-3
pubmed: 24451623
Nucleic Acids Res. 2013 Jul;41(12):e121
pubmed: 23598997
Protoplasma. 2018 Sep;255(5):1517-1574
pubmed: 29666938
Mol Biol Evol. 2016 Jun;33(6):1635-8
pubmed: 26921390
Genome Biol Evol. 2018 Feb 1;10(2):427-433
pubmed: 29360967
Mol Biol Evol. 2019 Apr 1;36(4):757-765
pubmed: 30668767
Mol Biol Evol. 2019 Jun 1;36(6):1344-1356
pubmed: 30903171
Mol Biol Evol. 2020 May 1;37(5):1530-1534
pubmed: 32011700
Mol Biol Evol. 2018 May 1;35(5):1266-1283
pubmed: 29688541
Nature. 2019 Aug;572(7768):240-243
pubmed: 31316212
Syst Biol. 2003 Oct;52(5):594-603
pubmed: 14530128
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
Evol Bioinform Online. 2015 Apr 27;11:79-83
pubmed: 25987827
Bioinformatics. 2009 Aug 1;25(15):1972-3
pubmed: 19505945
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D363-8
pubmed: 16381887
Cell. 2018 Nov 29;175(6):1533-1545.e20
pubmed: 30415838
BMC Bioinformatics. 2018 May 8;19(Suppl 6):153
pubmed: 29745866
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500
Mol Biol Evol. 2014 Nov;31(11):3081-92
pubmed: 25158799
Gigascience. 2015 Oct 19;4:48
pubmed: 26500767
Trends Ecol Evol. 2020 Jan;35(1):43-55
pubmed: 31606140
Mol Biochem Parasitol. 1993 May;59(1):41-8
pubmed: 8515782