PhyloFisher: A phylogenomic package for resolving eukaryotic relationships.


Journal

PLoS biology
ISSN: 1545-7885
Titre abrégé: PLoS Biol
Pays: United States
ID NLM: 101183755

Informations de publication

Date de publication:
08 2021
Historique:
received: 21 08 2020
accepted: 15 07 2021
entrez: 6 8 2021
pubmed: 7 8 2021
medline: 19 11 2021
Statut: epublish

Résumé

Phylogenomic analyses of hundreds of protein-coding genes aimed at resolving phylogenetic relationships is now a common practice. However, no software currently exists that includes tools for dataset construction and subsequent analysis with diverse validation strategies to assess robustness. Furthermore, there are no publicly available high-quality curated databases designed to assess deep (>100 million years) relationships in the tree of eukaryotes. To address these issues, we developed an easy-to-use software package, PhyloFisher (https://github.com/TheBrownLab/PhyloFisher), written in Python 3. PhyloFisher includes a manually curated database of 240 protein-coding genes from 304 eukaryotic taxa covering known eukaryotic diversity, a novel tool for ortholog selection, and utilities that will perform diverse analyses required by state-of-the-art phylogenomic investigations. Through phylogenetic reconstructions of the tree of eukaryotes and of the Saccharomycetaceae clade of budding yeasts, we demonstrate the utility of the PhyloFisher workflow and the provided starting database to address phylogenetic questions across a large range of evolutionary time points for diverse groups of organisms. We also demonstrate that undetected paralogy can remain in phylogenomic "single-copy orthogroup" datasets constructed using widely accepted methods such as all vs. all BLAST searches followed by Markov Cluster Algorithm (MCL) clustering and application of automated tree pruning algorithms. Finally, we show how the PhyloFisher workflow helps detect inadvertent paralog inclusions, allowing the user to make more informed decisions regarding orthology assignments, leading to a more accurate final dataset.

Identifiants

pubmed: 34358228
doi: 10.1371/journal.pbio.3001365
pii: PBIOLOGY-D-20-02379
pmc: PMC8345874
doi:

Types de publication

Evaluation Study Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

e3001365

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Bioinformatics. 2018 Nov 15;34(22):3929-3930
pubmed: 29868763
Mol Biol Evol. 2013 Apr;30(4):772-80
pubmed: 23329690
BMC Evol Biol. 2010 Jul 13;10:210
pubmed: 20626897
BMC Biol. 2021 Apr 16;19(1):77
pubmed: 33863338
Mol Biol Evol. 2007 Sep;24(9):2139-50
pubmed: 17652333
Nature. 2018 Dec;564(7736):410-414
pubmed: 30429611
Syst Biol. 2021 May 14;:
pubmed: 33988690
Biol Direct. 2016 Dec 28;11(1):69
pubmed: 28031045
Science. 2000 Nov 3;290(5493):972-7
pubmed: 11062127
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
Mol Biol Evol. 2019 Oct 1;36(10):2340-2351
pubmed: 31209473
PLoS One. 2013;8(3):e59565
pubmed: 23555709
Bioinformatics. 2012 Dec 1;28(23):3150-2
pubmed: 23060610
Syst Biol. 2018 Mar 01;67(2):216-235
pubmed: 28950365
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Bioinformatics. 2014 May 1;30(9):1312-3
pubmed: 24451623
Nucleic Acids Res. 2013 Jul;41(12):e121
pubmed: 23598997
Protoplasma. 2018 Sep;255(5):1517-1574
pubmed: 29666938
Mol Biol Evol. 2016 Jun;33(6):1635-8
pubmed: 26921390
Genome Biol Evol. 2018 Feb 1;10(2):427-433
pubmed: 29360967
Mol Biol Evol. 2019 Apr 1;36(4):757-765
pubmed: 30668767
Mol Biol Evol. 2019 Jun 1;36(6):1344-1356
pubmed: 30903171
Mol Biol Evol. 2020 May 1;37(5):1530-1534
pubmed: 32011700
Mol Biol Evol. 2018 May 1;35(5):1266-1283
pubmed: 29688541
Nature. 2019 Aug;572(7768):240-243
pubmed: 31316212
Syst Biol. 2003 Oct;52(5):594-603
pubmed: 14530128
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
Evol Bioinform Online. 2015 Apr 27;11:79-83
pubmed: 25987827
Bioinformatics. 2009 Aug 1;25(15):1972-3
pubmed: 19505945
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D363-8
pubmed: 16381887
Cell. 2018 Nov 29;175(6):1533-1545.e20
pubmed: 30415838
BMC Bioinformatics. 2018 May 8;19(Suppl 6):153
pubmed: 29745866
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500
Mol Biol Evol. 2014 Nov;31(11):3081-92
pubmed: 25158799
Gigascience. 2015 Oct 19;4:48
pubmed: 26500767
Trends Ecol Evol. 2020 Jan;35(1):43-55
pubmed: 31606140
Mol Biochem Parasitol. 1993 May;59(1):41-8
pubmed: 8515782

Auteurs

Alexander K Tice (AK)

Department of Biological Sciences, Mississippi State University, Mississippi State, Mississippi, United States of America.
Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, Mississippi State, Mississippi, United States of America.

David Žihala (D)

Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic.

Tomáš Pánek (T)

Department of Biological Sciences, Mississippi State University, Mississippi State, Mississippi, United States of America.
Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic.

Robert E Jones (RE)

Department of Biological Sciences, Mississippi State University, Mississippi State, Mississippi, United States of America.
Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, Mississippi State, Mississippi, United States of America.

Eric D Salomaki (ED)

Institute of Parasitology, Biology Centre Czech Academy of Sciences, České Budějovice, Czech Republic.

Serafim Nenarokov (S)

Institute of Parasitology, Biology Centre Czech Academy of Sciences, České Budějovice, Czech Republic.

Fabien Burki (F)

Department of Organismal Biology, Uppsala University, Uppsala, Sweden.
Science for Life Laboratory, Uppsala University, Uppsala, Sweden.

Marek Eliáš (M)

Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic.

Laura Eme (L)

Unité d'Ecologie, Systématique et Evolution, CNRS, Université Paris-Saclay, Paris, France.

Andrew J Roger (AJ)

Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Canada.

Antonis Rokas (A)

Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America.

Xing-Xing Shen (XX)

State Key Laboratory of Rice Biology and Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China.

Jürgen F H Strassert (JFH)

Department of Organismal Biology, Uppsala University, Uppsala, Sweden.
Leibniz Institute of Freshwater Ecology and Inland Fisheries, Ecosystem Research, Berlin, Germany.

Martin Kolísko (M)

Institute of Parasitology, Biology Centre Czech Academy of Sciences, České Budějovice, Czech Republic.
Faculty of Science, University of South Bohemia, České Budějovice, Czech Republic.

Matthew W Brown (MW)

Department of Biological Sciences, Mississippi State University, Mississippi State, Mississippi, United States of America.
Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, Mississippi State, Mississippi, United States of America.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Amaryllidaceae Alkaloids Lycoris NADPH-Ferrihemoprotein Reductase Gene Expression Regulation, Plant Plant Proteins

Classifications MeSH