OrthoPhyl - Streamlining large scale, orthology-based phylogenomic studies of bacteria at broad evolutionary scales.
Assembly
Brucella
Maximum Likelihood
Phylogenetics
Rickettsiales
Journal
G3 (Bethesda, Md.)
ISSN: 2160-1836
Titre abrégé: G3 (Bethesda)
Pays: England
ID NLM: 101566598
Informations de publication
Date de publication:
05 Jun 2024
05 Jun 2024
Historique:
received:
03
04
2024
revised:
15
05
2024
accepted:
29
05
2024
medline:
6
6
2024
pubmed:
6
6
2024
entrez:
5
6
2024
Statut:
aheadofprint
Résumé
There are a staggering number of publicly available bacterial genome sequences (at writing, 2.0 million assemblies in NCBI's GenBank alone), and the deposition rate continues to increase. This wealth of data begs for phylogenetic analyses to place these sequences within an evolutionary context. A phylogenetic placement not only aids in taxonomic classification, but informs the evolution of novel phenotypes, targets of selection, and horizontal gene transfer. Building trees from multi-gene codon alignments is a laborious task that requires bioinformatic expertise, rigorous curation of orthologs, and heavy computation. Compounding the problem is the lack of tools that can streamline these processes for building trees from large scale genomic data. Here we present OrthoPhyl, which takes bacterial genome assemblies and reconstructs trees from whole genome codon alignments. The analysis pipeline can analyze an arbitrarily large number of input genomes (>1200 tested here) by identifying a diversity spanning subset of assemblies and using these genomes to build gene models to infer orthologs in the full dataset. To illustrate the versatility of OrthoPhyl, we show three use-cases: E. coli/Shigella, Brucella/Ochrobactrum, and the order Rickettsiales. We compare trees generated with OrthoPhyl to trees generated with kSNP3 and GToTree along with published trees using alternative methods. We show that OrthoPhyl trees are consistent with other methods while incorporating more data, allowing for greater numbers of input genomes, and more flexibility of analysis.
Identifiants
pubmed: 38839049
pii: 7688438
doi: 10.1093/g3journal/jkae119
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© The Author(s) 2024. Published by Oxford University Press on behalf of The Genetics Society of America.