OrthoPhyl - Streamlining large scale, orthology-based phylogenomic studies of bacteria at broad evolutionary scales.

Assembly Brucella Maximum Likelihood Phylogenetics Rickettsiales

Journal

G3 (Bethesda, Md.)
ISSN: 2160-1836
Titre abrégé: G3 (Bethesda)
Pays: England
ID NLM: 101566598

Informations de publication

Date de publication:
05 Jun 2024
Historique:
received: 03 04 2024
revised: 15 05 2024
accepted: 29 05 2024
medline: 6 6 2024
pubmed: 6 6 2024
entrez: 5 6 2024
Statut: aheadofprint

Résumé

There are a staggering number of publicly available bacterial genome sequences (at writing, 2.0 million assemblies in NCBI's GenBank alone), and the deposition rate continues to increase. This wealth of data begs for phylogenetic analyses to place these sequences within an evolutionary context. A phylogenetic placement not only aids in taxonomic classification, but informs the evolution of novel phenotypes, targets of selection, and horizontal gene transfer. Building trees from multi-gene codon alignments is a laborious task that requires bioinformatic expertise, rigorous curation of orthologs, and heavy computation. Compounding the problem is the lack of tools that can streamline these processes for building trees from large scale genomic data. Here we present OrthoPhyl, which takes bacterial genome assemblies and reconstructs trees from whole genome codon alignments. The analysis pipeline can analyze an arbitrarily large number of input genomes (>1200 tested here) by identifying a diversity spanning subset of assemblies and using these genomes to build gene models to infer orthologs in the full dataset. To illustrate the versatility of OrthoPhyl, we show three use-cases: E. coli/Shigella, Brucella/Ochrobactrum, and the order Rickettsiales. We compare trees generated with OrthoPhyl to trees generated with kSNP3 and GToTree along with published trees using alternative methods. We show that OrthoPhyl trees are consistent with other methods while incorporating more data, allowing for greater numbers of input genomes, and more flexibility of analysis.

Identifiants

pubmed: 38839049
pii: 7688438
doi: 10.1093/g3journal/jkae119
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2024. Published by Oxford University Press on behalf of The Genetics Society of America.

Auteurs

Earl A Middlebrook (EA)

Genomics and Bioanalytics Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.

Robab Katani (R)

Huck Institutes of Life Sciences, Pennsylvania State University, University Park, PA 16802, USA.

Jeanne M Fair (JM)

Genomics and Bioanalytics Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.

Classifications MeSH