VBCG: 20 validated bacterial core genes for phylogenomic analysis with high fidelity and resolution.


Journal

Microbiome
ISSN: 2049-2618
Titre abrégé: Microbiome
Pays: England
ID NLM: 101615147

Informations de publication

Date de publication:
08 11 2023
Historique:
received: 13 06 2023
accepted: 18 10 2023
medline: 9 11 2023
pubmed: 8 11 2023
entrez: 8 11 2023
Statut: epublish

Résumé

Phylogenomic analysis has become an inseparable part of studies of bacterial diversity and evolution, and many different bacterial core genes have been collated and used for phylogenomic tree reconstruction. However, these genes have been selected based on their presence and single-copy ratio in all bacterial genomes, leaving out the gene's 'phylogenetic fidelity' unexamined. From 30,522 complete genomes covering 11,262 species, we examined 148 bacterial core genes that have been previously used for phylogenomic analysis. In addition to the gene presence and single-copy rations, we evaluated the gene's phylogenetic fidelity by comparing each gene's phylogeny with its corresponding 16S rRNA gene tree. Out of the 148 bacterial genes, 20 validated bacterial core genes (VBCG) were selected as the core gene set with the highest bacterial phylogenetic fidelity. Compared to the larger gene set, the 20-gene core set resulted in more species having all genes present and fewer species with missing data, thereby enhancing the accuracy of phylogenomic analysis. Using Escherichia coli strains as examples of prominent bacterial foodborne pathogens, we demonstrated that the 20 VBCG produced phylogenies with higher fidelity and resolution at species and strain levels while 16S rRNA gene tree alone could not. The 20 validated core gene set improves the fidelity and speed of phylogenomic analysis. Among other uses, this tool improves our ability to explore the evolution, typing and tracking of bacterial strains, such as human pathogens. We have developed a Python pipeline and a desktop graphic app (available on GitHub) for users to perform phylogenomic analysis with high fidelity and resolution. Video Abstract.

Sections du résumé

BACKGROUND
Phylogenomic analysis has become an inseparable part of studies of bacterial diversity and evolution, and many different bacterial core genes have been collated and used for phylogenomic tree reconstruction. However, these genes have been selected based on their presence and single-copy ratio in all bacterial genomes, leaving out the gene's 'phylogenetic fidelity' unexamined.
RESULTS
From 30,522 complete genomes covering 11,262 species, we examined 148 bacterial core genes that have been previously used for phylogenomic analysis. In addition to the gene presence and single-copy rations, we evaluated the gene's phylogenetic fidelity by comparing each gene's phylogeny with its corresponding 16S rRNA gene tree. Out of the 148 bacterial genes, 20 validated bacterial core genes (VBCG) were selected as the core gene set with the highest bacterial phylogenetic fidelity. Compared to the larger gene set, the 20-gene core set resulted in more species having all genes present and fewer species with missing data, thereby enhancing the accuracy of phylogenomic analysis. Using Escherichia coli strains as examples of prominent bacterial foodborne pathogens, we demonstrated that the 20 VBCG produced phylogenies with higher fidelity and resolution at species and strain levels while 16S rRNA gene tree alone could not.
CONCLUSION
The 20 validated core gene set improves the fidelity and speed of phylogenomic analysis. Among other uses, this tool improves our ability to explore the evolution, typing and tracking of bacterial strains, such as human pathogens. We have developed a Python pipeline and a desktop graphic app (available on GitHub) for users to perform phylogenomic analysis with high fidelity and resolution. Video Abstract.

Identifiants

pubmed: 37936197
doi: 10.1186/s40168-023-01705-9
pii: 10.1186/s40168-023-01705-9
pmc: PMC10631056
doi:

Substances chimiques

RNA, Ribosomal, 16S 0

Types de publication

Video-Audio Media Journal Article Research Support, U.S. Gov't, P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

247

Subventions

Organisme : FDA HHS
ID : U19 FD005322
Pays : United States

Informations de copyright

© 2023. The Author(s).

Références

Bioinformatics. 2010 Jun 15;26(12):1569-71
pubmed: 20421198
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37
pubmed: 21593126
Bioinformatics. 2012 Apr 1;28(7):1033-4
pubmed: 22332237
Bioinformatics. 1998;14(9):755-63
pubmed: 9918945
BMC Bioinformatics. 2004 Aug 19;5:113
pubmed: 15318951
Nature. 2015 Jul 9;523(7559):208-11
pubmed: 26083755
J Microbiol. 2021 Jun;59(6):609-615
pubmed: 34052993
Front Cell Infect Microbiol. 2012 Sep 06;2:116
pubmed: 22973561
J Biomed Inform. 2006 Feb;39(1):34-42
pubmed: 15922672
Bioinformatics. 2012 Dec 1;28(23):3150-2
pubmed: 23060610
Mol Biol Evol. 2000 Apr;17(4):540-52
pubmed: 10742046
Nat Microbiol. 2017 Nov;2(11):1533-1542
pubmed: 28894102
ISME J. 2012 Jun;6(6):1186-99
pubmed: 22170421
mSystems. 2018 Dec 4;3(6):
pubmed: 30534598
Genome Biol. 2008 Oct 13;9(10):R151
pubmed: 18851752
Sci Rep. 2020 Feb 3;10(1):1723
pubmed: 32015354
Mol Biol Evol. 2016 Mar;33(3):838-60
pubmed: 26589995
Trends Genet. 2006 Apr;22(4):225-31
pubmed: 16490279
J Microbiol. 2018 Apr;56(4):280-285
pubmed: 29492869
PLoS One. 2013 Oct 17;8(10):e77033
pubmed: 24146954
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
Nat Microbiol. 2021 Mar;6(3):354-365
pubmed: 33495623
PLoS One. 2011;6(8):e22099
pubmed: 21850220
Mol Biol Evol. 2009 Jul;26(7):1641-50
pubmed: 19377059
Genome. 2016 Oct;59(10):783-791
pubmed: 27603265

Auteurs

Renmao Tian (R)

Institute for Food Safety and Health, Illinois Institute of Technology, Bedford Park, IL, 60501, USA.

Behzad Imanian (B)

Institute for Food Safety and Health, Illinois Institute of Technology, Bedford Park, IL, 60501, USA. bimanian@iit.edu.
Food Science and Nutrition Department, Illinois Institute of Technology, Chicago, IL, 60616, USA. bimanian@iit.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH