A revisit to universal single-copy genes in bacterial genomes.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
25 08 2022
Historique:
received: 28 06 2022
accepted: 18 08 2022
entrez: 25 8 2022
pubmed: 26 8 2022
medline: 30 8 2022
Statut: epublish

Résumé

Universal single-copy genes (USCGs) are widely used for species classification and taxonomic profiling. Despite many studies on USCGs, our understanding of USCGs in bacterial genomes might be out of date, especially how different the USCGs are in different studies, how well a set of USCGs can distinguish two bacterial species, whether USCGs can separate different strains of a bacterial species, to name a few. To fill the void, we studied USCGs in the most updated complete bacterial genomes. We showed that different USCG sets are quite different while coming from highly similar functional categories. We also found that although USCGs occur once in almost all bacterial genomes, each USCG does occur multiple times in certain genomes. We demonstrated that USCGs are reliable markers to distinguish different species while they cannot distinguish different strains of most bacterial species. Our study sheds new light on the usage and limitations of USCGs, which will facilitate their applications in evolutionary, phylogenomic, and metagenomic studies.

Identifiants

pubmed: 36008577
doi: 10.1038/s41598-022-18762-z
pii: 10.1038/s41598-022-18762-z
pmc: PMC9411617
doi:

Types de publication

Journal Article Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

14550

Informations de copyright

© 2022. The Author(s).

Références

Ciccarelli, F. D. et al. Toward automatic reconstruction of a highly resolved tree of life. Science 311(5765), 1283–1287 (2006).
pubmed: 16513982 doi: 10.1126/science.1123061
Lan, Y., Rosen, G. & Hershberg, R. Marker genes that are less conserved in their sequences are useful for predicting genome-wide similarity levels between closely related prokaryotic strains. Microbiome 4(1), 18 (2016).
pubmed: 27138046 pmcid: 4853863 doi: 10.1186/s40168-016-0162-5
Wu, D., Jospin, G. & Eisen, J. A. Systematic identification of gene families for use as “markers” for phylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups. PLoS ONE 8(10), e77033 (2013).
pubmed: 24146954 pmcid: 3798382 doi: 10.1371/journal.pone.0077033
Wu, M. & Eisen, J. A. A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 9(10), R151 (2008).
pubmed: 18851752 pmcid: 2760878 doi: 10.1186/gb-2008-9-10-r151
Sunagawa, S. et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10(12), 1196–1199 (2013).
pubmed: 24141494 doi: 10.1038/nmeth.2693
Milanese, A. et al. Microbial abundance, activity and population genomic profiling with mOTUs2. Nat. Commun. 10(1), 1014 (2019).
pubmed: 30833550 pmcid: 6399450 doi: 10.1038/s41467-019-08844-4
Quince, C. et al. DESMAN: a new tool for de novo extraction of strains from metagenomes. Genome Biol 18(1), 181 (2017).
pubmed: 28934976 pmcid: 5607848 doi: 10.1186/s13059-017-1309-9
Ventolero, M.F., et al., Computational analyses of bacterial strains from shotgun reads. Brief Bioinform., 2022. 23(2).
Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2(11), 1533–1542 (2017).
pubmed: 28894102 doi: 10.1038/s41564-017-0012-7
Vetrovsky, T. & Baldrian, P. The variability of the 16S rRNA gene in bacterial genomes and its consequences for bacterial community analyses. PLoS ONE 8(2), e57923 (2013).
pubmed: 23460914 pmcid: 3583900 doi: 10.1371/journal.pone.0057923
Wang, Y., Hu, H. & Li, X. MBMC: an effective markov chain approach for binning metagenomic reads from environmental shotgun sequencing projects. OMICS 20(8), 470–479 (2016).
pubmed: 27447888 pmcid: 4982950 doi: 10.1089/omi.2016.0081
Eisen, J. A. Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes. PLoS Biol 5(3), e82 (2007).
pubmed: 17355177 pmcid: 1821061 doi: 10.1371/journal.pbio.0050082
Brooks, J. P. et al. The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies. BMC Microbiol. 15, 66 (2015).
pubmed: 25880246 pmcid: 4433096 doi: 10.1186/s12866-015-0351-6
Wang, Y. & Qian, P. Y. Conservative fragments in bacterial 16S rRNA genes and primer design for 16S ribosomal DNA amplicons in metagenomic studies. PLoS ONE 4(10), e7401 (2009).
pubmed: 19816594 pmcid: 2754607 doi: 10.1371/journal.pone.0007401
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11(11), 1144–1146 (2014).
pubmed: 25218180 doi: 10.1038/nmeth.3103
Creevey, C. J. et al. Universally distributed single-copy genes indicate a constant rate of horizontal transfer. PLoS ONE 6(8), e22099 (2011).
pubmed: 21850220 pmcid: 3151239 doi: 10.1371/journal.pone.0022099
Haroon, M. F. et al. Anaerobic oxidation of methane coupled to nitrate reduction in a novel archaeal lineage. Nature 500(7464), 567–570 (2013).
pubmed: 23892779 doi: 10.1038/nature12375
Wrighton, K. C. et al. Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla. Science 337(6102), 1661–1665 (2012).
pubmed: 23019650 doi: 10.1126/science.1224041
Wu, M. & Scott, A. J. Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 28(7), 1033–1034 (2012).
pubmed: 22332237 doi: 10.1093/bioinformatics/bts079
Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nature Methods 9(8), 811 (2012).
pubmed: 22688413 pmcid: 3443552 doi: 10.1038/nmeth.2066
Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling (vol 12, pg 902, 2015). Nat. Methods 13(1), 101–101 (2016).
doi: 10.1038/nmeth0116-101b
Truong, D. T. et al. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 27(4), 626–638 (2017).
pubmed: 28167665 pmcid: 5378180 doi: 10.1101/gr.216242.116
Quince, C. et al. STRONG: metagenomics strain resolution on assembly graphs. Genome Biol 22(1), 214 (2021).
pubmed: 34311761 pmcid: 8311964 doi: 10.1186/s13059-021-02419-7
Nayfach, S. et al. An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography. Genome Res. 26(11), 1612–1625 (2016).
pubmed: 27803195 pmcid: 5088602 doi: 10.1101/gr.201863.115
Forbes, N. S. Engineering the perfect (bacterial) cancer therapy. Nat. Rev. Cancer 10(11), 785–794 (2010).
pubmed: 20944664 pmcid: 3756932 doi: 10.1038/nrc2934
Hartstra, A. V. et al. Insights into the role of the microbiome in obesity and type 2 diabetes. Diabetes Care 38(1), 159–165 (2015).
pubmed: 25538312 doi: 10.2337/dc14-0769
Jiang, C. et al. The gut microbiota and Alzheimer’s disease. J. Alzheimers Dis. 58(1), 1–15 (2017).
pubmed: 28372330 doi: 10.3233/JAD-161141
Ott, S. J. et al. Detection of diverse bacterial signatures in atherosclerotic lesions of patients with coronary heart disease. Circulation 113(7), 929–937 (2006).
pubmed: 16490835 doi: 10.1161/CIRCULATIONAHA.105.579979
Wang, Y. et al. Prognostic cancer gene signatures share common regulatory motifs. Sci. Rep. 7(1), 4750 (2017).
pubmed: 28684851 pmcid: 5500535 doi: 10.1038/s41598-017-05035-3
Zaky, A., et al., The role of the gut microbiome in diabetes and obesity-related kidney disease. Int. J. Mol. Sci, 2021. 22(17).
Ding, J., et al., ChIPModule: systematic discovery of transcription factors and their cofactors from ChIP-seq data. In Pac Symp Biocomput, 2013: p. 320–31.
Harris, M. A. et al. The gene ontology (GO) database and informatics resource. Nucleic Acids Res 32, D258-61 (2004).
pubmed: 14681407 doi: 10.1093/nar/gkh036
Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003).
pubmed: 12969510 pmcid: 222959 doi: 10.1186/1471-2105-4-41
Young, M. D. et al. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol 11(2), R14 (2010).
pubmed: 20132535 pmcid: 2872874 doi: 10.1186/gb-2010-11-2-r14
Zhao, C., Li, X. & Hu, H. PETModule: a motif module based approach for enhancer target gene prediction. Sci Rep 6, 30043 (2016).
pubmed: 27436110 pmcid: 4951774 doi: 10.1038/srep30043
Li, X., H. Hu, and X. Li, mixtureS: a novel tool for bacterial strain reconstruction from reads. Bioinformatics, 2020.
Li, X. et al. BHap: a novel approach for bacterial haplotype reconstruction. Bioinformatics 35(22), 4624–4631 (2019).
pubmed: 31004480 pmcid: 6931272 doi: 10.1093/bioinformatics/btz280
Pulido-Tamayo, S. et al. Frequency-based haplotype reconstruction from deep sequencing data of bacterial populations. Nucleic Acids Res. 43(16), e105 (2015).
pubmed: 25990729 pmcid: 4652744 doi: 10.1093/nar/gkv478
Smillie, C. S. et al. Strain tracking reveals the determinants of bacterial engraftment in the human gut following fecal microbiota transplantation. Cell Host Microbe 23(2), 229 (2018).
pubmed: 29447696 pmcid: 8318347 doi: 10.1016/j.chom.2018.01.003
Cleary, B. et al. Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nat. Biotechnol. 33(10), 1053–1060 (2015).
pubmed: 26368049 pmcid: 4720164 doi: 10.1038/nbt.3329
Chng, K. R. et al. Whole metagenome profiling reveals skin microbiome-dependent susceptibility to atopic dermatitis flare. Nat. Microbiol. 1(9), 16106 (2016).
pubmed: 27562258 doi: 10.1038/nmicrobiol.2016.106
Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499(7459), 431–437 (2013).
pubmed: 23851394 doi: 10.1038/nature12352
Chen, I. A. et al. IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res. 47(D1), D666–D677 (2019).
pubmed: 30289528 doi: 10.1093/nar/gky901
Federhen, S., The NCBI Taxonomy database. Nucleic Acids Res, 2012. 40(Database issue): p. D136-43.
Langr, J. and V. Bok, GANs in action : deep learning with generative adversarial networks. 2019, Shelter Island, New York,: Manning Publications. xxiii, 214 pages.
Li, X. et al. Integrative analyses shed new light on human ribosomal protein gene regulation. Sci. Rep. 6, 28619 (2016).
pubmed: 27346035 pmcid: 4921865 doi: 10.1038/srep28619
Shi, J. Q., Choi, T. & Gaussian process regression analysis for functional data.,. Boca Raton 196 (CRC Press. xix, 2011).
Talukder, A. et al. EPIP: a novel approach for condition-specific enhancer-promoter interaction prediction. Bioinformatics 35(20), 3877–3883 (2019).
pubmed: 31410461 pmcid: 7963088 doi: 10.1093/bioinformatics/btz641
Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278(5338), 631–637 (1997).
pubmed: 9381173 doi: 10.1126/science.278.5338.631
Katoh, K. et al. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30(14), 3059–3066 (2002).
pubmed: 12136088 pmcid: 135756 doi: 10.1093/nar/gkf436

Auteurs

Saidi Wang (S)

Department of Computer Science, University of Central Florida, Orlando, FL, USA.

Minerva Ventolero (M)

Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL, USA.

Haiyan Hu (H)

Department of Computer Science, University of Central Florida, Orlando, FL, USA. haihu@cs.ucf.edu.
Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL, USA. haihu@cs.ucf.edu.

Xiaoman Li (X)

Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL, USA. xiaoman@mail.ucf.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Populus Soil Microbiology Soil Microbiota Fungi
Amaryllidaceae Alkaloids Lycoris NADPH-Ferrihemoprotein Reductase Gene Expression Regulation, Plant Plant Proteins

Classifications MeSH