A revisit to universal single-copy genes in bacterial genomes.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
25 08 2022
25 08 2022
Historique:
received:
28
06
2022
accepted:
18
08
2022
entrez:
25
8
2022
pubmed:
26
8
2022
medline:
30
8
2022
Statut:
epublish
Résumé
Universal single-copy genes (USCGs) are widely used for species classification and taxonomic profiling. Despite many studies on USCGs, our understanding of USCGs in bacterial genomes might be out of date, especially how different the USCGs are in different studies, how well a set of USCGs can distinguish two bacterial species, whether USCGs can separate different strains of a bacterial species, to name a few. To fill the void, we studied USCGs in the most updated complete bacterial genomes. We showed that different USCG sets are quite different while coming from highly similar functional categories. We also found that although USCGs occur once in almost all bacterial genomes, each USCG does occur multiple times in certain genomes. We demonstrated that USCGs are reliable markers to distinguish different species while they cannot distinguish different strains of most bacterial species. Our study sheds new light on the usage and limitations of USCGs, which will facilitate their applications in evolutionary, phylogenomic, and metagenomic studies.
Identifiants
pubmed: 36008577
doi: 10.1038/s41598-022-18762-z
pii: 10.1038/s41598-022-18762-z
pmc: PMC9411617
doi:
Types de publication
Journal Article
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
14550Informations de copyright
© 2022. The Author(s).
Références
Ciccarelli, F. D. et al. Toward automatic reconstruction of a highly resolved tree of life. Science 311(5765), 1283–1287 (2006).
pubmed: 16513982
doi: 10.1126/science.1123061
Lan, Y., Rosen, G. & Hershberg, R. Marker genes that are less conserved in their sequences are useful for predicting genome-wide similarity levels between closely related prokaryotic strains. Microbiome 4(1), 18 (2016).
pubmed: 27138046
pmcid: 4853863
doi: 10.1186/s40168-016-0162-5
Wu, D., Jospin, G. & Eisen, J. A. Systematic identification of gene families for use as “markers” for phylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups. PLoS ONE 8(10), e77033 (2013).
pubmed: 24146954
pmcid: 3798382
doi: 10.1371/journal.pone.0077033
Wu, M. & Eisen, J. A. A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 9(10), R151 (2008).
pubmed: 18851752
pmcid: 2760878
doi: 10.1186/gb-2008-9-10-r151
Sunagawa, S. et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10(12), 1196–1199 (2013).
pubmed: 24141494
doi: 10.1038/nmeth.2693
Milanese, A. et al. Microbial abundance, activity and population genomic profiling with mOTUs2. Nat. Commun. 10(1), 1014 (2019).
pubmed: 30833550
pmcid: 6399450
doi: 10.1038/s41467-019-08844-4
Quince, C. et al. DESMAN: a new tool for de novo extraction of strains from metagenomes. Genome Biol 18(1), 181 (2017).
pubmed: 28934976
pmcid: 5607848
doi: 10.1186/s13059-017-1309-9
Ventolero, M.F., et al., Computational analyses of bacterial strains from shotgun reads. Brief Bioinform., 2022. 23(2).
Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2(11), 1533–1542 (2017).
pubmed: 28894102
doi: 10.1038/s41564-017-0012-7
Vetrovsky, T. & Baldrian, P. The variability of the 16S rRNA gene in bacterial genomes and its consequences for bacterial community analyses. PLoS ONE 8(2), e57923 (2013).
pubmed: 23460914
pmcid: 3583900
doi: 10.1371/journal.pone.0057923
Wang, Y., Hu, H. & Li, X. MBMC: an effective markov chain approach for binning metagenomic reads from environmental shotgun sequencing projects. OMICS 20(8), 470–479 (2016).
pubmed: 27447888
pmcid: 4982950
doi: 10.1089/omi.2016.0081
Eisen, J. A. Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes. PLoS Biol 5(3), e82 (2007).
pubmed: 17355177
pmcid: 1821061
doi: 10.1371/journal.pbio.0050082
Brooks, J. P. et al. The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies. BMC Microbiol. 15, 66 (2015).
pubmed: 25880246
pmcid: 4433096
doi: 10.1186/s12866-015-0351-6
Wang, Y. & Qian, P. Y. Conservative fragments in bacterial 16S rRNA genes and primer design for 16S ribosomal DNA amplicons in metagenomic studies. PLoS ONE 4(10), e7401 (2009).
pubmed: 19816594
pmcid: 2754607
doi: 10.1371/journal.pone.0007401
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11(11), 1144–1146 (2014).
pubmed: 25218180
doi: 10.1038/nmeth.3103
Creevey, C. J. et al. Universally distributed single-copy genes indicate a constant rate of horizontal transfer. PLoS ONE 6(8), e22099 (2011).
pubmed: 21850220
pmcid: 3151239
doi: 10.1371/journal.pone.0022099
Haroon, M. F. et al. Anaerobic oxidation of methane coupled to nitrate reduction in a novel archaeal lineage. Nature 500(7464), 567–570 (2013).
pubmed: 23892779
doi: 10.1038/nature12375
Wrighton, K. C. et al. Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla. Science 337(6102), 1661–1665 (2012).
pubmed: 23019650
doi: 10.1126/science.1224041
Wu, M. & Scott, A. J. Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 28(7), 1033–1034 (2012).
pubmed: 22332237
doi: 10.1093/bioinformatics/bts079
Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nature Methods 9(8), 811 (2012).
pubmed: 22688413
pmcid: 3443552
doi: 10.1038/nmeth.2066
Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling (vol 12, pg 902, 2015). Nat. Methods 13(1), 101–101 (2016).
doi: 10.1038/nmeth0116-101b
Truong, D. T. et al. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 27(4), 626–638 (2017).
pubmed: 28167665
pmcid: 5378180
doi: 10.1101/gr.216242.116
Quince, C. et al. STRONG: metagenomics strain resolution on assembly graphs. Genome Biol 22(1), 214 (2021).
pubmed: 34311761
pmcid: 8311964
doi: 10.1186/s13059-021-02419-7
Nayfach, S. et al. An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography. Genome Res. 26(11), 1612–1625 (2016).
pubmed: 27803195
pmcid: 5088602
doi: 10.1101/gr.201863.115
Forbes, N. S. Engineering the perfect (bacterial) cancer therapy. Nat. Rev. Cancer 10(11), 785–794 (2010).
pubmed: 20944664
pmcid: 3756932
doi: 10.1038/nrc2934
Hartstra, A. V. et al. Insights into the role of the microbiome in obesity and type 2 diabetes. Diabetes Care 38(1), 159–165 (2015).
pubmed: 25538312
doi: 10.2337/dc14-0769
Jiang, C. et al. The gut microbiota and Alzheimer’s disease. J. Alzheimers Dis. 58(1), 1–15 (2017).
pubmed: 28372330
doi: 10.3233/JAD-161141
Ott, S. J. et al. Detection of diverse bacterial signatures in atherosclerotic lesions of patients with coronary heart disease. Circulation 113(7), 929–937 (2006).
pubmed: 16490835
doi: 10.1161/CIRCULATIONAHA.105.579979
Wang, Y. et al. Prognostic cancer gene signatures share common regulatory motifs. Sci. Rep. 7(1), 4750 (2017).
pubmed: 28684851
pmcid: 5500535
doi: 10.1038/s41598-017-05035-3
Zaky, A., et al., The role of the gut microbiome in diabetes and obesity-related kidney disease. Int. J. Mol. Sci, 2021. 22(17).
Ding, J., et al., ChIPModule: systematic discovery of transcription factors and their cofactors from ChIP-seq data. In Pac Symp Biocomput, 2013: p. 320–31.
Harris, M. A. et al. The gene ontology (GO) database and informatics resource. Nucleic Acids Res 32, D258-61 (2004).
pubmed: 14681407
doi: 10.1093/nar/gkh036
Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003).
pubmed: 12969510
pmcid: 222959
doi: 10.1186/1471-2105-4-41
Young, M. D. et al. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol 11(2), R14 (2010).
pubmed: 20132535
pmcid: 2872874
doi: 10.1186/gb-2010-11-2-r14
Zhao, C., Li, X. & Hu, H. PETModule: a motif module based approach for enhancer target gene prediction. Sci Rep 6, 30043 (2016).
pubmed: 27436110
pmcid: 4951774
doi: 10.1038/srep30043
Li, X., H. Hu, and X. Li, mixtureS: a novel tool for bacterial strain reconstruction from reads. Bioinformatics, 2020.
Li, X. et al. BHap: a novel approach for bacterial haplotype reconstruction. Bioinformatics 35(22), 4624–4631 (2019).
pubmed: 31004480
pmcid: 6931272
doi: 10.1093/bioinformatics/btz280
Pulido-Tamayo, S. et al. Frequency-based haplotype reconstruction from deep sequencing data of bacterial populations. Nucleic Acids Res. 43(16), e105 (2015).
pubmed: 25990729
pmcid: 4652744
doi: 10.1093/nar/gkv478
Smillie, C. S. et al. Strain tracking reveals the determinants of bacterial engraftment in the human gut following fecal microbiota transplantation. Cell Host Microbe 23(2), 229 (2018).
pubmed: 29447696
pmcid: 8318347
doi: 10.1016/j.chom.2018.01.003
Cleary, B. et al. Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nat. Biotechnol. 33(10), 1053–1060 (2015).
pubmed: 26368049
pmcid: 4720164
doi: 10.1038/nbt.3329
Chng, K. R. et al. Whole metagenome profiling reveals skin microbiome-dependent susceptibility to atopic dermatitis flare. Nat. Microbiol. 1(9), 16106 (2016).
pubmed: 27562258
doi: 10.1038/nmicrobiol.2016.106
Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499(7459), 431–437 (2013).
pubmed: 23851394
doi: 10.1038/nature12352
Chen, I. A. et al. IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res. 47(D1), D666–D677 (2019).
pubmed: 30289528
doi: 10.1093/nar/gky901
Federhen, S., The NCBI Taxonomy database. Nucleic Acids Res, 2012. 40(Database issue): p. D136-43.
Langr, J. and V. Bok, GANs in action : deep learning with generative adversarial networks. 2019, Shelter Island, New York,: Manning Publications. xxiii, 214 pages.
Li, X. et al. Integrative analyses shed new light on human ribosomal protein gene regulation. Sci. Rep. 6, 28619 (2016).
pubmed: 27346035
pmcid: 4921865
doi: 10.1038/srep28619
Shi, J. Q., Choi, T. & Gaussian process regression analysis for functional data.,. Boca Raton 196 (CRC Press. xix, 2011).
Talukder, A. et al. EPIP: a novel approach for condition-specific enhancer-promoter interaction prediction. Bioinformatics 35(20), 3877–3883 (2019).
pubmed: 31410461
pmcid: 7963088
doi: 10.1093/bioinformatics/btz641
Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278(5338), 631–637 (1997).
pubmed: 9381173
doi: 10.1126/science.278.5338.631
Katoh, K. et al. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30(14), 3059–3066 (2002).
pubmed: 12136088
pmcid: 135756
doi: 10.1093/nar/gkf436