Systematic prediction of functionally linked genes in bacterial and archaeal genomes.


Journal

Nature protocols
ISSN: 1750-2799
Titre abrégé: Nat Protoc
Pays: England
ID NLM: 101284307

Informations de publication

Date de publication:
10 2019
Historique:
received: 06 12 2018
accepted: 13 06 2019
pubmed: 15 9 2019
medline: 22 11 2019
entrez: 15 9 2019
Statut: ppublish

Résumé

Functionally linked genes in bacterial and archaeal genomes are often organized into operons. However, the composition and architecture of operons are highly variable and frequently differ even among closely related genomes. Therefore, to efficiently extract reliable functional predictions for uncharacterized genes from comparative analyses of the rapidly growing genomic databases, dedicated computational approaches are required. We developed a protocol to systematically and automatically identify genes that are likely to be functionally associated with a 'bait' gene or locus by using relevance metrics. Given a set of bait loci and a genomic database defined by the user, this protocol compares the genomic neighborhoods of the baits to identify genes that are likely to be functionally linked to the baits by calculating the abundance of a given gene within and outside the bait neighborhoods and the distance to the bait. We exemplify the performance of the protocol with three test cases, namely, genes linked to CRISPR-Cas systems using the 'CRISPRicity' metric, genes associated with archaeal proviruses and genes linked to Argonaute genes in halobacteria. The protocol can be run by users with basic computational skills. The computational cost depends on the sizes of the genomic dataset and the list of reference loci and can vary from one CPU-hour to hundreds of hours on a supercomputer.

Identifiants

pubmed: 31520072
doi: 10.1038/s41596-019-0211-1
pii: 10.1038/s41596-019-0211-1
pmc: PMC6938587
mid: NIHMS1063598
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

3013-3031

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM104071
Pays : United States

Références

Wolf, Y. I., Rogozin, I. B., Kondrashov, A. S. & Koonin, E. V. Genome alignment, evolution of prokaryotic genome organization and prediction of gene function using genomic context. Genome Res. 11, 356–372 (2001).
doi: 10.1101/gr.161901
Rogozin, I. B., Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Computational approaches for the analysis of gene neighbourhoods in prokaryotic genomes. Brief Bioinform. 5, 131–149 (2004).
doi: 10.1093/bib/5.2.131
Aravind, L. Guilt by association: contextual information in genome analysis. Genome Res. 10, 1074–1077 (2000).
doi: 10.1101/gr.10.8.1074
Galperin, M. Y. & Koonin, E. V. Who’s your neighbor? New computational approaches for functional genomics. Nat. Biotechnol. 18, 609–613 (2000).
doi: 10.1038/76443
Janga, S. C., Collado-Vides, J. & Moreno-Hagelsieb, G. Nebulon: a system for the inference of functional relationships of gene products from the rearrangement of predicted operons. Nucleic Acids Res. 33, 2521–2530 (2005).
doi: 10.1093/nar/gki545
Moreno-Hagelsieb, G. The power of operon rearrangements for predicting functional associations. Comput. Struct. Biotechnol. J. 13, 402–406 (2015).
doi: 10.1016/j.csbj.2015.06.002
Moreno-Hagelsieb, G. & Santoyo, G. Predicting functional interactions among genes in prokaryotes by genomic context. Adv. Exp. Med. Biol. 883, 97–106 (2015).
doi: 10.1007/978-3-319-23603-2_5
Price, M. N., Huang, K. H., Alm, E. J. & Arkin, A. P. A novel method for accurate operon predictions in all sequenced prokaryotes. Nucleic Acids Res. 33, 880–892 (2005).
doi: 10.1093/nar/gki232
de Crecy-Lagard, V. & Hanson, A. D. Finding novel metabolic genes through plant-prokaryote phylogenomics. Trends Microbiol. 15, 563–570 (2007).
doi: 10.1016/j.tim.2007.10.008
Zhao, S. et al. Discovery of new enzymes and metabolic pathways by using structure and genome context. Nature 502, 698–702 (2013).
doi: 10.1038/nature12576
Calhoun, S. et al. Prediction of enzymatic pathways by integrative pathway mapping. Elife 7, e31097 (2018).
doi: 10.7554/eLife.31097
Koonin, E. V., Wolf, Y. I. & Aravind, L. Prediction of the archaeal exosome and its connections with the proteasome and the translation and transcription machineries by a comparative-genomic approach. Genome Res. 11, 240–252 (2001).
doi: 10.1101/gr.162001
Evguenieva-Hackenberg, E., Hou, L., Glaeser, S. & Klug, G. Structure and function of the archaeal exosome. Wiley Interdiscip. Rev. RNA 5, 623–635 (2014).
doi: 10.1002/wrna.1234
Shmakov, S. et al. Discovery and functional characterization of diverse class 2 CRISPR–Cas systems. Mol. Cell 60, 385–397 (2015).
doi: 10.1016/j.molcel.2015.10.008
Shmakov, S. et al. Diversity and evolution of class 2 CRISPR–Cas systems. Nat. Rev. Microbiol. 15, 169–182 (2017).
doi: 10.1038/nrmicro.2016.184
Burstein, D. et al. Major bacterial lineages are essentially devoid of CRISPR–Cas viral defence systems. Nat. Commun. 7, 10613 (2016).
doi: 10.1038/ncomms10613
Yan, W. X. et al. Cas13d is a compact RNA-targeting type VI CRISPR effector positively modulated by a WYL-domain-containing accessory protein. Mol. Cell 70, 327–339.e5 (2018).
doi: 10.1016/j.molcel.2018.02.028
Makarova, K. S., Aravind, L., Grishin, N. V., Rogozin, I. B. & Koonin, E. V. A DNA repair system specific for thermophilic archaea and bacteria predicted by genomic context analysis. Nucleic Acids Res. 30, 482–496 (2002).
doi: 10.1093/nar/30.2.482
Shmakov, S. A., Makarova, K. S., Wolf, Y. I., Severinov, K. V. & Koonin, E. V. Systematic prediction of genes functionally linked to CRISPR–Cas systems by gene neighborhood analysis. Proc. Natl Acad. Sci. USA 115, E5307–E5316 (2018).
doi: 10.1073/pnas.1803440115
Pawluk, A. et al. Naturally occurring off-switches for CRISPR–Cas9. Cell 167, 1829–1838e1829 (2016).
doi: 10.1016/j.cell.2016.11.017
Pawluk, A., Davidson, A. R. & Maxwell, K. L. Anti-CRISPR: discovery, mechanism and function. Nat. Rev. Microbiol. 16, 12–17 (2018).
doi: 10.1038/nrmicro.2017.120
Lasken, R. S. & McLean, J. S. Recent advances in genomic DNA sequencing of microbial species from single cells. Nat. Rev. Genet. 15, 577–584 (2014).
doi: 10.1038/nrg3785
Stern, A. & Sorek, R. The phage-host arms race: shaping the evolution of microbes. Bioessays 33, 43–51 (2011).
doi: 10.1002/bies.201000071
Koonin, E. V., Makarova, K. S. & Wolf, Y. I. Evolutionary genomics of defense systems in archaea and bacteria. Annu. Rev. Microbiol. 71, 233–261 (2017).
doi: 10.1146/annurev-micro-090816-093830
Makarova, K. S., Wolf, Y. I., Snir, S. & Koonin, E. V. Defense islands in bacterial and archaeal genomes and prediction of novel defense systems. J. Bacteriol 193, 6039–6056 (2011).
doi: 10.1128/JB.05535-11
Doron, S. et al. Systematic discovery of antiphage defense systems in the microbial pangenome. Science 359, eaar4120 (2018).
doi: 10.1126/science.aar4120
Rogozin, I. B. et al. Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res. 30, 2212–2223 (2002).
doi: 10.1093/nar/30.10.2212
Zheng, Y., Szustakowski, J. D., Fortnow, L., Roberts, R. J. & Kasif, S. Computational identification of operons in microbial genomes. Genome Res. 12, 1221–1230 (2002).
doi: 10.1101/gr.200602
Yan, Y. & Moult, J. Detection of operons. Proteins 64, 615–628 (2006).
doi: 10.1002/prot.21021
Mitra, K., Carvunis, A. R., Ramesh, S. K. & Ideker, T. Integrative approaches for finding modular structure in biological networks. Nat. Rev. Genet. 14, 719–732 (2013).
doi: 10.1038/nrg3552
Burroughs, A. M., Zhang, D., Schaffer, D. E., Iyer, L. M. & Aravind, L. Comparative genomic analyses reveal a vast, novel network of nucleotide-centric systems in biological conflicts, immunity and signaling. Nucleic Acids Res. 43, 10633–10654 (2015).
doi: 10.1093/nar/gkv1267
Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Comparative genomics of defense systems in archaea and bacteria. Nucleic Acids Res. 41, 4360–4377 (2013).
doi: 10.1093/nar/gkt157
Galperin, M. Y. Bacterial signal transduction network in a genomic perspective. Environ. Microbiol. 6, 552–567 (2004).
doi: 10.1111/j.1462-2920.2004.00633.x
Mishra, V., Lal, R. & Srinivasan Enzymes and operons mediating xenobiotic degradation in bacteria. Crit. Rev. Microbiol. 27, 133–166 (2001).
doi: 10.1080/20014091096729
Besemer, J., Lomsadze, A. & Borodovsky, M. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 29, 2607–2618 (2001).
doi: 10.1093/nar/29.12.2607
Marchler-Bauer, A. et al. Troubleshooting advice can be: NCBI’s conserved domain database. Nucleic Acids Res. 43, D222–226 (2015).
doi: 10.1093/nar/gku1221
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–285 (2016).
doi: 10.1093/nar/gkv1344
Steinegger, M. & Soding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
doi: 10.1038/nbt.3988
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
doi: 10.1093/nar/25.17.3389
Soding, J. Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951–960 (2005).
doi: 10.1093/bioinformatics/bti125
Makarova, K. S. et al. An updated evolutionary classification of CRISPR–Cas systems. Nat. Rev. Microbiol. 13, 722–736 (2015).
doi: 10.1038/nrmicro3569
Bath, C., Cukalac, T., Porter, K. & Dyall-Smith, M. L. His1 and His2 are distantly related, spindle-shaped haloviruses belonging to the novel virus group, Salterprovirus. Virology 350, 228–239 (2006).
doi: 10.1016/j.virol.2006.02.005
Swarts, D. C. et al. The evolutionary journey of argonaute proteins. Nat. Struct. Mol. Biol. 21, 743–753 (2014).
doi: 10.1038/nsmb.2879
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
doi: 10.1093/nar/gkh340
Sasaki, Y. The truth of the F-measure. Teach Tutor Mater. 1, 1–5 (2007).

Auteurs

Sergey A Shmakov (SA)

National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA.
Skolkovo Institute of Science and Technology, Skolkovo, Russia.

Guilhem Faure (G)

National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA.
Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Kira S Makarova (KS)

National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA.

Yuri I Wolf (YI)

National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA.

Konstantin V Severinov (KV)

Skolkovo Institute of Science and Technology, Skolkovo, Russia.
Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.
Institute of Molecular Genetics, Russian Academy of Sciences, Moscow, Russia.

Eugene V Koonin (EV)

National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA. koonin@ncbi.nlm.nih.gov.

Articles similaires

Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic
Prader-Willi Syndrome Humans Angelman Syndrome CRISPR-Cas Systems Human Embryonic Stem Cells

Classifications MeSH