A unified catalog of 204,938 reference genomes from the human gut microbiome.


Journal

Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648

Informations de publication

Date de publication:
01 2021
Historique:
received: 18 09 2019
accepted: 31 05 2020
pubmed: 22 7 2020
medline: 12 2 2021
entrez: 22 7 2020
Statut: ppublish

Résumé

Comprehensive, high-quality reference genomes are required for functional characterization and taxonomic assignment of the human gut microbiota. We present the Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes. These genomes encode >170 million protein sequences, which we collated in the Unified Human Gastrointestinal Protein (UHGP) catalog. The UHGP more than doubles the number of gut proteins in comparison to those present in the Integrated Gene Catalog. More than 70% of the UHGG species lack cultured representatives, and 40% of the UHGP lack functional annotations. Intraspecies genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which are specific to individual human populations. The UHGG and UHGP collections will enable studies linking genotypes to phenotypes in the human gut microbiome.

Identifiants

pubmed: 32690973
doi: 10.1038/s41587-020-0603-3
pii: 10.1038/s41587-020-0603-3
pmc: PMC7801254
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

105-114

Subventions

Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/N018354/1
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/R015228/1
Pays : United Kingdom

Références

Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).
pubmed: 23023125 doi: 10.1038/nature11450
Feng, Q. et al. Gut microbiome development along the colorectal adenoma–carcinoma sequence. Nat. Commun. 6, 6528 (2015).
pubmed: 25758642 doi: 10.1038/ncomms7528
Thomas, A. M. & Segata, N. Multiple levels of the unknown in microbiome research. BMC Biol. 17, 48 (2019).
pubmed: 31189463 pmcid: 6560723 doi: 10.1186/s12915-019-0667-z
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
doi: 10.1038/nature11234
Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. 32, 834–841 (2014).
pubmed: 24997786 doi: 10.1038/nbt.2942
Qin, J. et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010).
pubmed: 20203603 pmcid: 3779803 doi: 10.1038/nature08821
Nayfach, S., Fischbach, M. A. & Pollard, K. S. MetaQuery: a web server for rapid annotation and quantitative analysis of specific genes in the human gut microbiome. Bioinformatics 31, 3368–3370 (2015).
pubmed: 26104745 pmcid: 4595903 doi: 10.1093/bioinformatics/btv382
Wu, H. et al. Metformin alters the gut microbiome of individuals with treatment-naive type 2 diabetes, contributing to the therapeutic effects of the drug. Nat. Med. 23, 850–858 (2017).
pubmed: 28530702 doi: 10.1038/nm.4345
Liu, R. et al. Gut microbiome and serum metabolome alterations in obesity and after weight-loss intervention. Nat. Med. 23, 859–868 (2017).
pubmed: 28628112 doi: 10.1038/nm.4358
Armour, C. R., Nayfach, S., Pollard, K. S. & Sharpton, T. J. A metagenomic meta-analysis reveals functional signatures of health and disease in the human gut microbiome. mSystems 4, e00332-18 (2019).
pubmed: 31098399 pmcid: 6517693 doi: 10.1128/mSystems.00332-18
Browne, H. P. et al. Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation. Nature 533, 543–546 (2016).
pubmed: 27144353 pmcid: 4890681 doi: 10.1038/nature17645
Lagier, J.-C. et al. Culture of previously uncultured members of the human gut microbiota by culturomics. Nat. Microbiol. 1, 16203 (2016).
pubmed: 27819657 doi: 10.1038/nmicrobiol.2016.203
Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).
doi: 10.1038/s41564-017-0012-7 pubmed: 28894102
Stewart, R. D. et al. Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nat. Biotechnol. 37, 953–961 (2019).
pubmed: 31375809 pmcid: 6785717 doi: 10.1038/s41587-019-0202-3
Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Commun. 7, 13219 (2016).
pubmed: 27774985 pmcid: 5079060 doi: 10.1038/ncomms13219
Nayfach, S., Shi, Z. J., Seshadri, R., Pollard, K. S. & Kyrpides, N. New insights from uncultivated genomes of the global human gut microbiome. Nature 568, 505–510 (2019).
pubmed: 30867587 pmcid: 6784871 doi: 10.1038/s41586-019-1058-x
Chen, L.-X., Anantharaman, K., Shaiber, A., Eren, A. M. & Banfield, J. F. Accurate and complete genomes from metagenomes. Genome Res. 30, 315–333 (2020).
pubmed: 32188701 pmcid: 7111523 doi: 10.1101/gr.258640.119
Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).
pubmed: 30745586 pmcid: 6784870 doi: 10.1038/s41586-019-0965-1
Forster, S. C. et al. A human gut bacterial genome and culture collection for improved metagenomic analyses. Nat. Biotechnol. 37, 186–192 (2019).
pubmed: 30718869 pmcid: 6785715 doi: 10.1038/s41587-018-0009-7
Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662 (2019).
pubmed: 30661755 pmcid: 6349461 doi: 10.1016/j.cell.2019.01.001
Zou, Y. et al. 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat. Biotechnol. 37, 179–185 (2019).
pubmed: 30718868 pmcid: 6784896 doi: 10.1038/s41587-018-0008-8
Kitts, P. A. et al. Assembly: a resource for assembled genomes at NCBI. Nucleic Acids Res. 44, D73–D80 (2016).
pubmed: 26578580 doi: 10.1093/nar/gkv1226
Wattam, A. R. et al. Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center. Nucleic Acids Res. 45, D535–D542 (2017).
pubmed: 27899627 doi: 10.1093/nar/gkw1017
Chen, I.-M. A. et al. IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res. 47, D666–D677 (2019).
pubmed: 30289528 doi: 10.1093/nar/gky901
Human Microbiome Jumpstart Reference Strains Consortium. A catalog of reference genomes from the human microbiome. Science 328, 994–999 (2010).
doi: 10.1126/science.1183605
Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016).
Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).
Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).
Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics https://doi.org/10.1093/bioinformatics/btz848 (2019).
Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018).
pubmed: 30148503 doi: 10.1038/nbt.4229
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).
pubmed: 28298430 pmcid: 5411777 doi: 10.1101/gr.213959.116
Kang, D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).
pubmed: 31388474 pmcid: 6662567 doi: 10.7717/peerj.7359
Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
pubmed: 25609793
Wu, Y.-W. W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2015).
pubmed: 26515820 doi: 10.1093/bioinformatics/btv638
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
doi: 10.1038/nmeth.3103 pubmed: 25218180
Sieber, C. M. K. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat. Microbiol. 3, 836–843 (2018).
pubmed: 29807988 pmcid: 6786971 doi: 10.1038/s41564-018-0171-1
Rosero, J. A. et al. Reclassification of Eubacterium rectale (Hauduroy et al. 1937) Prévot 1938 in a new genus Agathobacter gen. nov. as Agathobacter rectalis comb. nov., and description of Agathobacter ruminis sp. nov., isolated from the rumen contents of sheep and cows. Int. J. Syst. Evol. Microbiol. 66, 768–773 (2016).
pubmed: 26619944 doi: 10.1099/ijsem.0.000788
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
pubmed: 31779668 pmcid: 6883579 doi: 10.1186/s13059-019-1891-0
Hildebrand, F. et al. Antibiotics-induced monodominance of a novel gut bacterial order. Gut 68, 1781–1790 (2019).
pubmed: 30658995 doi: 10.1136/gutjnl-2018-317715
Di Rienzi, S. C. et al. The human gut and groundwater harbor non-photosynthetic bacteria belonging to a new candidate phylum sibling to Cyanobacteria. eLife 2, e01102 (2013).
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2019).
pubmed: 30418610 doi: 10.1093/nar/gky1085
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
pubmed: 24451626 pmcid: 3998142 doi: 10.1093/bioinformatics/btu031
Galperin, M. Y., Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 43, D261–D269 (2015).
pubmed: 25428365 doi: 10.1093/nar/gku1223
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361 (2017).
pubmed: 27899662 doi: 10.1093/nar/gkw1092
Schloissnig, S. et al. Genomic variation landscape of the human gut microbiome. Nature 493, 45–50 (2013).
pubmed: 23222524 doi: 10.1038/nature11711
Mitchell, A. L. et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 48, D570–D578 (2020).
pubmed: 31696235
Bradley, P., den Bakker, H. C., Rocha, E. P. C., McVean, G. & Iqbal, Z. Ultrafast search of all deposited bacterial and viral genomic data. Nat. Biotechnol. 37, 152–159 (2019).
pubmed: 30718882 pmcid: 6420049 doi: 10.1038/s41587-018-0010-1
Amid, C. et al. The European Nucleotide Archive in 2019. Nucleic Acids Res. 48, D70–D76 (2019).
pmcid: 7145635
Poyet, M. et al. A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research. Nat. Med. 25, 1442–1452 (2019).
pubmed: 31477907 doi: 10.1038/s41591-019-0559-3
Xu, Y. & Zhao, F. Single-cell metagenomics: challenges and applications. Protein Cell 9, 501–510 (2018).
pubmed: 29696589 pmcid: 5960468 doi: 10.1007/s13238-018-0544-5
Noyes, N. R. et al. Enrichment allows identification of diverse, rare elements in metagenomic resistome–virulome sequencing. Microbiome 5, 142 (2017).
pubmed: 29041965 pmcid: 5645900 doi: 10.1186/s40168-017-0361-8
Mukherjee, S. et al. Genomes OnLine Database (GOLD) v.7: updates and new features. Nucleic Acids Res. 47, D649–D659 (2019).
pubmed: 30357420 doi: 10.1093/nar/gky977
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
pubmed: 25977477 pmcid: 4484387 doi: 10.1101/gr.186072.114
Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009).
pubmed: 19307242 pmcid: 2732312 doi: 10.1093/bioinformatics/btp157
Kalvari, I. et al. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res. 46, D335–D342 (2018).
pubmed: 29112718 doi: 10.1093/nar/gkx1038
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
pubmed: 9023104 pmcid: 146525 doi: 10.1093/nar/25.5.955
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
pubmed: 22388286 pmcid: 3322381
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943 pmcid: 2723002 doi: 10.1093/bioinformatics/btp352
Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).
pubmed: 28742071 pmcid: 5702732 doi: 10.1038/ismej.2017.126
Müllner, D. Fastcluster: fast hierarchical, agglomerative clustering routines for R and Python. J. Stat. Softw. 53, 1–18 (2013).
doi: 10.18637/jss.v053.i09
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
pubmed: 14759262 pmcid: 395750 doi: 10.1186/gb-2004-5-2-r12
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
pubmed: 20080505 pmcid: 2828108 doi: 10.1093/bioinformatics/btp698
Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L. Bracken: estimating species abundance in metagenomics data. PeerJ Comput. Sci. 2017, e104 (2017).
doi: 10.7717/peerj-cs.104
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
doi: 10.1038/nmeth.3176 pubmed: 25402007
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
doi: 10.1093/molbev/msu300 pubmed: 25371430
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 47, W256–W259 (2019).
pubmed: 30931475 pmcid: 6602468 doi: 10.1093/nar/gkz239
Turner, I., Garimella, K. V., Iqbal, Z. & McVean, G. Integrating long-range connectivity information into de Bruijn graphs. Bioinformatics 34, 2556–2565 (2018).
pubmed: 29554215 pmcid: 6061703 doi: 10.1093/bioinformatics/bty157
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
pubmed: 24642063 doi: 10.1093/bioinformatics/btu153
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
pubmed: 20211023 pmcid: 2848648 doi: 10.1186/1471-2105-11-119
Page, A. J. et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31, 3691–3693 (2015).
pubmed: 26198102 pmcid: 4817141 doi: 10.1093/bioinformatics/btv421
Steinegger, M. & Söding, J. Clustering huge protein sequence sets in linear time. Nat. Commun. 9, 2542 (2018).
pubmed: 29959318 pmcid: 6026198 doi: 10.1038/s41467-018-04964-5
Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).
pubmed: 28460117 pmcid: 5850834 doi: 10.1093/molbev/msx148
Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M. & Henrissat, B. The Carbohydrate-Active Enzymes database (CAZy) in 2013. Nucleic Acids Res. 42, D490–D495 (2014).
pubmed: 24270786 doi: 10.1093/nar/gkt1178
Torchiano, M. Effsize—a package for efficient effect size computation. Zenodo https://doi.org/10.5281/ZENODO.1480624 (2016).
Gloor, G. B., Wu, J. R., Pawlowsky-Glahn, V. & Egozcue, J. J. It’s all relative: analyzing microbiome data as compositions. Ann. Epidemiol. 26, 322–329 (2016).
pubmed: 27143475 doi: 10.1016/j.annepidem.2016.03.003

Auteurs

Alexandre Almeida (A)

European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK. aalmeida@ebi.ac.uk.
Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK. aalmeida@ebi.ac.uk.

Stephen Nayfach (S)

US Department of Energy Joint Genome Institute, Walnut Creek, CA, USA.
Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

Miguel Boland (M)

European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.

Francesco Strozzi (F)

Enterome Bioscience, Paris, France.

Martin Beracochea (M)

European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.

Zhou Jason Shi (ZJ)

Gladstone Institutes, San Francisco, CA, USA.
Chan Zuckerberg Biohub, San Francisco, CA, USA.

Katherine S Pollard (KS)

Gladstone Institutes, San Francisco, CA, USA.
Chan Zuckerberg Biohub, San Francisco, CA, USA.
Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.
Institute for Computational Health Sciences, University of California San Francisco, San Francisco, CA, USA.
Quantitative Biology Institute, University of California San Francisco, San Francisco, CA, USA.
Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA.

Ekaterina Sakharova (E)

European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.

Donovan H Parks (DH)

Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia.

Philip Hugenholtz (P)

Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia.

Nicola Segata (N)

CIBIO Department, University of Trento, Trento, Italy.

Nikos C Kyrpides (NC)

US Department of Energy Joint Genome Institute, Walnut Creek, CA, USA.
Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

Robert D Finn (RD)

European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK. rdf@ebi.ac.uk.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH