A unified catalog of 204,938 reference genomes from the human gut microbiome.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
01 2021
01 2021
Historique:
received:
18
09
2019
accepted:
31
05
2020
pubmed:
22
7
2020
medline:
12
2
2021
entrez:
22
7
2020
Statut:
ppublish
Résumé
Comprehensive, high-quality reference genomes are required for functional characterization and taxonomic assignment of the human gut microbiota. We present the Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes. These genomes encode >170 million protein sequences, which we collated in the Unified Human Gastrointestinal Protein (UHGP) catalog. The UHGP more than doubles the number of gut proteins in comparison to those present in the Integrated Gene Catalog. More than 70% of the UHGG species lack cultured representatives, and 40% of the UHGP lack functional annotations. Intraspecies genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which are specific to individual human populations. The UHGG and UHGP collections will enable studies linking genotypes to phenotypes in the human gut microbiome.
Identifiants
pubmed: 32690973
doi: 10.1038/s41587-020-0603-3
pii: 10.1038/s41587-020-0603-3
pmc: PMC7801254
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
105-114Subventions
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/N018354/1
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/R015228/1
Pays : United Kingdom
Références
Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).
pubmed: 23023125
doi: 10.1038/nature11450
Feng, Q. et al. Gut microbiome development along the colorectal adenoma–carcinoma sequence. Nat. Commun. 6, 6528 (2015).
pubmed: 25758642
doi: 10.1038/ncomms7528
Thomas, A. M. & Segata, N. Multiple levels of the unknown in microbiome research. BMC Biol. 17, 48 (2019).
pubmed: 31189463
pmcid: 6560723
doi: 10.1186/s12915-019-0667-z
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
doi: 10.1038/nature11234
Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. 32, 834–841 (2014).
pubmed: 24997786
doi: 10.1038/nbt.2942
Qin, J. et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010).
pubmed: 20203603
pmcid: 3779803
doi: 10.1038/nature08821
Nayfach, S., Fischbach, M. A. & Pollard, K. S. MetaQuery: a web server for rapid annotation and quantitative analysis of specific genes in the human gut microbiome. Bioinformatics 31, 3368–3370 (2015).
pubmed: 26104745
pmcid: 4595903
doi: 10.1093/bioinformatics/btv382
Wu, H. et al. Metformin alters the gut microbiome of individuals with treatment-naive type 2 diabetes, contributing to the therapeutic effects of the drug. Nat. Med. 23, 850–858 (2017).
pubmed: 28530702
doi: 10.1038/nm.4345
Liu, R. et al. Gut microbiome and serum metabolome alterations in obesity and after weight-loss intervention. Nat. Med. 23, 859–868 (2017).
pubmed: 28628112
doi: 10.1038/nm.4358
Armour, C. R., Nayfach, S., Pollard, K. S. & Sharpton, T. J. A metagenomic meta-analysis reveals functional signatures of health and disease in the human gut microbiome. mSystems 4, e00332-18 (2019).
pubmed: 31098399
pmcid: 6517693
doi: 10.1128/mSystems.00332-18
Browne, H. P. et al. Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation. Nature 533, 543–546 (2016).
pubmed: 27144353
pmcid: 4890681
doi: 10.1038/nature17645
Lagier, J.-C. et al. Culture of previously uncultured members of the human gut microbiota by culturomics. Nat. Microbiol. 1, 16203 (2016).
pubmed: 27819657
doi: 10.1038/nmicrobiol.2016.203
Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).
doi: 10.1038/s41564-017-0012-7
pubmed: 28894102
Stewart, R. D. et al. Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nat. Biotechnol. 37, 953–961 (2019).
pubmed: 31375809
pmcid: 6785717
doi: 10.1038/s41587-019-0202-3
Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Commun. 7, 13219 (2016).
pubmed: 27774985
pmcid: 5079060
doi: 10.1038/ncomms13219
Nayfach, S., Shi, Z. J., Seshadri, R., Pollard, K. S. & Kyrpides, N. New insights from uncultivated genomes of the global human gut microbiome. Nature 568, 505–510 (2019).
pubmed: 30867587
pmcid: 6784871
doi: 10.1038/s41586-019-1058-x
Chen, L.-X., Anantharaman, K., Shaiber, A., Eren, A. M. & Banfield, J. F. Accurate and complete genomes from metagenomes. Genome Res. 30, 315–333 (2020).
pubmed: 32188701
pmcid: 7111523
doi: 10.1101/gr.258640.119
Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).
pubmed: 30745586
pmcid: 6784870
doi: 10.1038/s41586-019-0965-1
Forster, S. C. et al. A human gut bacterial genome and culture collection for improved metagenomic analyses. Nat. Biotechnol. 37, 186–192 (2019).
pubmed: 30718869
pmcid: 6785715
doi: 10.1038/s41587-018-0009-7
Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662 (2019).
pubmed: 30661755
pmcid: 6349461
doi: 10.1016/j.cell.2019.01.001
Zou, Y. et al. 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat. Biotechnol. 37, 179–185 (2019).
pubmed: 30718868
pmcid: 6784896
doi: 10.1038/s41587-018-0008-8
Kitts, P. A. et al. Assembly: a resource for assembled genomes at NCBI. Nucleic Acids Res. 44, D73–D80 (2016).
pubmed: 26578580
doi: 10.1093/nar/gkv1226
Wattam, A. R. et al. Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center. Nucleic Acids Res. 45, D535–D542 (2017).
pubmed: 27899627
doi: 10.1093/nar/gkw1017
Chen, I.-M. A. et al. IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res. 47, D666–D677 (2019).
pubmed: 30289528
doi: 10.1093/nar/gky901
Human Microbiome Jumpstart Reference Strains Consortium. A catalog of reference genomes from the human microbiome. Science 328, 994–999 (2010).
doi: 10.1126/science.1183605
Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016).
Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).
Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).
Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics https://doi.org/10.1093/bioinformatics/btz848 (2019).
Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018).
pubmed: 30148503
doi: 10.1038/nbt.4229
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).
pubmed: 28298430
pmcid: 5411777
doi: 10.1101/gr.213959.116
Kang, D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).
pubmed: 31388474
pmcid: 6662567
doi: 10.7717/peerj.7359
Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
pubmed: 25609793
Wu, Y.-W. W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2015).
pubmed: 26515820
doi: 10.1093/bioinformatics/btv638
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
doi: 10.1038/nmeth.3103
pubmed: 25218180
Sieber, C. M. K. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat. Microbiol. 3, 836–843 (2018).
pubmed: 29807988
pmcid: 6786971
doi: 10.1038/s41564-018-0171-1
Rosero, J. A. et al. Reclassification of Eubacterium rectale (Hauduroy et al. 1937) Prévot 1938 in a new genus Agathobacter gen. nov. as Agathobacter rectalis comb. nov., and description of Agathobacter ruminis sp. nov., isolated from the rumen contents of sheep and cows. Int. J. Syst. Evol. Microbiol. 66, 768–773 (2016).
pubmed: 26619944
doi: 10.1099/ijsem.0.000788
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
pubmed: 31779668
pmcid: 6883579
doi: 10.1186/s13059-019-1891-0
Hildebrand, F. et al. Antibiotics-induced monodominance of a novel gut bacterial order. Gut 68, 1781–1790 (2019).
pubmed: 30658995
doi: 10.1136/gutjnl-2018-317715
Di Rienzi, S. C. et al. The human gut and groundwater harbor non-photosynthetic bacteria belonging to a new candidate phylum sibling to Cyanobacteria. eLife 2, e01102 (2013).
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2019).
pubmed: 30418610
doi: 10.1093/nar/gky1085
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
pubmed: 24451626
pmcid: 3998142
doi: 10.1093/bioinformatics/btu031
Galperin, M. Y., Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 43, D261–D269 (2015).
pubmed: 25428365
doi: 10.1093/nar/gku1223
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361 (2017).
pubmed: 27899662
doi: 10.1093/nar/gkw1092
Schloissnig, S. et al. Genomic variation landscape of the human gut microbiome. Nature 493, 45–50 (2013).
pubmed: 23222524
doi: 10.1038/nature11711
Mitchell, A. L. et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 48, D570–D578 (2020).
pubmed: 31696235
Bradley, P., den Bakker, H. C., Rocha, E. P. C., McVean, G. & Iqbal, Z. Ultrafast search of all deposited bacterial and viral genomic data. Nat. Biotechnol. 37, 152–159 (2019).
pubmed: 30718882
pmcid: 6420049
doi: 10.1038/s41587-018-0010-1
Amid, C. et al. The European Nucleotide Archive in 2019. Nucleic Acids Res. 48, D70–D76 (2019).
pmcid: 7145635
Poyet, M. et al. A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research. Nat. Med. 25, 1442–1452 (2019).
pubmed: 31477907
doi: 10.1038/s41591-019-0559-3
Xu, Y. & Zhao, F. Single-cell metagenomics: challenges and applications. Protein Cell 9, 501–510 (2018).
pubmed: 29696589
pmcid: 5960468
doi: 10.1007/s13238-018-0544-5
Noyes, N. R. et al. Enrichment allows identification of diverse, rare elements in metagenomic resistome–virulome sequencing. Microbiome 5, 142 (2017).
pubmed: 29041965
pmcid: 5645900
doi: 10.1186/s40168-017-0361-8
Mukherjee, S. et al. Genomes OnLine Database (GOLD) v.7: updates and new features. Nucleic Acids Res. 47, D649–D659 (2019).
pubmed: 30357420
doi: 10.1093/nar/gky977
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
pubmed: 25977477
pmcid: 4484387
doi: 10.1101/gr.186072.114
Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009).
pubmed: 19307242
pmcid: 2732312
doi: 10.1093/bioinformatics/btp157
Kalvari, I. et al. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res. 46, D335–D342 (2018).
pubmed: 29112718
doi: 10.1093/nar/gkx1038
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
pubmed: 9023104
pmcid: 146525
doi: 10.1093/nar/25.5.955
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
pubmed: 22388286
pmcid: 3322381
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943
pmcid: 2723002
doi: 10.1093/bioinformatics/btp352
Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).
pubmed: 28742071
pmcid: 5702732
doi: 10.1038/ismej.2017.126
Müllner, D. Fastcluster: fast hierarchical, agglomerative clustering routines for R and Python. J. Stat. Softw. 53, 1–18 (2013).
doi: 10.18637/jss.v053.i09
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
pubmed: 14759262
pmcid: 395750
doi: 10.1186/gb-2004-5-2-r12
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
pubmed: 20080505
pmcid: 2828108
doi: 10.1093/bioinformatics/btp698
Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L. Bracken: estimating species abundance in metagenomics data. PeerJ Comput. Sci. 2017, e104 (2017).
doi: 10.7717/peerj-cs.104
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
doi: 10.1038/nmeth.3176
pubmed: 25402007
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
doi: 10.1093/molbev/msu300
pubmed: 25371430
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 47, W256–W259 (2019).
pubmed: 30931475
pmcid: 6602468
doi: 10.1093/nar/gkz239
Turner, I., Garimella, K. V., Iqbal, Z. & McVean, G. Integrating long-range connectivity information into de Bruijn graphs. Bioinformatics 34, 2556–2565 (2018).
pubmed: 29554215
pmcid: 6061703
doi: 10.1093/bioinformatics/bty157
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
pubmed: 24642063
doi: 10.1093/bioinformatics/btu153
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
pubmed: 20211023
pmcid: 2848648
doi: 10.1186/1471-2105-11-119
Page, A. J. et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31, 3691–3693 (2015).
pubmed: 26198102
pmcid: 4817141
doi: 10.1093/bioinformatics/btv421
Steinegger, M. & Söding, J. Clustering huge protein sequence sets in linear time. Nat. Commun. 9, 2542 (2018).
pubmed: 29959318
pmcid: 6026198
doi: 10.1038/s41467-018-04964-5
Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).
pubmed: 28460117
pmcid: 5850834
doi: 10.1093/molbev/msx148
Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M. & Henrissat, B. The Carbohydrate-Active Enzymes database (CAZy) in 2013. Nucleic Acids Res. 42, D490–D495 (2014).
pubmed: 24270786
doi: 10.1093/nar/gkt1178
Torchiano, M. Effsize—a package for efficient effect size computation. Zenodo https://doi.org/10.5281/ZENODO.1480624 (2016).
Gloor, G. B., Wu, J. R., Pawlowsky-Glahn, V. & Egozcue, J. J. It’s all relative: analyzing microbiome data as compositions. Ann. Epidemiol. 26, 322–329 (2016).
pubmed: 27143475
doi: 10.1016/j.annepidem.2016.03.003