Human reference gut microbiome catalog including newly assembled genomes from under-represented Asian metagenomes.


Journal

Genome medicine
ISSN: 1756-994X
Titre abrégé: Genome Med
Pays: England
ID NLM: 101475844

Informations de publication

Date de publication:
27 08 2021
Historique:
received: 09 12 2020
accepted: 05 08 2021
entrez: 27 8 2021
pubmed: 28 8 2021
medline: 17 2 2022
Statut: epublish

Résumé

Metagenome sampling bias for geographical location and lifestyle is partially responsible for the incomplete catalog of reference genomes of gut microbial species. Thus, genome assembly from currently under-represented populations may effectively expand the reference gut microbiome and improve taxonomic and functional profiling. We assembled genomes using public whole-metagenomic shotgun sequencing (WMS) data for 110 and 645 fecal samples from India and Japan, respectively. In addition, we assembled genomes from newly generated WMS data for 90 fecal samples collected from Korea. Expecting genome assembly for low-abundance species may require a much deeper sequencing than that usually employed, so we performed ultra-deep WMS (> 30 Gbp or > 100 million read pairs) for the fecal samples from Korea. We consequently assembled 29,082 prokaryotic genomes from 845 fecal metagenomes for the three under-represented Asian countries and combined them with the Unified Human Gastrointestinal Genome (UHGG) to generate an expanded catalog, the Human Reference Gut Microbiome (HRGM). HRGM contains 232,098 non-redundant genomes for 5414 representative prokaryotic species including 780 that are novel, > 103 million unique proteins, and > 274 million single-nucleotide variants. This is an over 10% increase from the UHGG. The new 780 species were enriched for the Bacteroidaceae family, including species associated with high-fiber and seaweed-rich diets. Single-nucleotide variant density was positively associated with the speciation rate of gut commensals. We found that ultra-deep sequencing facilitated the assembly of genomes for low-abundance taxa, and deep sequencing (e.g., > 20 million read pairs) may be needed for the profiling of low-abundance taxa. Importantly, the HRGM significantly improved the taxonomic and functional classification of sequencing reads from fecal samples. Finally, analysis of human self-antigen homologs on the HRGM species genomes suggested that bacterial taxa with high cross-reactivity potential may contribute more to the pathogenesis of gut microbiome-associated diseases than those with low cross-reactivity potential by promoting inflammatory condition. By including gut metagenomes from previously under-represented Asian countries, Korea, India, and Japan, we developed a substantially expanded microbiome catalog, HRGM. Information of the microbial genomes and coding genes is publicly available ( www.mbiomenet.org/HRGM/ ). HRGM will facilitate the identification and functional analysis of disease-associated gut microbiota.

Sections du résumé

BACKGROUND
Metagenome sampling bias for geographical location and lifestyle is partially responsible for the incomplete catalog of reference genomes of gut microbial species. Thus, genome assembly from currently under-represented populations may effectively expand the reference gut microbiome and improve taxonomic and functional profiling.
METHODS
We assembled genomes using public whole-metagenomic shotgun sequencing (WMS) data for 110 and 645 fecal samples from India and Japan, respectively. In addition, we assembled genomes from newly generated WMS data for 90 fecal samples collected from Korea. Expecting genome assembly for low-abundance species may require a much deeper sequencing than that usually employed, so we performed ultra-deep WMS (> 30 Gbp or > 100 million read pairs) for the fecal samples from Korea. We consequently assembled 29,082 prokaryotic genomes from 845 fecal metagenomes for the three under-represented Asian countries and combined them with the Unified Human Gastrointestinal Genome (UHGG) to generate an expanded catalog, the Human Reference Gut Microbiome (HRGM).
RESULTS
HRGM contains 232,098 non-redundant genomes for 5414 representative prokaryotic species including 780 that are novel, > 103 million unique proteins, and > 274 million single-nucleotide variants. This is an over 10% increase from the UHGG. The new 780 species were enriched for the Bacteroidaceae family, including species associated with high-fiber and seaweed-rich diets. Single-nucleotide variant density was positively associated with the speciation rate of gut commensals. We found that ultra-deep sequencing facilitated the assembly of genomes for low-abundance taxa, and deep sequencing (e.g., > 20 million read pairs) may be needed for the profiling of low-abundance taxa. Importantly, the HRGM significantly improved the taxonomic and functional classification of sequencing reads from fecal samples. Finally, analysis of human self-antigen homologs on the HRGM species genomes suggested that bacterial taxa with high cross-reactivity potential may contribute more to the pathogenesis of gut microbiome-associated diseases than those with low cross-reactivity potential by promoting inflammatory condition.
CONCLUSIONS
By including gut metagenomes from previously under-represented Asian countries, Korea, India, and Japan, we developed a substantially expanded microbiome catalog, HRGM. Information of the microbial genomes and coding genes is publicly available ( www.mbiomenet.org/HRGM/ ). HRGM will facilitate the identification and functional analysis of disease-associated gut microbiota.

Identifiants

pubmed: 34446072
doi: 10.1186/s13073-021-00950-7
pii: 10.1186/s13073-021-00950-7
pmc: PMC8394144
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

134

Informations de copyright

© 2021. The Author(s).

Références

Bioinformatics. 2006 Jul 1;22(13):1658-9
pubmed: 16731699
PeerJ. 2019 Jul 26;7:e7359
pubmed: 31388474
PLoS Biol. 2007 Jul;5(7):e156
pubmed: 17579514
Biochem J. 2017 May 16;474(11):1823-1836
pubmed: 28512250
Nucleic Acids Res. 2019 Jan 8;47(D1):D339-D343
pubmed: 30357391
Cell. 2019 Jan 24;176(3):649-662.e20
pubmed: 30661755
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Nat Med. 2019 Apr;25(4):667-678
pubmed: 30936548
Nat Commun. 2018 Nov 30;9(1):5114
pubmed: 30504855
Nat Microbiol. 2018 Jul;3(7):836-843
pubmed: 29807988
Int J Syst Evol Microbiol. 2008 Apr;58(Pt 4):1008-13
pubmed: 18398210
Nature. 2016 May 04;533(7604):543-546
pubmed: 27144353
Bioinformatics. 2015 May 15;31(10):1674-6
pubmed: 25609793
Curr Biol. 2013 Feb 18;23(4):R142-3
pubmed: 23428319
mSystems. 2018 Nov 13;3(6):
pubmed: 30443602
Genome Biol. 2004;5(2):R12
pubmed: 14759262
Bioinformatics. 2016 Feb 15;32(4):605-7
pubmed: 26515820
Nucleic Acids Res. 2020 Jan 8;48(D1):D517-D525
pubmed: 31665441
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Elife. 2019 Feb 12;8:
pubmed: 30747106
Nucleic Acids Res. 1997 Mar 1;25(5):955-64
pubmed: 9023104
Nat Methods. 2014 Nov;11(11):1144-6
pubmed: 25218180
ISME J. 2017 Dec;11(12):2864-2868
pubmed: 28742071
Trends Mol Med. 2020 Sep;26(9):862-873
pubmed: 32402849
Cell. 2019 Aug 8;178(4):779-794
pubmed: 31398336
Nat Genet. 2000 May;25(1):25-9
pubmed: 10802651
Nature. 2019 May;569(7758):655-662
pubmed: 31142855
Bioinformatics. 2007 Jan 1;23(1):127-8
pubmed: 17050570
Curr Opin Gastroenterol. 2015 Jan;31(1):69-75
pubmed: 25394236
Genome Res. 2017 May;27(5):824-834
pubmed: 28298430
Genome Med. 2016 Apr 20;8(1):42
pubmed: 27098727
Nucleic Acids Res. 2016 Jan 4;44(D1):D457-62
pubmed: 26476454
Nat Biotechnol. 2019 Feb;37(2):179-185
pubmed: 30718868
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
PLoS Comput Biol. 2017 Jun 22;13(6):e1005361
pubmed: 28640804
Microbiome. 2018 May 10;6(1):86
pubmed: 29747692
Mol Biol Evol. 2015 Jan;32(1):268-74
pubmed: 25371430
Gigascience. 2019 Mar 1;8(3):
pubmed: 30698687
PLoS Comput Biol. 2018 Jan 26;14(1):e1005944
pubmed: 29373581
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
Nat Microbiol. 2016 Oct 10;2:16180
pubmed: 27723761
Nature. 2019 Apr;568(7753):499-504
pubmed: 30745586
Genome Res. 2015 Jul;25(7):1043-55
pubmed: 25977477
PLoS One. 2019 Feb 6;14(2):e0211139
pubmed: 30726303
Nature. 2010 Apr 8;464(7290):908-12
pubmed: 20376150
Nat Biotechnol. 2019 Feb;37(2):186-192
pubmed: 30718869
BMC Bioinformatics. 2010 Mar 08;11:119
pubmed: 20211023
Nat Med. 2019 Jun;25(6):968-976
pubmed: 31171880
Nat Med. 2019 Sep;25(9):1442-1452
pubmed: 31477907
Bioinformatics. 2019 Nov 15;:
pubmed: 31730192
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45
pubmed: 26553804
Genome Biol. 2019 Nov 28;20(1):257
pubmed: 31779668
Genome Biol. 2016 Jun 20;17(1):132
pubmed: 27323842
Nature. 2019 Apr;568(7753):505-510
pubmed: 30867587
Science. 2005 Jun 10;308(5728):1635-8
pubmed: 15831718
Nucleic Acids Res. 2020 Jul 27;48(13):7603
pubmed: 32515792
Bioinformatics. 2014 Jul 15;30(14):2068-9
pubmed: 24642063
Microbiome. 2018 Sep 15;6(1):158
pubmed: 30219103
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
Mol Biol Evol. 2017 Aug 1;34(8):2115-2122
pubmed: 28460117
Nat Biotechnol. 2021 Jan;39(1):105-114
pubmed: 32690973
Nucleic Acids Res. 2019 Jan 8;47(D1):D309-D314
pubmed: 30418610
Nucleic Acids Res. 2019 Jul 2;47(W1):W81-W87
pubmed: 31032519

Auteurs

Chan Yeong Kim (CY)

Department of Biotechnology, College of Life Science & Biotechnology, Yonsei University, Seoul, 03722, Korea.

Muyoung Lee (M)

Department of Biotechnology, College of Life Science & Biotechnology, Yonsei University, Seoul, 03722, Korea.

Sunmo Yang (S)

Department of Biotechnology, College of Life Science & Biotechnology, Yonsei University, Seoul, 03722, Korea.

Kyungnam Kim (K)

Department of Laboratory Medicine, Research Institute of Bacterial Resistance, Yonsei University College of Medicine, Seoul, 03722, Korea.

Dongeun Yong (D)

Department of Laboratory Medicine, Research Institute of Bacterial Resistance, Yonsei University College of Medicine, Seoul, 03722, Korea.

Hye Ryun Kim (HR)

Division of Medical Oncology, Department of Internal Medicine, Yonsei Cancer Center, Yonsei University College of Medicine, Seoul, 03722, Korea.

Insuk Lee (I)

Department of Biotechnology, College of Life Science & Biotechnology, Yonsei University, Seoul, 03722, Korea. insuklee@yonsei.ac.kr.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH