Multicenter evaluation of gut microbiome profiling by next-generation sequencing reveals major biases in partial-length metabarcoding approach.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
18 Dec 2023
18 Dec 2023
Historique:
received:
27
02
2023
accepted:
27
10
2023
medline:
20
12
2023
pubmed:
20
12
2023
entrez:
19
12
2023
Statut:
epublish
Résumé
Next-generation sequencing workflows, using either metabarcoding or metagenomic approaches, have massively contributed to expanding knowledge of the human gut microbiota, but methodological bias compromises reproducibility across studies. Where these biases have been quantified within several comparative analyses on their own, none have measured inter-laboratory reproducibility using similar DNA material. Here, we designed a multicenter study involving seven participating laboratories dedicated to partial- (P1 to P5), full-length (P6) metabarcoding, or metagenomic profiling (MGP) using DNA from a mock microbial community or extracted from 10 fecal samples collected at two time points from five donors. Fecal material was collected, and the DNA was extracted according to the IHMS protocols. The mock and isolated DNA were then provided to the participating laboratories for sequencing. Following sequencing analysis according to the laboratories' routine pipelines, relative taxonomic-count tables defined at the genus level were provided and analyzed. Large variations in alpha-diversity between laboratories, uncorrelated with sequencing depth, were detected among the profiles. Half of the genera identified by P1 were unique to this partner and two-thirds of the genera identified by MGP were not detected by P3. Analysis of beta-diversity revealed lower inter-individual variance than inter-laboratory variances. The taxonomic profiles of P5 and P6 were more similar to those of MGP than those obtained by P1, P2, P3, and P4. Reanalysis of the raw sequences obtained by partial-length metabarcoding profiling, using a single bioinformatic pipeline, harmonized the description of the bacterial profiles, which were more similar to each other, except for P3, and closer to the profiles obtained by MGP. This study highlights the major impact of the bioinformatics pipeline, and primarily the database used for taxonomic annotation. Laboratories need to benchmark and optimize their bioinformatic pipelines using standards to monitor their effectiveness in accurately detecting taxa present in gut microbiota.
Identifiants
pubmed: 38114587
doi: 10.1038/s41598-023-46062-7
pii: 10.1038/s41598-023-46062-7
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
22593Subventions
Organisme : European Research Council
ID : 2017-AdG No. 788191
Pays : International
Informations de copyright
© 2023. The Author(s).
Références
Vincent, A. T., Derome, N., Boyle, B., Culley, A. I. & Charette, S. J. Next-generation sequencing (NGS) in the microbiological world: How to make the most of your money. J. Microbiol. Methods 138, 60–71 (2017).
pubmed: 26995332
doi: 10.1016/j.mimet.2016.02.016
Nearing, J. T., Comeau, A. M. & Langille, M. G. I. Identifying biases and their potential solutions in human microbiome studies. Microbiome https://doi.org/10.1186/s40168-021-01059-0 (2021).
doi: 10.1186/s40168-021-01059-0
pubmed: 34006335
pmcid: 8132403
Penington, J. S. et al. Influence of fecal collection conditions and 16S rRNA gene sequencing at two centers on human gut microbiota analysis. Sci. Rep. 8, 4386 (2018).
pubmed: 29531234
pmcid: 5847573
doi: 10.1038/s41598-018-22491-7
Ilett, E. E. et al. Gut microbiome comparability of fresh-frozen versus stabilized-frozen samples from hospitalized patients using 16S rRNA gene and shotgun metagenomic sequencing. Sci. Rep. 9, 13351 (2019).
pubmed: 31527823
pmcid: 6746779
doi: 10.1038/s41598-019-49956-7
Salter, S. J. et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 87 (2014).
pubmed: 25387460
pmcid: 4228153
doi: 10.1186/s12915-014-0087-z
Costea, P. I. et al. Towards standards for human fecal sample processing in metagenomic studies. Nat. Biotechnol. 35, 1069–1076 (2017).
pubmed: 28967887
doi: 10.1038/nbt.3960
Lim, M. Y., Song, E.-J., Kim, S. H., Lee, J. & Nam, Y.-D. Comparison of DNA extraction methods for human gut microbial community profiling. Syst. Appl. Microbiol. 41, 151–157 (2018).
pubmed: 29305057
doi: 10.1016/j.syapm.2017.11.008
Sze, M. A. & Schloss, P. D. The impact of DNA polymerase and number of rounds of amplification in PCR on 16S rRNA gene sequence data. mSphere https://doi.org/10.1128/mSphere.00163-19 (2019).
doi: 10.1128/mSphere.00163-19
pubmed: 31118299
pmcid: 6531881
Jones, M. B. et al. Library preparation methodology can influence genomic and functional predictions in human microbiome research. Proc. Natl. Acad. Sci. U.S.A. 112, 14024–14029 (2015).
pubmed: 26512100
pmcid: 4653211
doi: 10.1073/pnas.1519288112
Schirmer, M. et al. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucl. Acids Res. 43, e37 (2015).
pubmed: 25586220
pmcid: 4381044
doi: 10.1093/nar/gku1341
Thorsen, J. et al. Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies. Microbiome. 4, 62 (2016).
pubmed: 27884206
pmcid: 5123278
doi: 10.1186/s40168-016-0208-8
Hillmann, B. et al. Evaluating the information content of shallow shotgun metagenomics. mSystems https://doi.org/10.1128/mSystems.00069-18 (2018).
doi: 10.1128/mSystems.00069-18
pubmed: 30443602
pmcid: 6234283
Whon, T. W. et al. The effects of sequencing platforms on phylogenetic resolution in 16 S rRNA gene profiling of human feces. Sci. Data. 5, 180068 (2018).
pubmed: 29688220
pmcid: 5914283
doi: 10.1038/sdata.2018.68
Marizzoni, M. et al. Comparison of bioinformatics pipelines and operating systems for the analyses of 16S rRNA gene amplicon sequences in human fecal samples. Front. Microbiol. 11, 1262 (2020).
pubmed: 32636817
pmcid: 7318847
doi: 10.3389/fmicb.2020.01262
Weiss, S. et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome. 5, 27 (2017).
pubmed: 28253908
pmcid: 5335496
doi: 10.1186/s40168-017-0237-y
Lynch, M. D. J. & Neufeld, J. D. Ecology and exploration of the rare biosphere. Nat. Rev. Microbiol. 13, 217–229 (2015).
pubmed: 25730701
doi: 10.1038/nrmicro3400
Abellan-Schneyder, I. et al. Primer, pipelines, parameters: Issues in 16S rRNA gene sequencing. mSphere https://doi.org/10.1128/mSphere.01202-20 (2021).
doi: 10.1128/mSphere.01202-20
pubmed: 33627512
pmcid: 8544895
Wei, Z.-G. et al. Comparison of methods for picking the operational taxonomic units from amplicon sequences. Front. Microbiol. 12, 644012 (2021).
pubmed: 33841367
pmcid: 8024490
doi: 10.3389/fmicb.2021.644012
Nearing, J. T. et al. Microbiome differential abundance methods produce different results across 38 datasets. Nat. Commun. https://doi.org/10.1038/s41467-022-28034-z (2022).
doi: 10.1038/s41467-022-28034-z
pubmed: 35115546
pmcid: 8813933
Caruso, V., Song, X., Asquith, M. & Karstens, L. Performance of microbiome sequence inference methods in environments with varying biomass. mSystems https://doi.org/10.1128/mSystems.00163-18 (2019).
doi: 10.1128/mSystems.00163-18
pubmed: 30801029
pmcid: 6381225
Acinas, S. G. et al. Fine-scale phylogenetic architecture of a complex bacterial community. Nature. 430, 551–554 (2004).
pubmed: 15282603
doi: 10.1038/nature02649
Větrovský, T. & Baldrian, P. The variability of the 16S rRNA gene in bacterial genomes and its consequences for bacterial community analyses. PLoS ONE 8, e57923 (2013).
pubmed: 23460914
pmcid: 3583900
doi: 10.1371/journal.pone.0057923
Jeong, J. et al. The effect of taxonomic classification by full-length 16S rRNA sequencing with a synthetic long-read technology. Sci. Rep. 11, 1727 (2021).
pubmed: 33462291
pmcid: 7814050
doi: 10.1038/s41598-020-80826-9
Hassler, H. B. et al. Phylogenies of the 16S rRNA gene and its hypervariable regions lack concordance with core genome phylogenies. Microbiome https://doi.org/10.1186/s40168-022-01295-y (2022).
doi: 10.1186/s40168-022-01295-y
pubmed: 35799218
pmcid: 9264627
Pereira-Marques, J. et al. Impact of host DNA and sequencing depth on the taxonomic resolution of whole metagenome sequencing for microbiome analysis. Front. Microbiol. 10, 1277 (2019).
pubmed: 31244801
pmcid: 6581681
doi: 10.3389/fmicb.2019.01277
Gweon, H. S. et al. The impact of sequencing depth on the inferred taxonomic composition and AMR gene content of metagenomic samples. Environ. Microbiome https://doi.org/10.1186/s40793-019-0347-1 (2019).
doi: 10.1186/s40793-019-0347-1
pubmed: 33902704
pmcid: 8204541
Laudadio, I. et al. Quantitative assessment of shotgun metagenomics and 16S rDNA amplicon sequencing in the study of human gut microbiome. OMICS 22, 248–254 (2018).
pubmed: 29652573
doi: 10.1089/omi.2018.0013
Park, S.-Y., Ufondu, A., Lee, K. & Jayaraman, A. Emerging computational tools and models for studying gut microbiota composition and function. Curr. Opin. Biotechnol. 66, 301–311 (2020).
pubmed: 33248408
pmcid: 7744364
doi: 10.1016/j.copbio.2020.10.005
Jovel, J. et al. Characterization of the gut microbiome using 16S or shotgun metagenomics. Front. Microbiol. 7, 459 (2016).
pubmed: 27148170
pmcid: 4837688
doi: 10.3389/fmicb.2016.00459
Mitra, S. et al. Analysis of the intestinal microbiota using SOLiD 16S rRNA gene sequencing and SOLiD shotgun sequencing. BMC Genomics. 14(Suppl 5), S16 (2013).
pubmed: 24564472
pmcid: 3852202
doi: 10.1186/1471-2164-14-S5-S16
Rausch, P. et al. Comparative analysis of amplicon and metagenomic sequencing methods reveals key features in the evolution of animal metaorganisms. Microbiome 7, 133 (2019).
pubmed: 31521200
pmcid: 6744666
doi: 10.1186/s40168-019-0743-1
Biegert, G., Karpinets, T., Wu, X., Alam, M.B.E., Sims, T.T., Yoshida-Court, K., et al. Diversity and composition of gut microbiome of cervical cancer patients by 16S rRNA and whole-metagenome sequencing (2020).
Vogtmann, E. et al. Colorectal cancer and the human gut microbiome: Reproducibility with whole-genome shotgun sequencing. PLoS ONE. 11, e0155362 (2016).
pubmed: 27171425
pmcid: 4865240
doi: 10.1371/journal.pone.0155362
Ranjan, R., Rani, A., Metwally, A., McGee, H. S. & Perkins, D. L. Analysis of the microbiome: Advantages of whole genome shotgun versus 16S amplicon sequencing. Biochem. Biophys. Res. Commun. 469, 967–977 (2016).
pubmed: 26718401
doi: 10.1016/j.bbrc.2015.12.083
Clooney, A. G. et al. Comparing apples and oranges? Next generation sequencing and its impact on microbiome analysis. PLoS ONE 11, e0148028 (2016).
pubmed: 26849217
pmcid: 4746063
doi: 10.1371/journal.pone.0148028
Han, D. et al. Multicenter assessment of microbial community profiling using 16S rRNA gene sequencing and shotgun metagenomic sequencing. J Adv Res. 26, 111–121 (2020).
pubmed: 33133687
pmcid: 7584675
doi: 10.1016/j.jare.2020.07.010
Criscuolo, A. & Brisse, S. AlienTrimmer: A tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads. Genomics 102, 500–506 (2013).
pubmed: 23912058
doi: 10.1016/j.ygeno.2013.07.011
Wen, C. et al. Quantitative metagenomics reveals unique gut microbiome biomarkers in ankylosing spondylitis. Genome Biol. 18, 142 (2017).
pubmed: 28750650
pmcid: 5530561
doi: 10.1186/s13059-017-1271-6
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
pubmed: 22388286
pmcid: 3322381
doi: 10.1038/nmeth.1923
Cotillard, A. et al. Dietary intervention impact on gut microbial gene richness. Nature 500, 585–588 (2013).
pubmed: 23985875
doi: 10.1038/nature12480
Le Chatelier, E. et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).
pubmed: 23985870
doi: 10.1038/nature12506
Plaza Oñate, F. et al. MSPminer: Abundance-based reconstitution of microbial pan-genomes from shotgun metagenomic data. Bioinformatics 35, 1544–1552 (2019).
pubmed: 30252023
doi: 10.1093/bioinformatics/bty830
Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018).
pubmed: 30148503
doi: 10.1038/nbt.4229
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
pubmed: 31779668
pmcid: 6883579
doi: 10.1186/s13059-019-1891-0
Schloss, P. D. et al. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).
pubmed: 19801464
pmcid: 2786419
doi: 10.1128/AEM.01541-09
Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336 (2010).
pubmed: 20383131
pmcid: 3156573
doi: 10.1038/nmeth.f.303
Escudié, F. et al. FROGS: Find, rapidly, OTUs with galaxy solution. Bioinformatics 34, 1287–1294 (2018).
pubmed: 29228191
doi: 10.1093/bioinformatics/btx791
Callahan, B. J. et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).
pubmed: 27214047
pmcid: 4927377
doi: 10.1038/nmeth.3869
Westcott, S. L. & Schloss, P. D. OptiClust, an improved method for assigning amplicon-based sequence data to operational taxonomic units. mSphere https://doi.org/10.1128/mSphereDirect.00073-17 (2017).
doi: 10.1128/mSphereDirect.00073-17
pubmed: 28289728
pmcid: 5343174
Mahé, F., Rognes, T., Quince, C., de Vargas, C. & Dunthorn, M. Swarm: Robust and fast clustering method for amplicon-based studies. PeerJ. 2, e593 (2014).
pubmed: 25276506
pmcid: 4178461
doi: 10.7717/peerj.593
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).
pubmed: 20709691
doi: 10.1093/bioinformatics/btq461
Magoč, T. & Salzberg, S. L. FLASH: Fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957–2963 (2011).
pubmed: 21903629
pmcid: 3198573
doi: 10.1093/bioinformatics/btr507
Maidak, B. L. et al. The RDP (Ribosomal Database Project) continues. Nucl. Acids Res. 28, 173–174 (2000).
pubmed: 10592216
pmcid: 102428
doi: 10.1093/nar/28.1.173
DeSantis, T. Z. et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072 (2006).
pubmed: 16820507
pmcid: 1489311
doi: 10.1128/AEM.03006-05
Camacho, C. et al. BLAST+: Architecture and applications. BMC Bioinform. 10, 421 (2009).
doi: 10.1186/1471-2105-10-421
Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267 (2007).
pubmed: 17586664
pmcid: 1950982
doi: 10.1128/AEM.00062-07
Quast, C. et al. The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools. Nucl. Acids Res. 41, D590–D596 (2013).
pubmed: 23193283
doi: 10.1093/nar/gks1219
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucl. Acids Res. 44, D733–D745 (2016).
pubmed: 26553804
doi: 10.1093/nar/gkv1189
Blin, K. ncbi-genome-download: Zenodo (2023).
Schoch, C. L. et al. NCBI Taxonomy: A comprehensive update on curation, resources and tools. Database (Oxford) https://doi.org/10.1093/database/baaa062 (2020).
Seemann, T. barrnap 0.9: Rapid ribosomal RNA prediction (2013). https://github.com/tseemann/barrnap .
Li, W. & Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
pubmed: 16731699
doi: 10.1093/bioinformatics/btl158
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
pubmed: 23060610
pmcid: 3516142
doi: 10.1093/bioinformatics/bts565
Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucl. Acids Res. 47, D23–D28 (2019).
pubmed: 30395293
doi: 10.1093/nar/gky1069
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10 (2011).
doi: 10.14806/ej.17.1.200
Bankevich, A. et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
pubmed: 22506599
pmcid: 3342519
doi: 10.1089/cmb.2012.0021
Zhang, J., Kobert, K., Flouri, T. & Stamatakis, A. PEAR: A fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30, 614–620 (2014).
pubmed: 24142950
doi: 10.1093/bioinformatics/btt593
Edgar, R. C., Haas, B. J., Clemente, J. C., Quince, C. & Knight, R. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27, 2194–2200 (2011).
pubmed: 21700674
pmcid: 3150044
doi: 10.1093/bioinformatics/btr381
Cole, J. R. et al. Ribosomal Database Project: Data and tools for high throughput rRNA analysis. Nucl. Acids Res. 42, D633–D642 (2014).
pubmed: 24288368
doi: 10.1093/nar/gkt1244
Dereeper, A. et al. Phylogeny.fr: Robust phylogenetic analysis for the non-specialist. Nucl. Acids Res. 36, W465–W469 (2008).
pubmed: 18424797
pmcid: 2447785
doi: 10.1093/nar/gkn180
Castresana, J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552 (2000).
pubmed: 10742046
doi: 10.1093/oxfordjournals.molbev.a026334
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
pubmed: 20525638
doi: 10.1093/sysbio/syq010
Chevenet, F., Brun, C., Bañuls, A.-L., Jacq, B. & Christen, R. TreeDyn: Towards dynamic graphics and annotations for analyses of trees. BMC Bioinform. 7, 439 (2006).
doi: 10.1186/1471-2105-7-439
Balvočiūtė, M. & Huson, D. H. SILVA, RDP, Greengenes, NCBI and OTT—How do these taxonomies compare?. BMC Genomics https://doi.org/10.1186/s12864-017-3501-4 (2017).
doi: 10.1186/s12864-017-3501-4
pubmed: 28361695
pmcid: 5374703
McDonald, D. et al. Greengenes2 unifies microbial data in a single reference tree. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01845-1 (2023).
doi: 10.1038/s41587-023-01845-1
pubmed: 37853258
pmcid: 10344774
Park, S.-C. & Won, S. Evaluation of 16S rRNA databases for taxonomic assignments using a mock community. Genomics Inform. 16, e24 (2018).
pubmed: 30602085
pmcid: 6440677
doi: 10.5808/GI.2018.16.4.e24
Sinha, R. et al. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat. Biotechnol. 35, 1077–1086 (2017).
pubmed: 28967885
pmcid: 5839636
doi: 10.1038/nbt.3981
O’Sullivan, D. M. et al. An inter-laboratory study to investigate the impact of the bioinformatics component on microbiome analysis using mock communities. Sci. Rep. 11, 10590 (2021).
pubmed: 34012005
pmcid: 8134577
doi: 10.1038/s41598-021-89881-2
Straub, D. et al. Interpretations of environmental microbial community studies are biased by the selected 16S rRNA (Gene) amplicon sequencing pipeline. Front. Microbiol. 11, 550420 (2020).
pubmed: 33193131
pmcid: 7645116
doi: 10.3389/fmicb.2020.550420
Amos, G. C. A. et al. Developing standards for the microbiome field. Microbiome. 8, 98 (2020).
pubmed: 32591016
pmcid: 7320585
doi: 10.1186/s40168-020-00856-3
Scherz, V., Greub, G. & Bertelli, C. Building up a clinical microbiota profiling: A quality framework proposal. Crit. Rev. Microbiol. 48(3), 356–375 (2021).
pubmed: 34752719
doi: 10.1080/1040841X.2021.1975642
Mirzayi, C. et al. Reporting guidelines for human microbiome research: The STORMS checklist. Nat. Med. 27, 1885–1892 (2021).
pubmed: 34789871
pmcid: 9105086
doi: 10.1038/s41591-021-01552-x