Contamination source modeling with SCRuB improves cancer phenotype prediction from microbiome data.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
Dec 2023
Dec 2023
Historique:
received:
17
05
2022
accepted:
23
01
2023
pmc-release:
16
09
2024
pubmed:
18
3
2023
medline:
18
3
2023
entrez:
17
3
2023
Statut:
ppublish
Résumé
Sequencing-based approaches for the analysis of microbial communities are susceptible to contamination, which could mask biological signals or generate artifactual ones. Methods for in silico decontamination using controls are routinely used, but do not make optimal use of information shared across samples and cannot handle taxa that only partially originate in contamination or leakage of biological material into controls. Here we present Source tracking for Contamination Removal in microBiomes (SCRuB), a probabilistic in silico decontamination method that incorporates shared information across multiple samples and controls to precisely identify and remove contamination. We validate the accuracy of SCRuB in multiple data-driven simulations and experiments, including induced contamination, and demonstrate that it outperforms state-of-the-art methods by an average of 15-20 times. We showcase the robustness of SCRuB across multiple ecosystems, data types and sequencing depths. Demonstrating its applicability to microbiome research, SCRuB facilitates improved predictions of host phenotypes, most notably the prediction of treatment response in melanoma patients using decontaminated tumor microbiome data.
Identifiants
pubmed: 36928429
doi: 10.1038/s41587-023-01696-w
pii: 10.1038/s41587-023-01696-w
pmc: PMC10504420
mid: NIHMS1881945
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
1820-1828Subventions
Organisme : NCI NIH HHS
ID : R01 CA245894
Pays : United States
Organisme : NICHD NIH HHS
ID : R01 HD106017
Pays : United States
Informations de copyright
© 2023. The Author(s), under exclusive licence to Springer Nature America, Inc.
Références
Salter, S. J. et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 87 (2014).
pubmed: 25387460
pmcid: 4228153
Weyrich, L. S. et al. Laboratory contamination over time during low-biomass sample analysis. Mol. Ecol. Resour. 19, 982–996 (2019).
pubmed: 30887686
pmcid: 6850301
Kim, D. et al. Optimizing methods and dodging pitfalls in microbiome research. Microbiome 5, 52 (2017).
pubmed: 28476139
pmcid: 5420141
Eisenhofer, R. et al. Contamination in low microbial biomass microbiome studies: issues and recommendations. Trends Microbiol. 27, 105–117 (2019).
pubmed: 30497919
Weiss, S. et al. Tracking down the sources of experimental contamination in microbiome studies. Genome Biol. 15, 564 (2014).
pubmed: 25608874
pmcid: 4311479
Aagaard, K. et al. The placenta harbors a unique microbiome. Sci. Transl. Med. 6, 237ra65 (2014).
pubmed: 24848255
pmcid: 4929217
Parnell, L. A. et al. Microbial communities in placentas from term normal pregnancy exhibit spatially variable profiles. Sci Rep. 7, 11200 (2017).
pubmed: 28894161
pmcid: 5593928
Seferovic, M. D. et al. Visualization of microbes by 16S in situ hybridization in term and preterm placentas without intraamniotic infection. Am. J. Obstet. Gynecol. 221, 146.e1–146.e23 (2019).
pubmed: 31055031
de Goffau, M. C. et al. Human placenta has no microbiome but can contain potential pathogens. Nature 572, 329–334 (2019).
pubmed: 31367035
pmcid: 6697540
Leiby, J. S. et al. Lack of detection of a human placenta microbiome in samples from preterm and term deliveries. Microbiome 6, 196 (2018).
pubmed: 30376898
pmcid: 6208038
Kuperman, A. A. et al. Deep microbial analysis of multiple placentas shows no evidence for a placental microbiome. BJOG 127, 159–169 (2020).
pubmed: 31376240
Sinha, R., Abnet, C. C., White, O., Knight, R. & Huttenhower, C. The microbiome quality control project: baseline study design and future directions. Genome Biol. 16, 276 (2015).
pubmed: 26653756
pmcid: 4674991
Edmonds, K. & Williams, L. The role of the negative control in microbiome analyses. FASEB J. 31, 940.3 (2017).
Schierwagen, R. et al. Trust is good, control is better: technical considerations in blood microbiome analysis. Gut 69, 1362–1363 (2020).
pubmed: 31203205
de Goffau, M. C. et al. Recognizing the reagent microbiome. Nat Microbiol 3, 851–853 (2018).
pubmed: 30046175
van der Horst, J. et al. Sterile paper points as a bacterial DNA-contamination source in microbiome profiles of clinical samples. J. Dent. 41, 1297–1301 (2013).
pubmed: 24135296
Olomu, I. N. et al. Elimination of ‘kitome’ and ‘splashome’ contamination results in lack of detection of a unique placental microbiome. BMC Microbiol. 20, 157 (2020).
pubmed: 32527226
pmcid: 7291729
Nejman, D. et al. The human tumor microbiome is composed of tumor type-specific intracellular bacteria. Science 368, 973–980 (2020).
pubmed: 32467386
pmcid: 7757858
Pinto-Ribeiro, I. et al. Evaluation of the use of formalin-fixed and paraffin-embedded archive gastric tissues for microbiota characterization using next-generation sequencing. Int. J. Mol. Sci. 21, 1096 (2020).
pubmed: 32046034
pmcid: 7037826
Poore, G. D. et al. Microbiome analyses of blood and tissues suggest cancer diagnostic approach. Nature 579, 567–574 (2020).
pubmed: 32214244
pmcid: 7500457
Wang, J. et al. Translocation of vaginal microbiota is involved in impairment and protection of uterine health. Nat. Commun. 12, 4191 (2021).
pubmed: 34234149
pmcid: 8263591
Lam, S. Y. et al. Technical challenges regarding the use of formalin-fixed paraffin embedded (FFPE) tissue specimens for the detection of bacterial alterations in colorectal cancer. BMC Microbiol. 21, 297 (2021).
pubmed: 34715774
pmcid: 8555202
Allali, I. et al. Gut microbiome compositional and functional differences between tumor and non-tumor adjacent tissues from cohorts from the US and Spain. Gut Microbes 6, 161–172 (2015).
pubmed: 25875428
pmcid: 4615176
Marotz, C. et al. SARS-CoV-2 detection status associates with bacterial community composition in patients and the hospital environment. Microbiome 9, 132 (2021).
pubmed: 34103074
pmcid: 8186369
Richardson, M., Gottel, N., Gilbert, J. A. & Lax, S. Microbial similarity between students in a common dormitory environment reveals the forensic potential of individual microbial signatures. mBio 10, e01054-19 (2019).
pubmed: 31363029
pmcid: 6667619
Chen, Q.-L. et al. Rare microbial taxa as the major drivers of ecosystem multifunctionality in long-term fertilized soils. Soil Biol. Biochem. 141, 107686 (2020).
Smirnova, E., Huzurbazar, S. & Jafari, F. PERFect: PERmutation Filtering test for microbiome data. Biostatistics 20, 615–631 (2019).
pubmed: 29917060
Davis, N. M., Proctor, D. M., Holmes, S. P., Relman, D. A. & Callahan, B. J. Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. Microbiome 6, 226 (2018).
pubmed: 30558668
pmcid: 6298009
McKnight, D. T. et al. microDecon: a highly accurate read‐subtraction tool for the post‐sequencing removal of contamination in metabarcoding studies. Environ. DNA 1, 14–25 (2019).
Shenhav, L. et al. FEAST: fast expectation-maximization for microbial source tracking. Nat. Methods 16, 627–632 (2019).
pubmed: 31182859
pmcid: 8535041
Knights, D. et al. Bayesian community-wide culture-independent microbial source tracking. Nat. Methods 8, 761–763 (2011).
pubmed: 21765408
pmcid: 3791591
Minich, J. J. et al. Quantifying and understanding well-to-well contamination in microbiome research. mSystems 4, e00186-19 (2019).
pubmed: 31239396
pmcid: 6593221
Lou, Y. C. et al. Using strain-resolved analysis to identify contamination in metagenomics data. Preprint at bioRxiv https://doi.org/10.1101/2022.01.16.476537 (2022).
An, U. et al. STENSL: Microbial Source Tracking with ENvironment SeLection. mSystems 7, e0099521 (2022).
pubmed: 36047699
Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).
pubmed: 31341288
pmcid: 7015180
Karstens, L. et al. Controlling for contaminants in low-biomass 16S rRNA gene sequencing experiments. mSystems 4, e00290-19 (2019).
pubmed: 31164452
pmcid: 6550369
Flores, R. et al. Collection media and delayed freezing effects on microbial composition of human stool. Microbiome 3, 33 (2015).
pubmed: 26269741
pmcid: 4534027
Adams, R. I., Bateman, A. C., Bik, H. M. & Meadow, J. F. Microbiota of the indoor environment: a meta-analysis. Microbiome 3, 49 (2015).
pubmed: 26459172
pmcid: 4604073
Lou, Y. C. et al. Infant gut strain persistence is associated with maternal origin, phylogeny, and traits including surface adhesion and iron acquisition. Cell Rep. Med. 2, 100393 (2021).
pubmed: 34622230
pmcid: 8484513
Hornung, B. V. H., Zwittink, R. D. & Kuijper, E. J. Issues and current standards of controls in microbiome research. FEMS Microbiol. Ecol. 95, fiz045 (2019).
pubmed: 30997495
pmcid: 6469980
Gonzalez, A. et al. Qiita: rapid, web-enabled microbiome meta-analysis. Nat. Methods 15, 796–798 (2018).
pubmed: 30275573
pmcid: 6235622
Minich, J. J. et al. Host biology, ecology and the environment influence microbial biomass and diversity in 101 marine fish species. Nat. Commun. 13, 6978 (2022).
pubmed: 36396943
pmcid: 9671965
Shaffer, J. P. et al. Standardized multi-omics of Earth’s microbiomes reveals microbial and metabolite diversity. Nat Microbiol. 7, 2128–2150 (2022).
pubmed: 36443458
pmcid: 9712116
Chase, J. et al. Geography and location are the primary drivers of office microbiome composition. mSystems 1, e00022-16 (2016).
pubmed: 27822521
pmcid: 5069741
Ramirez, K. S. et al. Biogeographic patterns in below-ground diversity in New York City’s Central Park are similar to those observed globally. Proc. Biol. Sci. 281, 20141988 (2014).
pubmed: 25274366
pmcid: 4213626
Hanes, D. et al. The gastrointestinal and microbiome impact of a resistant starch blend from potato, banana, and apple fibers: a randomized clinical trial using smart caps. Front. Nutr. 9, 987216 (2022).
pubmed: 36245486
pmcid: 9559413
Shaffer, J. P. et al. A comparison of DNA/RNA extraction protocols for high-throughput sequencing of microbial communities. Biotechniques 70, 149–159 (2021).
pubmed: 33512248
pmcid: 7931620
Ruiz-Calderon, J. F. et al. Walls talk: microbial biogeography of homes spanning urbanization. Sci. Adv. 2, e1501061 (2016).
pubmed: 26933683
pmcid: 4758746
Robin, X. et al. pROC: an open-source package for R and S to analyze and compare ROC curves. BMC Bioinformatics 12, 77 (2011).
pubmed: 21414208
pmcid: 3068975
Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).
pubmed: 27214047
pmcid: 4927377
Annavajhala, M. K. et al. Oral and gut microbial diversity and immune regulation in patients with HIV on antiretroviral therapy. mSphere 5, e00798-19 (2020).
pubmed: 32024712
pmcid: 7002309
Graspeuntner, S., Loeper, N., Künzel, S., Baines, J. F. & Rupp, J. Selection of validated hypervariable regions is crucial in 16S-based microbiota studies of the female genital tract. Sci. Rep. 8, 9678 (2018).
pubmed: 29946153
pmcid: 6018735
Herlemann, D. P. et al. Transitions in bacterial communities along the 2000 km salinity gradient of the Baltic Sea. ISME J. 5, 1571–1579 (2011).
pubmed: 21472016
pmcid: 3176514
Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).
pubmed: 24485249
pmcid: 4053721
Austin, G. I. et al. Contamination benchmark using human-derived samples. NCBI https://www.ncbi.nlm.nih.gov/bioproject/PRJNA905430 (2022).
Austin, G. I., Shenhav, L. & Korem, T. SCRuB. GitHuB https://github.com/Shenhav-and-Korem-labs/SCRuB (2023).
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
pubmed: 20808728
pmcid: 2929880
Shenhav, L., Korem, T., & Austin, G. Contamination source modeling with SCRuB improves cancer phenotype prediction from microbiome data. Code Ocean https://doi.org/10.24433/CO.2307706.v1 (2023).
Wickham, H. et al. Welcome to the tidyverse. J. Open Source Softw. 4, 1686 (2019).
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Krishnapuram, B. et al.) 785–794 (ACM, 2016).