Contamination source modeling with SCRuB improves cancer phenotype prediction from microbiome data.


Journal

Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648

Informations de publication

Date de publication:
Dec 2023
Historique:
received: 17 05 2022
accepted: 23 01 2023
pmc-release: 16 09 2024
pubmed: 18 3 2023
medline: 18 3 2023
entrez: 17 3 2023
Statut: ppublish

Résumé

Sequencing-based approaches for the analysis of microbial communities are susceptible to contamination, which could mask biological signals or generate artifactual ones. Methods for in silico decontamination using controls are routinely used, but do not make optimal use of information shared across samples and cannot handle taxa that only partially originate in contamination or leakage of biological material into controls. Here we present Source tracking for Contamination Removal in microBiomes (SCRuB), a probabilistic in silico decontamination method that incorporates shared information across multiple samples and controls to precisely identify and remove contamination. We validate the accuracy of SCRuB in multiple data-driven simulations and experiments, including induced contamination, and demonstrate that it outperforms state-of-the-art methods by an average of 15-20 times. We showcase the robustness of SCRuB across multiple ecosystems, data types and sequencing depths. Demonstrating its applicability to microbiome research, SCRuB facilitates improved predictions of host phenotypes, most notably the prediction of treatment response in melanoma patients using decontaminated tumor microbiome data.

Identifiants

pubmed: 36928429
doi: 10.1038/s41587-023-01696-w
pii: 10.1038/s41587-023-01696-w
pmc: PMC10504420
mid: NIHMS1881945
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

1820-1828

Subventions

Organisme : NCI NIH HHS
ID : R01 CA245894
Pays : United States
Organisme : NICHD NIH HHS
ID : R01 HD106017
Pays : United States

Informations de copyright

© 2023. The Author(s), under exclusive licence to Springer Nature America, Inc.

Références

Salter, S. J. et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 87 (2014).
pubmed: 25387460 pmcid: 4228153
Weyrich, L. S. et al. Laboratory contamination over time during low-biomass sample analysis. Mol. Ecol. Resour. 19, 982–996 (2019).
pubmed: 30887686 pmcid: 6850301
Kim, D. et al. Optimizing methods and dodging pitfalls in microbiome research. Microbiome 5, 52 (2017).
pubmed: 28476139 pmcid: 5420141
Eisenhofer, R. et al. Contamination in low microbial biomass microbiome studies: issues and recommendations. Trends Microbiol. 27, 105–117 (2019).
pubmed: 30497919
Weiss, S. et al. Tracking down the sources of experimental contamination in microbiome studies. Genome Biol. 15, 564 (2014).
pubmed: 25608874 pmcid: 4311479
Aagaard, K. et al. The placenta harbors a unique microbiome. Sci. Transl. Med. 6, 237ra65 (2014).
pubmed: 24848255 pmcid: 4929217
Parnell, L. A. et al. Microbial communities in placentas from term normal pregnancy exhibit spatially variable profiles. Sci Rep. 7, 11200 (2017).
pubmed: 28894161 pmcid: 5593928
Seferovic, M. D. et al. Visualization of microbes by 16S in situ hybridization in term and preterm placentas without intraamniotic infection. Am. J. Obstet. Gynecol. 221, 146.e1–146.e23 (2019).
pubmed: 31055031
de Goffau, M. C. et al. Human placenta has no microbiome but can contain potential pathogens. Nature 572, 329–334 (2019).
pubmed: 31367035 pmcid: 6697540
Leiby, J. S. et al. Lack of detection of a human placenta microbiome in samples from preterm and term deliveries. Microbiome 6, 196 (2018).
pubmed: 30376898 pmcid: 6208038
Kuperman, A. A. et al. Deep microbial analysis of multiple placentas shows no evidence for a placental microbiome. BJOG 127, 159–169 (2020).
pubmed: 31376240
Sinha, R., Abnet, C. C., White, O., Knight, R. & Huttenhower, C. The microbiome quality control project: baseline study design and future directions. Genome Biol. 16, 276 (2015).
pubmed: 26653756 pmcid: 4674991
Edmonds, K. & Williams, L. The role of the negative control in microbiome analyses. FASEB J. 31, 940.3 (2017).
Schierwagen, R. et al. Trust is good, control is better: technical considerations in blood microbiome analysis. Gut 69, 1362–1363 (2020).
pubmed: 31203205
de Goffau, M. C. et al. Recognizing the reagent microbiome. Nat Microbiol 3, 851–853 (2018).
pubmed: 30046175
van der Horst, J. et al. Sterile paper points as a bacterial DNA-contamination source in microbiome profiles of clinical samples. J. Dent. 41, 1297–1301 (2013).
pubmed: 24135296
Olomu, I. N. et al. Elimination of ‘kitome’ and ‘splashome’ contamination results in lack of detection of a unique placental microbiome. BMC Microbiol. 20, 157 (2020).
pubmed: 32527226 pmcid: 7291729
Nejman, D. et al. The human tumor microbiome is composed of tumor type-specific intracellular bacteria. Science 368, 973–980 (2020).
pubmed: 32467386 pmcid: 7757858
Pinto-Ribeiro, I. et al. Evaluation of the use of formalin-fixed and paraffin-embedded archive gastric tissues for microbiota characterization using next-generation sequencing. Int. J. Mol. Sci. 21, 1096 (2020).
pubmed: 32046034 pmcid: 7037826
Poore, G. D. et al. Microbiome analyses of blood and tissues suggest cancer diagnostic approach. Nature 579, 567–574 (2020).
pubmed: 32214244 pmcid: 7500457
Wang, J. et al. Translocation of vaginal microbiota is involved in impairment and protection of uterine health. Nat. Commun. 12, 4191 (2021).
pubmed: 34234149 pmcid: 8263591
Lam, S. Y. et al. Technical challenges regarding the use of formalin-fixed paraffin embedded (FFPE) tissue specimens for the detection of bacterial alterations in colorectal cancer. BMC Microbiol. 21, 297 (2021).
pubmed: 34715774 pmcid: 8555202
Allali, I. et al. Gut microbiome compositional and functional differences between tumor and non-tumor adjacent tissues from cohorts from the US and Spain. Gut Microbes 6, 161–172 (2015).
pubmed: 25875428 pmcid: 4615176
Marotz, C. et al. SARS-CoV-2 detection status associates with bacterial community composition in patients and the hospital environment. Microbiome 9, 132 (2021).
pubmed: 34103074 pmcid: 8186369
Richardson, M., Gottel, N., Gilbert, J. A. & Lax, S. Microbial similarity between students in a common dormitory environment reveals the forensic potential of individual microbial signatures. mBio 10, e01054-19 (2019).
pubmed: 31363029 pmcid: 6667619
Chen, Q.-L. et al. Rare microbial taxa as the major drivers of ecosystem multifunctionality in long-term fertilized soils. Soil Biol. Biochem. 141, 107686 (2020).
Smirnova, E., Huzurbazar, S. & Jafari, F. PERFect: PERmutation Filtering test for microbiome data. Biostatistics 20, 615–631 (2019).
pubmed: 29917060
Davis, N. M., Proctor, D. M., Holmes, S. P., Relman, D. A. & Callahan, B. J. Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. Microbiome 6, 226 (2018).
pubmed: 30558668 pmcid: 6298009
McKnight, D. T. et al. microDecon: a highly accurate read‐subtraction tool for the post‐sequencing removal of contamination in metabarcoding studies. Environ. DNA 1, 14–25 (2019).
Shenhav, L. et al. FEAST: fast expectation-maximization for microbial source tracking. Nat. Methods 16, 627–632 (2019).
pubmed: 31182859 pmcid: 8535041
Knights, D. et al. Bayesian community-wide culture-independent microbial source tracking. Nat. Methods 8, 761–763 (2011).
pubmed: 21765408 pmcid: 3791591
Minich, J. J. et al. Quantifying and understanding well-to-well contamination in microbiome research. mSystems 4, e00186-19 (2019).
pubmed: 31239396 pmcid: 6593221
Lou, Y. C. et al. Using strain-resolved analysis to identify contamination in metagenomics data. Preprint at bioRxiv https://doi.org/10.1101/2022.01.16.476537 (2022).
An, U. et al. STENSL: Microbial Source Tracking with ENvironment SeLection. mSystems 7, e0099521 (2022).
pubmed: 36047699
Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).
pubmed: 31341288 pmcid: 7015180
Karstens, L. et al. Controlling for contaminants in low-biomass 16S rRNA gene sequencing experiments. mSystems 4, e00290-19 (2019).
pubmed: 31164452 pmcid: 6550369
Flores, R. et al. Collection media and delayed freezing effects on microbial composition of human stool. Microbiome 3, 33 (2015).
pubmed: 26269741 pmcid: 4534027
Adams, R. I., Bateman, A. C., Bik, H. M. & Meadow, J. F. Microbiota of the indoor environment: a meta-analysis. Microbiome 3, 49 (2015).
pubmed: 26459172 pmcid: 4604073
Lou, Y. C. et al. Infant gut strain persistence is associated with maternal origin, phylogeny, and traits including surface adhesion and iron acquisition. Cell Rep. Med. 2, 100393 (2021).
pubmed: 34622230 pmcid: 8484513
Hornung, B. V. H., Zwittink, R. D. & Kuijper, E. J. Issues and current standards of controls in microbiome research. FEMS Microbiol. Ecol. 95, fiz045 (2019).
pubmed: 30997495 pmcid: 6469980
Gonzalez, A. et al. Qiita: rapid, web-enabled microbiome meta-analysis. Nat. Methods 15, 796–798 (2018).
pubmed: 30275573 pmcid: 6235622
Minich, J. J. et al. Host biology, ecology and the environment influence microbial biomass and diversity in 101 marine fish species. Nat. Commun. 13, 6978 (2022).
pubmed: 36396943 pmcid: 9671965
Shaffer, J. P. et al. Standardized multi-omics of Earth’s microbiomes reveals microbial and metabolite diversity. Nat Microbiol. 7, 2128–2150 (2022).
pubmed: 36443458 pmcid: 9712116
Chase, J. et al. Geography and location are the primary drivers of office microbiome composition. mSystems 1, e00022-16 (2016).
pubmed: 27822521 pmcid: 5069741
Ramirez, K. S. et al. Biogeographic patterns in below-ground diversity in New York City’s Central Park are similar to those observed globally. Proc. Biol. Sci. 281, 20141988 (2014).
pubmed: 25274366 pmcid: 4213626
Hanes, D. et al. The gastrointestinal and microbiome impact of a resistant starch blend from potato, banana, and apple fibers: a randomized clinical trial using smart caps. Front. Nutr. 9, 987216 (2022).
pubmed: 36245486 pmcid: 9559413
Shaffer, J. P. et al. A comparison of DNA/RNA extraction protocols for high-throughput sequencing of microbial communities. Biotechniques 70, 149–159 (2021).
pubmed: 33512248 pmcid: 7931620
Ruiz-Calderon, J. F. et al. Walls talk: microbial biogeography of homes spanning urbanization. Sci. Adv. 2, e1501061 (2016).
pubmed: 26933683 pmcid: 4758746
Robin, X. et al. pROC: an open-source package for R and S to analyze and compare ROC curves. BMC Bioinformatics 12, 77 (2011).
pubmed: 21414208 pmcid: 3068975
Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).
pubmed: 27214047 pmcid: 4927377
Annavajhala, M. K. et al. Oral and gut microbial diversity and immune regulation in patients with HIV on antiretroviral therapy. mSphere 5, e00798-19 (2020).
pubmed: 32024712 pmcid: 7002309
Graspeuntner, S., Loeper, N., Künzel, S., Baines, J. F. & Rupp, J. Selection of validated hypervariable regions is crucial in 16S-based microbiota studies of the female genital tract. Sci. Rep. 8, 9678 (2018).
pubmed: 29946153 pmcid: 6018735
Herlemann, D. P. et al. Transitions in bacterial communities along the 2000 km salinity gradient of the Baltic Sea. ISME J. 5, 1571–1579 (2011).
pubmed: 21472016 pmcid: 3176514
Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).
pubmed: 24485249 pmcid: 4053721
Austin, G. I. et al. Contamination benchmark using human-derived samples. NCBI https://www.ncbi.nlm.nih.gov/bioproject/PRJNA905430 (2022).
Austin, G. I., Shenhav, L. & Korem, T. SCRuB. GitHuB https://github.com/Shenhav-and-Korem-labs/SCRuB (2023).
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
pubmed: 20808728 pmcid: 2929880
Shenhav, L., Korem, T., & Austin, G. Contamination source modeling with SCRuB improves cancer phenotype prediction from microbiome data. Code Ocean https://doi.org/10.24433/CO.2307706.v1 (2023).
Wickham, H. et al. Welcome to the tidyverse. J. Open Source Softw. 4, 1686 (2019).
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Krishnapuram, B. et al.) 785–794 (ACM, 2016).

Auteurs

George I Austin (GI)

Department of Computer Science, Columbia University, New York, NY, USA.
Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.

Heekuk Park (H)

Division of Infectious Diseases, Columbia University Irving Medical Center, New York, NY, USA.

Yoli Meydan (Y)

Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.

Dwayne Seeram (D)

Division of Infectious Diseases, Columbia University Irving Medical Center, New York, NY, USA.

Tanya Sezin (T)

Department of Dermatology, Columbia University Irving Medical Center, New York, NY, USA.

Yue Clare Lou (YC)

Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA.

Brian A Firek (BA)

Department of Surgery, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.

Michael J Morowitz (MJ)

Department of Surgery, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.

Jillian F Banfield (JF)

Department of Earth and Planetary Science, University of California, Berkeley, CA, USA.
Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA.
Innovative Genomics Institute, University of California, Berkeley, CA, USA.
Chan Zuckerberg Biohub, San Francisco, CA, USA.

Angela M Christiano (AM)

Department of Dermatology, Columbia University Irving Medical Center, New York, NY, USA.
Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY, USA.

Itsik Pe'er (I)

Department of Computer Science, Columbia University, New York, NY, USA.
Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
Data Science Institute, Columbia University, New York, NY, USA.

Anne-Catrin Uhlemann (AC)

Division of Infectious Diseases, Columbia University Irving Medical Center, New York, NY, USA.

Liat Shenhav (L)

Center for Studies in Physics and Biology, Rockefeller University, New York, NY, USA. lshenhav@rockefeller.edu.

Tal Korem (T)

Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA. tal.korem@columbia.edu.
Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA. tal.korem@columbia.edu.
CIFAR Azrieli Global Scholars program, CIFAR, Toronto, Canada. tal.korem@columbia.edu.

Classifications MeSH