Readsynth: short-read simulation for consideration of composition-biases in reduced metagenome sequencing approaches.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
15 May 2024
Historique:
received: 09 01 2024
accepted: 10 05 2024
medline: 16 5 2024
pubmed: 16 5 2024
entrez: 15 5 2024
Statut: epublish

Résumé

The application of reduced metagenomic sequencing approaches holds promise as a middle ground between targeted amplicon sequencing and whole metagenome sequencing approaches but has not been widely adopted as a technique. A major barrier to adoption is the lack of read simulation software built to handle characteristic features of these novel approaches. Reduced metagenomic sequencing (RMS) produces unique patterns of fragmentation per genome that are sensitive to restriction enzyme choice, and the non-uniform size selection of these fragments may introduce novel challenges to taxonomic assignment as well as relative abundance estimates. Through the development and application of simulation software, readsynth, we compare simulated metagenomic sequencing libraries with existing RMS data to assess the influence of multiple library preparation and sequencing steps on downstream analytical results. Based on read depth per position, readsynth achieved 0.79 Pearson's correlation and 0.94 Spearman's correlation to these benchmarks. Application of a novel estimation approach, fixed length taxonomic ratios, improved quantification accuracy of simulated human gut microbial communities when compared to estimates of mean or median coverage. We investigate the possible strengths and weaknesses of applying the RMS technique to profiling microbial communities via simulations with readsynth. The choice of restriction enzymes and size selection steps in library prep are non-trivial decisions that bias downstream profiling and quantification. The simulations investigated in this study illustrate the possible limits of preparing metagenomic libraries with a reduced representation sequencing approach, but also allow for the development of strategies for producing and handling the sequence data produced by this promising application.

Sections du résumé

BACKGROUND BACKGROUND
The application of reduced metagenomic sequencing approaches holds promise as a middle ground between targeted amplicon sequencing and whole metagenome sequencing approaches but has not been widely adopted as a technique. A major barrier to adoption is the lack of read simulation software built to handle characteristic features of these novel approaches. Reduced metagenomic sequencing (RMS) produces unique patterns of fragmentation per genome that are sensitive to restriction enzyme choice, and the non-uniform size selection of these fragments may introduce novel challenges to taxonomic assignment as well as relative abundance estimates.
RESULTS RESULTS
Through the development and application of simulation software, readsynth, we compare simulated metagenomic sequencing libraries with existing RMS data to assess the influence of multiple library preparation and sequencing steps on downstream analytical results. Based on read depth per position, readsynth achieved 0.79 Pearson's correlation and 0.94 Spearman's correlation to these benchmarks. Application of a novel estimation approach, fixed length taxonomic ratios, improved quantification accuracy of simulated human gut microbial communities when compared to estimates of mean or median coverage.
CONCLUSIONS CONCLUSIONS
We investigate the possible strengths and weaknesses of applying the RMS technique to profiling microbial communities via simulations with readsynth. The choice of restriction enzymes and size selection steps in library prep are non-trivial decisions that bias downstream profiling and quantification. The simulations investigated in this study illustrate the possible limits of preparing metagenomic libraries with a reduced representation sequencing approach, but also allow for the development of strategies for producing and handling the sequence data produced by this promising application.

Identifiants

pubmed: 38750423
doi: 10.1186/s12859-024-05809-3
pii: 10.1186/s12859-024-05809-3
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

191

Informations de copyright

© 2024. The Author(s).

Références

Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE. Double digest RADseq: an inexpensive method for De novo SNP discovery and genotyping in model and non-model species. PLoS ONE. 2012;7(5):11.
doi: 10.1371/journal.pone.0037135
Liu M, Worden P, Monahan LG, DeMaere MZ, Burke CM, Djordjevic SP, et al. Evaluation of ddRAD seq for reduced representation metagenome sequencing. PeerJ. 2017;5:9.
doi: 10.7717/peerj.3837
Ravi A, Avershina E, Angell IL, Ludvigsen J, Manohar P, Padmanaban S, et al. Comparison of reduced metagenome and 16S rRNA gene sequencing for determination of genetic diversity and mother-child overlap of the gut associated microbiota. J Microbiol Methods. 2018;149:44–52.
doi: 10.1016/j.mimet.2018.02.016 pubmed: 29501688
Hess MK, Rowe SJ, Van Stijn TC, Henry HM, Hickey SM, Brauning R, et al. A restriction enzyme reduced representation sequencing approach for low-cost, high-throughput metagenome profiling. PLoS ONE. 2020;15(4):18.
doi: 10.1371/journal.pone.0219882
Snipen L, Angell IL, Rognes T, Rudi K. Reduced metagenome sequencing for strain-resolution taxonomic profiles. Microbiome. 2021;9(1):19.
doi: 10.1186/s40168-021-01019-8
Goodrich JK, Di Rienzi SC, Poole AC, Koren O, Walters WA, Caporaso JG, et al. Conducting a microbiome study. Cell. 2014;158(2):250–62.
doi: 10.1016/j.cell.2014.06.037 pubmed: 25036628 pmcid: 5074386
Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE. 2011;6(5):10.
doi: 10.1371/journal.pone.0019379
Miller MR, Dunham JP, Amores A, Cresko WA, Johnson EA. Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res. 2007;17(2):240–8.
doi: 10.1101/gr.5681207 pubmed: 17189378 pmcid: 1781356
Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE. 2008;3(10):7.
doi: 10.1371/journal.pone.0003376
Ochman H, Caro-Quintero A. Genome size and structure, bacterial. In: Kliman RM, editor. Encyclopedia of evolutionary biology [Internet]. Oxford: Academic Press; 2016. p. 179–85. Available from: https://www.sciencedirect.com/science/article/pii/B9780128000496002353
Andrews KR, Good JM, Miller MR, Luikart G, Hohenlohe PA. Harnessing the power of RADseq for ecological and evolutionary genomics. Nat Rev Genet. 2016;17(2):81–92.
doi: 10.1038/nrg.2015.28 pubmed: 26729255 pmcid: 4823021
Davey JW, Cezard T, Fuentes-Utrilla P, Eland C, Gharbi K, Blaxter ML. Special features of RAD sequencing data: implications for genotyping. Mol Ecol. 2013;22(11):3151–64.
doi: 10.1111/mec.12084 pubmed: 23110438
DaCosta JM, Sorenson MD. Amplification biases and consistent recovery of loci in a double-digest RAD-seq protocol. PLoS ONE. 2014;9(9):14.
doi: 10.1371/journal.pone.0106713
Sun Z, Huang S, Zhu P, Tzehau L, Zhao H, Lv J, et al. Species-resolved sequencing of low-biomass or degraded microbiomes using 2bRAD-M. Genome Biol. 2022;23(1):36.
doi: 10.1186/s13059-021-02576-9 pubmed: 35078506 pmcid: 8789378
Sun Z, Liu J, Zhang M, Wang T, Huang S, Weiss ST, et al. Removal of false positives in metagenomics-based taxonomy profiling via targeting type IIB restriction sites. Nat Commun. 2023;14(1):5321.
doi: 10.1038/s41467-023-41099-8 pubmed: 37658057 pmcid: 10474111
Stoddard SF, Smith BJ, Hein R, Roller BRK, Schmidt TM. rrnDB: improved tools for interpreting rRNA gene abundance in bacteria and archaea and a new foundation for future development. Nucleic Acids Res. 2015;43(D1):D593–8.
doi: 10.1093/nar/gku1201 pubmed: 25414355
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal Vol 17 No 1 Gener Seq Data Anal [Internet]. 2011; Available from: https://journal.embnet.org/index.php/embnetjournal/article/view/200
Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinforma Oxf Engl. 2010;26(5):589–95.
doi: 10.1093/bioinformatics/btp698
Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2):giab008.
doi: 10.1093/gigascience/giab008 pubmed: 33590861 pmcid: 7931819
Lu J, Breitwieser FP, Thielen P, Salzberg SL. Bracken: estimating species abundance in metagenomics data. Peerj Comput Sci. 2017;5:14082.
Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20(1):13.
doi: 10.1186/s13059-019-1891-0
Deschasaux M, Bouter KE, Prodan A, Levin E, Groen AK, Herrema H, et al. Depicting the composition of gut microbiota in a population with varied ethnic origins but shared geography. Nat Med. 2018;24(10):1526–31.
doi: 10.1038/s41591-018-0160-1 pubmed: 30150717
Rodriguez-R LM, Konstantinidis KT. Estimating coverage in metagenomic data sets and why it matters. ISME J. 2014;8(11):2349–51.
doi: 10.1038/ismej.2014.76 pubmed: 24824669 pmcid: 4992084
Anyansi C, Straub TJ, Manson AL, Earl AM, Abeel T. Computational methods for strain-level microbial detection in colony and metagenome sequencing data. Front Microbiol. 2020;11:17.
doi: 10.3389/fmicb.2020.01925

Auteurs

Ryan Kuster (R)

Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA. rkuster@utk.edu.

Margaret Staton (M)

Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH