Dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology.

High-Throughput Nucleotide Sequencing Microbiota Phylogeny RNA, Ribosomal, 16S / genetics Software

R community structure denoising exact sequence variants microbiome pipeline rRNA gene sequence analysis

Journal

GigaScience

ISSN: 2047-217X

Titre abrégé: Gigascience

Pays: United States

ID NLM: 101596872

Informations de publication

Date de publication:
30 11 2020

Historique:

received: 22 05 2020

revised: 20 10 2020

accepted: 05 11 2020

entrez: 30 11 2020

pubmed: 1 12 2020

medline: 26 10 2021

Statut: ppublish

Résumé

Amplicon sequencing of phylogenetic marker genes, e.g., 16S, 18S, or ITS ribosomal RNA sequences, is still the most commonly used method to determine the composition of microbial communities. Microbial ecologists often have expert knowledge on their biological question and data analysis in general, and most research institutes have computational infrastructures to use the bioinformatics command line tools and workflows for amplicon sequencing analysis, but requirements of bioinformatics skills often limit the efficient and up-to-date use of computational resources. We present dadasnake, a user-friendly, 1-command Snakemake pipeline that wraps the preprocessing of sequencing reads and the delineation of exact sequence variants by using the favorably benchmarked and widely used DADA2 algorithm with a taxonomic classification and the post-processing of the resultant tables, including hand-off in standard formats. The suitability of the provided default configurations is demonstrated using mock community data from bacteria and archaea, as well as fungi. By use of Snakemake, dadasnake makes efficient use of high-performance computing infrastructures. Easy user configuration guarantees flexibility of all steps, including the processing of data from multiple sequencing platforms. It is easy to install dadasnake via conda environments. dadasnake is available at https://github.com/a-h-b/dadasnake.

Sections du résumé

BACKGROUND

RESULTS

We present dadasnake, a user-friendly, 1-command Snakemake pipeline that wraps the preprocessing of sequencing reads and the delineation of exact sequence variants by using the favorably benchmarked and widely used DADA2 algorithm with a taxonomic classification and the post-processing of the resultant tables, including hand-off in standard formats. The suitability of the provided default configurations is demonstrated using mock community data from bacteria and archaea, as well as fungi.

CONCLUSIONS

By use of Snakemake, dadasnake makes efficient use of high-performance computing infrastructures. Easy user configuration guarantees flexibility of all steps, including the processing of data from multiple sequencing platforms. It is easy to install dadasnake via conda environments. dadasnake is available at https://github.com/a-h-b/dadasnake.

Identifiants

DOI: 10.1093/gigascience/giaa135 PMID: 33252655 PMC: PMC7702218

pubmed: 33252655

pii: 6011256

doi: 10.1093/gigascience/giaa135

pmc: PMC7702218

pii:

doi:

Substances chimiques

RNA, Ribosomal, 16S 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

Nucleic Acids Res. 2019 Oct 10;47(18):e103

pubmed: 31269198

Microbiome. 2018 Aug 9;6(1):140

pubmed: 30092815

mSphere. 2018 Jul 18;3(4):

pubmed: 30021874

PLoS One. 2020 Feb 13;15(2):e0228899

pubmed: 32053657

Mol Ecol. 2016 Jun;25(12):2816-32

pubmed: 27092961

Sci Rep. 2017 Feb 06;7:41948

pubmed: 28165046

Nat Commun. 2019 Nov 6;10(1):5029

pubmed: 31695033

Microbiome. 2019 Sep 14;7(1):133

pubmed: 31521200

BMC Genomics. 2016 Jan 14;17:55

pubmed: 26763898

PeerJ. 2016 Oct 18;4:e2584

pubmed: 27781170

mSphere. 2020 Mar 4;5(2):

pubmed: 32132159

PLoS One. 2010 Mar 10;5(3):e9490

pubmed: 20224823

Methods Mol Biol. 2014;1079:105-16

pubmed: 24170397

Gigascience. 2017 Feb 1;6(2):1-10

pubmed: 28369460

FEMS Microbiol Ecol. 2012 Dec;82(3):666-77

pubmed: 22738186

Mol Ecol Resour. 2018 May;18(3):541-556

pubmed: 29389073

Gigascience. 2016 Aug 02;5(1):34

pubmed: 27485345

Sci Rep. 2015 Sep 16;5:14181

pubmed: 26373611

Nature. 2017 Nov 23;551(7681):457-463

pubmed: 29088705

PLoS One. 2020 Jan 16;15(1):e0227434

pubmed: 31945086

Nat Methods. 2016 Jul;13(7):581-3

pubmed: 27214047

Appl Environ Microbiol. 2009 Dec;75(23):7537-41

pubmed: 19801464

Ecology. 1971 Jul;52(4):577-586

pubmed: 28973811

PLoS One. 2013 Apr 22;8(4):e61217

pubmed: 23630581

mSystems. 2019 Feb 19;4(1):

pubmed: 30801029

mSystems. 2018 Apr 3;3(3):

pubmed: 29629423

Nat Methods. 2013 Oct;10(10):996-8

pubmed: 23955772

Gigascience. 2020 Mar 1;9(3):

pubmed: 32161947

Bioinformatics. 2012 Oct 1;28(19):2520-2

pubmed: 22908215

ISME J. 2017 Dec;11(12):2639-2643

pubmed: 28731476

Gigascience. 2018 Dec 1;7(12):

pubmed: 30476081

Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6

pubmed: 23193283

Proc Natl Acad Sci U S A. 2006 Aug 8;103(32):12115-20

pubmed: 16880384

Biometrics. 2014 Sep;70(3):671-82

pubmed: 24945937

Front Microbiol. 2017 Sep 04;8:1561

pubmed: 28928718

PeerJ. 2018 Aug 8;6:e5364

pubmed: 30123705

Mol Ecol. 2013 Nov;22(21):5271-7

pubmed: 24112409

Proc Natl Acad Sci U S A. 2015 Sep 1;112(35):10967-72

pubmed: 26283343

Microbiome. 2014 Sep 30;2(1):30

pubmed: 27367037

BMC Bioinformatics. 2009 Dec 15;10:421

pubmed: 20003500

Sci Rep. 2017 Jul 31;7(1):6589

pubmed: 28761145

Nat Biotechnol. 2019 Aug;37(8):852-857

pubmed: 31341288

BMC Biol. 2014 Nov 12;12:87

pubmed: 25387460

Dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Christina Weißbecker (C)

Beatrix Schnabel (B)

Anna Heintz-Buschart (A)

Articles similaires

Comprehensive comparative analysis and development of molecular markers for Lasianthus species based on complete chloroplast genome sequences.

Selecting optimal software code descriptors-The case of Java.

Decoding the genomic terrain: functional insights into 14 chemosensory proteins in whitefly Bemisia tabaci Asia II-1.

Planting density effect on poplar growth traits and soil nutrient availability, and response of microbial community, assembly and function.

Classifications MeSH