Dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology.
R
community structure
denoising
exact sequence variants
microbiome
pipeline
rRNA gene sequence analysis
Journal
GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872
Informations de publication
Date de publication:
30 11 2020
30 11 2020
Historique:
received:
22
05
2020
revised:
20
10
2020
accepted:
05
11
2020
entrez:
30
11
2020
pubmed:
1
12
2020
medline:
26
10
2021
Statut:
ppublish
Résumé
Amplicon sequencing of phylogenetic marker genes, e.g., 16S, 18S, or ITS ribosomal RNA sequences, is still the most commonly used method to determine the composition of microbial communities. Microbial ecologists often have expert knowledge on their biological question and data analysis in general, and most research institutes have computational infrastructures to use the bioinformatics command line tools and workflows for amplicon sequencing analysis, but requirements of bioinformatics skills often limit the efficient and up-to-date use of computational resources. We present dadasnake, a user-friendly, 1-command Snakemake pipeline that wraps the preprocessing of sequencing reads and the delineation of exact sequence variants by using the favorably benchmarked and widely used DADA2 algorithm with a taxonomic classification and the post-processing of the resultant tables, including hand-off in standard formats. The suitability of the provided default configurations is demonstrated using mock community data from bacteria and archaea, as well as fungi. By use of Snakemake, dadasnake makes efficient use of high-performance computing infrastructures. Easy user configuration guarantees flexibility of all steps, including the processing of data from multiple sequencing platforms. It is easy to install dadasnake via conda environments. dadasnake is available at https://github.com/a-h-b/dadasnake.
Sections du résumé
BACKGROUND
Amplicon sequencing of phylogenetic marker genes, e.g., 16S, 18S, or ITS ribosomal RNA sequences, is still the most commonly used method to determine the composition of microbial communities. Microbial ecologists often have expert knowledge on their biological question and data analysis in general, and most research institutes have computational infrastructures to use the bioinformatics command line tools and workflows for amplicon sequencing analysis, but requirements of bioinformatics skills often limit the efficient and up-to-date use of computational resources.
RESULTS
We present dadasnake, a user-friendly, 1-command Snakemake pipeline that wraps the preprocessing of sequencing reads and the delineation of exact sequence variants by using the favorably benchmarked and widely used DADA2 algorithm with a taxonomic classification and the post-processing of the resultant tables, including hand-off in standard formats. The suitability of the provided default configurations is demonstrated using mock community data from bacteria and archaea, as well as fungi.
CONCLUSIONS
By use of Snakemake, dadasnake makes efficient use of high-performance computing infrastructures. Easy user configuration guarantees flexibility of all steps, including the processing of data from multiple sequencing platforms. It is easy to install dadasnake via conda environments. dadasnake is available at https://github.com/a-h-b/dadasnake.
Identifiants
pubmed: 33252655
pii: 6011256
doi: 10.1093/gigascience/giaa135
pmc: PMC7702218
pii:
doi:
Substances chimiques
RNA, Ribosomal, 16S
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© The Author(s) 2020. Published by Oxford University Press GigaScience.
Références
Nucleic Acids Res. 2019 Oct 10;47(18):e103
pubmed: 31269198
Microbiome. 2018 Aug 9;6(1):140
pubmed: 30092815
mSphere. 2018 Jul 18;3(4):
pubmed: 30021874
PLoS One. 2020 Feb 13;15(2):e0228899
pubmed: 32053657
Mol Ecol. 2016 Jun;25(12):2816-32
pubmed: 27092961
Sci Rep. 2017 Feb 06;7:41948
pubmed: 28165046
Nat Commun. 2019 Nov 6;10(1):5029
pubmed: 31695033
Microbiome. 2019 Sep 14;7(1):133
pubmed: 31521200
BMC Genomics. 2016 Jan 14;17:55
pubmed: 26763898
PeerJ. 2016 Oct 18;4:e2584
pubmed: 27781170
mSphere. 2020 Mar 4;5(2):
pubmed: 32132159
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
Methods Mol Biol. 2014;1079:105-16
pubmed: 24170397
Gigascience. 2017 Feb 1;6(2):1-10
pubmed: 28369460
FEMS Microbiol Ecol. 2012 Dec;82(3):666-77
pubmed: 22738186
Mol Ecol Resour. 2018 May;18(3):541-556
pubmed: 29389073
Gigascience. 2016 Aug 02;5(1):34
pubmed: 27485345
Sci Rep. 2015 Sep 16;5:14181
pubmed: 26373611
Nature. 2017 Nov 23;551(7681):457-463
pubmed: 29088705
PLoS One. 2020 Jan 16;15(1):e0227434
pubmed: 31945086
Nat Methods. 2016 Jul;13(7):581-3
pubmed: 27214047
Appl Environ Microbiol. 2009 Dec;75(23):7537-41
pubmed: 19801464
Ecology. 1971 Jul;52(4):577-586
pubmed: 28973811
PLoS One. 2013 Apr 22;8(4):e61217
pubmed: 23630581
mSystems. 2019 Feb 19;4(1):
pubmed: 30801029
mSystems. 2018 Apr 3;3(3):
pubmed: 29629423
Nat Methods. 2013 Oct;10(10):996-8
pubmed: 23955772
Gigascience. 2020 Mar 1;9(3):
pubmed: 32161947
Bioinformatics. 2012 Oct 1;28(19):2520-2
pubmed: 22908215
ISME J. 2017 Dec;11(12):2639-2643
pubmed: 28731476
Gigascience. 2018 Dec 1;7(12):
pubmed: 30476081
Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6
pubmed: 23193283
Proc Natl Acad Sci U S A. 2006 Aug 8;103(32):12115-20
pubmed: 16880384
Biometrics. 2014 Sep;70(3):671-82
pubmed: 24945937
Front Microbiol. 2017 Sep 04;8:1561
pubmed: 28928718
PeerJ. 2018 Aug 8;6:e5364
pubmed: 30123705
Mol Ecol. 2013 Nov;22(21):5271-7
pubmed: 24112409
Proc Natl Acad Sci U S A. 2015 Sep 1;112(35):10967-72
pubmed: 26283343
Microbiome. 2014 Sep 30;2(1):30
pubmed: 27367037
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500
Sci Rep. 2017 Jul 31;7(1):6589
pubmed: 28761145
Nat Biotechnol. 2019 Aug;37(8):852-857
pubmed: 31341288
BMC Biol. 2014 Nov 12;12:87
pubmed: 25387460