A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses.
amplicon data analysis
bioinformatics
environmental DNA
metabarcoding
pipeline
review
Journal
Molecular ecology resources
ISSN: 1755-0998
Titre abrégé: Mol Ecol Resour
Pays: England
ID NLM: 101465604
Informations de publication
Date de publication:
07 Aug 2023
07 Aug 2023
Historique:
revised:
05
06
2023
received:
10
02
2023
accepted:
06
07
2023
pubmed:
7
8
2023
medline:
7
8
2023
entrez:
7
8
2023
Statut:
aheadofprint
Résumé
Environmental DNA (eDNA) metabarcoding has gained growing attention as a strategy for monitoring biodiversity in ecology. However, taxa identifications produced through metabarcoding require sophisticated processing of high-throughput sequencing data from taxonomically informative DNA barcodes. Various sets of universal and taxon-specific primers have been developed, extending the usability of metabarcoding across archaea, bacteria and eukaryotes. Accordingly, a multitude of metabarcoding data analysis tools and pipelines have also been developed. Often, several developed workflows are designed to process the same amplicon sequencing data, making it somewhat puzzling to choose one among the plethora of existing pipelines. However, each pipeline has its own specific philosophy, strengths and limitations, which should be considered depending on the aims of any specific study, as well as the bioinformatics expertise of the user. In this review, we outline the input data requirements, supported operating systems and particular attributes of thirty-two amplicon processing pipelines with the goal of helping users to select a pipeline for their metabarcoding projects.
Identifiants
pubmed: 37548515
doi: 10.1111/1755-0998.13847
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : NIGMS NIH HHS
ID : P20 GM103449
Pays : United States
Organisme : NIGMS NIH HHS
ID : P20GM103449
Pays : United States
Organisme : NIGMS NIH HHS
ID : P20GM103449
Pays : United States
Informations de copyright
© 2023 John Wiley & Sons Ltd.
Références
Albanese, D., Fontana, P., de Filippo, C., Cavalieri, D., & Donati, C. (2015). MICCA: A complete and accurate software for taxonomic profiling of metagenomic data. Scientific Reports, 5(1), 1-7. https://doi.org/10.1038/srep09743
Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389-3402. https://doi.org/10.1093/nar/25.17.3389
Amir, A., McDonald, D., Navas-Molina, J. A., Kopylova, E., Morton, J. T., Zech, X. Z., Kightley, E. P., Thompson, L. R., Hyde, E. R., Gonzalez, A., & Knight, R. (2017). Deblur rapidly resolves single-nucleotide community sequence patterns. mSystems, 2(2), e00191-16. https://doi.org/10.1128/mSystems
Andújar, C., Creedy, T. J., Arribas, P., López, H., Salces-Castellano, A., Pérez-Delgado, A. J., Vogler, A. P., & Emerson, B. C. (2021). Validated removal of nuclear pseudogenes and sequencing artefacts from mitochondrial metabarcode data. Molecular Ecology Resources, 21(6), 1772-1787. https://doi.org/10.1111/1755-0998.13337
Anslan, S., Bahram, M., Hiiesalu, I., & Tedersoo, L. (2017). PipeCraft: Flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data. Molecular Ecology Resources, 17(6), e234-e240. https://doi.org/10.1111/1755-0998.12692
Anslan, S., Mikryukov, V., Armolaitis, K., Ankuda, J., Lazdina, D., Makovskis, K., Vesterdal, L., Schmidt, I. K., & Tedersoo, L. (2021). Highly comparable metabarcoding results from MGI-tech and Illumina sequencing platforms. PeerJ, 9, e12254. https://doi.org/10.7717/peerj.12254
Anslan, S., Nilsson, R. H., Wurzbacher, C., Baldrian, P., Tedersoo, L., & Bahram, M. (2018). Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding. MycoKeys, 39, 29-40. https://doi.org/10.3897/mycokeys.39.28109
Anslan, S., & Tedersoo, L. (2015). Performance of cytochrome c oxidase subunit I (COI), ribosomal DNA large subunit (LSU) and internal transcribed spacer 2 (ITS2) in DNA barcoding of Collembola. European Journal of Soil Biology, 69, 1-7. https://doi.org/10.1016/j.ejsobi.2015.04.001
Ansorge, R., Birolo, G., James, S. A., & Telatin, A. (2021). Dadaist2: A toolkit to automate and simplify statistical analysis and plotting of metabarcoding experiments. International Journal of Molecular Sciences, 22(10), 5309. https://doi.org/10.3390/ijms22105309
Antich, A., Palacín, C., Turon, X., & Wangensteen, O. S. (2022). DnoisE: Distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets. PeerJ, 10, e12758. https://doi.org/10.7717/peerj.12758
Antich, A., Palacin, C., Wangensteen, O. S., & Turon, X. (2021). To denoise or to cluster, that is not the question: Optimizing pipelines for COI metabarcoding and metaphylogeography. BMC Bioinformatics, 22(1), 177. https://doi.org/10.1186/s12859-021-04115-6
Asbun, A. A., Besseling, M. A., Balzano, S., van Bleijswijk, J. D. L., Witte, H. J., Villanueva, L., & Engelmann, J. C. (2020). Cascabel: A scalable and versatile amplicon sequence data analysis pipeline delivering reproducible and documented results. Frontiers in Genetics, 11, 489357. https://doi.org/10.3389/fgene.2020.489357
Bai, J., Jhaney, I., & Wells, J. (2019). Developing a reproducible microbiome data analysis pipeline using the Amazon web services cloud for a cancer research group: Proof-of-concept study. JMIR Medical Informatics, 7(4), e14667. https://doi.org/10.2196/14667
Bailet, B., Apothéloz-Perret-Gentil, L., Baričević, A., Chonova, T., Franc, A., Frigerio, J. M., Kelly, M., Mora, D., Pfannkuchen, M., Proft, S., Ramon, M., Vasselon, V., Zimmermann, J., & Kahlert, M. (2020). Diatom DNA metabarcoding for ecological assessment: Comparison among bioinformatics pipelines used in six European countries reveals the need for standardization. Science of the Total Environment, 745, 140948. https://doi.org/10.1016/j.scitotenv.2020.140948
Baloğlu, B., Chen, Z., Elbrecht, V., Braukmann, T., MacDonald, S., & Steinke, D. (2021). A workflow for accurate metabarcoding using nanopore MinION sequencing. Methods in Ecology and Evolution, 12(5), 794-804. https://doi.org/10.1111/2041-210X.13561
Baltrušis, P., Halvarsson, P., & Höglund, J. (2022). Estimation of the impact of three different bioinformatic pipelines on sheep nemabiome analysis. Parasites & Vectors, 15(1), 1-12. https://doi.org/10.1186/s13071-022-05399-0
Banchi, E., Ametrano, C. G., Greco, S., Stanković, D., Muggia, L., & Pallavicini, A. (2020). PLANiTS: A curated sequence reference dataset for plant ITS DNA metabarcoding. Database, 2020, baz155. https://doi.org/10.1093/database/baz155
Ben-David, T., Melamed, S., Gerson, U., & Morin, S. (2007). ITS2 sequences as barcodes for identifying and analyzing spider mites (Acari: Tetranychidae). Experimental and Applied Acarology, 41(3), 169-181. https://doi.org/10.1186/s13071-022-05399-0
Bengtsson-Palme, J., Ryberg, M., Hartmann, M., Branco, S., Wang, Z., Godhe, A., de Wit, P., Sánchez-García, M., Ebersberger, I., de Sousa, F., Amend, A., & Nilsson, R. H. (2013). Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods in Ecology and Evolution, 4(10), 914-919. https://doi.org/10.1111/2041-210X.12073
Bernard, M., Rué, O., Mariadassou, M., & Pascal, G. (2021). FROGS: A powerful tool to analyse the diversity of fungi with special management of internal transcribed spacers. Briefings in Bioinformatics, 22(6), bbab318. https://doi.org/10.1093/bib/bbab318
Bokulich, N. A., Kaehler, B. D., Rideout, J. R., Dillon, M., Bolyen, E., Knight, R., Huttley, G. A., & Caporaso, J. G. (2018). Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin. Microbiome, 6(1), 1-17. https://doi.org/10.1186/s40168-018-0470-z
Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114-2120. https://doi.org/10.1093/bioinformatics/btu170
Bolyen, E., Rideout, J. R., Dillon, M. R., Bokulich, N. A., Abnet, C. C., Al-Ghalith, G. A., Alexander, H., Alm, E. J., Arumugam, M., Asnicar, F., Bai, Y., Bisanz, J. E., Bittinger, K., Brejnrod, A., Brislawn, C. J., Brown, C. T., Callahan, B. J., Caraballo-Rodríguez, A. M., Chase, J., … Caporaso, J. G. (2019). Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology, 37(8), 852-857. https://doi.org/10.1038/s41587-019-0209-9
Boyer, F., Mercier, C., Bonin, A., Le Bras, Y., Taberlet, P., & Coissac, E. (2016). Obitools: A unix-inspired software package for DNA metabarcoding. Molecular Ecology Resources, 16(1), 176-182. https://doi.org/10.1111/1755-0998.12428
Brandt, M. I., Trouche, B., Quintric, L., Günther, B., Wincker, P., Poulain, J., & Arnaud-Haond, S. (2021). Bioinformatic pipelines combining denoising and clustering tools allow for more comprehensive prokaryotic and eukaryotic metabarcoding. Molecular Ecology Resources, 21(6), 1904-1921. https://doi.org/10.1111/1755-0998.13398
Brown, S. P., Veach, A. M., Rigdon-Huss, A. R., Grond, K., Lickteig, S. K., Lothamer, K., Oliver, A. K., & Jumpponen, A. (2015). Scraping the bottom of the barrel: Are rare high throughput sequences artifacts? Fungal Ecology, 13, 221-225.
Bruce, K., Blackman, R. C., Bourlat, S. J., Hellström, M., Bakker, J., Bista, I., Bohmann, K., Bouchez, A., Brys, R., Clark, K., Elbrecht, V., & Deiner, K. (2021). A Practical Guide to DNA-Based Methods for Biodiversity Assessment. Pensoft Advanced Books. https://doi.org/10.3897/ab.e68634
Buchner, D., Macher, T.-H., & Leese, F. (2022). APSCALE: Advanced pipeline for simple yet comprehensive analyses of DNA metabarcoding data. Bioinformatics, 38(20), 4817-4819. https://doi.org/10.1093/bioinformatics/btac588
Callahan, B. J., Grinevich, D., Thakur, S., Balamotis, M. A., & Yehezkel, T. B. (2021). Ultra-accurate microbial amplicon sequencing with synthetic long reads. Microbiome, 9(1), 130. https://doi.org/10.1186/s40168-021-01072-3
Callahan, B. J., McMurdie, P. J., & Holmes, S. P. (2017). Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME Journal, 11(12), 2639-2643. https://doi.org/10.1038/ismej.2017.119
Callahan, B. J., McMurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J. A., & Holmes, S. P. (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13(7), 581-583. https://doi.org/10.1038/nmeth.3869
Callahan, B. J., Wong, J., Heiner, C., Oh, S., Theriot, C. M., Gulati, A. S., McGill, S. K., & Dougherty, M. K. (2019). High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution. Nucleic Acids Research, 47(18), e103. https://doi.org/10.1093/nar/gkz569
Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K., Fierer, N., Peña, A. G., Goodrich, J. K., Gordon, J. I., Huttley, G. A., Kelley, S. T., Knights, D., Koenig, J. E., Ley, R. E., Lozupone, C. A., McDonald, D., Muegge, B. D., Pirrung, M., … Knight, R. (2010). QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7(5), 335-336. https://doi.org/10.1038/nmeth.f.303
Carlsen, T., Aas, A. B., Lindner, D., Vrålstad, T., Schumacher, T., & Kauserud, H. (2012). Don't make a mista (g) ke: Is tag switching an overlooked source of error in amplicon pyrosequencing studies? Fungal Ecology, 5(6), 747-749.
Carøe, C., & Bohmann, K. (2020). Tagsteady: A metabarcoding library preparation protocol to avoid false assignment of sequences to samples. Molecular Ecology Resources, 20(6), 1620-1631. https://doi.org/10.1111/1755-0998.13227
Castaño, C., Berlin, A., Brandström Durling, M., Ihrmark, K., Lindahl, B. D., Stenlid, J., Clemmensen, K. E., & Olson, Å. (2020). Optimized metabarcoding with Pacific biosciences enables semi-quantitative analysis of fungal communities. New Phytologist, 228(3), 1149-1158. https://doi.org/10.1111/nph.16731
CBOL Plant Working Group 1, Hollingsworth, P. M., Hajibabaei, M., Ratnasingham, S., Chase, M., Cowan, R. S., Erickson, D. L., Fazekas, A. J., Graham, S. W., James, K. E., Kim, K.-J., Kress, W. J., Schneider, H., van Alphenstahl, J., Barrett, S. C. H., van den Berg, C., Bogarín, D., Burgess, K. S., Cameron, K. M., … Little, D. P. (2009). A DNA barcode for land plants. Proceedings of the National Academy of Sciences, 106(31), 12794-12797. https://doi.org/10.1073/pnas.0905845106
Chen, S., Zhou, Y., Chen, Y., & Gu, J. (2018). Fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 34(17), i884-i890. https://doi.org/10.1093/bioinformatics/bty560
Compson, Z. G., McClenaghan, B., Singer, G. A., Fahner, N. A., & Hajibabaei, M. (2020). Metabarcoding from microbes to mammals: Comprehensive bioassessment on a global scale. Frontiers in Ecology and Evolution, 8, 581835. https://doi.org/10.3389/fevo.2020.581835
Copeland, M., Soh, J., Puca, A., Manning, M., & Gollob, D. (2015). Microsoft Azure and Cloud Computing. In Microsoft azure (pp. 3-26). Apress.
Couton, M., Baud, A., Daguin-Thiébaut, C., Corre, E., Comtet, T., & Viard, F. (2021). High-throughput sequencing on preservative ethanol is effective at jointly examining infraspecific and taxonomic diversity, although bioinformatics pipelines do not perform equally. Ecology and Evolution, 11(10), 5533-5546. https://doi.org/10.1002/ece3.7453
Creedy, T. J., Andújar, C., Meramveliotakis, E., Noguerales, V., Overcast, I., Papadopoulou, A., Morlon, H., Vogler, A. P., Emerson, B. C., & Arribas, P. (2022). Coming of age for COI metabarcoding of whole organism community DNA: Towards bioinformatic harmonisation. Molecular Ecology Resources, 22(3), 847-861. https://doi.org/10.1111/1755-0998.13502
Curd, E. E., Gold, Z., Kandlikar, G. S., Gomer, J., Ogden, M., O'Connell, T., Pipes, L., Schweizer, T. M., Rabichow, L., Lin, M., Shi, B., & Meyer, R. S. (2019). Anacapa toolkit: An environmental DNA toolkit for processing multilocus metabarcode datasets. Methods in Ecology and Evolution, 10(9), 1469-1475. https://doi.org/10.1111/2041-210X.13214
de Santiago, A., Pereira, T. J., Mincks, S. L., & Bik, H. M. (202). Dataset complexity impacts both MOTU delimitation and biodiversity estimates in eukaryotic 18S rRNA metabarcoding studies. Environmental DNA, 4(2), 363-384. https://doi.org/10.1002/edn3.255
di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. https://doi.org/10.1038/nbt.3820
Djemiel, C., Dequiedt, S., Karimi, B., Cottin, A., Girier, T., El Djoudi, Y., Wincker, P., Lelièvre, M., Mondy, S., Chemidlin Prévost-Bouré, N., Maron, P.-A., Ranjard, L., & Terrat, S. (2020). BIOCOM-PIPE: A new user-friendly metabarcoding pipeline for the characterization of microbial diversity from 16S, 18S and 23S rRNA gene amplicons. BMC Bioinformatics, 21(1), 492. https://doi.org/10.1186/s12859-020-03829-3
Djemiel, C., Plassard, D., Terrat, S., Crouzet, O., Sauze, J., Mondy, S., Nowak, V., Wingate, L., Ogée, J., & Maron, P. A. (2020). μgreen-db: A reference database for the 23S rRNA gene of eukaryotic plastids and cyanobacteria. Scientific Reports, 10(1), 1-11. https://doi.org/10.1038/s41598-020-62555-1
Durling, M. B., Clemmensen, K. E., Stenlid, J., & Lindahl, B. (2011). SCATA-an efficient bioinformatic pipeline for species identification and quantification after high-throughput sequencing of tagged amplicons. https://scata.mykopat.slu.se/
Edgar, R. C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26(19), 2460-2461. https://doi.org/10.1093/bioinformatics/btq461
Edgar, R. C. (2016a). SINTAX: A simple non-Bayesian taxonomy classifier for 16S and ITS sequences. bioRxiv, 74161. https://doi.org/10.1101/074161
Edgar, R. C. (2016b). UNOISE2: Improved error-correction for Illumina 16S and ITS amplicon sequencing. bioRxiv, 81257. https://doi.org/10.1101/081257
Edgar, R. C. (2017). Accuracy of microbial community diversity estimated by closed-and open-reference OTUs. PeerJ, 5, e3889. https://doi.org/10.7717/peerj.3889
Edgar, R. C. (2018a). Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences. PeerJ, 6, e4652. https://doi.org/10.7717/peerj.4652
Edgar, R. C. (2018b). UNCROSS2: Identification of cross-talk in 16S rRNA OTU tables. bioRxiv, 400762. https://doi.org/10.1101/400762
Edgar, R. C., & Flyvbjerg, H. (2015). Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics, 31(21), 3476-3482. https://doi.org/10.1093/bioinformatics/btv401
Edgar, R. C., Haas, B. J., Clemente, J. C., Quince, C., & Knight, R. (2011). UCHIME improves sensitivity and speed of chimera detection. Bioinformatics, 27(16), 2194-2200. https://doi.org/10.1093/bioinformatics/btr381
Elbrecht, V., Taberlet, P., Dejean, T., Valentini, A., Usseglio-Polatera, P., Beisel, J. N., Coissac, E., Boyer, F., & Leese, F. (2016). Testing the potential of a ribosomal 16S marker for DNA metabarcoding of insects. PeerJ, 4, e1966. https://doi.org/10.7717/peerj.1966
Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38, 276-278. https://doi.org/10.1038/s41587-020-0439-x
Escudié, F., Auer, L., Bernard, M., Mariadassou, M., Cauquil, L., Vidal, K., Maman, S., Hernandez-Raquet, G., Combes, S., & Pascal, G. (2018). FROGS: Find, rapidly, OTUs with galaxy solution. Bioinformatics, 34(8), 1287-1294. https://doi.org/10.1093/bioinformatics/btx791
Frøslev, T. G., Kjøller, R., Bruun, H. H., Ejrnaes, R., Brunbjerg, A. K., Pietroni, C., & Hansen, A. J. (2017). Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates. Nature Communications, 8(1), 1-11. https://doi.org/10.1038/s41467-017-01312-x
Furneaux, B., Bahram, M., Rosling, A., Yorou, N. S., & Ryberg, M. (2021). Long-and short-read metabarcoding technologies reveal similar spatiotemporal structures in fungal communities. Molecular Ecology Resources, 21(6), 1833-1849. https://doi.org/10.1111/1755-0998.13387
Galaxy Community. (2022). The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Research, 50(W1), W345-W351. https://doi.org/10.1093/nar/gkac247
Glassman, S. I., & Martiny, J. B. (2018). Broadscale ecological patterns are robust to use of exact sequence variants versus operational taxonomic units. MSphere, 3(4), e00148-18. https://doi.org/10.1128/mSphere.00148-18
Gold, Z., Curd, E. E., Goodwin, K. D., Choi, E. S., Frable, B. W., Thompson, A. R., Walker, H. J., Jr., Burton, R. S., Kacev, D., Martz, L. D., & Barber, P. H. (2021). Improving metabarcoding taxonomic assignment: A case study of fishes in a large marine ecosystem. Molecular Ecology Resources, 21(7), 2546-2564. https://doi.org/10.1111/1755-0998.13450
González, A., Dubut, V., Corse, E., Mekdad, R., Dechatre, T., Castet, U., Hebert, R., & Meglécz, E. (2023). VTAM: A robust pipeline for validating metabarcoding data using controls. Computational and Structural Biotechnology Journal., 21, 1151-1156. https://doi.org/10.1016/j.csbj.2023.01.034
Gweon, H. S., Oliver, A., Taylor, J., Booth, T., Gibbs, M., Read, D. S., Griffiths, R. I., & Schonrogge, K. (2015). PIPITS: An automated pipeline for analyses of fungal internal transcribed spacer sequences from the Illumina sequencing platform. Methods in Ecology and Evolution/British Ecological Society, 6(8), 973-980. https://doi.org/10.1111/2041-210X.12399
Hajibabaei, M., Shokralla, S., Zhou, X., Singer, G. A., & Baird, D. J. (2011). Environmental barcoding: A next-generation sequencing approach for biomonitoring applications using river benthos. PLoS One, 6(4), e17497. https://doi.org/10.1371/journal.pone.0017497
Harrison, J. P., Chronopoulou, P. M., Salonen, I. S., Jilbert, T., & Koho, K. A. (2021). 16S and 18S rRNA gene metabarcoding provide congruent information on the responses of sediment communities to eutrophication. Frontiers in Marine Science, 8, 708716. https://doi.org/10.3389/fmars.2021.708716
Hebert, P. D. N., Cywinska, A., Ball, S. L., & deWaard, J. R. (2003). Biological identifications through DNA barcodes. Proceedings of the Royal Society of London. Series B: Biological Sciences, 270, 313-321. https://doi.org/10.1098/rspb.2002.2218
Heeger, F., Bourne, E. C., Baschien, C., Yurkov, A., Bunk, B., Spröer, C., Overmann, J., Mazzoni, C. J., & Monaghan, M. T. (2018). Long-read DNA metabarcoding of ribosomal RNA in the analysis of fungi from aquatic environments. Molecular Ecology Resources, 18(6), 1500-1514. https://doi.org/10.1111/1755-0998.12937
Hildebrand, F., Tadeo, R., Voigt, A. Y., Bork, P., & Raes, J. (2014). LotuS: An efficient and user-friendly OTU processing pipeline. Microbiome, 2(1), 1-7. https://doi.org/10.1186/2049-2618-2-30
Hleap, J. S., Littlefair, J. E., Steinke, D., Hebert, P. D., & Cristescu, M. E. (2021). Assessment of current taxonomic assignment strategies for metabarcoding eukaryotes. Molecular Ecology Resources, 21(7), 2190-2203. https://doi.org/10.1111/1755-0998.13407
Hupfauf, S., Etemadi, M., Juárez, M. F.-D., Gómez-Brandón, M., Insam, H., & Podmirseg, S. M. (2020). CoMA - An intuitive and user-friendly pipeline for amplicon-sequencing data analysis. PLoS One, 15(12), e0243241. https://doi.org/10.1371/journal.pone.0243241
Huse, S. M., Welch, D. M., Morrison, H. G., & Sogin, M. L. (2010). Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environmental Microbiology, 12(7), 1889-1898. https://doi.org/10.1111/j.1462-2920.2010.02193.x
Huson, D. H., Auch, A. F., Qi, J., & Schuster, S. C. (2007). MEGAN analysis of metagenomic data. Genome Research, 17(3), 377-386. https://doi.org/10.1101/gr.5969107
Hussain, A., & Aleem, M. (2018). GoCJ: Google cloud jobs dataset for distributed and cloud computing infrastructures. Data, 3(4), 38. https://doi.org/10.3390/data3040038
Kaehler, B. D., Bokulich, N. A., McDonald, D., Knight, R., Caporaso, J. G., & Huttley, G. A. (2019). Species abundance information improves sequence taxonomy classification accuracy. Nature Communications, 10(1), 4643. https://doi.org/10.1038/s41467-019-12669-6
Kang, W., Anslan, S., Börner, N., Schwarz, A., Schmidt, R., Künzel, S., Rioual, P., Echeverría-Galindo, P., Vences, M., Wang, J., & Schwalb, A. (2021). Diatom metabarcoding and microscopic analyses from sediment samples at Lake Nam Co, Tibet: The effect of sample-size and bioinformatics on the identified communities. Ecological Indicators, 121, 107070. https://doi.org/10.1016/j.ecolind.2020.107070
Knight, R., Vrbanac, A., Taylor, B. C., Aksenov, A., Callewaert, C., Debelius, J., Gonzalez, A., Kosciolek, T., McCall, L.-I., McDonald, D., Melnik, A. V., Morton, J. T., Navas, J., Quinn, R. A., Sanders, J. G., Swafford, A. D., Thompson, L. R., Tripathi, A., Xu, Z. Z., … Dorrestein, P. C. (2018). Best practices for analysing microbiomes. Nature Reviews. Microbiology, 16(7), 410-422. https://doi.org/10.1038/s41579-018-0029-9
Koster, J., & Rahmann, S. (2012). Snakemake-a scalable bioinformatics workflow engine. Bioinformatics, 28(19), 2520-2522. https://doi.org/10.1093/bioinformatics/bts480
Kurtzer, G. M., Sochat, V., & Bauer, M. W. (2017). Singularity: Scientific containers for mobility of compute. PLoS One, 12(5), e0177459. https://doi.org/10.1371/journal.pone.0177459
Laehnemann, D., Borkhardt, A., & McHardy, A. C. (2016). Denoising DNA deep sequencing data-High-throughput sequencing errors and their correction. Briefings in Bioinformatics, 17(1), 154-179. https://doi.org/10.1093/bib/bbv029
Lear, G., Dickie, I., Banks, J., Boyer, S., Buckley, H. L., Buckley, T. R., Buckley, T. R., Cruickshank, R., & Holdaway, R. (2018). Methods for the extraction, storage, amplification and sequencing of DNA from environmental samples. New Zealand Journal of Ecology, 42(1), 10-50A. https://doi.org/10.20417/nzjecol.42.9
Lindgreen, S. (2012). AdapterRemoval: Easy cleaning of next-generation sequencing reads. BMC Research Notes, 5(1), 1-7. https://doi.org/10.1186/1756-0500-5-337
Liu, J., & Zhang, H. (2021). Combining multiple markers in environmental DNA metabarcoding to assess deep-sea benthic biodiversity. Frontiers in Marine Science, 8, 684955. https://doi.org/10.3389/fmars.2021.684955
Loos, D., Zhang, L., Beemelmanns, C., Kurzai, O., & Panagiotou, G. (2021). DAnIEL: A user-friendly web server for fungal ITS amplicon sequencing data. Frontiers in Microbiology, 12, 720513. https://doi.org/10.3389/fmicb.2021.720513
Mahé, F., Czech, L., Stamatakis, A., Quince, C., de Vargas, C., Dunthorn, M., & Rognes, T. (2022). Swarm v3: Towards tera-scale amplicon clustering. Bioinformatics, 38(1), 267-269. https://doi.org/10.1093/bioinformatics/btab493
Marquina, D., Esparza-Salas, R., Roslin, T., & Ronquist, F. (2019). Establishing arthropod community composition using metabarcoding: Surprising inconsistencies between soil samples and preservative ethanol and homogenate from malaise trap catches. Molecular Ecology Resources, 19(6), 1516-1530. https://doi.org/10.1111/1755-0998.13071
Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. Journal, 17(1), 10-12. https://doi.org/10.14806/ej.17.1.200
Mathon, L., Valentini, A., Guérin, P. E., Normandeau, E., Noel, C., Lionnet, C., Boulanger, E., Thuiller, W., Bernatchez, L., Mouillot, D., Dejean, T., & Manel, S. (2021). Benchmarking bioinformatic tools for fast and accurate eDNA metabarcoding species identification. Molecular Ecology Resources, 21(7), 2565-2579. https://doi.org/10.1111/1755-0998.13430
McGee, K. M., Robinson, C. V., & Hajibabaei, M. (2019). Gaps in DNA-based biomonitoring across the globe. Frontiers in Ecology and Evolution, 7, 337. https://doi.org/10.3389/fevo.2019.00337
Mikryukov, V., Anslan, S., & Tedersoo, L. (2022). NextITS: A pipeline for metabarcoding fungi and other eukaryotes with full-length ITS sequenced with PacBio. https://github.com/vmikk/NextITS
Minerovic, A. D., Potapova, M. G., Sales, C. M., Price, J. R., & Enache, M. D. (2020). 18S-V9 DNA metabarcoding detects the effect of water-quality impairment on stream biofilm eukaryotic assemblages. Ecological Indicators, 113, 106225. https://doi.org/10.1016/j.ecolind.2020.106225
Miya, M., Gotoh, R. O., & Sado, T. (2020). MiFish metabarcoding: A high-throughput approach for simultaneous detection of multiple fish species from environmental DNA and other samples. Fisheries Science, 86(6), 939-970. https://doi.org/10.1007/s12562-020-01461-x
Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., Nahnsen, S., & Köster, J. (2021). Sustainable data analysis with Snakemake. F1000Research, 10, 33. https://doi.org/10.12688/f1000research.29032.2
Mousavi-Derazmahalleh, M., Stott, A., Lines, R., Peverley, G., Nester, G., Simpson, T., Zawierta, M., De la Pierre, M., Bunce, M., & Christophersen, C. T. (2021). eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA sequences exploiting Nextflow and singularity. Molecular Ecology Resources, 21(5), 1697-1704. https://doi.org/10.1111/1755-0998.13356
Nearing, J. T., Douglas, G. M., Comeau, A. M., & Langille, M. G. (2018). Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction approaches. PeerJ, 6, e5364. https://doi.org/10.7717/peerj.5364
Nilsson, R. H., Anslan, S., Bahram, M., Wurzbacher, C., Baldrian, P., & Tedersoo, L. (2019). Mycobiome diversity: High-throughput sequencing and identification of fungi. Nature Reviews. Microbiology, 17(2), 95-109. https://doi.org/10.1038/s41579-018-0116-y
Nilsson, R. H., Wurzbacher, C., Bahram, M., Coimbra, V. R., Larsson, E., Tedersoo, L., Eriksson, J., Ritter, C. D., Svantesson, S., Sánchez-García, M., Ryberg, M., & Abarenkov, K. (2016). Top 50 most wanted fungi. MycoKeys, 12, 29-40. https://doi.org/10.3897/mycokeys.12.7553
Özkurt, E., Fritscher, J., Soranzo, N., Ng, D. Y. K., Davey, R. P., Bahram, M., & Hildebrand, F. (2022). LotuS2: An ultrafast and highly accurate tool for amplicon sequencing analysis. Microbiome, 10(1), 176. https://doi.org/10.1186/s40168-022-01365-1
Palmer, J. M., Jusino, M. A., Banik, M. T., & Lindner, D. L. (2018). Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data. PeerJ, 6, e4925. https://doi.org/10.7717/peerj.4925
Pauvert, C., Buee, M., Laval, V., Edel-Hermann, V., Fauchery, L., Gautier, A., Lesur, I., Vallance, J., & Vacher, C. (2019). Bioinformatics matters: The accuracy of plant and soil fungal community data is highly dependent on the metabarcoding pipeline. Fungal Ecology, 41, 23-33. https://doi.org/10.1016/j.funeco.2019.03.005
Pollock, J., Glendinning, L., Wisedchanwet, T., & Watson, M. (2018). The madness of microbiome: Attempting to find consensus “best practice” for 16S microbiome studies. Applied and Environmental Microbiology, 84(7), e02627-17. https://doi.org/10.1128/AEM.02627-17
Porter, T. M., & Hajibabaei, M. (2018). Automated high throughput animal CO1 metabarcode classification. Scientific Reports, 8(1), 4226. https://doi.org/10.1038/s41598-018-22505-4
Porter, T. M., & Hajibabaei, M. (2020). Putting COI metabarcoding in context: The utility of exact sequence variants (ESVs) in biodiversity analysis. Frontiers in Ecology and Evolution, 8, 248. https://doi.org/10.3389/fevo.2020.00248
Porter, T. M., & Hajibabaei, M. (2021). Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets. BMC Bioinformatics, 22(1), 1-20. https://doi.org/10.1186/s12859-021-04180-x
Porter, T. M., & Hajibabaei, M. (2022). MetaWorks: A flexible, scalable bioinformatic pipeline for high-throughput multi-marker biodiversity assessments. PLoS One, 17(9), e0274260. https://doi.org/10.1371/journal.pone.0274260
Prodan, A., Tremaroli, V., Brolin, H., Zwinderman, A. H., Nieuwdorp, M., & Levin, E. (2020). Comparing bioinformatic pipelines for microbial 16S rRNA amplicon sequencing. PLoS One, 15(1), e0227434. https://doi.org/10.1371/journal.pone.0227434
Reeder, J., & Knight, R. (2009). The'rare biosphere': A reality check. Nature Methods, 6(9), 636-637. https://doi.org/10.1038/nmeth0909-636
Reitmeier, S., Hitch, T. C., Treichel, N., Fikas, N., Hausmann, B., Ramer-Tait, A. E., Neuhaus, K., Berry, D., Haller, D., Lagkouvardos, I., & Clavel, T. (2021). Handling of spurious sequences affects the outcome of high-throughput 16S rRNA gene amplicon profiling. ISME Communications, 1(1), 1-12. https://doi.org/10.1038/s43705-021-00033-z
Richardson, R. T., Bengtsson-Palme, J., & Johnson, R. M. (2017). Evaluating and optimizing the performance of software commonly used for the taxonomic classification of DNA metabarcoding sequence data. Molecular Ecology Resources, 17(4), 760-769. https://doi.org/10.1111/1755-0998.12628
Rimet, F., Gusev, E., Kahlert, M., Kelly, M. G., Kulikovskiy, M., Maltsev, Y., Mann, D. G., Pfannkuchen, M., Trobajo, R., Vasselon, V., Zimmermann, J., & Bouchez, A. (2019). Diat. Barcode, an open-access curated barcode library for diatoms. Scientific Reports, 9(1), 15116. https://doi.org/10.1038/s41598-019-51500-6
Rivers, A. R., Weber, K. C., Gardner, T. G., Liu, S., & Armstrong, S. D. (2018). ITSxpress: Software to rapidly trim internally transcribed spacer sequences with quality scores for marker gene analysis. F1000Research, 7, 7. https://doi.org/10.12688/f1000research.15704.1
Rodriguez-Martinez, S., Klaminder, J., Morlock, M. A., Dalén, L., & Huang, D. T. (2022). The topological nature of tag jumping in environmental DNA metabarcoding studies. Molecular Ecology Resources., 23, 621-631. https://doi.org/10.1111/1755-0998.13745
Rognes, T., Flouri, T., Nichols, B., Quince, C., & Mahé, F. (2016). VSEARCH: A versatile open source tool for metagenomics. PeerJ, 4, e2584. https://doi.org/10.7717/peerj.2584
Rosen, G. L., Reichenberger, E. R., & Rosenfeld, A. M. (2011). NBC: The naive Bayes classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics, 27(1), 127-129. https://doi.org/10.1093/bioinformatics/btq619
Schloss, P. D., Westcott, S. L., Ryabin, T., Hall, J. R., Hartmann, M., Hollister, E. B., Lesniewski, R. A., Oakley, B. B., Parks, D. H., Robinson, C. J., Sahl, J. W., Stres, B., Thallinger, G. G., Van Horn, D., & Weber, C. F. (2009). Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75(23), 7537-7541. https://doi.org/10.1128/AEM.01541-09
Schnell, I. B., Bohmann, K., & Gilbert, M. T. P. (2015). Tag jumps illuminated-reducing sequence-to-sample misidentifications in metabarcoding studies. Molecular Ecology Resources, 15(6), 1289-1303. https://doi.org/10.1111/1755-0998.12402
Singer, G. A. C., Fahner, N. A., Barnes, J. G., McCarthy, A., & Hajibabaei, M. (2019). Comprehensive biodiversity analysis via ultra-deep patterned flow cell technology: A case study of eDNA metabarcoding seawater. Scientific Reports, 9(1), 5991. https://doi.org/10.1038/s41598-019-42455-9
Song, H., Buhay, J. E., Whiting, M. F., & Crandall, K. A. (2008). Many species in one: DNA barcoding overestimates the number of species when nuclear mitochondrial pseudogenes are coamplified. Proceedings of the National Academy of Sciences of the United States of America, 105(36), 13486-13491. https://doi.org/10.1073/pnas.0803076105
Staats, M., Arulandhu, A. J., Gravendeel, B., Holst-Jensen, A., Scholtens, I., Peelen, T., Prins, T. W., & Kok, E. (2016). Advances in DNA metabarcoding for food and wildlife forensic species identification. Analytical and Bioanalytical Chemistry, 408(17), 4615-4630. https://doi.org/10.1007/s00216-016-9595-8
Straub, D., Blackwell, N., Langarica-Fuentes, A., Peltzer, A., Nahnsen, S., & Kleindienst, S. (2020). Interpretations of environmental microbial community studies are biased by the selected 16S rRNA (gene) amplicon sequencing pipeline. Frontiers in Microbiology, 11, 550420. https://doi.org/10.3389/fmicb.2020.550420
Taberlet, P., Bonin, A., Zinger, L., & Coissac, E. (2018). Environmental DNA: For biodiversity research and monitoring. Oxford University Press. https://doi.org/10.1093/oso/9780198767220.001.0001
Taberlet, P., Coissac, E., Hajibabaei, M., & Rieseberg, L. H. (2012). Environmental dna. Molecular Ecology, 21(8), 1789-1793. https://doi.org/10.1111/j.1365-294X.2012.05542.x
Taberlet, P., Coissac, E., Pompanon, F., Brochmann, C., & Willerslev, E. (2012). Towards next-generation biodiversity assessment using DNA metabarcoding. Molecular Ecology, 21(8), 2045-2050. https://doi.org/10.1111/j.1365-294X.2012.05470.x
Taberlet, P., Coissac, E., Pompanon, F., Gielly, L., Miquel, C., Valentini, A., Vermat, T., Corthier, G., Brochmann, C., & Willerslev, E. (2007). Power and limitations of the chloroplast trn L (UAA) intron for plant DNA barcoding. Nucleic Acids Research, 35(3), e14. https://doi.org/10.1093/nar/gkl938
Tedersoo, L., Albertsen, M., Anslan, S., & Callahan, B. (2021). Perspectives and benefits of high-throughput long-read sequencing in microbial ecology. Applied and Environmental Microbiology, 87(17), e00626-21. https://doi.org/10.1128/AEM.00626-21
Tedersoo, L., & Anslan, S. (2019). Towards PacBio-based pan-eukaryote metabarcoding using full-length ITS sequences. Environmental Microbiology Reports, 11(5), 659-668. https://doi.org/10.1111/1758-2229.12776
Tedersoo, L., Bahram, M., Zinger, L., Nilsson, R. H., Kennedy, P. G., Yang, T., Anslan, S., & Mikryukov, V. (2022). Best practices in metabarcoding of fungi: From experimental design to results. Molecular Ecology, 31(10), 2769-2795. https://doi.org/10.1111/mec.16460
Terrat, S., Djemiel, C., Journay, C., Karimi, B., Dequiedt, S., Horrigue, W., Maron, P. A., Chemidlin Prévost-Bouré, N., & Ranjard, L. (2020). ReClustOR: A re-clustering tool using an open-reference method that improves operational taxonomic unit definition. Methods in Ecology and Evolution, 11(1), 168-180. https://doi.org/10.1111/2041-210X.13316
Thompson, L. R., Anderson, S. R., den Uyl, P. A., Patin, N. V., Lim, S. J., Sanderson, G., & Goodwin, K. D. (2022). Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake. GigaScience, 11, giac066. https://doi.org/10.1093/gigascience/giac066
Thomsen, P. F., & Sigsgaard, E. E. (2019). Environmental DNA metabarcoding of wild flowers reveals diverse communities of terrestrial arthropods. Ecology and Evolution, 9(4), 1665-1679. https://doi.org/10.1002/ece3.4809
Vasar, M., Davison, J., Neuenkamp, L., Sepp, S.-K., Young, J. P. W., Moora, M., & Öpik, M. (2021). User-friendly bioinformatics pipeline gDAT (graphical downstream analysis tool) for analysing rDNA sequences. Molecular Ecology Resources, 21(4), 1380-1392. https://doi.org/10.1111/1755-0998.13340
Vetrovský, T., Baldrian, P., & Morais, D. (2018). SEED 2: A user-friendly platform for amplicon high-throughput sequencing data analyses. Bioinformatics, 34(13), 2292-2294. https://doi.org/10.1093/bioinformatics/bty071
Vu, D., Nilsson, R. H., & Verkley, G. J. M. (2022). Dnabarcoder: An open-source software package for analysing and predicting DNA sequence similarity cutoffs for fungal sequence identification. Molecular Ecology Resources., 22, 2793-2809. https://doi.org/10.1111/1755-0998.13651
Wang, Q., Garrity, G. M., Tiedje, J. M., & Cole, J. R. (2007). Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology, 73(16), 5261-5267. https://doi.org/10.1128/AEM.00062-07
Weißbecker, C., Schnabel, B., & Heintz-Buschart, A. (2020). Dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology. GigaScience, 9(12), giaa135. https://doi.org/10.1093/gigascience/giaa135
Weigand, H., Beermann, A. J., Čiampor, F., Costa, F. O., Csabai, Z., Duarte, S., Geiger, M. F., Grabowski, M., Rimet, F., Rulik, B., Strand, M., Szucsich, N., Weigand, A. M., Willassen, E., Wyler, S. A., Bouchez, A., Borja, A., Čiamporová-Zaťovičová, Z., Ferreira, S., … Ekrem, T. (2019). DNA barcode reference libraries for the monitoring of aquatic biota in Europe: Gap-analysis and recommendations for future work. Science of the Total Environment, 678, 499-524. https://doi.org/10.1016/j.scitotenv.2019.04.247
Westfall, K. M., Therriault, T. W., & Abbott, C. L. (2020). A new approach to molecular biosurveillance of invasive species using DNA metabarcoding. Global Change Biology, 26(2), 1012-1022. https://doi.org/10.1111/gcb.14886
Wratten, L., Wilm, A., & Göke, J. (2021). Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nature Methods, 18(10), 1161-1168. https://doi.org/10.1038/s41592-021-01254-9
Zafeiropoulos, H., Gargan, L., Hintikka, S., Pavloudi, C., & Carlsson, J. (2021). The dark mAtteR iNvestigator (DARN) tool: Getting to know the known unknowns in COI amplicon data. Metabarcoding and Metagenomics, 5, e69657. https://doi.org/10.3897/mbmg.5.69657
Zafeiropoulos, H., Viet, H. Q., Vasileiadou, K., Potirakis, A., Arvanitidis, C., Topalis, P., Pavloudi, C., & Pafilis, E. (2020). PEMA: A flexible pipeline for environmental DNA Metabarcoding analysis of the 16S/18S ribosomal RNA, ITS, and COI marker genes. GigaScience, 9(3), giaa022. https://doi.org/10.1093/gigascience/giaa022
Zinger, L., Lionnet, C., Benoiston, A. S., Donald, J., Mercier, C., & Boyer, F. (2021). metabaR: An R package for the evaluation and improvement of DNA metabarcoding data quality. Methods in Ecology and Evolution, 12(4), 586-592. https://doi.org/10.1111/2041-210X.13552