PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S ribosomal RNA, ITS, and COI marker genes.

Animals Archaea Bacteria DNA Barcoding, Taxonomic / methods DNA, Environmental / chemistry Electron Transport Complex IV / genetics Fungi Metagenomics / methods Plants RNA, Ribosomal, 16S / genetics RNA, Ribosomal, 18S / genetics Reference Standards Sensitivity and Specificity Software

Docker HPC container eDNA high performance computing metabarcoding pipeline singularity

Journal

GigaScience

ISSN: 2047-217X

Titre abrégé: Gigascience

Pays: United States

ID NLM: 101596872

Informations de publication

Date de publication:
01 03 2020

Historique:

received: 18 11 2019

revised: 05 01 2020

accepted: 14 02 2020

entrez: 13 3 2020

pubmed: 13 3 2020

medline: 28 1 2021

Statut: ppublish

Résumé

Environmental DNA and metabarcoding allow the identification of a mixture of species and launch a new era in bio- and eco-assessment. Many steps are required to obtain taxonomically assigned matrices from raw data. For most of these, a plethora of tools are available; each tool's execution parameters need to be tailored to reflect each experiment's idiosyncrasy. Adding to this complexity, the computation capacity of high-performance computing systems is frequently required for such analyses. To address the difficulties, bioinformatic pipelines need to combine state-of-the art technologies and algorithms with an easy to get-set-use framework, allowing researchers to tune each study. Software containerization technologies ease the sharing and running of software packages across operating systems; thus, they strongly facilitate pipeline development and usage. Likewise programming languages specialized for big data pipelines incorporate features like roll-back checkpoints and on-demand partial pipeline execution. PEMA is a containerized assembly of key metabarcoding analysis tools that requires low effort in setting up, running, and customizing to researchers' needs. Based on third-party tools, PEMA performs read pre-processing, (molecular) operational taxonomic unit clustering, amplicon sequence variant inference, and taxonomy assignment for 16S and 18S ribosomal RNA, as well as ITS and COI marker gene data. Owing to its simplified parameterization and checkpoint support, PEMA allows users to explore alternative algorithms for specific steps of the pipeline without the need of a complete re-execution. PEMA was evaluated against both mock communities and previously published datasets and achieved results of comparable quality. A high-performance computing-based approach was used to develop PEMA; however, it can be used in personal computers as well. PEMA's time-efficient performance and good results will allow it to be used for accurate environmental DNA metabarcoding analysis, thus enhancing the applicability of next-generation biodiversity assessment studies.

Sections du résumé

BACKGROUND

FINDINGS

PEMA is a containerized assembly of key metabarcoding analysis tools that requires low effort in setting up, running, and customizing to researchers' needs. Based on third-party tools, PEMA performs read pre-processing, (molecular) operational taxonomic unit clustering, amplicon sequence variant inference, and taxonomy assignment for 16S and 18S ribosomal RNA, as well as ITS and COI marker gene data. Owing to its simplified parameterization and checkpoint support, PEMA allows users to explore alternative algorithms for specific steps of the pipeline without the need of a complete re-execution. PEMA was evaluated against both mock communities and previously published datasets and achieved results of comparable quality.

CONCLUSIONS

A high-performance computing-based approach was used to develop PEMA; however, it can be used in personal computers as well. PEMA's time-efficient performance and good results will allow it to be used for accurate environmental DNA metabarcoding analysis, thus enhancing the applicability of next-generation biodiversity assessment studies.

Identifiants

DOI: 10.1093/gigascience/giaa022 PMID: 32161947 PMC: PMC7066391

pubmed: 32161947

pii: 5803335

doi: 10.1093/gigascience/giaa022

pmc: PMC7066391

pii:

doi:

Substances chimiques

DNA, Environmental 0

RNA, Ribosomal, 16S 0

RNA, Ribosomal, 18S 0

Electron Transport Complex IV EC 1.9.3.1

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Commentaires et corrections

Type : ErratumIn

Informations de copyright

Références

PeerJ. 2015 Dec 10;3:e1420

pubmed: 26713226

Appl Environ Microbiol. 2007 Aug;73(16):5261-7

pubmed: 17586664

PLoS One. 2017 May 11;12(5):e0177459

pubmed: 28494014

Mol Ecol. 2012 Apr;21(8):1834-47

pubmed: 22486822

Nat Commun. 2017 Jan 18;8:14087

pubmed: 28098255

Bioinformatics. 2007 Jan 1;23(1):127-8

pubmed: 17050570

Nucleic Acids Res. 2019 Jan 8;47(D1):D84-D88

pubmed: 30395270

Gigascience. 2019 Apr 1;8(4):

pubmed: 30997489

Mol Ecol Resour. 2016 Jan;16(1):176-82

pubmed: 25959493

Bioinformatics. 2011 Mar 1;27(5):611-8

pubmed: 21233169

Sci Data. 2017 Mar 14;4:170027

pubmed: 28291235

Microbiome. 2014 Sep 30;2(1):30

pubmed: 27367037

Syst Biol. 2019 Mar 1;68(2):365-369

pubmed: 30165689

BMC Bioinformatics. 2009 Dec 15;10:421

pubmed: 20003500

BMC Genomics. 2013;14 Suppl 1:S7

pubmed: 23368723

Mol Ecol Resour. 2018 May;18(3):541-556

pubmed: 29389073

Nat Biotechnol. 2016 Sep;34(9):942-9

pubmed: 27454739

Nucleic Acids Res. 2002 Jul 15;30(14):3059-66

pubmed: 12136088

Appl Environ Microbiol. 2016 Sep 16;82(19):5878-91

pubmed: 27451454

Mol Ecol Resour. 2018 Apr 18;:

pubmed: 29667329

Appl Environ Microbiol. 2009 Dec;75(23):7537-41

pubmed: 19801464

PLoS One. 2013 Apr 22;8(4):e61217

pubmed: 23630581

PeerJ. 2014 Sep 25;2:e593

pubmed: 25276506

Bioinformatics. 2014 Aug 1;30(15):2114-20

pubmed: 24695404

Bioinformatics. 2019 Apr 1;35(7):1151-1158

pubmed: 30169747

PeerJ. 2017 Oct 13;5:e3687

pubmed: 29043106

mSphere. 2018 Jul 18;3(4):

pubmed: 30021874

ISME J. 2017 Dec;11(12):2639-2643

pubmed: 28731476

Gigascience. 2020 Mar 1;9(3):

pubmed: 32161947

Nucleic Acids Res. 2019 Jan 8;47(D1):D259-D264

pubmed: 30371820

Bioinformatics. 2010 Oct 1;26(19):2460-1

pubmed: 20709691

Methods Ecol Evol. 2015 Aug;6(8):973-980

pubmed: 27570615

Bioinformatics. 2015 Jan 1;31(1):10-6

pubmed: 25189778

Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6

pubmed: 23193283

Bioinformatics. 2019 Nov 1;35(21):4453-4455

pubmed: 31070718

BMC Bioinformatics. 2012 Feb 14;13:31

pubmed: 22333067

J Comput Biol. 2012 May;19(5):455-77

pubmed: 22506599

PeerJ. 2016 Oct 18;4:e2584

pubmed: 27781170

PLoS One. 2012;7(11):e49334

pubmed: 23145153

Ecol Lett. 2013 Oct;16(10):1245-57

pubmed: 23910579

Mol Ecol Notes. 2007 May 1;7(3):355-364

pubmed: 18784790

Nat Biotechnol. 2019 Aug;37(8):852-857

pubmed: 31341288

Nucleic Acids Res. 2018 Jan 4;46(D1):D41-D47

pubmed: 29140468

PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S ribosomal RNA, ITS, and COI marker genes.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Commentaires et corrections

Informations de copyright

Références

Auteurs

Haris Zafeiropoulos (H)

Ha Quoc Viet (HQ)

Katerina Vasileiadou (K)

Antonis Potirakis (A)

Christos Arvanitidis (C)

Pantelis Topalis (P)

Christina Pavloudi (C)

Evangelos Pafilis (E)

Articles similaires

Evaluating the efficacy of telesurgery with dual console SSI Mantra Surgical Robotic System: experiment on animal model and clinical trials.

Odour generalisation and detection dog training.

Selecting optimal software code descriptors-The case of Java.

FBXO22 inhibits colitis and colorectal carcinogenesis by regulating the degradation of the S2448-phosphorylated form of mTOR.

Classifications MeSH