elPrep 4: A multithreaded framework for sequence analysis.


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2019
Historique:
received: 23 11 2018
accepted: 27 01 2019
entrez: 14 2 2019
pubmed: 14 2 2019
medline: 13 11 2019
Statut: epublish

Résumé

We present elPrep 4, a reimplementation from scratch of the elPrep framework for processing sequence alignment map files in the Go programming language. elPrep 4 includes multiple new features allowing us to process all of the preparation steps defined by the GATK Best Practice pipelines for variant calling. This includes new and improved functionality for sorting, (optical) duplicate marking, base quality score recalibration, BED and VCF parsing, and various filtering options. The implementations of these options in elPrep 4 faithfully reproduce the outcomes of their counterparts in GATK 4, SAMtools, and Picard, even though the underlying algorithms are redesigned to take advantage of elPrep's parallel execution framework to vastly improve the runtime and resource use compared to these tools. Our benchmarks show that elPrep executes the preparation steps of the GATK Best Practices up to 13x faster on WES data, and up to 7.4x faster for WGS data compared to running the same pipeline with GATK 4, while utilizing fewer compute resources.

Identifiants

pubmed: 30759172
doi: 10.1371/journal.pone.0209523
pii: PONE-D-18-33656
pmc: PMC6373927
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0209523

Déclaration de conflit d'intérêts

The authors have the following interests: This work is funded by IMEC vzw. Charlotte Herzeel, Pascal Costanza, Dries Decap, Jan Fostier and Wilfried Verachtert are employees of IMEC vzw, Belgium; Dries Decap and Jan Fostier are employees of Ghent University, Ghent, Belgium. There are no patents, products in development or marketed products to declare. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials.

Références

Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Bioinformatics. 2015 Aug 1;31(15):2482-8
pubmed: 25819078
Genome Res. 2015 Jun;25(6):918-25
pubmed: 25883319
Proc Natl Acad Sci U S A. 2017 Oct 3;114(40):E8320-E8322
pubmed: 28916731
Bioinformatics. 2013 Aug 15;29(16):2041-3
pubmed: 23736529
Nat Genet. 2011 May;43(5):491-8
pubmed: 21478889
Curr Protoc Bioinformatics. 2013;43:11.10.1-11.10.33
pubmed: 25431634
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
PeerJ. 2014 Jun 03;2:e421
pubmed: 24949238
Bioinformatics. 2015 Jun 15;31(12):2032-4
pubmed: 25697820
Bioinformatics. 2016 Oct 1;32(19):3047-8
pubmed: 27312411
PLoS One. 2015 Jul 16;10(7):e0132868
pubmed: 26182406

Auteurs

Charlotte Herzeel (C)

ExaScience Life Lab, IMEC, Leuven, Belgium.

Pascal Costanza (P)

ExaScience Life Lab, IMEC, Leuven, Belgium.

Dries Decap (D)

ExaScience Life Lab, IMEC, Leuven, Belgium.
Department of Information Technology, Ghent University - IMEC, Ghent, Belgium.

Jan Fostier (J)

ExaScience Life Lab, IMEC, Leuven, Belgium.
Department of Information Technology, Ghent University - IMEC, Ghent, Belgium.

Wilfried Verachtert (W)

ExaScience Life Lab, IMEC, Leuven, Belgium.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Cephalometry Humans Anatomic Landmarks Software Internet

Classifications MeSH