elPrep 4: A multithreaded framework for sequence analysis.
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2019
2019
Historique:
received:
23
11
2018
accepted:
27
01
2019
entrez:
14
2
2019
pubmed:
14
2
2019
medline:
13
11
2019
Statut:
epublish
Résumé
We present elPrep 4, a reimplementation from scratch of the elPrep framework for processing sequence alignment map files in the Go programming language. elPrep 4 includes multiple new features allowing us to process all of the preparation steps defined by the GATK Best Practice pipelines for variant calling. This includes new and improved functionality for sorting, (optical) duplicate marking, base quality score recalibration, BED and VCF parsing, and various filtering options. The implementations of these options in elPrep 4 faithfully reproduce the outcomes of their counterparts in GATK 4, SAMtools, and Picard, even though the underlying algorithms are redesigned to take advantage of elPrep's parallel execution framework to vastly improve the runtime and resource use compared to these tools. Our benchmarks show that elPrep executes the preparation steps of the GATK Best Practices up to 13x faster on WES data, and up to 7.4x faster for WGS data compared to running the same pipeline with GATK 4, while utilizing fewer compute resources.
Identifiants
pubmed: 30759172
doi: 10.1371/journal.pone.0209523
pii: PONE-D-18-33656
pmc: PMC6373927
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0209523Déclaration de conflit d'intérêts
The authors have the following interests: This work is funded by IMEC vzw. Charlotte Herzeel, Pascal Costanza, Dries Decap, Jan Fostier and Wilfried Verachtert are employees of IMEC vzw, Belgium; Dries Decap and Jan Fostier are employees of Ghent University, Ghent, Belgium. There are no patents, products in development or marketed products to declare. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials.
Références
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Bioinformatics. 2015 Aug 1;31(15):2482-8
pubmed: 25819078
Genome Res. 2015 Jun;25(6):918-25
pubmed: 25883319
Proc Natl Acad Sci U S A. 2017 Oct 3;114(40):E8320-E8322
pubmed: 28916731
Bioinformatics. 2013 Aug 15;29(16):2041-3
pubmed: 23736529
Nat Genet. 2011 May;43(5):491-8
pubmed: 21478889
Curr Protoc Bioinformatics. 2013;43:11.10.1-11.10.33
pubmed: 25431634
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
PeerJ. 2014 Jun 03;2:e421
pubmed: 24949238
Bioinformatics. 2015 Jun 15;31(12):2032-4
pubmed: 25697820
Bioinformatics. 2016 Oct 1;32(19):3047-8
pubmed: 27312411
PLoS One. 2015 Jul 16;10(7):e0132868
pubmed: 26182406