PaCBAM: fast and scalable processing of whole exome and targeted sequencing data.
Journal
BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258
Informations de publication
Date de publication:
26 Dec 2019
26 Dec 2019
Historique:
received:
27
03
2019
accepted:
11
12
2019
entrez:
28
12
2019
pubmed:
28
12
2019
medline:
21
4
2020
Statut:
epublish
Résumé
Interrogation of whole exome and targeted sequencing NGS data is rapidly becoming a preferred approach for the exploration of large cohorts in the research setting and importantly in the context of precision medicine. Single-base and genomic region level data retrieval and processing still constitute major bottlenecks in NGS data analysis. Fast and scalable tools are hence needed. PaCBAM is a command line tool written in C and designed for the characterization of genomic regions and single nucleotide positions from whole exome and targeted sequencing data. PaCBAM computes depth of coverage and allele-specific pileup statistics, implements a fast and scalable multi-core computational engine, introduces an innovative and efficient on-the-fly read duplicates filtering strategy and provides comprehensive text output files and visual reports. We demonstrate that PaCBAM exploits parallel computation resources better than existing tools, resulting in important reductions of processing time and memory usage, hence enabling an efficient and fast exploration of large datasets. PaCBAM is a fast and scalable tool designed to process genomic regions from NGS data files and generate coverage and pileup comprehensive statistics for downstream analysis. The tool can be easily integrated in NGS processing pipelines and is available from Bitbucket and Docker/Singularity hubs.
Sections du résumé
BACKGROUND
BACKGROUND
Interrogation of whole exome and targeted sequencing NGS data is rapidly becoming a preferred approach for the exploration of large cohorts in the research setting and importantly in the context of precision medicine. Single-base and genomic region level data retrieval and processing still constitute major bottlenecks in NGS data analysis. Fast and scalable tools are hence needed.
RESULTS
RESULTS
PaCBAM is a command line tool written in C and designed for the characterization of genomic regions and single nucleotide positions from whole exome and targeted sequencing data. PaCBAM computes depth of coverage and allele-specific pileup statistics, implements a fast and scalable multi-core computational engine, introduces an innovative and efficient on-the-fly read duplicates filtering strategy and provides comprehensive text output files and visual reports. We demonstrate that PaCBAM exploits parallel computation resources better than existing tools, resulting in important reductions of processing time and memory usage, hence enabling an efficient and fast exploration of large datasets.
CONCLUSIONS
CONCLUSIONS
PaCBAM is a fast and scalable tool designed to process genomic regions from NGS data files and generate coverage and pileup comprehensive statistics for downstream analysis. The tool can be easily integrated in NGS processing pipelines and is available from Bitbucket and Docker/Singularity hubs.
Identifiants
pubmed: 31878881
doi: 10.1186/s12864-019-6386-6
pii: 10.1186/s12864-019-6386-6
pmc: PMC6933905
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
1018Références
BMC Med Genomics. 2015 Mar 01;8:9
pubmed: 25889339
Bioinformatics. 2018 Mar 1;34(5):867-868
pubmed: 29096012
Nat Genet. 2011 May;43(5):491-8
pubmed: 21478889
Bioinformatics. 2014 Sep 1;30(17):2503-5
pubmed: 24812344
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Bioinformatics. 2015 Jun 15;31(12):2032-4
pubmed: 25697820