ScaleQC: a scalable lossy to lossless solution for NGS data compression.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
01 11 2020
Historique:
received: 21 10 2019
revised: 25 04 2020
accepted: 20 05 2020
pubmed: 28 5 2020
medline: 20 2 2021
entrez: 28 5 2020
Statut: ppublish

Résumé

Per-base quality values in Next Generation Sequencing data take a significant portion of storage even after compression. Lossy compression technologies could further reduce the space used by quality values. However, in many applications, lossless compression is still desired. Hence, sequencing data in multiple file formats have to be prepared for different applications. We developed a scalable lossy to lossless compression solution for quality values named ScaleQC (Scalable Quality value Compression). ScaleQC is able to provide the so-called bit-stream level scalability that the losslessly compressed bit-stream by ScaleQC can be further truncated to lower data rates without incurring an expensive transcoding operation. Despite its scalability, ScaleQC still achieves comparable compression performance at both lossless and lossy data rates compared to the existing lossless or lossy compressors. ScaleQC has been integrated with SAMtools as a special quality value encoding mode for CRAM. Its source codes can be obtained from our integrated SAMtools (https://github.com/xmuyulab/samtools) with dependency on integrated HTSlib (https://github.com/xmuyulab/htslib). Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 32458976
pii: 5847594
doi: 10.1093/bioinformatics/btaa543
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

4551-4559

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Auteurs

Rongshan Yu (R)

Digital Fujian Institute of Healthcare and Biomedical Big Data, School of Informatics, Xiamen University, Xiamen 316005, China.
Aginome Scientific, Xiamen 316005, China.

Wenxian Yang (W)

Aginome Scientific, Xiamen 316005, China.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH