GABAC: an arithmetic coding solution for genomic data.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
01 04 2020
01 04 2020
Historique:
received:
13
06
2019
revised:
10
11
2019
accepted:
09
12
2019
pubmed:
13
12
2019
medline:
17
9
2020
entrez:
13
12
2019
Statut:
ppublish
Résumé
In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM. The GABAC library is written in C++. We also provide a command line application which exercises all features provided by the library. GABAC can be downloaded from https://github.com/mitogen/gabac. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 31830243
pii: 5674036
doi: 10.1093/bioinformatics/btz922
pmc: PMC7141842
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
2275-2277Informations de copyright
© The Author(s) 2019. Published by Oxford University Press.
Références
Genome Res. 2011 May;21(5):734-40
pubmed: 21245279
Bioinformatics. 2014 Oct;30(19):2818-9
pubmed: 24930138
Nat Methods. 2014 Nov;11(11):1082-4
pubmed: 25357237