GABAC: an arithmetic coding solution for genomic data.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
01 04 2020
Historique:
received: 13 06 2019
revised: 10 11 2019
accepted: 09 12 2019
pubmed: 13 12 2019
medline: 17 9 2020
entrez: 13 12 2019
Statut: ppublish

Résumé

In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM. The GABAC library is written in C++. We also provide a command line application which exercises all features provided by the library. GABAC can be downloaded from https://github.com/mitogen/gabac. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 31830243
pii: 5674036
doi: 10.1093/bioinformatics/btz922
pmc: PMC7141842
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

2275-2277

Informations de copyright

© The Author(s) 2019. Published by Oxford University Press.

Références

Genome Res. 2011 May;21(5):734-40
pubmed: 21245279
Bioinformatics. 2014 Oct;30(19):2818-9
pubmed: 24930138
Nat Methods. 2014 Nov;11(11):1082-4
pubmed: 25357237

Auteurs

Jan Voges (J)

Institut für Informationsverarbeitung (TNT), Leibniz University Hannover, 30167 Hannover, Germany.

Tom Paridaens (T)

IDLab, Ghent University-imec, 9050 Ghent, Belgium.

Fabian Müntefering (F)

Institut für Informationsverarbeitung (TNT), Leibniz University Hannover, 30167 Hannover, Germany.

Liudmila S Mainzer (LS)

National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

Brian Bliss (B)

National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

Mingyu Yang (M)

Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

Idoia Ochoa (I)

Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

Jan Fostier (J)

IDLab, Ghent University-imec, 9050 Ghent, Belgium.

Jörn Ostermann (J)

Institut für Informationsverarbeitung (TNT), Leibniz University Hannover, 30167 Hannover, Germany.

Mikel Hernaez (M)

Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH