LEC-Codec: Learning-Based Genome Data Compression.
Journal
IEEE/ACM transactions on computational biology and bioinformatics
ISSN: 1557-9964
Titre abrégé: IEEE/ACM Trans Comput Biol Bioinform
Pays: United States
ID NLM: 101196755
Informations de publication
Date de publication:
03 Oct 2024
03 Oct 2024
Historique:
medline:
4
10
2024
pubmed:
4
10
2024
entrez:
3
10
2024
Statut:
aheadofprint
Résumé
In this paper, we propose a Learning-based gEnome Codec (LEC), which is designed for high efficiency and enhanced flexibility. The LEC integrates several advanced technologies, including Group of Bases (GoB) compression, multi-stride coding and bidirectional prediction, all of which are aimed at optimizing the balance between coding complexity and performance in lossless compression. The model applied in our proposed codec is data-driven, based on deep neural networks to infer probabilities for each symbol, enabling fully parallel encoding and decoding with configured complexity for diverse applications. Based upon a set of configurations on compression ratios and inference speed, experimental results show that the proposed method is very efficient in terms of compression performance and provides improved flexibility in real-world applications.
Identifiants
pubmed: 39361454
doi: 10.1109/TCBB.2024.3473899
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM