The K-mer File Format: a standardized and compact disk representation of sets of k-mers.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
15 09 2022
Historique:
received: 18 03 2022
revised: 27 06 2022
accepted: 26 07 2022
pubmed: 30 7 2022
medline: 15 11 2022
entrez: 29 7 2022
Statut: ppublish

Résumé

Bioinformatics applications increasingly rely on ad hoc disk storage of k-mer sets, e.g. for de Bruijn graphs or alignment indexes. Here, we introduce the K-mer File Format as a general lossless framework for storing and manipulating k-mer sets, realizing space savings of 3-5× compared to other formats, and bringing interoperability across tools. Format specification, C++/Rust API, tools: https://github.com/Kmer-File-Format/. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 35904548
pii: 6651834
doi: 10.1093/bioinformatics/btac528
pmc: PMC9477520
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

4423-4425

Subventions

Organisme : ANR Inception
ID : ANR-16-CONV-0005
Organisme : PRAIRIE
ID : ANR-19-P3IA-0001
Organisme : National Science Centre
ID : DEC-2019/33/B/ST6/02040
Organisme : National Science Foundation
ID : 1453527
Organisme : European Union's Horizon 2020 Research and Innovation Programme
Organisme : Marie Skłodowska-Curie
ID : 956229

Informations de copyright

© The Author(s) 2022. Published by Oxford University Press.

Références

Bioinformatics. 2017 Sep 01;33(17):2759-2761
pubmed: 28472236
J Comput Biol. 2021 Apr;28(4):381-394
pubmed: 33290137
Genome Biol. 2021 Apr 6;22(1):96
pubmed: 33823902
Bioinform Adv. 2022 Apr 29;2(1):vbac029
pubmed: 36699393
F1000Res. 2019 Jul 4;8:1006
pubmed: 31508216
Bioinformatics. 2013 Mar 1;29(5):652-3
pubmed: 23325618
BMC Bioinformatics. 2013 May 16;14:160
pubmed: 23679007
Bioinformatics. 2011 Mar 15;27(6):764-70
pubmed: 21217122
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
Genome Res. 2021 Jan;31(1):1-12
pubmed: 33328168

Auteurs

Yoann Dufresne (Y)

Computational Biology Department, Institut Pasteur, Université Paris Cité, F-75015 Paris, France.

Teo Lemane (T)

Univ Rennes, Inria, CNRS, IRISA-UMR, 6074 Rennes, France.

Pierre Marijon (P)

Heinrich Heine University Düsseldorf Medical Faculty Institute for Medical Biometry and Bioinformatic, Düsseldorf 40225, Germany.

Pierre Peterlongo (P)

Univ Rennes, Inria, CNRS, IRISA-UMR, 6074 Rennes, France.

Amatur Rahman (A)

Department of Computer Science and Engineering, The Pennsylvania State University, State College 16802, USA.

Marek Kokot (M)

Department of Algorithmics and Software, Silesian University of Technology, Gliwice, PL-44-100 Akademicka 16, Poland.

Paul Medvedev (P)

Department of Computer Science and Engineering, The Pennsylvania State University, State College 16802, USA.
Department of Biochemistry and Molecular Biology, The Pennsylvania State University, State College 16801, USA.
Huck Institutes of the Life Sciences, The Pennsylvania State University, State College 16802, USA.

Sebastian Deorowicz (S)

Department of Algorithmics and Software, Silesian University of Technology, Gliwice, PL-44-100 Akademicka 16, Poland.

Rayan Chikhi (R)

Computational Biology Department, Institut Pasteur, Université Paris Cité, F-75015 Paris, France.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH