GBZ file format for pangenome graphs.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
15 11 2022
15 11 2022
Historique:
received:
12
07
2022
revised:
06
09
2022
accepted:
30
09
2022
pubmed:
1
10
2022
medline:
19
11
2022
entrez:
30
9
2022
Statut:
ppublish
Résumé
Pangenome graphs representing aligned genome assemblies are being shared in the text-based Graphical Fragment Assembly format. As the number of assemblies grows, there is a need for a file format that can store the highly repetitive data space efficiently. We propose the GBZ file format based on data structures used in the Giraffe short-read aligner. The format provides good compression, and the files can be efficiently loaded into in-memory data structures. We provide compression and decompression tools and libraries for using GBZ graphs, and we show that they can be efficiently used on a variety of systems. C++ and Rust implementations are available at https://github.com/jltsiren/gbwtgraph and https://github.com/jltsiren/gbwt-rs, respectively. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 36179091
pii: 6731924
doi: 10.1093/bioinformatics/btac656
pmc: PMC9665857
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
5012-5018Subventions
Organisme : NHGRI NIH HHS
ID : R01 HG010485
Pays : United States
Organisme : NIH HHS
ID : U01HG010961
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG010971
Pays : United States
Organisme : NIH HHS
ID : OT2 OD026682
Pays : United States
Organisme : NHGRI NIH HHS
ID : U41 HG010972
Pays : United States
Informations de copyright
© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Références
Genome Biol. 2019 Dec 19;20(1):291
pubmed: 31856913
Genome Biol. 2020 Sep 24;21(1):253
pubmed: 32972461
Nature. 2022 Apr;604(7906):437-446
pubmed: 35444317
Nat Biotechnol. 2019 Aug;37(8):907-915
pubmed: 31375807
Bioinformatics. 2020 Jan 15;36(2):400-407
pubmed: 31406990
Nat Biotechnol. 2018 Oct;36(9):875-879
pubmed: 30125266
Genome Biol. 2020 Feb 12;21(1):35
pubmed: 32051000
Annu Rev Genomics Hum Genet. 2020 Aug 31;21:139-162
pubmed: 32453966
Nat Genet. 2017 Nov;49(11):1654-1660
pubmed: 28945251
Genome Biol. 2020 Oct 16;21(1):265
pubmed: 33066802
Bioinformatics. 2021 Jan 29;36(21):5139-5144
pubmed: 33040146
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Nat Genet. 2022 Apr;54(4):518-525
pubmed: 35410384
Science. 2021 Dec 17;374(6574):abg8871
pubmed: 34914532