PHONI: Streamed Matching Statistics with Multi-Genome References.


Journal

Proceedings. Data Compression Conference
ISSN: 2375-0383
Titre abrégé: Proc Data Compress Conf
Pays: United States
ID NLM: 101662849

Informations de publication

Date de publication:
Mar 2021
Historique:
entrez: 15 11 2021
pubmed: 16 11 2021
medline: 16 11 2021
Statut: ppublish

Résumé

Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this paper, we simplify their solution and make it streaming, at the cost of slowing it down slightly. This means that, first, we can compute the matching statistics of several long patterns (such as whole human chromosomes) in parallel while still using a reasonable amount of RAM; second, we can compute matching statistics online with low latency and thus quickly recognize when a pattern becomes incompressible relative to the database. Our code is available at https://github.com/koeppl/phoni.

Identifiants

pubmed: 34778549
doi: 10.1109/dcc50243.2021.00027
pmc: PMC8583545
mid: NIHMS1750847
doi:

Types de publication

Journal Article

Langues

eng

Pagination

193-202

Subventions

Organisme : NIAID NIH HHS
ID : R01 AI141810
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG011392
Pays : United States

Références

Sci Rep. 2020 May 6;10(1):7622
pubmed: 32376847
Nat Biotechnol. 2021 Apr;39(4):431-441
pubmed: 33257863
J Comput Biol. 2020 Apr;27(4):500-513
pubmed: 32181684
Algorithms Mol Biol. 2019 May 24;14:13
pubmed: 31149025
Sci Rep. 2019 Aug 7;9(1):11475
pubmed: 31391493
J Comput Biol. 2022 Feb;29(2):169-187
pubmed: 35041495
Bioinformatics. 2020 Aug 15;36(16):4399-4405
pubmed: 32277811

Auteurs

Christina Boucher (C)

U Florida Gainesville, USA.

Travis Gagie (T)

Dalhousie U Halifax, Canada.

I Tomohiro (I)

Kyutech Fukuoka, Japan.

Dominik Köppl (D)

TMDU Tokyo, Japan.

Ben Langmead (B)

Johns Hopkins U Baltimore, USA.

Giovanni Manzini (G)

U Piemonte Orientale Alessandria, Italy.

Gonzalo Navarro (G)

CeBiB, DCC, U Chile Santiago, Chile.

Alejandro Pacheco (A)

CeBiB, DCC, U Chile Santiago, Chile.

Massimiliano Rossi (M)

U Florida Gainesville, USA.

Classifications MeSH