Distance indexing and seed clustering in sequence graphs.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
01 07 2020
01 07 2020
Historique:
entrez:
14
7
2020
pubmed:
14
7
2020
medline:
9
3
2021
Statut:
ppublish
Résumé
Graph representations of genomes are capable of expressing more genetic variation and can therefore better represent a population than standard linear genomes. However, due to the greater complexity of genome graphs relative to linear genomes, some functions that are trivial on linear genomes become much more difficult in genome graphs. Calculating distance is one such function that is simple in a linear genome but complicated in a graph context. In read mapping algorithms such distance calculations are fundamental to determining if seed alignments could belong to the same mapping. We have developed an algorithm for quickly calculating the minimum distance between positions on a sequence graph using a minimum distance index. We have also developed an algorithm that uses the distance index to cluster seeds on a graph. We demonstrate that our implementations of these algorithms are efficient and practical to use for a new generation of mapping algorithms based upon genome graphs. Our algorithms have been implemented as part of the vg toolkit and are available at https://github.com/vgteam/vg.
Identifiants
pubmed: 32657356
pii: 5870464
doi: 10.1093/bioinformatics/btaa446
pmc: PMC7355256
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
i146-i153Subventions
Organisme : NHGRI NIH HHS
ID : R01 HG010485
Pays : United States
Organisme : NHGRI NIH HHS
ID : T32 HG008345
Pays : United States
Organisme : NHLBI NIH HHS
ID : U01 HL137183
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010053
Pays : United States
Informations de copyright
© The Author(s) 2020. Published by Oxford University Press.
Références
Nat Genet. 2019 Feb;51(2):354-362
pubmed: 30643257
Genome Biol. 2009;10(9):R98
pubmed: 19761611
Genome Res. 2017 May;27(5):665-676
pubmed: 28360232
Nat Biotechnol. 2014 Mar;32(3):246-51
pubmed: 24531798
Bioinformatics. 2016 Jul 15;32(14):2103-10
pubmed: 27153593
Nat Biotechnol. 2018 Oct;36(9):875-879
pubmed: 30125266
Bioinformatics. 2019 Oct 1;35(19):3599-3607
pubmed: 30851095
Sci Data. 2016 Jun 07;3:160025
pubmed: 27271295
J Comput Biol. 2018 Jul;25(7):649-663
pubmed: 29461862
Brief Bioinform. 2018 Jan 1;19(1):118-135
pubmed: 27769991