Distance indexing and seed clustering in sequence graphs.

Algorithms Cluster Analysis Genome Sequence Analysis, DNA Software

Journal

Bioinformatics (Oxford, England)

ISSN: 1367-4811

Titre abrégé: Bioinformatics

Pays: England

ID NLM: 9808944

Informations de publication

Date de publication:
01 07 2020

Historique:

entrez: 14 7 2020

pubmed: 14 7 2020

medline: 9 3 2021

Statut: ppublish

Résumé

Graph representations of genomes are capable of expressing more genetic variation and can therefore better represent a population than standard linear genomes. However, due to the greater complexity of genome graphs relative to linear genomes, some functions that are trivial on linear genomes become much more difficult in genome graphs. Calculating distance is one such function that is simple in a linear genome but complicated in a graph context. In read mapping algorithms such distance calculations are fundamental to determining if seed alignments could belong to the same mapping. We have developed an algorithm for quickly calculating the minimum distance between positions on a sequence graph using a minimum distance index. We have also developed an algorithm that uses the distance index to cluster seeds on a graph. We demonstrate that our implementations of these algorithms are efficient and practical to use for a new generation of mapping algorithms based upon genome graphs. Our algorithms have been implemented as part of the vg toolkit and are available at https://github.com/vgteam/vg.

Identifiants

DOI: 10.1093/bioinformatics/btaa446 PMID: 32657356 PMC: PMC7355256

pubmed: 32657356

pii: 5870464

doi: 10.1093/bioinformatics/btaa446

pmc: PMC7355256

doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

Pagination

i146-i153

Subventions

Organisme : NHGRI NIH HHS

ID : R01 HG010485

Pays : United States

Organisme : NHGRI NIH HHS

ID : T32 HG008345

Pays : United States

Organisme : NHLBI NIH HHS

ID : U01 HL137183

Pays : United States

Organisme : NHGRI NIH HHS

ID : R01 HG010053

Pays : United States

Informations de copyright

Références

Nat Genet. 2019 Feb;51(2):354-362

pubmed: 30643257

Genome Biol. 2009;10(9):R98

pubmed: 19761611

Genome Res. 2017 May;27(5):665-676

pubmed: 28360232

Nat Biotechnol. 2014 Mar;32(3):246-51

pubmed: 24531798

Bioinformatics. 2016 Jul 15;32(14):2103-10

pubmed: 27153593

Nat Biotechnol. 2018 Oct;36(9):875-879

pubmed: 30125266

Bioinformatics. 2019 Oct 1;35(19):3599-3607

pubmed: 30851095

Sci Data. 2016 Jun 07;3:160025

pubmed: 27271295

J Comput Biol. 2018 Jul;25(7):649-663

pubmed: 29461862

Brief Bioinform. 2018 Jan 1;19(1):118-135

pubmed: 27769991

Distance indexing and seed clustering in sequence graphs.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Informations de copyright

Références

Auteurs

Xian Chang (X)

Jordan Eizenga (J)

Adam M Novak (AM)

Jouni Sirén (J)

Benedict Paten (B)

Articles similaires

Comprehensive comparative analysis and development of molecular markers for Lasianthus species based on complete chloroplast genome sequences.

Selecting optimal software code descriptors-The case of Java.

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Fasciola hepatica and Fasciola hybrid form co-existence in yak from Tibet of China: application of rDNA internal transcribed spacer.

Classifications MeSH