Distance indexing and seed clustering in sequence graphs.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
01 07 2020
Historique:
entrez: 14 7 2020
pubmed: 14 7 2020
medline: 9 3 2021
Statut: ppublish

Résumé

Graph representations of genomes are capable of expressing more genetic variation and can therefore better represent a population than standard linear genomes. However, due to the greater complexity of genome graphs relative to linear genomes, some functions that are trivial on linear genomes become much more difficult in genome graphs. Calculating distance is one such function that is simple in a linear genome but complicated in a graph context. In read mapping algorithms such distance calculations are fundamental to determining if seed alignments could belong to the same mapping. We have developed an algorithm for quickly calculating the minimum distance between positions on a sequence graph using a minimum distance index. We have also developed an algorithm that uses the distance index to cluster seeds on a graph. We demonstrate that our implementations of these algorithms are efficient and practical to use for a new generation of mapping algorithms based upon genome graphs. Our algorithms have been implemented as part of the vg toolkit and are available at https://github.com/vgteam/vg.

Identifiants

pubmed: 32657356
pii: 5870464
doi: 10.1093/bioinformatics/btaa446
pmc: PMC7355256
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

i146-i153

Subventions

Organisme : NHGRI NIH HHS
ID : R01 HG010485
Pays : United States
Organisme : NHGRI NIH HHS
ID : T32 HG008345
Pays : United States
Organisme : NHLBI NIH HHS
ID : U01 HL137183
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010053
Pays : United States

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press.

Références

Nat Genet. 2019 Feb;51(2):354-362
pubmed: 30643257
Genome Biol. 2009;10(9):R98
pubmed: 19761611
Genome Res. 2017 May;27(5):665-676
pubmed: 28360232
Nat Biotechnol. 2014 Mar;32(3):246-51
pubmed: 24531798
Bioinformatics. 2016 Jul 15;32(14):2103-10
pubmed: 27153593
Nat Biotechnol. 2018 Oct;36(9):875-879
pubmed: 30125266
Bioinformatics. 2019 Oct 1;35(19):3599-3607
pubmed: 30851095
Sci Data. 2016 Jun 07;3:160025
pubmed: 27271295
J Comput Biol. 2018 Jul;25(7):649-663
pubmed: 29461862
Brief Bioinform. 2018 Jan 1;19(1):118-135
pubmed: 27769991

Auteurs

Xian Chang (X)

Department of Biomolecular Engineering, University of California Santa Cruz Genomics Institute, Santa Cruz, CA 95060, USA.

Jordan Eizenga (J)

Department of Biomolecular Engineering, University of California Santa Cruz Genomics Institute, Santa Cruz, CA 95060, USA.

Adam M Novak (AM)

Department of Biomolecular Engineering, University of California Santa Cruz Genomics Institute, Santa Cruz, CA 95060, USA.

Jouni Sirén (J)

Department of Biomolecular Engineering, University of California Santa Cruz Genomics Institute, Santa Cruz, CA 95060, USA.

Benedict Paten (B)

Department of Biomolecular Engineering, University of California Santa Cruz Genomics Institute, Santa Cruz, CA 95060, USA.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH