Haplotype-aware graph indexes.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
15 01 2020
Historique:
received: 20 02 2019
revised: 29 05 2019
accepted: 18 07 2019
pubmed: 14 8 2019
medline: 18 8 2020
entrez: 14 8 2019
Statut: ppublish

Résumé

The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes. We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows-Wheeler transform. We demonstrate the scalability of the new implementation by building a whole-genome index of the 5008 haplotypes of the 1000 Genomes Project, and an index of all 108 070 Trans-Omics for Precision Medicine Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes. Our software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt and https://github.com/jltsiren/gcsa2. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 31406990
pii: 5538990
doi: 10.1093/bioinformatics/btz575
pmc: PMC7223266
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

400-407

Subventions

Organisme : Wellcome Trust
Pays : United Kingdom
Organisme : Wellcome Trust
ID : 207492/Z/17/Z
Pays : United Kingdom
Organisme : NHLBI NIH HHS
ID : U01 HL137183
Pays : United States

Informations de copyright

© The Author(s) 2019. Published by Oxford University Press.

Références

Algorithms Mol Biol. 2017 Jul 11;12:18
pubmed: 28702075
Brief Bioinform. 2018 Jan 1;19(1):118-135
pubmed: 27769991
Genome Res. 2017 May;27(5):665-676
pubmed: 28360232
Bioinformatics. 2014 Nov 15;30(22):3274-5
pubmed: 25107872
Theor Comput Sci. 2017 Oct 25;698:67-78
pubmed: 29276331
Bioinformatics. 2011 Aug 1;27(15):2156-8
pubmed: 21653522
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Genome Biol. 2020 Mar 11;21(1):65
pubmed: 32160922
Bioinformatics. 2017 Jul 15;33(14):i118-i123
pubmed: 28881971
Nat Genet. 2019 Feb;51(2):354-362
pubmed: 30643257
IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):375-88
pubmed: 26355784
J Comput Biol. 2010 Mar;17(3):281-308
pubmed: 20377446
Genome Biol. 2009;10(9):R98
pubmed: 19761611
Nat Genet. 2017 Nov;49(11):1654-1660
pubmed: 28945251
Bioinformatics. 2014 May 1;30(9):1266-72
pubmed: 24413527
Bioinformatics. 2013 Jul 01;29(13):i361-70
pubmed: 23813006
Nat Biotechnol. 2018 Oct;36(9):875-879
pubmed: 30125266
Bioinformatics. 2012 Jul 15;28(14):1838-44
pubmed: 22569178

Auteurs

Jouni Sirén (J)

UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA.
Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.

Erik Garrison (E)

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.

Adam M Novak (AM)

UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA.

Benedict Paten (B)

UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA.

Richard Durbin (R)

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.
Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH