Haplotype-aware graph indexes.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
15 01 2020
15 01 2020
Historique:
received:
20
02
2019
revised:
29
05
2019
accepted:
18
07
2019
pubmed:
14
8
2019
medline:
18
8
2020
entrez:
14
8
2019
Statut:
ppublish
Résumé
The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes. We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows-Wheeler transform. We demonstrate the scalability of the new implementation by building a whole-genome index of the 5008 haplotypes of the 1000 Genomes Project, and an index of all 108 070 Trans-Omics for Precision Medicine Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes. Our software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt and https://github.com/jltsiren/gcsa2. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 31406990
pii: 5538990
doi: 10.1093/bioinformatics/btz575
pmc: PMC7223266
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
400-407Subventions
Organisme : Wellcome Trust
Pays : United Kingdom
Organisme : Wellcome Trust
ID : 207492/Z/17/Z
Pays : United Kingdom
Organisme : NHLBI NIH HHS
ID : U01 HL137183
Pays : United States
Informations de copyright
© The Author(s) 2019. Published by Oxford University Press.
Références
Algorithms Mol Biol. 2017 Jul 11;12:18
pubmed: 28702075
Brief Bioinform. 2018 Jan 1;19(1):118-135
pubmed: 27769991
Genome Res. 2017 May;27(5):665-676
pubmed: 28360232
Bioinformatics. 2014 Nov 15;30(22):3274-5
pubmed: 25107872
Theor Comput Sci. 2017 Oct 25;698:67-78
pubmed: 29276331
Bioinformatics. 2011 Aug 1;27(15):2156-8
pubmed: 21653522
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Genome Biol. 2020 Mar 11;21(1):65
pubmed: 32160922
Bioinformatics. 2017 Jul 15;33(14):i118-i123
pubmed: 28881971
Nat Genet. 2019 Feb;51(2):354-362
pubmed: 30643257
IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):375-88
pubmed: 26355784
J Comput Biol. 2010 Mar;17(3):281-308
pubmed: 20377446
Genome Biol. 2009;10(9):R98
pubmed: 19761611
Nat Genet. 2017 Nov;49(11):1654-1660
pubmed: 28945251
Bioinformatics. 2014 May 1;30(9):1266-72
pubmed: 24413527
Bioinformatics. 2013 Jul 01;29(13):i361-70
pubmed: 23813006
Nat Biotechnol. 2018 Oct;36(9):875-879
pubmed: 30125266
Bioinformatics. 2012 Jul 15;28(14):1838-44
pubmed: 22569178