The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences.
DFT
DWT
PAA
alignment
assembly
compressive genomics
data compression
data transformation
taxonomic classification
time series
Journal
Viruses
ISSN: 1999-4915
Titre abrégé: Viruses
Pays: Switzerland
ID NLM: 101509722
Informations de publication
Date de publication:
26 04 2019
26 04 2019
Historique:
received:
30
03
2019
revised:
19
04
2019
accepted:
22
04
2019
entrez:
1
5
2019
pubmed:
1
5
2019
medline:
13
8
2020
Statut:
epublish
Résumé
Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.
Identifiants
pubmed: 31035503
pii: v11050394
doi: 10.3390/v11050394
pmc: PMC6563281
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Medical Research Council
ID : MC_UU_12014/12
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/H012419/1
Pays : United Kingdom
Organisme : Wellcome Trust
ID : 097820/Z/11/B
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/M001121/1
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : DTP studentship
Pays : United Kingdom
Références
Phys Rev Lett. 1992 Jun 22;68(25):3805-3808
pubmed: 10045801
Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53
pubmed: 11504945
Nucleic Acids Res. 2002 Jul 15;30(14):3059-66
pubmed: 12136088
Analyst. 2004 Jun;129(6):542-52
pubmed: 15152333
Nature. 2005 Sep 15;437(7057):376-80
pubmed: 16056220
Bioinformation. 2006 Oct 07;1(6):197-202
pubmed: 17597888
Nature. 2008 Nov 6;456(7218):53-9
pubmed: 18987734
Science. 2009 Jan 2;323(5910):133-8
pubmed: 19023044
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
J Comput Biol. 2010 Apr;17(4):603-15
pubmed: 20426693
PLoS Comput Biol. 2010 Dec 16;6(12):e1001022
pubmed: 21187908
Nature. 2011 Jul 20;475(7356):348-52
pubmed: 21776081
Genome Res. 2011 Dec;21(12):2224-41
pubmed: 21926179
Nat Genet. 2012 Jan 08;44(2):226-32
pubmed: 22231483
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
PLoS One. 2013;8(2):e54201
pubmed: 23393554
Gigascience. 2013 Jul 22;2(1):10
pubmed: 23870653
Brief Bioinform. 2014 Mar;15(2):138-54
pubmed: 24413184
Algorithms Mol Biol. 2014 Feb 24;9(1):2
pubmed: 24565280
Bioinformatics. 2014 Jul 1;30(13):1837-43
pubmed: 24626854
BMC Genomics. 2014 Apr 05;15:264
pubmed: 24708189
J Virol. 2014 Oct;88(19):11056-69
pubmed: 25056894
PLoS One. 2014 Jul 25;9(7):e101271
pubmed: 25062443
BMC Bioinformatics. 2014;15 Suppl 7:S1
pubmed: 25079667
Genome Res. 2015 Jan;25(1):119-28
pubmed: 25373147
Bioinformatics. 2015 May 15;31(10):1674-6
pubmed: 25609793
J Virol. 2015 Apr;89(7):3444-54
pubmed: 25609811
Nat Biotechnol. 2015 Jun;33(6):623-30
pubmed: 26006009
Biomol Detect Quantif. 2015 Mar;3:1-8
pubmed: 26753127
Nat Commun. 2016 Apr 13;7:11257
pubmed: 27071849
Nat Commun. 2016 Apr 15;7:11307
pubmed: 27079541
Virus Evol. 2016 Jun 22;2(1):vew016
pubmed: 28694998
Virus Evol. 2016 Oct 03;2(2):vew027
pubmed: 28748110
Virus Evol. 2016 Aug 03;2(2):vew022
pubmed: 29492275
J Infect Dis. 2018 May 5;217(11):1728-1739
pubmed: 29741740
Nat Biotechnol. 2019 Feb;37(2):124-126
pubmed: 30670796
Genome Biol. 2019 Feb 4;20(1):26
pubmed: 30717772
Sci Rep. 2019 Feb 15;9(1):2159
pubmed: 30770850
J Theor Biol. 1986 Feb 7;118(3):295-300
pubmed: 3713213
J Comput Biol. 1995 Summer;2(2):275-90
pubmed: 7497129