The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences.

DFT DWT PAA alignment assembly compressive genomics data compression data transformation taxonomic classification time series

Journal

Viruses
ISSN: 1999-4915
Titre abrégé: Viruses
Pays: Switzerland
ID NLM: 101509722

Informations de publication

Date de publication:
26 04 2019
Historique:
received: 30 03 2019
revised: 19 04 2019
accepted: 22 04 2019
entrez: 1 5 2019
pubmed: 1 5 2019
medline: 13 8 2020
Statut: epublish

Résumé

Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.

Identifiants

pubmed: 31035503
pii: v11050394
doi: 10.3390/v11050394
pmc: PMC6563281
pii:
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Medical Research Council
ID : MC_UU_12014/12
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/H012419/1
Pays : United Kingdom
Organisme : Wellcome Trust
ID : 097820/Z/11/B
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/M001121/1
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : DTP studentship
Pays : United Kingdom

Références

Phys Rev Lett. 1992 Jun 22;68(25):3805-3808
pubmed: 10045801
Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53
pubmed: 11504945
Nucleic Acids Res. 2002 Jul 15;30(14):3059-66
pubmed: 12136088
Analyst. 2004 Jun;129(6):542-52
pubmed: 15152333
Nature. 2005 Sep 15;437(7057):376-80
pubmed: 16056220
Bioinformation. 2006 Oct 07;1(6):197-202
pubmed: 17597888
Nature. 2008 Nov 6;456(7218):53-9
pubmed: 18987734
Science. 2009 Jan 2;323(5910):133-8
pubmed: 19023044
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
J Comput Biol. 2010 Apr;17(4):603-15
pubmed: 20426693
PLoS Comput Biol. 2010 Dec 16;6(12):e1001022
pubmed: 21187908
Nature. 2011 Jul 20;475(7356):348-52
pubmed: 21776081
Genome Res. 2011 Dec;21(12):2224-41
pubmed: 21926179
Nat Genet. 2012 Jan 08;44(2):226-32
pubmed: 22231483
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
PLoS One. 2013;8(2):e54201
pubmed: 23393554
Gigascience. 2013 Jul 22;2(1):10
pubmed: 23870653
Brief Bioinform. 2014 Mar;15(2):138-54
pubmed: 24413184
Algorithms Mol Biol. 2014 Feb 24;9(1):2
pubmed: 24565280
Bioinformatics. 2014 Jul 1;30(13):1837-43
pubmed: 24626854
BMC Genomics. 2014 Apr 05;15:264
pubmed: 24708189
J Virol. 2014 Oct;88(19):11056-69
pubmed: 25056894
PLoS One. 2014 Jul 25;9(7):e101271
pubmed: 25062443
BMC Bioinformatics. 2014;15 Suppl 7:S1
pubmed: 25079667
Genome Res. 2015 Jan;25(1):119-28
pubmed: 25373147
Bioinformatics. 2015 May 15;31(10):1674-6
pubmed: 25609793
J Virol. 2015 Apr;89(7):3444-54
pubmed: 25609811
Nat Biotechnol. 2015 Jun;33(6):623-30
pubmed: 26006009
Biomol Detect Quantif. 2015 Mar;3:1-8
pubmed: 26753127
Nat Commun. 2016 Apr 13;7:11257
pubmed: 27071849
Nat Commun. 2016 Apr 15;7:11307
pubmed: 27079541
Virus Evol. 2016 Jun 22;2(1):vew016
pubmed: 28694998
Virus Evol. 2016 Oct 03;2(2):vew027
pubmed: 28748110
Virus Evol. 2016 Aug 03;2(2):vew022
pubmed: 29492275
J Infect Dis. 2018 May 5;217(11):1728-1739
pubmed: 29741740
Nat Biotechnol. 2019 Feb;37(2):124-126
pubmed: 30670796
Genome Biol. 2019 Feb 4;20(1):26
pubmed: 30717772
Sci Rep. 2019 Feb 15;9(1):2159
pubmed: 30770850
J Theor Biol. 1986 Feb 7;118(3):295-300
pubmed: 3713213
J Comput Biol. 1995 Summer;2(2):275-90
pubmed: 7497129

Auteurs

Avraam Tapinos (A)

School of Biological Sciences, The University of Manchester, Manchester M13 9PT, UK. avraam.tapinos@manchester.ac.uk.

Bede Constantinides (B)

School of Biological Sciences, The University of Manchester, Manchester M13 9PT, UK. bede.constantinides@manchester.ac.uk.
Modernising Medical Microbiology Consortium, Nuffield Department of Clinical Medicine, John Radcliffe Hospital, University of Oxford, Oxford OX3 9DU, UK. bede.constantinides@manchester.ac.uk.

My V T Phan (MVT)

Department of Viroscience, Erasmus Medical Centre, Doctor Molewaterplein 40, 3015 GD Rotterdam, The Netherlands. v.t.m.phan@erasmusmc.nl.

Samaneh Kouchaki (S)

School of Biological Sciences, The University of Manchester, Manchester M13 9PT, UK. samaneh.kouchaki@eng.ox.ac.uk.
Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford OX3 7DQ, UK. samaneh.kouchaki@eng.ox.ac.uk.

Matthew Cotten (M)

Department of Viroscience, Erasmus Medical Centre, Doctor Molewaterplein 40, 3015 GD Rotterdam, The Netherlands. mlcotten13@gmail.com.
MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK. mlcotten13@gmail.com.
MRC/UVRI & LSHTM Uganda Research Unit Entebbe, P.O. Box 49 Entebbe, Uganda. mlcotten13@gmail.com.

David L Robertson (DL)

School of Biological Sciences, The University of Manchester, Manchester M13 9PT, UK. david.l.robertson@glasgow.ac.uk.
MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK. david.l.robertson@glasgow.ac.uk.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH