A hybrid pipeline for reconstruction and analysis of viral genomes at multi-organ level.
JC polyomavirus
efficient pipeline
genome analysis
mitochondrial DNA
multi-organ sequencing
parvovirus B19
viral genomes
Journal
GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872
Informations de publication
Date de publication:
01 08 2020
01 08 2020
Historique:
received:
17
01
2020
revised:
25
05
2020
accepted:
23
07
2020
entrez:
21
8
2020
pubmed:
21
8
2020
medline:
26
10
2021
Statut:
ppublish
Résumé
Advances in sequencing technologies have enabled the characterization of multiple microbial and host genomes, opening new frontiers of knowledge while kindling novel applications and research perspectives. Among these is the investigation of the viral communities residing in the human body and their impact on health and disease. To this end, the study of samples from multiple tissues is critical, yet, the complexity of such analysis calls for a dedicated pipeline. We provide an automatic and efficient pipeline for identification, assembly, and analysis of viral genomes that combines the DNA sequence data from multiple organs. TRACESPipe relies on cooperation among 3 modalities: compression-based prediction, sequence alignment, and de novo assembly. The pipeline is ultra-fast and provides, additionally, secure transmission and storage of sensitive data. TRACESPipe performed outstandingly when tested on synthetic and ex vivo datasets, identifying and reconstructing all the viral genomes, including those with high levels of single-nucleotide polymorphisms. It also detected minimal levels of genomic variation between different organs. TRACESPipe's unique ability to simultaneously process and analyze samples from different sources enables the evaluation of within-host variability. This opens up the possibility to investigate viral tissue tropism, evolution, fitness, and disease associations. Moreover, additional features such as DNA damage estimation and mitochondrial DNA reconstruction and analysis, as well as exogenous-source controls, expand the utility of this pipeline to other fields such as forensics and ancient DNA studies. TRACESPipe is released under GPLv3 and is available for free download at https://github.com/viromelab/tracespipe.
Sections du résumé
BACKGROUND
Advances in sequencing technologies have enabled the characterization of multiple microbial and host genomes, opening new frontiers of knowledge while kindling novel applications and research perspectives. Among these is the investigation of the viral communities residing in the human body and their impact on health and disease. To this end, the study of samples from multiple tissues is critical, yet, the complexity of such analysis calls for a dedicated pipeline. We provide an automatic and efficient pipeline for identification, assembly, and analysis of viral genomes that combines the DNA sequence data from multiple organs. TRACESPipe relies on cooperation among 3 modalities: compression-based prediction, sequence alignment, and de novo assembly. The pipeline is ultra-fast and provides, additionally, secure transmission and storage of sensitive data.
FINDINGS
TRACESPipe performed outstandingly when tested on synthetic and ex vivo datasets, identifying and reconstructing all the viral genomes, including those with high levels of single-nucleotide polymorphisms. It also detected minimal levels of genomic variation between different organs.
CONCLUSIONS
TRACESPipe's unique ability to simultaneously process and analyze samples from different sources enables the evaluation of within-host variability. This opens up the possibility to investigate viral tissue tropism, evolution, fitness, and disease associations. Moreover, additional features such as DNA damage estimation and mitochondrial DNA reconstruction and analysis, as well as exogenous-source controls, expand the utility of this pipeline to other fields such as forensics and ancient DNA studies. TRACESPipe is released under GPLv3 and is available for free download at https://github.com/viromelab/tracespipe.
Identifiants
pubmed: 32815536
pii: 5894824
doi: 10.1093/gigascience/giaa086
pmc: PMC7439602
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© The Author(s) 2020. Published by Oxford University Press.
Références
Nucleic Acids Res. 2007 Jan;35(Database issue):D5-12
pubmed: 17170002
ISME J. 2017 Jan;11(1):7-14
pubmed: 27420028
Nature. 1981 Apr 9;290(5806):457-65
pubmed: 7219534
Bioinformatics. 2010 Mar 1;26(5):589-95
pubmed: 20080505
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
J Gen Virol. 2015 Jun;96(Pt 6):1193-1206
pubmed: 26068186
PeerJ. 2018 Jan 12;6:e4227
pubmed: 29340239
Gigascience. 2020 Aug 1;9(8):
pubmed: 32815536
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W6-9
pubmed: 16845079
Front Microbiol. 2016 Jun 09;7:822
pubmed: 27375564
Nat Methods. 2018 Jul;15(7):475-476
pubmed: 29967506
Curr Protoc Bioinformatics. 2014 Sep 08;47:11.12.1-34
pubmed: 25199790
J Comput Biol. 2000 Feb-Apr;7(1-2):203-14
pubmed: 10890397
Sci Rep. 2016 Mar 30;6:23774
pubmed: 27026381
BMC Genomics. 2016 Mar 01;17:165
pubmed: 26932765
PLoS One. 2013 Nov 21;8(11):e79922
pubmed: 24278218
Sci Rep. 2015 Nov 27;5:17226
pubmed: 26611279
Genes (Basel). 2018 Sep 06;9(9):
pubmed: 30200636
Bioinformatics. 2012 Feb 15;28(4):593-4
pubmed: 22199392
Bioinformatics. 2013 Jul 01;29(13):1682-4
pubmed: 23613487
Genome Res. 2017 May;27(5):824-834
pubmed: 28298430
Bioinformatics. 2019 Mar 1;35(5):871-873
pubmed: 30124794
Bioinformatics. 2011 Mar 1;27(5):718-9
pubmed: 21208982
Bioinformatics. 2011 Nov 1;27(21):2987-93
pubmed: 21903627
Nature. 2016 Aug 25;536(7617):425-30
pubmed: 27533034
Microbiome. 2017 Jul 6;5(1):69
pubmed: 28683828
Investig Genet. 2014 Jul 30;5:9
pubmed: 25101166
Bioinformatics. 2019 Jan 1;35(1):146-148
pubmed: 30020420
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
PLoS Comput Biol. 2018 Jan 26;14(1):e1005944
pubmed: 29373581
Nat Biotechnol. 2011 Jan;29(1):24-6
pubmed: 21221095
Mol Biol Evol. 2020 Feb 1;37(2):442-454
pubmed: 31593241
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Nat Commun. 2018 Aug 10;9(1):3205
pubmed: 30097567
Nat Genet. 1999 Oct;23(2):147
pubmed: 10508508
Genome Biol. 2019 Jul 25;20(1):144
pubmed: 31345254
Methods Mol Biol. 2012;840:197-228
pubmed: 22237537