Sequencing of animal viruses: quality data assurance for NGS bioinformatics.
Bioinformatics
NGS
Proficiency testing
Virology
Journal
Virology journal
ISSN: 1743-422X
Titre abrégé: Virol J
Pays: England
ID NLM: 101231645
Informations de publication
Date de publication:
21 11 2019
21 11 2019
Historique:
received:
17
07
2019
accepted:
16
09
2019
entrez:
23
11
2019
pubmed:
23
11
2019
medline:
29
4
2020
Statut:
epublish
Résumé
Next generation sequencing (NGS) is becoming widely used among diagnostics and research laboratories, and nowadays it is applied to a variety of disciplines, including veterinary virology. The NGS workflow comprises several steps, namely sample processing, library preparation, sequencing and primary/secondary/tertiary bioinformatics (BI) analyses. The latter is constituted by a complex process extremely difficult to standardize, due to the variety of tools and metrics available. Thus, it is of the utmost importance to assess the comparability of results obtained through different methods and in different laboratories. To achieve this goal, we have organized a proficiency test focused on the bioinformatics components for the generation of complete genome sequences of salmonid rhabdoviruses. Three partners, that performed virus sequencing using different commercial library preparation kits and NGS platforms, gathered together and shared with each other 75 raw datasets which were analyzed separately by the participants to produce a consensus sequence according to their own bioinformatics pipeline. Results were then compared to highlight discrepancies, and a subset of inconsistencies were investigated more in detail. In total, we observed 526 discrepancies, of which 39.5% were located at genome termini, 14.1% at intergenic regions and 46.4% at coding regions. Among these, 10 SNPs and 99 indels caused changes in the protein products. Overall reproducibility was 99.94%. Based on the analysis of a subset of inconsistencies investigated more in-depth, manual curation appeared the most critical step affecting sequence comparability, suggesting that the harmonization of this phase is crucial to obtain comparable results. The analysis of a calibrator sample allowed assessing BI accuracy, being 99.983%. We demonstrated the applicability and the usefulness of BI proficiency testing to assure the quality of NGS data, and recommend a wider implementation of such exercises to guarantee sequence data uniformity among different virology laboratories.
Sections du résumé
BACKGROUND
Next generation sequencing (NGS) is becoming widely used among diagnostics and research laboratories, and nowadays it is applied to a variety of disciplines, including veterinary virology. The NGS workflow comprises several steps, namely sample processing, library preparation, sequencing and primary/secondary/tertiary bioinformatics (BI) analyses. The latter is constituted by a complex process extremely difficult to standardize, due to the variety of tools and metrics available. Thus, it is of the utmost importance to assess the comparability of results obtained through different methods and in different laboratories. To achieve this goal, we have organized a proficiency test focused on the bioinformatics components for the generation of complete genome sequences of salmonid rhabdoviruses.
METHODS
Three partners, that performed virus sequencing using different commercial library preparation kits and NGS platforms, gathered together and shared with each other 75 raw datasets which were analyzed separately by the participants to produce a consensus sequence according to their own bioinformatics pipeline. Results were then compared to highlight discrepancies, and a subset of inconsistencies were investigated more in detail.
RESULTS
In total, we observed 526 discrepancies, of which 39.5% were located at genome termini, 14.1% at intergenic regions and 46.4% at coding regions. Among these, 10 SNPs and 99 indels caused changes in the protein products. Overall reproducibility was 99.94%. Based on the analysis of a subset of inconsistencies investigated more in-depth, manual curation appeared the most critical step affecting sequence comparability, suggesting that the harmonization of this phase is crucial to obtain comparable results. The analysis of a calibrator sample allowed assessing BI accuracy, being 99.983%.
CONCLUSIONS
We demonstrated the applicability and the usefulness of BI proficiency testing to assure the quality of NGS data, and recommend a wider implementation of such exercises to guarantee sequence data uniformity among different virology laboratories.
Identifiants
pubmed: 31752912
doi: 10.1186/s12985-019-1223-8
pii: 10.1186/s12985-019-1223-8
pmc: PMC6868765
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
140Références
Cell Host Microbe. 2014 Nov 12;16(5):691-700
pubmed: 25456074
Vet Res. 2016 Jan 08;47:10
pubmed: 26743117
Mol Cell. 2015 May 21;58(4):586-97
pubmed: 26000844
Mol Biol Evol. 2013 Apr;30(4):772-80
pubmed: 23329690
Bioinformatics. 2010 Mar 1;26(5):589-95
pubmed: 20080505
Clin Microbiol Infect. 2018 Apr;24(4):355-360
pubmed: 29117578
J Mol Diagn. 2017 May;19(3):341-365
pubmed: 28341590
J Virol. 2010 Oct;84(19):10038-50
pubmed: 20631140
Genome Res. 2010 Sep;20(9):1297-303
pubmed: 20644199
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
PLoS Negl Trop Dis. 2013 Nov 21;7(11):e2555
pubmed: 24278493
J Biotechnol. 2017 May 20;250:2-10
pubmed: 28495072
Rev Sci Tech. 2016 Apr;35(1):25-42
pubmed: 27217166
Pathol Res Pract. 2018 Jul;214(7):957-963
pubmed: 29807778
Methods Mol Biol. 2015;1247:415-36
pubmed: 25399113
J Mol Diagn. 2018 Jan;20(1):4-27
pubmed: 29154853
Science. 1962 Mar 23;135(3508):1065-6
pubmed: 14007940
J Gen Virol. 2015 Oct;96(10):2999-3009
pubmed: 26297666
Clin Chem. 2015 Jan;61(1):124-35
pubmed: 25451870
Nat Rev Genet. 2016 May 17;17(6):333-51
pubmed: 27184599
Nat Genet. 2011 May;43(5):491-8
pubmed: 21478889
Bioinformatics. 2010 Feb 1;26(3):401-2
pubmed: 19965881
Arch Pathol Lab Med. 2019 Apr;143(4):463-471
pubmed: 30376374
Rev Sci Tech. 2016 Apr;35(1):67-81
pubmed: 27217169
Nat Biotechnol. 2015 Jul;33(7):689-93
pubmed: 26154004
Genet Med. 2013 Sep;15(9):733-47
pubmed: 23887774
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
Nat Biotechnol. 2012 May;30(5):434-9
pubmed: 22522955
J Clin Microbiol. 2019 Jul 26;57(8):null
pubmed: 31167846
PLoS One. 2013;8(1):e52752
pubmed: 23308119
J Clin Microbiol. 2017 Aug;55(8):2502-2520
pubmed: 28592550
Rev Sci Tech. 2013 Dec;32(3):893-915
pubmed: 24761741
J Clin Microbiol. 2016 Dec;54(12):2857-2865
pubmed: 27510831
Nucleic Acids Res. 2012 Dec;40(22):11189-201
pubmed: 23066108
Front Cell Infect Microbiol. 2014 Mar 06;4:25
pubmed: 24639952
BMC Genomics. 2012 Jul 24;13:341
pubmed: 22827831
J Comput Biol. 2013 Oct;20(10):714-37
pubmed: 24093227
Curr Protoc Bioinformatics. 2013;43:11.10.1-11.10.33
pubmed: 25431634
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Arch Pathol Lab Med. 2015 Apr;139(4):481-93
pubmed: 25152313
J Mol Diagn. 2016 Jul;18(4):572-9
pubmed: 27155050
J Clin Virol. 2014 Sep;61(1):9-19
pubmed: 24998424
J Mol Diagn. 2014 May;16(3):283-7
pubmed: 24650895
Bioinformatics. 2014 Jul 15;30(14):2068-9
pubmed: 24642063
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
Arch Pathol Lab Med. 2016 Oct;140(10):1085-91
pubmed: 27388684