Detection of single nucleotide polymorphisms in virus genomes assembled from high-throughput sequencing data: large-scale performance testing of sequence analysis strategies.
Bioinformatic
Genomic
Plant
Variant
Virus
Journal
PeerJ
ISSN: 2167-8359
Titre abrégé: PeerJ
Pays: United States
ID NLM: 101603425
Informations de publication
Date de publication:
2023
2023
Historique:
received:
20
02
2023
accepted:
10
07
2023
medline:
22
8
2023
pubmed:
21
8
2023
entrez:
21
8
2023
Statut:
epublish
Résumé
Recent developments in high-throughput sequencing (HTS) technologies and bioinformatics have drastically changed research in virology, especially for virus discovery. Indeed, proper monitoring of the viral population requires information on the different isolates circulating in the studied area. For this purpose, HTS has greatly facilitated the sequencing of new genomes of detected viruses and their comparison. However, bioinformatics analyses allowing reconstruction of genome sequences and detection of single nucleotide polymorphisms (SNPs) can potentially create bias and has not been widely addressed so far. Therefore, more knowledge is required on the limitations of predicting SNPs based on HTS-generated sequence samples. To address this issue, we compared the ability of 14 plant virology laboratories, each employing a different bioinformatics pipeline, to detect 21 variants of pepino mosaic virus (PepMV) in three samples through large-scale performance testing (PT) using three artificially designed datasets. To evaluate the impact of bioinformatics analyses, they were divided into three key steps: reads pre-processing, virus-isolate identification, and variant calling. Each step was evaluated independently through an original, PT design including discussion and validation between participants at each step. Overall, this work underlines key parameters influencing SNPs detection and proposes recommendations for reliable variant calling for plant viruses. The identification of the closest reference, mapping parameters and manual validation of the detection were recognized as the most impactful analysis steps for the success of the SNPs detections. Strategies to improve the prediction of SNPs are also discussed.
Identifiants
pubmed: 37601254
doi: 10.7717/peerj.15816
pii: 15816
pmc: PMC10439718
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e15816Informations de copyright
© 2023 Rollin et al.
Déclaration de conflit d'intérêts
The authors declare that they have no competing interests.
Références
J Gen Virol. 2014 Mar;95(Pt 3):724-732
pubmed: 24362963
Adv Virus Res. 2014;88:161-91
pubmed: 24373312
Mol Plant. 2015 Jun;8(6):831-46
pubmed: 25676455
Mol Ecol Resour. 2021 May;21(4):1216-1229
pubmed: 33534960
Front Plant Sci. 2020 Jul 17;11:1092
pubmed: 32765569
Genome Biol. 2017 Apr 27;18(1):77
pubmed: 28449691
Nucleic Acids Res. 2018 Jul 2;46(W1):W209-W214
pubmed: 29722874
Front Cell Infect Microbiol. 2022 Jan 18;11:781429
pubmed: 35118007
J Virol. 2015 May;89(9):4760-9
pubmed: 25673712
Phytopathology. 2019 Mar;109(3):488-497
pubmed: 30070618
Virus Genes. 2008 Feb;36(1):241-9
pubmed: 18074213
J Virol. 2017 Jul 27;91(16):
pubmed: 28592544
PLoS Genet. 2019 Oct 17;15(10):e1008271
pubmed: 31622336
Virology. 2017 Jan;500:130-138
pubmed: 27825033
Vaccines (Basel). 2021 Oct 18;9(10):
pubmed: 34696303
BMC Genomics. 2022 Feb 22;23(1):155
pubmed: 35193511
Nat Rev Microbiol. 2011 Jul 04;9(8):617-26
pubmed: 21725337
Genome Med. 2020 Oct 26;12(1):91
pubmed: 33106175
Bioinformatics. 2012 Feb 15;28(4):593-4
pubmed: 22199392
Pathogens. 2021 Sep 12;10(9):
pubmed: 34578206
PLoS Pathog. 2015 May 05;11(5):e1004838
pubmed: 25941809
Brief Bioinform. 2021 May 20;22(3):
pubmed: 34020538
Proc Natl Acad Sci U S A. 1999 Jul 6;96(14):8022-7
pubmed: 10393941
Virus Res. 2017 Jul 15;239:136-142
pubmed: 28192164