Misclassifications in human papillomavirus databases.
Chimera
HPVChimera
Human papillomavirus
International HPV Reference center
Journal
Virology
ISSN: 1096-0341
Titre abrégé: Virology
Pays: United States
ID NLM: 0110674
Informations de publication
Date de publication:
06 2021
06 2021
Historique:
received:
12
12
2020
revised:
23
02
2021
accepted:
04
03
2021
pubmed:
18
3
2021
medline:
29
12
2021
entrez:
17
3
2021
Statut:
ppublish
Résumé
We assessed the quality of human papillomavirus (HPV) sequences in GenBank by analyzing the possible presence of chimeras, "wrong-assembled" contigs and errors in taxonomy using an open-source script (HPVChimera_Gb) that compared 25 638 HPV-related nucleotide sequences in GenBank with the 221 numbered HPV types and another 220 complete HPV sequences. There were 110 sequences with taxonomy/naming errors (sequences reported as another HPV type than the one they corresponded to) and 1318 possibly chimeric sequences. Manual analysis found plausible explanations for most of them (e.g. sequence covering an integration site) but 114 sequences appeared to be chimeras (96/114 were already flagged as "unverified" by GenBank) and 13 had taxonomy/naming errors. When comparing all correct HPV sequences in GenBank, there appeared to exist about 800 unique putative HPV types. Systematic and regular work towards eliminating chimeric sequences and taxonomy/naming errors could increase the quality and order in HPV research.
Identifiants
pubmed: 33730650
pii: S0042-6822(21)00060-X
doi: 10.1016/j.virol.2021.03.002
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
57-66Informations de copyright
Copyright © 2021 The Authors. Published by Elsevier Inc. All rights reserved.