Bioinformatics Pipeline for Human Papillomavirus Short Read Genomic Sequences Classification Using Support Vector Machine.


Journal

Viruses
ISSN: 1999-4915
Titre abrégé: Viruses
Pays: Switzerland
ID NLM: 101509722

Informations de publication

Date de publication:
30 06 2020
Historique:
received: 03 06 2020
revised: 26 06 2020
accepted: 27 06 2020
entrez: 8 7 2020
pubmed: 8 7 2020
medline: 27 2 2021
Statut: epublish

Résumé

We recently developed a test based on the Agilent SureSelect target enrichment system capturing genomic fragments from 191 human papillomaviruses (HPV) types for Illumina sequencing. This enriched whole genome sequencing (eWGS) assay provides an approach to identify all HPV types in a sample. Here we present a machine learning algorithm that calls HPV types based on the eWGS output. The algorithm based on the support vector machine (SVM) technique was trained on eWGS data from 122 control samples with known HPV types. The new algorithm demonstrated good performance in HPV type detection for designed samples with 25 or greater HPV plasmid copies per sample. We compared the results of HPV typing made by the new algorithm for 261 residual epidemiologic samples with the results of the typing delivered by the standard HPV Linear Array (LA). The agreement between methods (97.4%) was substantial (kappa= 0.783). However, the new algorithm identified additionally 428 instances of HPV types not detectable by the LA assay by design. Overall, we have demonstrated that the bioinformatics pipeline is an accurate tool for calling HPV types by analyzing data generated by eWGS processing of DNA fragments extracted from control and epidemiological samples.

Identifiants

pubmed: 32629900
pii: v12070710
doi: 10.3390/v12070710
pmc: PMC7412107
pii:
doi:

Types de publication

Evaluation Study Journal Article Research Support, U.S. Gov't, P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Références

BMC Res Notes. 2016 Feb 12;9:88
pubmed: 26868221
J Clin Microbiol. 2017 Mar;55(3):811-823
pubmed: 27974548
Nucleic Acids Res. 2012 Jan;40(1):e3
pubmed: 22021376
BMC Genomics. 2019 Mar 20;20(1):231
pubmed: 30894118
Nucleic Acids Res. 2013 Jan;41(Database issue):D571-8
pubmed: 23093593
Bioinformatics. 2013 Jan 1;29(1):15-21
pubmed: 23104886

Auteurs

Alexandre Lomsadze (A)

Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech, Atlanta, GA 30332, USA.

Tengguo Li (T)

Division of High-Consequence Pathogens & Pathology, Centers for Disease Control and Prevention, Atlanta, GA 30329, USA.

Mangalathu S Rajeevan (MS)

Division of High-Consequence Pathogens & Pathology, Centers for Disease Control and Prevention, Atlanta, GA 30329, USA.

Elizabeth R Unger (ER)

Division of High-Consequence Pathogens & Pathology, Centers for Disease Control and Prevention, Atlanta, GA 30329, USA.

Mark Borodovsky (M)

Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech, Atlanta, GA 30332, USA.
School of Computational Science and Engineering Georgia Tech, Atlanta, GA 30332, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH