On leveraging self-supervised learning for accurate HCV genotyping.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
05 Jul 2024
Historique:
received: 11 03 2024
accepted: 06 06 2024
medline: 5 7 2024
pubmed: 5 7 2024
entrez: 4 7 2024
Statut: epublish

Résumé

Hepatitis C virus (HCV) is a major global health concern, affecting millions of individuals worldwide. While existing literature predominantly focuses on disease classification using clinical data, there exists a critical research gap concerning HCV genotyping based on genomic sequences. Accurate HCV genotyping is essential for patient management and treatment decisions. While the neural models excel at capturing complex patterns, they still face challenges, such as data scarcity, that exist a lot in computational genomics. To overcome this challenges, this paper introduces an advanced deep learning approach for HCV genotyping based on the graphical representation of nucleotide sequences that outperforms classical approaches. Notably, it is effective for both partial and complete HCV genomes and addresses challenges associated with imbalanced datasets. In this work, ten HCV genotypes: 1a, 1b, 2a, 2b, 2c, 3a, 3b, 4, 5, and 6 were used in the analysis. This study utilizes Chaos Game Representation for 2D mapping of genomic sequences, employing self-supervised learning using convolutional autoencoder for deep feature extraction, resulting in an outstanding performance for HCV genotyping compared to various machine learning and deep learning models. This baseline provides a benchmark against which the performance of the proposed approach and other models can be evaluated. The experimental results showcase a remarkable classification accuracy of over 99%, outperforming traditional deep learning models. This performance demonstrates the capability of the proposed model to accurately identify HCV genotypes in both partial and complete sequences and in dealing with data scarcity for certain genotypes. The results of the proposed model are compared to NCBI genotyping tool.

Identifiants

pubmed: 38965254
doi: 10.1038/s41598-024-64209-y
pii: 10.1038/s41598-024-64209-y
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

15463

Informations de copyright

© 2024. The Author(s).

Références

Petruzziello, A. et al. Global epidemiology of hepatitis C virus infection: An up-date of the distribution and circulation of hepatitis C virus genotypes. World J. Gastroenterol. 22(34), 7824 (2016).
pubmed: 27678366 pmcid: 5016383 doi: 10.3748/wjg.v22.i34.7824
Kowala-Piaskowska, A. Wirus zapalenie watroby typu C-budowa i replikacja a mozliwosci terapeutyczne i zjawisko opornosci. Postȩpy Biochemii 52(4), 399–407 (2006).
pubmed: 17536509
Irenam, A. I. E. R. & George, Y. W. Hepatitis C and HIV co-infection: A review. World J. Gastroenterol. 8(4), 577–579 (2002).
doi: 10.3748/wjg.v8.i4.577
Wyles, D. L., Sulkowski, M. S. & Dieterich, D. Management of hepatitis C/HIV coinfection in the era of highly effective hepatitis C virus direct-acting antiviral therapy. Clin. Infect. Dis. 63(suppl-1), S3–S11 (2016).
pubmed: 27363438 pmcid: 4928450 doi: 10.1093/cid/ciw219
Benhamou, Y. et al. Liver fibrosis progression in human immunodeficiency virus and hepatitis C virus coinfected patients. Hepatology 30(4), 1054–1058 (1999).
pubmed: 10498659 doi: 10.1002/hep.510300409
Sierra, C. M. et al. Progression of chronic hepatitis C to liver fibrosis and cirrhosis in patients coinfected with hepatitis C virus and human immunodeficiency virus. Clin. Infect. Dis. 36(4), 491–498 (2003).
doi: 10.1086/367643
Martın-Carbonero, L. et al. Increasing impact of chronic viral hepatitis on hospital admissions and mortality among HIV-infected patients. AIDS Res. Hum. Retroviruses 17(16), 1467–1471 (2001).
pubmed: 11709090 doi: 10.1089/08892220152644160
Daniel, H.D.-J. et al. Comparison of three different hepatitis C virus genotyping methods: 5 NCR PCR-RFLP, core type-specific PCR, and NS 5b sequencing in a Tertiary Care Hospital in South India. J. Clin. Lab. Anal. 31(3), e22045 (2017).
pubmed: 27580956 doi: 10.1002/jcla.22045
Asselah, T. et al. Eliminating hepatitis C within low-income countries-The need to cure genotypes 4, 5, 6. J. Hepatol. 68(4), 814–826 (2018).
pubmed: 29229584 doi: 10.1016/j.jhep.2017.11.037
Hedskog, C. et al. Identification of 19 novel hepatitis C virus subtypes–further expanding HCV classification. Open Forum Infect. Dis. 6(3), ofz076 (2019).
pubmed: 30949527 pmcid: 6440686 doi: 10.1093/ofid/ofz076
Bruno, S. et al. Hepatitis C virus genotypes and risk of hepatocellular carcinoma in cirrhosis: A prospective study. Hepatology 25(3), 754–758 (1997).
pubmed: 9049231 doi: 10.1002/hep.510250344
Liu, C.-H. & Kao, J.-H. Pan-genotypic direct-acting antivirals for patients with hepatitis C virus infection and chronic kidney disease stage 4 or 5. Hep. Intl. 16(5), 1001–1019 (2022).
doi: 10.1007/s12072-022-10390-z
Balk, E. M. et al. A systematic review of direct-acting antivirals for hepatitis C in advanced CKD. Kidney Int. Rep. 8(2), 240–253 (2023).
pubmed: 36815114 doi: 10.1016/j.ekir.2022.11.008
Nolte, F. S. et al. Clinical evaluation of two methods for genotyping hepatitis C virus based on analysis of the 5 noncoding region. J. Clin. Microbiol. 41(4), 1558–1564 (2003).
pubmed: 12682145 pmcid: 153875 doi: 10.1128/JCM.41.4.1558-1564.2003
Shahid, I. et al. Hepatitis C diagnosis: Simplified solutions, predictive barriers, and future promises. Diagnostics 11(7), 1253 (2021).
pubmed: 34359335 pmcid: 8305142 doi: 10.3390/diagnostics11071253
Sohn, Y.-H. et al. Performance evaluation of the Abbott RealTi me HCV Genotype II for hepatitis C virus genotyping. Clin. Chem. Lab. Med. 48(4), 469–474 (2010).
pubmed: 20128734 doi: 10.1515/CCLM.2010.093
Ai, T. et al. Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: A report of 1014 cases. Radiology 296(2), E32–E40 (2020).
pubmed: 32101510 doi: 10.1148/radiol.2020200642
Udugama, B. et al. Diagnosing COVID-19: The disease and tools for detection. ACS Nano 14(4), 3822–3835 (2020).
pubmed: 32223179 doi: 10.1021/acsnano.0c02624
Naseem, R. et al. Performance assessment of classification algorithms on early detection of liver syndrome. J. Healthc. Eng. 1, 6680002 (2020).
Jadhav, D. A. An enhanced and secured predictive model of Ada-Boost and Random-Forest techniques in HCV detections. Mater. Today Proc. 51, 186–195 (2022).
doi: 10.1016/j.matpr.2021.05.071
Akter, L. Detection of hepatitis C virus progressed patient’s liver condition using machine learning. In International Conference on Innovative Computing and Communications: Proceedings of ICICC 2021, Vol. 1, 71–80 (Springer, 2022).
Edeh, M. O. et al. Artificial intelligence-based ensemble learning model for prediction of hepatitis C disease. Front. Public Health 10, 892371 (2022).
pubmed: 35570979 pmcid: 9092454 doi: 10.3389/fpubh.2022.892371
Safdari, R. et al. Applying data mining techniques to classify patients with suspected hepatitis C virus infection. Intell. Med. 2(04), 193–198 (2022).
doi: 10.1016/j.imed.2021.12.003
Alizargar, A., Chang, Y.-L. & Tan, T.-H. Performance comparison of machine learning approaches on hepatitis C prediction employing data mining techniques. Bioengineering 10(4), 481 (2023).
pubmed: 37106668 pmcid: 10135598 doi: 10.3390/bioengineering10040481
Lilhore, U. K. et al. Hybrid model for precise hepatitis-C classification using improved random forest and SVM method. Sci. Rep. 13(1), 12473 (2023).
pubmed: 37528148 pmcid: 10394001 doi: 10.1038/s41598-023-36605-3
Li, C. Predictors selection strategy based on stepwise random forests and logistic regression model. In International Conference on Statistics, Data Science, and Computational Intelligence (CSDSCI 2022). Vol. 12510, 251–256, (SPIE, 2023).
Fan, Y., Lu, X. & Sun, G. IHCP: Interpretable hepatitis C prediction system based on black-box machine learning models. BMC Bioinform. 24(1), 333 (2023).
doi: 10.1186/s12859-023-05456-0
Qiu, P. et al. HCV genotyping using statistical classification approach. J. Biomed. Sci. 16, 1–9 (2009).
doi: 10.1186/1423-0127-16-62
Tanchotsrinon, W., Lursinsap, C. & Poovorawan, Y. A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition. BMC Bioinform. 16, 1–13 (2015).
doi: 10.1186/s12859-015-0493-4
Hammad, M. S. et al. A hybrid deep learning approach for COVID-19 detection based on genomic image processing techniques. Sci. Rep. 13(1), 4003 (2023).
pubmed: 36899035 pmcid: 9999081 doi: 10.1038/s41598-023-30941-0
Kuiken, C. et al. The Los Alamos hepatitis C sequence database. Bioinformatics 21(3), 379–384 (2005).
pubmed: 15377502 doi: 10.1093/bioinformatics/bth485
Kuiken, C. et al. The hepatitis C sequence database in Los Alamos. Nucleic Acids Res. 36(suppl-1), D512–D516 (2007).
pubmed: 18025038 pmcid: 2238885 doi: 10.1093/nar/gkm962
Jeffrey, H. J. Chaos game representation of gene structure. Nucleic Acids Res. 18(8), 2163–2170 (1990).
pubmed: 2336393 pmcid: 330698 doi: 10.1093/nar/18.8.2163
Almeida, J. S. et al. Analysis of genomic sequences by Chaos Game Representation. Bioinformatics 17(5), 429–437 (2001).
pubmed: 11331237 doi: 10.1093/bioinformatics/17.5.429
Tanchotsrinon, W., Lursinsap, C. & Poovorawan, Y. An efficient prediction of HPV genotypes from partial coding sequences by Chaos Game Representation and fuzzy k-nearest neighbor technique. Curr. Bioinform. 12(5), 431–440 (2017).
doi: 10.2174/1574893611666161110112006
Anitas, E. M. Fractal analysis of DNA sequences using frequency chaos game representation and small-angle scattering. Int. J. Mol. Sci. 23(3), 1847 (2022).
pubmed: 35163771 pmcid: 8836744 doi: 10.3390/ijms23031847
Yu, Z.-G., Anh, V. & Lau, K.-S. Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses. J. Theor. Biol. 226(3), 341–348 (2004).
pubmed: 14643648 doi: 10.1016/j.jtbi.2003.09.009
Ge, L. et al. Identifying anticancer peptides by using a generalized chaos game representation. J. Math. Biol. 78, 441–463 (2019).
pubmed: 30291366 doi: 10.1007/s00285-018-1279-x
Löchel, H. F. et al. Deep learning on chaos game representation for proteins. Bioinformatics 36(1), 272–279 (2020).
pubmed: 31225868 doi: 10.1093/bioinformatics/btz493
Sun, Z. et al. A novel numerical representation for proteins: Three-dimensional chaos game representation and its extended natural vector. Comput. Struct. Biotechnol. J. 18, 1904–1913 (2020).
pubmed: 32774785 pmcid: 7390779 doi: 10.1016/j.csbj.2020.07.004
Huang, B. et al. Sequence-based optimized chaos game representation and deep learning for peptide/protein classification. BioRxiv, 2022–09 (2022).
Zervou, M. A., Doutsi, E. & Tsakalides, P. Efficient protein structural class prediction via chaos game representation and recurrent neural networks. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2023).
Deschavanne, P. J. et al. Genomic signature: Characterization and classification of species assessed by chaos game representation of sequences. Mol. Biol. Evol. 16(10), 1391–1399 (1999).
pubmed: 10563018 doi: 10.1093/oxfordjournals.molbev.a026048
Joseph, J. & Sasikumar, R. Chaos game representation for comparison of whole genomes. BMC Bioinform. 7, 1–10 (2006).
doi: 10.1186/1471-2105-7-243
Karamichalis, R. et al. An investigation into inter-and intragenomic variations of graphic genomic signatures. BMC Bioinform. 16, 1–22 (2015).
doi: 10.1186/s12859-015-0655-4
Karamichalis, R. et al. Additive methods for genomic signatures. BMC Bioinform. 17, 1–18 (2016).
doi: 10.1186/s12859-016-1157-8
Chou, K.-C. Graphic rule for drug metabolism systems. Curr. Drug Metab. 11(4), 369–378 (2010).
pubmed: 20446902 doi: 10.2174/138920010791514261
Jha, K., Saha, S. & Singh, H. Prediction of protein-protein interaction using graph neural networks. Sci. Rep. 12(1), 8360 (2022).
pubmed: 35589837 pmcid: 9120162 doi: 10.1038/s41598-022-12201-9
Caron, M. et al. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 9650–9660 (2021).
Silva, G. L. F. D. et al. Convolutional neural network-based PSO for lung nodule false positive reduction on CT images. Comput. Methods Programs Biomed. 162, 109–118 (2018).
pubmed: 29903476 doi: 10.1016/j.cmpb.2018.05.006
Diniz, P. H. B. et al. Detection of white matter lesion regions in MRI using SLIC0 and convolutional neural network. Comput. Methods Programs Biomed. 167, 49–63 (2018).
pubmed: 29706405 doi: 10.1016/j.cmpb.2018.04.011
Akter, S. et al. COVID-19 detection using deep learning algorithm on chest X-ray images. Biology 10(11), 1174 (2021).
pubmed: 34827167 pmcid: 8614951 doi: 10.3390/biology10111174
Dumakude, A. & Ezugwu, A. E. Automated COVID-19 detection with convolutional neural networks. Sci. Rep. 13(1), 10607 (2023).
pubmed: 37391527 pmcid: 10313722 doi: 10.1038/s41598-023-37743-4
Fabijańska, A. & Grabowski, S. Viral genome deep classifier. IEEE Access 7, 81297–81307 (2019).
doi: 10.1109/ACCESS.2019.2923687
Rincon, A. L. et al. Accurate identification of SARS-COV-2 from viral genome sequences using deep learning. BioRxiv (2020).
Shang, J. & Sun, Y. CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning. Methods 189, 95–103 (2021).
pubmed: 32454212 doi: 10.1016/j.ymeth.2020.05.018
Câmara, G. B. M. et al. Convolutional neural network applied to SARS-CoV-2 sequence classification. Sensors 22(15), 5730 (2022).
pubmed: 35957287 pmcid: 9371030 doi: 10.3390/s22155730
De Clercq, G. & Zuallaert, J. Deep learning for classification of DNA functional sequences. In Master of Science in Bioinformatics (2019).
Rozanov, M. et al. A web-based genotyping resource for viral sequences. Nucleic Acids Res. 32(suppl-2), W654–W659 (2004).
pubmed: 15215470 pmcid: 441557 doi: 10.1093/nar/gkh419

Auteurs

Ahmed M Fahmy (AM)

Computer Science program, School of Information Technology and Computer Science (ITCS), Nile University, Sheikh Zayed City, Egypt. studahmed91@gmail.com.

Muhammed S Hammad (MS)

Biomedical Engineering Department, Faculty of Engineering, Helwan University, Cairo, Egypt.

Mai S Mabrouk (MS)

Biomedical informatics program, School of Information Technology and Computer Science (ITCS), Nile University, Sheikh Zayed City, Egypt.

Walid I Al-Atabany (WI)

Biomedical informatics program, School of Information Technology and Computer Science (ITCS), Nile University, Sheikh Zayed City, Egypt.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH