On leveraging self-supervised learning for accurate HCV genotyping.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
05 Jul 2024
05 Jul 2024
Historique:
received:
11
03
2024
accepted:
06
06
2024
medline:
5
7
2024
pubmed:
5
7
2024
entrez:
4
7
2024
Statut:
epublish
Résumé
Hepatitis C virus (HCV) is a major global health concern, affecting millions of individuals worldwide. While existing literature predominantly focuses on disease classification using clinical data, there exists a critical research gap concerning HCV genotyping based on genomic sequences. Accurate HCV genotyping is essential for patient management and treatment decisions. While the neural models excel at capturing complex patterns, they still face challenges, such as data scarcity, that exist a lot in computational genomics. To overcome this challenges, this paper introduces an advanced deep learning approach for HCV genotyping based on the graphical representation of nucleotide sequences that outperforms classical approaches. Notably, it is effective for both partial and complete HCV genomes and addresses challenges associated with imbalanced datasets. In this work, ten HCV genotypes: 1a, 1b, 2a, 2b, 2c, 3a, 3b, 4, 5, and 6 were used in the analysis. This study utilizes Chaos Game Representation for 2D mapping of genomic sequences, employing self-supervised learning using convolutional autoencoder for deep feature extraction, resulting in an outstanding performance for HCV genotyping compared to various machine learning and deep learning models. This baseline provides a benchmark against which the performance of the proposed approach and other models can be evaluated. The experimental results showcase a remarkable classification accuracy of over 99%, outperforming traditional deep learning models. This performance demonstrates the capability of the proposed model to accurately identify HCV genotypes in both partial and complete sequences and in dealing with data scarcity for certain genotypes. The results of the proposed model are compared to NCBI genotyping tool.
Identifiants
pubmed: 38965254
doi: 10.1038/s41598-024-64209-y
pii: 10.1038/s41598-024-64209-y
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
15463Informations de copyright
© 2024. The Author(s).
Références
Petruzziello, A. et al. Global epidemiology of hepatitis C virus infection: An up-date of the distribution and circulation of hepatitis C virus genotypes. World J. Gastroenterol. 22(34), 7824 (2016).
pubmed: 27678366
pmcid: 5016383
doi: 10.3748/wjg.v22.i34.7824
Kowala-Piaskowska, A. Wirus zapalenie watroby typu C-budowa i replikacja a mozliwosci terapeutyczne i zjawisko opornosci. Postȩpy Biochemii 52(4), 399–407 (2006).
pubmed: 17536509
Irenam, A. I. E. R. & George, Y. W. Hepatitis C and HIV co-infection: A review. World J. Gastroenterol. 8(4), 577–579 (2002).
doi: 10.3748/wjg.v8.i4.577
Wyles, D. L., Sulkowski, M. S. & Dieterich, D. Management of hepatitis C/HIV coinfection in the era of highly effective hepatitis C virus direct-acting antiviral therapy. Clin. Infect. Dis. 63(suppl-1), S3–S11 (2016).
pubmed: 27363438
pmcid: 4928450
doi: 10.1093/cid/ciw219
Benhamou, Y. et al. Liver fibrosis progression in human immunodeficiency virus and hepatitis C virus coinfected patients. Hepatology 30(4), 1054–1058 (1999).
pubmed: 10498659
doi: 10.1002/hep.510300409
Sierra, C. M. et al. Progression of chronic hepatitis C to liver fibrosis and cirrhosis in patients coinfected with hepatitis C virus and human immunodeficiency virus. Clin. Infect. Dis. 36(4), 491–498 (2003).
doi: 10.1086/367643
Martın-Carbonero, L. et al. Increasing impact of chronic viral hepatitis on hospital admissions and mortality among HIV-infected patients. AIDS Res. Hum. Retroviruses 17(16), 1467–1471 (2001).
pubmed: 11709090
doi: 10.1089/08892220152644160
Daniel, H.D.-J. et al. Comparison of three different hepatitis C virus genotyping methods: 5 NCR PCR-RFLP, core type-specific PCR, and NS 5b sequencing in a Tertiary Care Hospital in South India. J. Clin. Lab. Anal. 31(3), e22045 (2017).
pubmed: 27580956
doi: 10.1002/jcla.22045
Asselah, T. et al. Eliminating hepatitis C within low-income countries-The need to cure genotypes 4, 5, 6. J. Hepatol. 68(4), 814–826 (2018).
pubmed: 29229584
doi: 10.1016/j.jhep.2017.11.037
Hedskog, C. et al. Identification of 19 novel hepatitis C virus subtypes–further expanding HCV classification. Open Forum Infect. Dis. 6(3), ofz076 (2019).
pubmed: 30949527
pmcid: 6440686
doi: 10.1093/ofid/ofz076
Bruno, S. et al. Hepatitis C virus genotypes and risk of hepatocellular carcinoma in cirrhosis: A prospective study. Hepatology 25(3), 754–758 (1997).
pubmed: 9049231
doi: 10.1002/hep.510250344
Liu, C.-H. & Kao, J.-H. Pan-genotypic direct-acting antivirals for patients with hepatitis C virus infection and chronic kidney disease stage 4 or 5. Hep. Intl. 16(5), 1001–1019 (2022).
doi: 10.1007/s12072-022-10390-z
Balk, E. M. et al. A systematic review of direct-acting antivirals for hepatitis C in advanced CKD. Kidney Int. Rep. 8(2), 240–253 (2023).
pubmed: 36815114
doi: 10.1016/j.ekir.2022.11.008
Nolte, F. S. et al. Clinical evaluation of two methods for genotyping hepatitis C virus based on analysis of the 5 noncoding region. J. Clin. Microbiol. 41(4), 1558–1564 (2003).
pubmed: 12682145
pmcid: 153875
doi: 10.1128/JCM.41.4.1558-1564.2003
Shahid, I. et al. Hepatitis C diagnosis: Simplified solutions, predictive barriers, and future promises. Diagnostics 11(7), 1253 (2021).
pubmed: 34359335
pmcid: 8305142
doi: 10.3390/diagnostics11071253
Sohn, Y.-H. et al. Performance evaluation of the Abbott RealTi me HCV Genotype II for hepatitis C virus genotyping. Clin. Chem. Lab. Med. 48(4), 469–474 (2010).
pubmed: 20128734
doi: 10.1515/CCLM.2010.093
Ai, T. et al. Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: A report of 1014 cases. Radiology 296(2), E32–E40 (2020).
pubmed: 32101510
doi: 10.1148/radiol.2020200642
Udugama, B. et al. Diagnosing COVID-19: The disease and tools for detection. ACS Nano 14(4), 3822–3835 (2020).
pubmed: 32223179
doi: 10.1021/acsnano.0c02624
Naseem, R. et al. Performance assessment of classification algorithms on early detection of liver syndrome. J. Healthc. Eng. 1, 6680002 (2020).
Jadhav, D. A. An enhanced and secured predictive model of Ada-Boost and Random-Forest techniques in HCV detections. Mater. Today Proc. 51, 186–195 (2022).
doi: 10.1016/j.matpr.2021.05.071
Akter, L. Detection of hepatitis C virus progressed patient’s liver condition using machine learning. In International Conference on Innovative Computing and Communications: Proceedings of ICICC 2021, Vol. 1, 71–80 (Springer, 2022).
Edeh, M. O. et al. Artificial intelligence-based ensemble learning model for prediction of hepatitis C disease. Front. Public Health 10, 892371 (2022).
pubmed: 35570979
pmcid: 9092454
doi: 10.3389/fpubh.2022.892371
Safdari, R. et al. Applying data mining techniques to classify patients with suspected hepatitis C virus infection. Intell. Med. 2(04), 193–198 (2022).
doi: 10.1016/j.imed.2021.12.003
Alizargar, A., Chang, Y.-L. & Tan, T.-H. Performance comparison of machine learning approaches on hepatitis C prediction employing data mining techniques. Bioengineering 10(4), 481 (2023).
pubmed: 37106668
pmcid: 10135598
doi: 10.3390/bioengineering10040481
Lilhore, U. K. et al. Hybrid model for precise hepatitis-C classification using improved random forest and SVM method. Sci. Rep. 13(1), 12473 (2023).
pubmed: 37528148
pmcid: 10394001
doi: 10.1038/s41598-023-36605-3
Li, C. Predictors selection strategy based on stepwise random forests and logistic regression model. In International Conference on Statistics, Data Science, and Computational Intelligence (CSDSCI 2022). Vol. 12510, 251–256, (SPIE, 2023).
Fan, Y., Lu, X. & Sun, G. IHCP: Interpretable hepatitis C prediction system based on black-box machine learning models. BMC Bioinform. 24(1), 333 (2023).
doi: 10.1186/s12859-023-05456-0
Qiu, P. et al. HCV genotyping using statistical classification approach. J. Biomed. Sci. 16, 1–9 (2009).
doi: 10.1186/1423-0127-16-62
Tanchotsrinon, W., Lursinsap, C. & Poovorawan, Y. A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition. BMC Bioinform. 16, 1–13 (2015).
doi: 10.1186/s12859-015-0493-4
Hammad, M. S. et al. A hybrid deep learning approach for COVID-19 detection based on genomic image processing techniques. Sci. Rep. 13(1), 4003 (2023).
pubmed: 36899035
pmcid: 9999081
doi: 10.1038/s41598-023-30941-0
Kuiken, C. et al. The Los Alamos hepatitis C sequence database. Bioinformatics 21(3), 379–384 (2005).
pubmed: 15377502
doi: 10.1093/bioinformatics/bth485
Kuiken, C. et al. The hepatitis C sequence database in Los Alamos. Nucleic Acids Res. 36(suppl-1), D512–D516 (2007).
pubmed: 18025038
pmcid: 2238885
doi: 10.1093/nar/gkm962
Jeffrey, H. J. Chaos game representation of gene structure. Nucleic Acids Res. 18(8), 2163–2170 (1990).
pubmed: 2336393
pmcid: 330698
doi: 10.1093/nar/18.8.2163
Almeida, J. S. et al. Analysis of genomic sequences by Chaos Game Representation. Bioinformatics 17(5), 429–437 (2001).
pubmed: 11331237
doi: 10.1093/bioinformatics/17.5.429
Tanchotsrinon, W., Lursinsap, C. & Poovorawan, Y. An efficient prediction of HPV genotypes from partial coding sequences by Chaos Game Representation and fuzzy k-nearest neighbor technique. Curr. Bioinform. 12(5), 431–440 (2017).
doi: 10.2174/1574893611666161110112006
Anitas, E. M. Fractal analysis of DNA sequences using frequency chaos game representation and small-angle scattering. Int. J. Mol. Sci. 23(3), 1847 (2022).
pubmed: 35163771
pmcid: 8836744
doi: 10.3390/ijms23031847
Yu, Z.-G., Anh, V. & Lau, K.-S. Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses. J. Theor. Biol. 226(3), 341–348 (2004).
pubmed: 14643648
doi: 10.1016/j.jtbi.2003.09.009
Ge, L. et al. Identifying anticancer peptides by using a generalized chaos game representation. J. Math. Biol. 78, 441–463 (2019).
pubmed: 30291366
doi: 10.1007/s00285-018-1279-x
Löchel, H. F. et al. Deep learning on chaos game representation for proteins. Bioinformatics 36(1), 272–279 (2020).
pubmed: 31225868
doi: 10.1093/bioinformatics/btz493
Sun, Z. et al. A novel numerical representation for proteins: Three-dimensional chaos game representation and its extended natural vector. Comput. Struct. Biotechnol. J. 18, 1904–1913 (2020).
pubmed: 32774785
pmcid: 7390779
doi: 10.1016/j.csbj.2020.07.004
Huang, B. et al. Sequence-based optimized chaos game representation and deep learning for peptide/protein classification. BioRxiv, 2022–09 (2022).
Zervou, M. A., Doutsi, E. & Tsakalides, P. Efficient protein structural class prediction via chaos game representation and recurrent neural networks. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2023).
Deschavanne, P. J. et al. Genomic signature: Characterization and classification of species assessed by chaos game representation of sequences. Mol. Biol. Evol. 16(10), 1391–1399 (1999).
pubmed: 10563018
doi: 10.1093/oxfordjournals.molbev.a026048
Joseph, J. & Sasikumar, R. Chaos game representation for comparison of whole genomes. BMC Bioinform. 7, 1–10 (2006).
doi: 10.1186/1471-2105-7-243
Karamichalis, R. et al. An investigation into inter-and intragenomic variations of graphic genomic signatures. BMC Bioinform. 16, 1–22 (2015).
doi: 10.1186/s12859-015-0655-4
Karamichalis, R. et al. Additive methods for genomic signatures. BMC Bioinform. 17, 1–18 (2016).
doi: 10.1186/s12859-016-1157-8
Chou, K.-C. Graphic rule for drug metabolism systems. Curr. Drug Metab. 11(4), 369–378 (2010).
pubmed: 20446902
doi: 10.2174/138920010791514261
Jha, K., Saha, S. & Singh, H. Prediction of protein-protein interaction using graph neural networks. Sci. Rep. 12(1), 8360 (2022).
pubmed: 35589837
pmcid: 9120162
doi: 10.1038/s41598-022-12201-9
Caron, M. et al. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 9650–9660 (2021).
Silva, G. L. F. D. et al. Convolutional neural network-based PSO for lung nodule false positive reduction on CT images. Comput. Methods Programs Biomed. 162, 109–118 (2018).
pubmed: 29903476
doi: 10.1016/j.cmpb.2018.05.006
Diniz, P. H. B. et al. Detection of white matter lesion regions in MRI using SLIC0 and convolutional neural network. Comput. Methods Programs Biomed. 167, 49–63 (2018).
pubmed: 29706405
doi: 10.1016/j.cmpb.2018.04.011
Akter, S. et al. COVID-19 detection using deep learning algorithm on chest X-ray images. Biology 10(11), 1174 (2021).
pubmed: 34827167
pmcid: 8614951
doi: 10.3390/biology10111174
Dumakude, A. & Ezugwu, A. E. Automated COVID-19 detection with convolutional neural networks. Sci. Rep. 13(1), 10607 (2023).
pubmed: 37391527
pmcid: 10313722
doi: 10.1038/s41598-023-37743-4
Fabijańska, A. & Grabowski, S. Viral genome deep classifier. IEEE Access 7, 81297–81307 (2019).
doi: 10.1109/ACCESS.2019.2923687
Rincon, A. L. et al. Accurate identification of SARS-COV-2 from viral genome sequences using deep learning. BioRxiv (2020).
Shang, J. & Sun, Y. CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning. Methods 189, 95–103 (2021).
pubmed: 32454212
doi: 10.1016/j.ymeth.2020.05.018
Câmara, G. B. M. et al. Convolutional neural network applied to SARS-CoV-2 sequence classification. Sensors 22(15), 5730 (2022).
pubmed: 35957287
pmcid: 9371030
doi: 10.3390/s22155730
De Clercq, G. & Zuallaert, J. Deep learning for classification of DNA functional sequences. In Master of Science in Bioinformatics (2019).
Rozanov, M. et al. A web-based genotyping resource for viral sequences. Nucleic Acids Res. 32(suppl-2), W654–W659 (2004).
pubmed: 15215470
pmcid: 441557
doi: 10.1093/nar/gkh419