Deepvirusclassifier: a deep learning tool for classifying SARS-CoV-2 based on viral subtypes within the coronaviridae family.
Coronaviridae
Deep learning
SARS-CoV-2
Viral classification
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
05 Jul 2024
05 Jul 2024
Historique:
received:
23
08
2023
accepted:
19
03
2024
medline:
6
7
2024
pubmed:
6
7
2024
entrez:
5
7
2024
Statut:
epublish
Résumé
In this study, we present DeepVirusClassifier, a tool capable of accurately classifying Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) viral sequences among other subtypes of the coronaviridae family. This classification is achieved through a deep neural network model that relies on convolutional neural networks (CNNs). Since viruses within the same family share similar genetic and structural characteristics, the classification process becomes more challenging, necessitating more robust models. With the rapid evolution of viral genomes and the increasing need for timely classification, we aimed to provide a robust and efficient tool that could increase the accuracy of viral identification and classification processes. Contribute to advancing research in viral genomics and assist in surveilling emerging viral strains. Based on a one-dimensional deep CNN, the proposed tool is capable of training and testing on the Coronaviridae family, including SARS-CoV-2. Our model's performance was assessed using various metrics, including F1-score and AUROC. Additionally, artificial mutation tests were conducted to evaluate the model's generalization ability across sequence variations. We also used the BLAST algorithm and conducted comprehensive processing time analyses for comparison. DeepVirusClassifier demonstrated exceptional performance across several evaluation metrics in the training and testing phases. Indicating its robust learning capacity. Notably, during testing on more than 10,000 viral sequences, the model exhibited a more than 99% sensitivity for sequences with fewer than 2000 mutations. The tool achieves superior accuracy and significantly reduced processing times compared to the Basic Local Alignment Search Tool algorithm. Furthermore, the results appear more reliable than the work discussed in the text, indicating that the tool has great potential to revolutionize viral genomic research. DeepVirusClassifier is a powerful tool for accurately classifying viral sequences, specifically focusing on SARS-CoV-2 and other subtypes within the Coronaviridae family. The superiority of our model becomes evident through rigorous evaluation and comparison with existing methods. Introducing artificial mutations into the sequences demonstrates the tool's ability to identify variations and significantly contributes to viral classification and genomic research. As viral surveillance becomes increasingly critical, our model holds promise in aiding rapid and accurate identification of emerging viral strains.
Identifiants
pubmed: 38969970
doi: 10.1186/s12859-024-05754-1
pii: 10.1186/s12859-024-05754-1
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
231Informations de copyright
© 2024. The Author(s).
Références
Wang H, et al. The genetic sequence, origin, and diagnosis of SARS-CoV-2. Eur J Clin Microbiol Infect Dis. 2020;39:1–7.
doi: 10.1007/s10096-020-03899-4
Maghdid HS, Ghafoor KZ, Sadiq AS, Curran K, Rabie K. A novel AI-enabled framework to diagnose coronavirus COVID 19 using smartphone embedded sensors: design study; 2020. arXiv:2003.07434 .
Chowdhury MEH, et al. Can AI help in screening viral and COVID-19 pneumonia? IEEE Access. 2020;8:132665–76.
doi: 10.1109/ACCESS.2020.3010287
Toyoshima Y, Nemoto K, Matsumoto S, Nakamura Y, Kiyotani K. SARS-CoV-2 genomic variations associated with mortality rate of COVID-19. J Hum Genet. 2020;65:1075–82.
doi: 10.1038/s10038-020-0808-9
pubmed: 32699345
pmcid: 7375454
Remita MA, et al. A machine learning approach for viral genome classification. BMC Bioinform. 2017;18:1–11.
doi: 10.1186/s12859-017-1602-3
Cobbin JC, Charon J, Harvey E, Holmes EC, Mahar JE. Current challenges to virus discovery by meta-transcriptomics. Curr Opin Virol. 2021;51:48–55.
doi: 10.1016/j.coviro.2021.09.007
pubmed: 34592710
Lebatteux D, Remita AM, Diallo AB. Toward an alignment-free method for feature extraction and accurate classification of viral sequences. J Comput Biol. 2019;26:519–35.
doi: 10.1089/cmb.2018.0239
pubmed: 31050550
Zielezinski A, Vinga S, Almeida J, Karlowski WM. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol. 2017;18:1–17.
doi: 10.1186/s13059-017-1319-7
Nooij S, Schmitz D, Vennema H, Kroneman A, Koopmans MP. Overview of virus metagenomic classification methods and their biological applications. Front Microbiol. 2018;9:749.
doi: 10.3389/fmicb.2018.00749
pubmed: 29740407
pmcid: 5924777
Altschul SF, et al. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.
doi: 10.1093/nar/25.17.3389
pubmed: 9254694
pmcid: 146917
Vågene ÅJ, et al. Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico. Nat Ecol Evol. 2018;2:520–8.
doi: 10.1038/s41559-017-0446-6
pubmed: 29335577
Lipman DJ, Pearson WR. Rapid and sensitive protein similarity searches. Science. 1985;227:1435–41.
doi: 10.1126/science.2983426
pubmed: 2983426
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–80.
doi: 10.1093/nar/22.22.4673
pubmed: 7984417
pmcid: 308517
Edgar RC. Search and clustering orders of magnitude faster than blast. Bioinformatics. 2010;26:2460–1.
doi: 10.1093/bioinformatics/btq461
pubmed: 20709691
de Souza LC, Azevedo KS, de Souza JG, Barbosa RdM, Fernandes MA. New proposal of viral genome representation applied in the classification of SARS-CoV-2 with deep learning. BMC Bioinform. 2023;24:1–19.
doi: 10.1186/s12859-023-05188-1
Randhawa GS, et al. Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study. PLoS ONE. 2020;15: e0232391.
doi: 10.1371/journal.pone.0232391
pubmed: 32330208
pmcid: 7182198
Randhawa GS, Hill KA, Kari L. ML-DSP: machine learning with digital signal processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels. BMC Genom. 2019;20:267.
doi: 10.1186/s12864-019-5571-y
Ren J, Ahlgren NA, Lu YY, Fuhrman JA, Sun F. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome. 2017;5:1–20.
doi: 10.1186/s40168-017-0283-5
Coutinho MG, Câmara GB, Barbosa RdM, Fernandes MA. SARS-CoV-2 virus classification based on stacked sparse autoencoder. Comput Struct Biotechnol J. 2023;21:284–98.
doi: 10.1016/j.csbj.2022.12.007
pubmed: 36530948
Hu L, Yang Y, Tang Z, He Y, Luo X. FCAN-MOPSO: an improved fuzzy-based graph clustering algorithm for complex networks with multi-objective particle swarm optimization. IEEE Trans Fuzzy Syst; 2023.
Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6:94.
doi: 10.7861/futurehosp.6-2-94
pubmed: 31363513
pmcid: 6616181
Mottaqi MS, Mohammadipanah F, Sajedi H. Contribution of machine learning approaches in response to SARS-CoV-2 infection. Inform Med Unlocked. 2021;100526.
Park Y, Kellis M. Deep learning for regulatory genomics. Nat Biotechnol. 2015;33:825–6.
doi: 10.1038/nbt.3313
pubmed: 26252139
Lopez-Rincon A, et al. Accurate identification of SARS-CoV-2 from viral genome sequences using deep learning. bioRxiv; 2020.
Lalmuanawma S, Hussain J, Chhakchhuak L. Applications of machine learning and artificial intelligence for COVID-19 (SARS-CoV-2) pandemic: a review. Chaos Solitons Fractals. 2020;110059.
Eraslan G, Avsec Ž, Gagneur J, Theis FJ. Deep learning: new computational modelling techniques for genomics. Nat Rev Genet. 2019;20:389–403.
doi: 10.1038/s41576-019-0122-6
pubmed: 30971806
Zou J, et al. A primer on deep learning in genomics. Nat Genet. 2019;51:12–8.
doi: 10.1038/s41588-018-0295-5
pubmed: 30478442
Fabijańska A, Grabowski S. Viral genome deep classifier. IEEE Access. 2019;7:81297–307.
doi: 10.1109/ACCESS.2019.2923687
Tampuu A, Bzhalava Z, Dillner J, Vicente R. Viraminer: deep learning on raw DNA sequences for identifying viral genomes in human samples. PLoS ONE. 2019;14: e0222271.
doi: 10.1371/journal.pone.0222271
pubmed: 31509583
pmcid: 6738585
Whata A, Chimedza C. Deep learning for SARS CoV-2 genome sequences. IEEE Access. 2021;9:59597–611.
doi: 10.1109/ACCESS.2021.3073728
pubmed: 34812391
Adetiba E, et al. DeepCOVID-19: a model for identification of COVID-19 virus sequences with genomic signal processing and deep learning. Cogent Eng. 2022;9:2017580.
doi: 10.1080/23311916.2021.2017580
Gunasekaran H, et al. Analysis of DNA sequence classification using CNN and hybrid models. Comput Math Methods Med. 2021;2021.
Ren J, et al. Identifying viruses from metagenomic data using deep learning. Quant Biol. 2020;8:1–14.
doi: 10.1007/s40484-019-0187-4
NCBI. GenBank overview; 2020. https://www.ncbi.nlm.nih.gov/genbank/ .
Kingma DP, Ba J. Adam: a method for stochastic optimization; 2014. arXiv:1412.6980 .