Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets.

COVID-19 SARS-CoV-2 T-cell epitope ensemble learning machine learning peptide-based vaccines random forest voting ensemble

Journal

Diagnostics (Basel, Switzerland)
ISSN: 2075-4418
Titre abrégé: Diagnostics (Basel)
Pays: Switzerland
ID NLM: 101658402

Informations de publication

Date de publication:
26 10 2021
Historique:
received: 13 09 2021
revised: 20 10 2021
accepted: 21 10 2021
entrez: 27 11 2021
pubmed: 28 11 2021
medline: 28 11 2021
Statut: epublish

Résumé

An ongoing outbreak of coronavirus disease 2019 (COVID-19), caused by a single-stranded RNA virus called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has caused a worldwide pandemic that continues to date. Vaccination has proven to be the most effective technique, by far, for the treatment of COVID-19 and to combat the outbreak. Among all vaccine types, epitope-based peptide vaccines have received less attention and hold a large untapped potential for boosting vaccine safety and immunogenicity. Peptides used in such vaccine technology are chemically synthesized based on the amino acid sequences of antigenic proteins (T-cell epitopes) of the target pathogen. Using wet-lab experiments to identify antigenic proteins is very difficult, expensive, and time-consuming. We hereby propose an ensemble machine learning (ML) model for the prediction of T-cell epitopes (also known as immune relevant determinants or antigenic determinants) against SARS-CoV-2, utilizing physicochemical properties of amino acids. To train the model, we retrieved the experimentally determined SARS-CoV-2 T-cell epitopes from Immune Epitope Database and Analysis Resource (IEDB) repository. The model so developed achieved accuracy, AUC (Area under the ROC curve), Gini, specificity, sensitivity, F-score, and precision of 98.20%, 0.991, 0.994, 0.971, 0.982, 0.990, and 0.981, respectively, using a test set consisting of SARS-CoV-2 peptides (T-cell epitopes and non-epitopes) obtained from IEDB. The average accuracy of 97.98% was recorded in repeated 5-fold cross validation. Its comparison with 05 robust machine learning classifiers and existing T-cell epitope prediction techniques, such as NetMHC and CTLpred, suggest the proposed work as a better model. The predicted epitopes from the current model could possess a high probability to act as potential peptide vaccine candidates subjected to in vitro and in vivo scientific assessments. The model developed would help scientific community working in vaccine development save time to screen the active T-cell epitope candidates of SARS-CoV-2 against the inactive ones.

Identifiants

pubmed: 34829338
pii: diagnostics11111990
doi: 10.3390/diagnostics11111990
pmc: PMC8617960
pii:
doi:

Types de publication

Journal Article

Langues

eng

Subventions

Organisme : Kuwait Foundation for Advancement of Sciences (KFAS)
ID : PR19-13NH-04

Références

Immunogenetics. 2009 Jan;61(1):1-13
pubmed: 19002680
Immunogenetics. 2005 Apr;57(1-2):33-41
pubmed: 15744535
Immunity. 2019 Oct 15;51(4):766-779.e17
pubmed: 31495665
Immunity. 2021 May 11;54(5):1055-1065.e5
pubmed: 33945786
Nucleic Acids Res. 2020 Jul 2;48(W1):W449-W454
pubmed: 32406916
Expert Rev Vaccines. 2009 Jul;8(7):887-98
pubmed: 19538115
Immunity. 2017 Feb 21;46(2):315-326
pubmed: 28228285
Front Immunol. 2018 Apr 09;9:678
pubmed: 29686673
Cell Host Microbe. 2020 Apr 8;27(4):671-680.e2
pubmed: 32183941
Lancet. 2020 Feb 15;395(10223):497-506
pubmed: 31986264
Front Immunol. 2020 Jul 28;11:1784
pubmed: 32849643
Immunogenetics. 2010 Jun;62(6):357-68
pubmed: 20379710
Biomed Res Int. 2017;2017:6340316
pubmed: 28744468
Nat Biotechnol. 2019 Nov;37(11):1332-1343
pubmed: 31611695
Vaccine. 2016 Apr 12;34(17):2008-14
pubmed: 26954467
Nature. 2021 Jul;595(7865):17-18
pubmed: 34158664
Gene Rep. 2020 Jun;19:100682
pubmed: 32300673
Protein Sci. 2003 May;12(5):1007-17
pubmed: 12717023
PLoS One. 2007 Aug 29;2(8):e796
pubmed: 17726526
Nat Commun. 2022 Jan 24;13(1):460
pubmed: 35075154
Cell. 2021 Aug 19;184(17):4401-4413.e10
pubmed: 34265281
BMC Bioinformatics. 2007 Oct 31;8:424
pubmed: 17973982
BMJ. 2021 Jun 15;373:n1513
pubmed: 34130949
Nat Rev Immunol. 2008 Apr;8(4):247-58
pubmed: 18323851
Sci Rep. 2020 Aug 25;10(1):14179
pubmed: 32843695
NPJ Vaccines. 2021 May 13;6(1):71
pubmed: 33986292
BMC Bioinformatics. 2002 Sep 11;3:25
pubmed: 12225620
Semin Immunopathol. 2017 Jul;39(5):529-539
pubmed: 28466096
NPJ Vaccines. 2020 Mar 6;5(1):18
pubmed: 32194995
Trends Microbiol. 2016 Jun;24(6):490-502
pubmed: 27012512
Immunology. 2018 Jul;154(3):394-406
pubmed: 29315598
Vaccine. 2004 Aug 13;22(23-24):3195-204
pubmed: 15297074
Nucleic Acids Res. 2019 Jan 8;47(D1):D339-D343
pubmed: 30357391
Nat Microbiol. 2020 Apr;5(4):536-544
pubmed: 32123347
Front Immunol. 2014 Apr 16;5:171
pubmed: 24795718
Nat Rev Drug Discov. 2020 Oct;19(10):667-668
pubmed: 32887942
Front Immunol. 2020 Jul 10;11:1663
pubmed: 32754160
Cell Syst. 2018 Jul 25;7(1):129-132.e4
pubmed: 29960884
J Med Virol. 2020 May;92(5):495-500
pubmed: 32022276
Immunogenetics. 2013 Oct;65(10):711-24
pubmed: 23900783
J Healthc Eng. 2021 Oct 1;2021:9591670
pubmed: 34631001
Adv Drug Deliv Rev. 2021 Apr;171:29-47
pubmed: 33465451
Nature. 2020 Jul;583(7816):437-440
pubmed: 32434211

Auteurs

Syed Nisar Hussain Bukhari (SNH)

University Institute of Computing, Chandigarh University, NH-95, Chandigarh-Ludhiana Highway, Mohali 140413, India.

Amit Jain (A)

University Institute of Computing, Chandigarh University, NH-95, Chandigarh-Ludhiana Highway, Mohali 140413, India.

Ehtishamul Haq (E)

Department of Biotechnology, University of Kashmir, Srinagar 190006, India.

Abolfazl Mehbodniya (A)

Department of Electronics and Communication Engineering, Kuwait College of Science and Technology, Kuwait City 13133, Kuwait.

Julian Webber (J)

Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka 560-8531, Japan.

Classifications MeSH