USMPep: universal sequence models for major histocompatibility complex binding affinity prediction.
Binding affinity prediction
Language modeling
Major histocompatibility complex
Peptide data
Recurrent neural networks
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
02 Jul 2020
02 Jul 2020
Historique:
received:
14
11
2019
accepted:
23
06
2020
entrez:
4
7
2020
pubmed:
4
7
2020
medline:
15
8
2020
Statut:
epublish
Résumé
Immunotherapy is a promising route towards personalized cancer treatment. A key algorithmic challenge in this process is to decide if a given peptide (neoepitope) binds with the major histocompatibility complex (MHC). This is an active area of research and there are many MHC binding prediction algorithms that can predict the MHC binding affinity for a given peptide to a high degree of accuracy. However, most of the state-of-the-art approaches make use of complicated training and model selection procedures, are restricted to peptides of a certain length and/or rely on heuristics. We put forward USMPep, a simple recurrent neural network that reaches state-of-the-art approaches on MHC class I binding prediction with a single, generic architecture and even a single set of hyperparameters both on IEDB benchmark datasets and on the very recent HPV dataset. Moreover, the algorithm is competitive for a single model trained from scratch, while ensembling multiple regressors and language model pretraining can still slightly improve the performance. The direct application of the approach to MHC class II binding prediction shows a solid performance despite of limited training data. We demonstrate that competitive performance in MHC binding affinity prediction can be reached with a standard architecture and training procedure without relying on any heuristics.
Sections du résumé
BACKGROUND
BACKGROUND
Immunotherapy is a promising route towards personalized cancer treatment. A key algorithmic challenge in this process is to decide if a given peptide (neoepitope) binds with the major histocompatibility complex (MHC). This is an active area of research and there are many MHC binding prediction algorithms that can predict the MHC binding affinity for a given peptide to a high degree of accuracy. However, most of the state-of-the-art approaches make use of complicated training and model selection procedures, are restricted to peptides of a certain length and/or rely on heuristics.
RESULTS
RESULTS
We put forward USMPep, a simple recurrent neural network that reaches state-of-the-art approaches on MHC class I binding prediction with a single, generic architecture and even a single set of hyperparameters both on IEDB benchmark datasets and on the very recent HPV dataset. Moreover, the algorithm is competitive for a single model trained from scratch, while ensembling multiple regressors and language model pretraining can still slightly improve the performance. The direct application of the approach to MHC class II binding prediction shows a solid performance despite of limited training data.
CONCLUSIONS
CONCLUSIONS
We demonstrate that competitive performance in MHC binding affinity prediction can be reached with a standard architecture and training procedure without relying on any heuristics.
Identifiants
pubmed: 32615972
doi: 10.1186/s12859-020-03631-1
pii: 10.1186/s12859-020-03631-1
pmc: PMC7330990
doi:
Substances chimiques
Histocompatibility Antigens Class I
0
Histocompatibility Antigens Class II
0
Peptides
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
279Références
BMC Bioinformatics. 2010 Nov 22;11:568
pubmed: 21092157
PLoS Comput Biol. 2018 Nov 8;14(11):e1006457
pubmed: 30408041
J Immunol. 2013 Dec 15;191(12):5831-9
pubmed: 24190657
Bioinformatics. 2016 Feb 15;32(4):511-7
pubmed: 26515819
Nucleic Acids Res. 2019 Jan 8;47(D1):D339-D343
pubmed: 30357391
Brief Bioinform. 2019 Jun 14;:
pubmed: 31204427
Cell Syst. 2018 Jul 25;7(1):129-132.e4
pubmed: 29960884
Science. 2018 Mar 23;359(6382):1355-1360
pubmed: 29567706
Nat Rev Immunol. 2018 Mar;18(3):168-182
pubmed: 29226910
Immunogenetics. 2005 Apr;57(1-2):33-41
pubmed: 15744535
Nat Biotechnol. 2006 Jul;24(7):817-9
pubmed: 16767078
BMC Bioinformatics. 2019 May 28;20(1):270
pubmed: 31138107
Nat Biotechnol. 2015 Aug;33(8):831-8
pubmed: 26213851
BMC Bioinformatics. 2014 Jul 14;15:241
pubmed: 25017736
Nat Biomed Eng. 2019 Oct;3(10):768-782
pubmed: 31406259
Cancer Immunol Res. 2019 May;7(5):719-736
pubmed: 30902818
Bioinformatics. 2020 Apr 15;36(8):2401-2409
pubmed: 31913448
J Immunol. 2017 Nov 1;199(9):3360-3368
pubmed: 28978689
Science. 2015 Apr 3;348(6230):69-74
pubmed: 25838375
BMC Bioinformatics. 2009 Nov 30;10:394
pubmed: 19948066
Brief Bioinform. 2018 Mar 1;19(2):231-244
pubmed: 27881430