Large-Scale Modeling of Sparse Protein Kinase Activity Data.
Journal
Journal of chemical information and modeling
ISSN: 1549-960X
Titre abrégé: J Chem Inf Model
Pays: United States
ID NLM: 101230060
Informations de publication
Date de publication:
26 06 2023
26 06 2023
Historique:
medline:
27
6
2023
pubmed:
9
6
2023
entrez:
9
6
2023
Statut:
ppublish
Résumé
Protein kinases are a protein family that plays an important role in several complex diseases such as cancer and cardiovascular and immunological diseases. Protein kinases have conserved ATP binding sites, which when targeted can lead to similar activities of inhibitors against different kinases. This can be exploited to create multitarget drugs. On the other hand, selectivity (lack of similar activities) is desirable in order to avoid toxicity issues. There is a vast amount of protein kinase activity data in the public domain, which can be used in many different ways. Multitask machine learning models are expected to excel for these kinds of data sets because they can learn from implicit correlations between tasks (in this case activities against a variety of kinases). However, multitask modeling of sparse data poses two major challenges: (i) creating a balanced train-test split without data leakage and (ii) handling missing data. In this work, we construct a protein kinase benchmark set composed of two balanced splits without data leakage, using random and dissimilarity-driven cluster-based mechanisms, respectively. This data set can be used for benchmarking and developing protein kinase activity prediction models. Overall, the performance on the dissimilarity-driven cluster-based split is lower than on random split-based sets for all models, indicating poor generalizability of models. Nevertheless, we show that multitask deep learning models, on this very sparse data set, outperform single-task deep learning and tree-based models. Finally, we demonstrate that data imputation does not improve the performance of (multitask) models on this benchmark set.
Identifiants
pubmed: 37294674
doi: 10.1021/acs.jcim.3c00132
pmc: PMC10302492
doi:
Substances chimiques
Proteins
0
Protein Kinases
EC 2.7.-
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
3688-3696Références
FEBS Lett. 1995 Aug 1;369(1):57-61
pubmed: 7641885
Nat Rev Drug Discov. 2021 Jul;20(7):551-569
pubmed: 34002056
Acc Chem Res. 2003 Jun;36(6):462-9
pubmed: 12809533
J Chem Inf Model. 2022 Jan 24;62(2):240-257
pubmed: 34905358
Chem Soc Rev. 2020 Jun 7;49(11):3525-3564
pubmed: 32356548
J Chem Inf Model. 2016 Jul 25;56(7):1237-42
pubmed: 27367556
J Chem Inf Model. 2013 Apr 22;53(4):783-90
pubmed: 23521722
J Chem Inf Model. 2016 Sep 26;56(9):1654-75
pubmed: 27482722
J Cheminform. 2023 Jan 6;15(1):3
pubmed: 36609528
J Chem Inf Model. 2014 Mar 24;54(3):735-43
pubmed: 24521231
J Chem Inf Model. 2017 Aug 28;57(8):2077-2088
pubmed: 28651433
ACS Omega. 2022 May 23;7(22):18374-18381
pubmed: 35694454
J Chem Inf Model. 2019 Mar 25;59(3):1197-1204
pubmed: 30753070
Biochim Biophys Acta. 2010 Mar;1804(3):440-4
pubmed: 19879387
Nat Commun. 2021 Jun 3;12(1):3307
pubmed: 34083538
Science. 2002 Dec 6;298(5600):1912-34
pubmed: 12471243
Mol Cancer Ther. 2008 Oct;7(10):3129-40
pubmed: 18852116
J Chem Inf Model. 2009 Feb;49(2):318-29
pubmed: 19434833
J Chem Inf Model. 2012 Apr 23;52(4):901-12
pubmed: 22414491
J Cheminform. 2017 Aug 14;9(1):45
pubmed: 29086168
J Health Econ. 2016 May;47:20-33
pubmed: 26928437
Curr Opin Cell Biol. 2002 Apr;14(2):230-6
pubmed: 11891123
J Chem Inf Model. 2019 Aug 26;59(8):3370-3388
pubmed: 31361484
J Biomed Inform. 2020 Aug;108:103484
pubmed: 32615159
Cell Syst. 2018 Sep 26;7(3):347-350.e1
pubmed: 30172842
Trends Biochem Sci. 2000 Dec;25(12):596-601
pubmed: 11116185
J Chem Inf Model. 2016 Dec 27;56(12):2353-2360
pubmed: 27958738
Drug Discov Today Technol. 2019 Dec;32-33:89-98
pubmed: 33386099
F1000Res. 2016 Jun 14;5:
pubmed: 27429748
J Cheminform. 2018 May 22;10(1):26
pubmed: 29789977
Nat Biotechnol. 2016 Jan;34(1):95-103
pubmed: 26501955
J Cheminform. 2022 Jun 7;14(1):32
pubmed: 35672779
Angew Chem Int Ed Engl. 2020 Aug 10;59(33):13764-13776
pubmed: 31889388