Leveraging multiple data types for improved compound-kinase bioactivity prediction.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
31 Aug 2024
31 Aug 2024
Historique:
received:
08
03
2024
accepted:
21
08
2024
medline:
1
9
2024
pubmed:
1
9
2024
entrez:
31
8
2024
Statut:
epublish
Résumé
Machine learning provides efficient ways to map compound-kinase interactions. However, diverse bioactivity data types, including single-dose and multi-dose-response assay results, present challenges. Traditional models utilize only multi-dose data, overlooking information contained in single-dose measurements. Here, we propose a machine learning methodology for compound-kinase activity prediction that leverages both single-dose and dose-response data. We demonstrate that our two-stage approach yields accurate activity predictions and significantly improves model performance compared to training solely on dose-response labels. This superior performance is consistent across five diverse machine learning methods. Using the best performing model, we carried out extensive experimental profiling on a total of 347 selected compound-kinase pairs, achieving a high hit rate of 40% and a negative predictive value of 78%. We show that these rates can be improved further by incorporating model uncertainty estimates into the compound selection process. By integrating multiple activity data types, we demonstrate that our approach holds promise for facilitating the development of training activity datasets in a more efficient and cost-effective way.
Identifiants
pubmed: 39217147
doi: 10.1038/s41467-024-52055-5
pii: 10.1038/s41467-024-52055-5
doi:
Substances chimiques
Protein Kinase Inhibitors
0
Phosphotransferases
EC 2.7.-
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
7596Informations de copyright
© 2024. The Author(s).
Références
Cortés-Ciriano, I. et al. Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. Med. Chem. Commun. 6, 24–50 (2015).
doi: 10.1039/C4MD00216D
Cichońska, A. et al. Crowdsourced mapping of unexplored target space of kinase inhibitors. Nat. Commun. 12, 3307 (2021).
doi: 10.1038/s41467-021-23165-1
pubmed: 34083538
pmcid: 8175708
Du, B. X. et al. Compound-protein interaction prediction by deep learning: databases, descriptors and models. Drug Discov. Today 27, 1350–1366 (2022).
doi: 10.1016/j.drudis.2022.02.023
pubmed: 35248748
De Simone, G., Sardina, D. S., Gulotta, M. R. & Perricone, U. KUALA: a machine learning-driven framework for kinase inhibitors repositioning. Sci. Rep. 12, 17877 (2022).
doi: 10.1038/s41598-022-22324-8
pubmed: 36284125
pmcid: 9595087
Born, J., Huynh, T., Stroobants, A., Cornell, W. D. & Manica, M. Active site sequence representations of human kinases outperform full sequence representations for affinity prediction and inhibitor generation: 3D effects in a 1D model. J. Chem. Inf. Model. 62, 240–257 (2021).
doi: 10.1021/acs.jcim.1c00889
pubmed: 34905358
Thafar, M. A. et al. Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning. Sci. Rep. 12, 4751 (2022).
doi: 10.1038/s41598-022-08787-9
pubmed: 35306525
pmcid: 8934358
Martin, E. & Mukherjee, P. Kinase-kernel models: accurate in silico screening of 4 million compounds across the entire human kinome. J. Chem. Inf. Model. 52, 156–170 (2012).
doi: 10.1021/ci200314j
pubmed: 22133092
Nascimento, A. C., Prudêncio, R. B. & Costa, I. G. A multiple kernel learning algorithm for drug-target interaction prediction. BMC Bioinformatics 17, 1–16 (2016).
doi: 10.1186/s12859-016-0890-3
Cichonska, A. et al. Computational-experimental approach to drug-target interaction mapping: a case study on kinase inhibitors. PLoS Comput. Biol. 13, 1005678 (2017).
doi: 10.1371/journal.pcbi.1005678
Cichonska, A. et al. Learning with multiple pairwise kernels for drug bioactivity prediction. Bioinformatics 34, 509–518 (2018).
doi: 10.1093/bioinformatics/bty277
Öztürk, H., Özgür, A. & Ozkirimli, E. DeepDTA: deep drug-target binding affinity prediction. Bioinformatics 34, 821–829 (2018).
doi: 10.1093/bioinformatics/bty593
Kalemati, M., Zamani Emani, M. & Koohi, S. BiComp-DTA: Drug-target binding affinity prediction through complementary biological-related and compression-based featurization approach. PLoS Comput. Biol. 19, 1011036 (2023).
doi: 10.1371/journal.pcbi.1011036
Luo, Y., Liu, Y. & Peng, J. Calibrated geometric deep learning improves kinase-drug binding predictions. Nat. Mach. Intell. 5, 1390–1401 (2023).
doi: 10.1038/s42256-023-00751-0
pubmed: 38962391
pmcid: 11221792
Singh, R., Sledzieski, S., Bryson, B., Cowen, L. & Berger, B. Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc. Natl. Acad. Sci. USA 120, 2220778120 (2023).
doi: 10.1073/pnas.2220778120
David, L., Thakkar, A., Mercado, R. & Engkvist, O. Molecular representations in AI-driven drug discovery: a review and practical guide. J. Cheminform. 12, 1–22 (2020).
doi: 10.1186/s13321-020-00460-5
Kanev, G. K. et al. Predicting the target landscape of kinase inhibitors using 3D convolutional neural networks. PLoS Comput. Biol. 19, 1011301 (2023).
doi: 10.1371/journal.pcbi.1011301
Park, H. et al. AiKPro: deep learning model for kinome-wide bioactivity profiling using structure-based sequence alignments and molecular 3D conformer ensemble descriptors. Sci. Rep. 13, 10268 (2023).
doi: 10.1038/s41598-023-37456-8
pubmed: 37355672
pmcid: 10290719
Liu, C., Kutchukian, P., Nguyen, N. D., AlQuraishi, M. & Sorger, P. K. A hybrid structure-based machine learning approach for predicting kinase inhibition by small molecules. J. Chem. Inf. Model. 63, 5457–5472 (2023).
doi: 10.1021/acs.jcim.3c00347
pubmed: 37595065
pmcid: 10498990
Li, S. et al. PocketAnchor: Learning structure-based pocket representations for protein-ligand interaction prediction. Cell Syst. 14, 692–705 (2023).
doi: 10.1016/j.cels.2023.05.005
pubmed: 37516103
Elnaggar, A. et al. Prottrans: Toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2021).
doi: 10.1109/TPAMI.2021.3095381
Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47, 930–940 (2019).
doi: 10.1093/nar/gky1075
Zhu, T. et al. Hit identification and optimization in virtual screening: Practical recommendations based on a critical literature analysis. J. Med. Chem. 56, 6560–6572 (2013).
doi: 10.1021/jm301916b
pubmed: 23688234
pmcid: 3772997
Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nat. 566, 224–229 (2019).
doi: 10.1038/s41586-019-0917-9
Kim, S. et al. PubChem 2023 update. Nucleic Acids Res. 51, 1373–1380 (2023).
doi: 10.1093/nar/gkac956
Schölkopf, B., Smola, A. J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT Press, Cambridge, 2002)
Pahikkala, T., Airola, A., Stock, M., De Baets, B. & Waegeman, W. Efficient regularized least-squares algorithms for conditional ranking on relational data. Machine Learning 93, 321–356 (2013).
doi: 10.1007/s10994-013-5354-7
Breiman, L. Random forests. Machine Learning 45, 5–32 (2001).
doi: 10.1023/A:1010933404324
Davis, M. I. et al. Comprehensive analysis of kinase inhibitor selectivity. Nat. Biotechnol. 29, 1046–1051 (2011).
doi: 10.1038/nbt.1990
pubmed: 22037378
Rasmussen, C. E., Williams& C. K. Gaussian Processes for Machine Learning (MIT Press, Cambridge (2006)
Berginski, M. E. et al. The Dark Kinase Knowledgebase: an online compendium of knowledge and experimental results of understudied kinases. Nucleic Acids Res. 49, 529–535 (2021).
doi: 10.1093/nar/gkaa853
Bender, A. et al. Evaluation guidelines for machine learning tools in the chemical sciences. Nat. Rev. Chem. 6, 428–442 (2022).
doi: 10.1038/s41570-022-00391-9
pubmed: 37117429
Ong, W. J. G., Kirubakaran, P., Karanicolas, J. Poor generalization by current deep learning models for predicting binding affinities of kinase inhibitors. Preprint at https://www.biorxiv.org/content/10.1101/2023.09.04.556234v1 (2023).
Anastassiadis, T., Deacon, S. W., Devarajan, K., Ma, H. & Peterson, J. R. Comprehensive assay of kinase catalytic activity reveals features of kinase inhibitor selectivity. Nat. Biotechnol. 29, 1039–1045 (2011).
doi: 10.1038/nbt.2017
pubmed: 22037377
pmcid: 3230241
Metz, J. T. et al. Navigating the kinome. Nat. Chem. Biol. 7, 200–202 (2011).
doi: 10.1038/nchembio.530
pubmed: 21336281
Landrum, G. A., Riniker, S. Combining IC50 or Ki values from different sources is a source of significant noise. J. Chem. Inf. Model. 64, 1560–1567 (2024).
Bento, A. P. et al. An open source chemical structure curation pipeline using RDKit. J. Cheminform. 12, 1–16 (2020).
doi: 10.1186/s13321-020-00456-1
Park, R. et al. Preference optimization for molecular language models. Preprint at https://arxiv.org/abs/2310.12304 (2023).
Kanev, G. K., de Graaf, C., Westerman, B. A., de Esch, I. J. & Kooistra, A. J. KLIFS: an overhaul after the first 5 years of supporting kinase research. Nucleic Acids Res. 49, 562–569 (2021).
doi: 10.1093/nar/gkaa895
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Fabian, M. A. et al. A small molecule-kinase interaction map for clinical kinase inhibitors. Nat. Biotechnol. 23, 329–336 (2005).
doi: 10.1038/nbt1068
pubmed: 15711537
Hill, A. V. The possible effects of the aggregation of the molecules of hemoglobin on its dissociation curves. J. Physiol. 40, iv–vii (1910).
Levenberg, K. A method for the solution of certain non-linear problems in least squares. Q. Appl. Math. 2, 164–168 (1944).
doi: 10.1090/qam/10666
Theisen, R., Wang, T., Ravikumar, B., Rahman, R. & Cichońska, A. Leveraging multiple data types for improved compound-kinase bioactivity prediction. Zenodo https://doi.org/10.5281/zenodo.12806494 (2024).