Leveraging multiple data types for improved compound-kinase bioactivity prediction.

Machine Learning Humans Protein Kinase Inhibitors / pharmacology Dose-Response Relationship, Drug Phosphotransferases / metabolism Algorithms Drug Discovery / methods

Journal

Nature communications

ISSN: 2041-1723

Titre abrégé: Nat Commun

Pays: England

ID NLM: 101528555

Informations de publication

Date de publication:
31 Aug 2024

Historique:

received: 08 03 2024

accepted: 21 08 2024

medline: 1 9 2024

pubmed: 1 9 2024

entrez: 31 8 2024

Statut: epublish

Résumé

Machine learning provides efficient ways to map compound-kinase interactions. However, diverse bioactivity data types, including single-dose and multi-dose-response assay results, present challenges. Traditional models utilize only multi-dose data, overlooking information contained in single-dose measurements. Here, we propose a machine learning methodology for compound-kinase activity prediction that leverages both single-dose and dose-response data. We demonstrate that our two-stage approach yields accurate activity predictions and significantly improves model performance compared to training solely on dose-response labels. This superior performance is consistent across five diverse machine learning methods. Using the best performing model, we carried out extensive experimental profiling on a total of 347 selected compound-kinase pairs, achieving a high hit rate of 40% and a negative predictive value of 78%. We show that these rates can be improved further by incorporating model uncertainty estimates into the compound selection process. By integrating multiple activity data types, we demonstrate that our approach holds promise for facilitating the development of training activity datasets in a more efficient and cost-effective way.

Identifiants

DOI: 10.1038/s41467-024-52055-5 PMID: 39217147

pubmed: 39217147

doi: 10.1038/s41467-024-52055-5

pii: 10.1038/s41467-024-52055-5

doi:

Substances chimiques

Protein Kinase Inhibitors 0

Phosphotransferases EC 2.7.-

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

7596

Informations de copyright

Références

Cortés-Ciriano, I. et al. Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. Med. Chem. Commun. 6, 24–50 (2015).

doi: 10.1039/C4MD00216D

Cichońska, A. et al. Crowdsourced mapping of unexplored target space of kinase inhibitors. Nat. Commun. 12, 3307 (2021).

doi: 10.1038/s41467-021-23165-1 pubmed: 34083538 pmcid: 8175708

Du, B. X. et al. Compound-protein interaction prediction by deep learning: databases, descriptors and models. Drug Discov. Today 27, 1350–1366 (2022).

doi: 10.1016/j.drudis.2022.02.023 pubmed: 35248748

De Simone, G., Sardina, D. S., Gulotta, M. R. & Perricone, U. KUALA: a machine learning-driven framework for kinase inhibitors repositioning. Sci. Rep. 12, 17877 (2022).

doi: 10.1038/s41598-022-22324-8 pubmed: 36284125 pmcid: 9595087

Born, J., Huynh, T., Stroobants, A., Cornell, W. D. & Manica, M. Active site sequence representations of human kinases outperform full sequence representations for affinity prediction and inhibitor generation: 3D effects in a 1D model. J. Chem. Inf. Model. 62, 240–257 (2021).

doi: 10.1021/acs.jcim.1c00889 pubmed: 34905358

Thafar, M. A. et al. Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning. Sci. Rep. 12, 4751 (2022).

doi: 10.1038/s41598-022-08787-9 pubmed: 35306525 pmcid: 8934358

Martin, E. & Mukherjee, P. Kinase-kernel models: accurate in silico screening of 4 million compounds across the entire human kinome. J. Chem. Inf. Model. 52, 156–170 (2012).

doi: 10.1021/ci200314j pubmed: 22133092

Nascimento, A. C., Prudêncio, R. B. & Costa, I. G. A multiple kernel learning algorithm for drug-target interaction prediction. BMC Bioinformatics 17, 1–16 (2016).

doi: 10.1186/s12859-016-0890-3

Cichonska, A. et al. Computational-experimental approach to drug-target interaction mapping: a case study on kinase inhibitors. PLoS Comput. Biol. 13, 1005678 (2017).

doi: 10.1371/journal.pcbi.1005678

Cichonska, A. et al. Learning with multiple pairwise kernels for drug bioactivity prediction. Bioinformatics 34, 509–518 (2018).

doi: 10.1093/bioinformatics/bty277

Öztürk, H., Özgür, A. & Ozkirimli, E. DeepDTA: deep drug-target binding affinity prediction. Bioinformatics 34, 821–829 (2018).

doi: 10.1093/bioinformatics/bty593

Kalemati, M., Zamani Emani, M. & Koohi, S. BiComp-DTA: Drug-target binding affinity prediction through complementary biological-related and compression-based featurization approach. PLoS Comput. Biol. 19, 1011036 (2023).

doi: 10.1371/journal.pcbi.1011036

Luo, Y., Liu, Y. & Peng, J. Calibrated geometric deep learning improves kinase-drug binding predictions. Nat. Mach. Intell. 5, 1390–1401 (2023).

doi: 10.1038/s42256-023-00751-0 pubmed: 38962391 pmcid: 11221792

Singh, R., Sledzieski, S., Bryson, B., Cowen, L. & Berger, B. Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc. Natl. Acad. Sci. USA 120, 2220778120 (2023).

doi: 10.1073/pnas.2220778120

David, L., Thakkar, A., Mercado, R. & Engkvist, O. Molecular representations in AI-driven drug discovery: a review and practical guide. J. Cheminform. 12, 1–22 (2020).

doi: 10.1186/s13321-020-00460-5

Kanev, G. K. et al. Predicting the target landscape of kinase inhibitors using 3D convolutional neural networks. PLoS Comput. Biol. 19, 1011301 (2023).

doi: 10.1371/journal.pcbi.1011301

Park, H. et al. AiKPro: deep learning model for kinome-wide bioactivity profiling using structure-based sequence alignments and molecular 3D conformer ensemble descriptors. Sci. Rep. 13, 10268 (2023).

doi: 10.1038/s41598-023-37456-8 pubmed: 37355672 pmcid: 10290719

Liu, C., Kutchukian, P., Nguyen, N. D., AlQuraishi, M. & Sorger, P. K. A hybrid structure-based machine learning approach for predicting kinase inhibition by small molecules. J. Chem. Inf. Model. 63, 5457–5472 (2023).

doi: 10.1021/acs.jcim.3c00347 pubmed: 37595065 pmcid: 10498990

Li, S. et al. PocketAnchor: Learning structure-based pocket representations for protein-ligand interaction prediction. Cell Syst. 14, 692–705 (2023).

doi: 10.1016/j.cels.2023.05.005 pubmed: 37516103

Elnaggar, A. et al. Prottrans: Toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2021).

doi: 10.1109/TPAMI.2021.3095381

Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47, 930–940 (2019).

doi: 10.1093/nar/gky1075

Zhu, T. et al. Hit identification and optimization in virtual screening: Practical recommendations based on a critical literature analysis. J. Med. Chem. 56, 6560–6572 (2013).

doi: 10.1021/jm301916b pubmed: 23688234 pmcid: 3772997

Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nat. 566, 224–229 (2019).

doi: 10.1038/s41586-019-0917-9

Kim, S. et al. PubChem 2023 update. Nucleic Acids Res. 51, 1373–1380 (2023).

doi: 10.1093/nar/gkac956

Schölkopf, B., Smola, A. J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT Press, Cambridge, 2002)

Pahikkala, T., Airola, A., Stock, M., De Baets, B. & Waegeman, W. Efficient regularized least-squares algorithms for conditional ranking on relational data. Machine Learning 93, 321–356 (2013).

doi: 10.1007/s10994-013-5354-7

Breiman, L. Random forests. Machine Learning 45, 5–32 (2001).

doi: 10.1023/A:1010933404324

Davis, M. I. et al. Comprehensive analysis of kinase inhibitor selectivity. Nat. Biotechnol. 29, 1046–1051 (2011).

doi: 10.1038/nbt.1990 pubmed: 22037378

Rasmussen, C. E., Williams& C. K. Gaussian Processes for Machine Learning (MIT Press, Cambridge (2006)

Berginski, M. E. et al. The Dark Kinase Knowledgebase: an online compendium of knowledge and experimental results of understudied kinases. Nucleic Acids Res. 49, 529–535 (2021).

doi: 10.1093/nar/gkaa853

Bender, A. et al. Evaluation guidelines for machine learning tools in the chemical sciences. Nat. Rev. Chem. 6, 428–442 (2022).

doi: 10.1038/s41570-022-00391-9 pubmed: 37117429

Ong, W. J. G., Kirubakaran, P., Karanicolas, J. Poor generalization by current deep learning models for predicting binding affinities of kinase inhibitors. Preprint at https://www.biorxiv.org/content/10.1101/2023.09.04.556234v1 (2023).

Anastassiadis, T., Deacon, S. W., Devarajan, K., Ma, H. & Peterson, J. R. Comprehensive assay of kinase catalytic activity reveals features of kinase inhibitor selectivity. Nat. Biotechnol. 29, 1039–1045 (2011).

doi: 10.1038/nbt.2017 pubmed: 22037377 pmcid: 3230241

Metz, J. T. et al. Navigating the kinome. Nat. Chem. Biol. 7, 200–202 (2011).

doi: 10.1038/nchembio.530 pubmed: 21336281

Landrum, G. A., Riniker, S. Combining IC50 or Ki values from different sources is a source of significant noise. J. Chem. Inf. Model. 64, 1560–1567 (2024).

Bento, A. P. et al. An open source chemical structure curation pipeline using RDKit. J. Cheminform. 12, 1–16 (2020).

doi: 10.1186/s13321-020-00456-1

Park, R. et al. Preference optimization for molecular language models. Preprint at https://arxiv.org/abs/2310.12304 (2023).

Kanev, G. K., de Graaf, C., Westerman, B. A., de Esch, I. J. & Kooistra, A. J. KLIFS: an overhaul after the first 5 years of supporting kinase research. Nucleic Acids Res. 49, 562–569 (2021).

doi: 10.1093/nar/gkaa895

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

Fabian, M. A. et al. A small molecule-kinase interaction map for clinical kinase inhibitors. Nat. Biotechnol. 23, 329–336 (2005).

doi: 10.1038/nbt1068 pubmed: 15711537

Hill, A. V. The possible effects of the aggregation of the molecules of hemoglobin on its dissociation curves. J. Physiol. 40, iv–vii (1910).

Levenberg, K. A method for the solution of certain non-linear problems in least squares. Q. Appl. Math. 2, 164–168 (1944).

doi: 10.1090/qam/10666

Theisen, R., Wang, T., Ravikumar, B., Rahman, R. & Cichońska, A. Leveraging multiple data types for improved compound-kinase bioactivity prediction. Zenodo https://doi.org/10.5281/zenodo.12806494 (2024).

Leveraging multiple data types for improved compound-kinase bioactivity prediction.

Journal

Informations de publication

Résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Ryan Theisen (R)

Tianduanyi Wang (T)

Balaguru Ravikumar (B)

Rayees Rahman (R)

Anna Cichońska (A)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH