Leveraging multiple data types for improved compound-kinase bioactivity prediction.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
31 Aug 2024
Historique:
received: 08 03 2024
accepted: 21 08 2024
medline: 1 9 2024
pubmed: 1 9 2024
entrez: 31 8 2024
Statut: epublish

Résumé

Machine learning provides efficient ways to map compound-kinase interactions. However, diverse bioactivity data types, including single-dose and multi-dose-response assay results, present challenges. Traditional models utilize only multi-dose data, overlooking information contained in single-dose measurements. Here, we propose a machine learning methodology for compound-kinase activity prediction that leverages both single-dose and dose-response data. We demonstrate that our two-stage approach yields accurate activity predictions and significantly improves model performance compared to training solely on dose-response labels. This superior performance is consistent across five diverse machine learning methods. Using the best performing model, we carried out extensive experimental profiling on a total of 347 selected compound-kinase pairs, achieving a high hit rate of 40% and a negative predictive value of 78%. We show that these rates can be improved further by incorporating model uncertainty estimates into the compound selection process. By integrating multiple activity data types, we demonstrate that our approach holds promise for facilitating the development of training activity datasets in a more efficient and cost-effective way.

Identifiants

pubmed: 39217147
doi: 10.1038/s41467-024-52055-5
pii: 10.1038/s41467-024-52055-5
doi:

Substances chimiques

Protein Kinase Inhibitors 0
Phosphotransferases EC 2.7.-

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

7596

Informations de copyright

© 2024. The Author(s).

Références

Cortés-Ciriano, I. et al. Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. Med. Chem. Commun. 6, 24–50 (2015).
doi: 10.1039/C4MD00216D
Cichońska, A. et al. Crowdsourced mapping of unexplored target space of kinase inhibitors. Nat. Commun. 12, 3307 (2021).
doi: 10.1038/s41467-021-23165-1 pubmed: 34083538 pmcid: 8175708
Du, B. X. et al. Compound-protein interaction prediction by deep learning: databases, descriptors and models. Drug Discov. Today 27, 1350–1366 (2022).
doi: 10.1016/j.drudis.2022.02.023 pubmed: 35248748
De Simone, G., Sardina, D. S., Gulotta, M. R. & Perricone, U. KUALA: a machine learning-driven framework for kinase inhibitors repositioning. Sci. Rep. 12, 17877 (2022).
doi: 10.1038/s41598-022-22324-8 pubmed: 36284125 pmcid: 9595087
Born, J., Huynh, T., Stroobants, A., Cornell, W. D. & Manica, M. Active site sequence representations of human kinases outperform full sequence representations for affinity prediction and inhibitor generation: 3D effects in a 1D model. J. Chem. Inf. Model. 62, 240–257 (2021).
doi: 10.1021/acs.jcim.1c00889 pubmed: 34905358
Thafar, M. A. et al. Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning. Sci. Rep. 12, 4751 (2022).
doi: 10.1038/s41598-022-08787-9 pubmed: 35306525 pmcid: 8934358
Martin, E. & Mukherjee, P. Kinase-kernel models: accurate in silico screening of 4 million compounds across the entire human kinome. J. Chem. Inf. Model. 52, 156–170 (2012).
doi: 10.1021/ci200314j pubmed: 22133092
Nascimento, A. C., Prudêncio, R. B. & Costa, I. G. A multiple kernel learning algorithm for drug-target interaction prediction. BMC Bioinformatics 17, 1–16 (2016).
doi: 10.1186/s12859-016-0890-3
Cichonska, A. et al. Computational-experimental approach to drug-target interaction mapping: a case study on kinase inhibitors. PLoS Comput. Biol. 13, 1005678 (2017).
doi: 10.1371/journal.pcbi.1005678
Cichonska, A. et al. Learning with multiple pairwise kernels for drug bioactivity prediction. Bioinformatics 34, 509–518 (2018).
doi: 10.1093/bioinformatics/bty277
Öztürk, H., Özgür, A. & Ozkirimli, E. DeepDTA: deep drug-target binding affinity prediction. Bioinformatics 34, 821–829 (2018).
doi: 10.1093/bioinformatics/bty593
Kalemati, M., Zamani Emani, M. & Koohi, S. BiComp-DTA: Drug-target binding affinity prediction through complementary biological-related and compression-based featurization approach. PLoS Comput. Biol. 19, 1011036 (2023).
doi: 10.1371/journal.pcbi.1011036
Luo, Y., Liu, Y. & Peng, J. Calibrated geometric deep learning improves kinase-drug binding predictions. Nat. Mach. Intell. 5, 1390–1401 (2023).
doi: 10.1038/s42256-023-00751-0 pubmed: 38962391 pmcid: 11221792
Singh, R., Sledzieski, S., Bryson, B., Cowen, L. & Berger, B. Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc. Natl. Acad. Sci. USA 120, 2220778120 (2023).
doi: 10.1073/pnas.2220778120
David, L., Thakkar, A., Mercado, R. & Engkvist, O. Molecular representations in AI-driven drug discovery: a review and practical guide. J. Cheminform. 12, 1–22 (2020).
doi: 10.1186/s13321-020-00460-5
Kanev, G. K. et al. Predicting the target landscape of kinase inhibitors using 3D convolutional neural networks. PLoS Comput. Biol. 19, 1011301 (2023).
doi: 10.1371/journal.pcbi.1011301
Park, H. et al. AiKPro: deep learning model for kinome-wide bioactivity profiling using structure-based sequence alignments and molecular 3D conformer ensemble descriptors. Sci. Rep. 13, 10268 (2023).
doi: 10.1038/s41598-023-37456-8 pubmed: 37355672 pmcid: 10290719
Liu, C., Kutchukian, P., Nguyen, N. D., AlQuraishi, M. & Sorger, P. K. A hybrid structure-based machine learning approach for predicting kinase inhibition by small molecules. J. Chem. Inf. Model. 63, 5457–5472 (2023).
doi: 10.1021/acs.jcim.3c00347 pubmed: 37595065 pmcid: 10498990
Li, S. et al. PocketAnchor: Learning structure-based pocket representations for protein-ligand interaction prediction. Cell Syst. 14, 692–705 (2023).
doi: 10.1016/j.cels.2023.05.005 pubmed: 37516103
Elnaggar, A. et al. Prottrans: Toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2021).
doi: 10.1109/TPAMI.2021.3095381
Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47, 930–940 (2019).
doi: 10.1093/nar/gky1075
Zhu, T. et al. Hit identification and optimization in virtual screening: Practical recommendations based on a critical literature analysis. J. Med. Chem. 56, 6560–6572 (2013).
doi: 10.1021/jm301916b pubmed: 23688234 pmcid: 3772997
Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nat. 566, 224–229 (2019).
doi: 10.1038/s41586-019-0917-9
Kim, S. et al. PubChem 2023 update. Nucleic Acids Res. 51, 1373–1380 (2023).
doi: 10.1093/nar/gkac956
Schölkopf, B., Smola, A. J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT Press, Cambridge, 2002)
Pahikkala, T., Airola, A., Stock, M., De Baets, B. & Waegeman, W. Efficient regularized least-squares algorithms for conditional ranking on relational data. Machine Learning 93, 321–356 (2013).
doi: 10.1007/s10994-013-5354-7
Breiman, L. Random forests. Machine Learning 45, 5–32 (2001).
doi: 10.1023/A:1010933404324
Davis, M. I. et al. Comprehensive analysis of kinase inhibitor selectivity. Nat. Biotechnol. 29, 1046–1051 (2011).
doi: 10.1038/nbt.1990 pubmed: 22037378
Rasmussen, C. E., Williams& C. K. Gaussian Processes for Machine Learning (MIT Press, Cambridge (2006)
Berginski, M. E. et al. The Dark Kinase Knowledgebase: an online compendium of knowledge and experimental results of understudied kinases. Nucleic Acids Res. 49, 529–535 (2021).
doi: 10.1093/nar/gkaa853
Bender, A. et al. Evaluation guidelines for machine learning tools in the chemical sciences. Nat. Rev. Chem. 6, 428–442 (2022).
doi: 10.1038/s41570-022-00391-9 pubmed: 37117429
Ong, W. J. G., Kirubakaran, P., Karanicolas, J. Poor generalization by current deep learning models for predicting binding affinities of kinase inhibitors. Preprint at https://www.biorxiv.org/content/10.1101/2023.09.04.556234v1 (2023).
Anastassiadis, T., Deacon, S. W., Devarajan, K., Ma, H. & Peterson, J. R. Comprehensive assay of kinase catalytic activity reveals features of kinase inhibitor selectivity. Nat. Biotechnol. 29, 1039–1045 (2011).
doi: 10.1038/nbt.2017 pubmed: 22037377 pmcid: 3230241
Metz, J. T. et al. Navigating the kinome. Nat. Chem. Biol. 7, 200–202 (2011).
doi: 10.1038/nchembio.530 pubmed: 21336281
Landrum, G. A., Riniker, S. Combining IC50 or Ki values from different sources is a source of significant noise. J. Chem. Inf. Model. 64, 1560–1567 (2024).
Bento, A. P. et al. An open source chemical structure curation pipeline using RDKit. J. Cheminform. 12, 1–16 (2020).
doi: 10.1186/s13321-020-00456-1
Park, R. et al. Preference optimization for molecular language models. Preprint at https://arxiv.org/abs/2310.12304 (2023).
Kanev, G. K., de Graaf, C., Westerman, B. A., de Esch, I. J. & Kooistra, A. J. KLIFS: an overhaul after the first 5 years of supporting kinase research. Nucleic Acids Res. 49, 562–569 (2021).
doi: 10.1093/nar/gkaa895
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Fabian, M. A. et al. A small molecule-kinase interaction map for clinical kinase inhibitors. Nat. Biotechnol. 23, 329–336 (2005).
doi: 10.1038/nbt1068 pubmed: 15711537
Hill, A. V. The possible effects of the aggregation of the molecules of hemoglobin on its dissociation curves. J. Physiol. 40, iv–vii (1910).
Levenberg, K. A method for the solution of certain non-linear problems in least squares. Q. Appl. Math. 2, 164–168 (1944).
doi: 10.1090/qam/10666
Theisen, R., Wang, T., Ravikumar, B., Rahman, R. & Cichońska, A. Leveraging multiple data types for improved compound-kinase bioactivity prediction. Zenodo https://doi.org/10.5281/zenodo.12806494 (2024).

Auteurs

Ryan Theisen (R)

Harmonic Discovery Inc., New York City, NY, USA. rayees@harmonicdiscovery.com.

Tianduanyi Wang (T)

Harmonic Discovery Inc., New York City, NY, USA.

Balaguru Ravikumar (B)

Harmonic Discovery Inc., New York City, NY, USA.

Rayees Rahman (R)

Harmonic Discovery Inc., New York City, NY, USA.

Anna Cichońska (A)

Harmonic Discovery Inc., New York City, NY, USA. anna@harmonicdiscovery.com.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH