Improving drug discovery with a hybrid deep generative model using reinforcement learning trained on a Bayesian docking approximation.
Bayesian regression
Discoidin domain receptor 1
Generative molecular models
Machine learning in drug design
Molecular docking
Journal
Journal of computer-aided molecular design
ISSN: 1573-4951
Titre abrégé: J Comput Aided Mol Des
Pays: Netherlands
ID NLM: 8710425
Informations de publication
Date de publication:
11 2023
11 2023
Historique:
received:
22
05
2023
accepted:
17
07
2023
medline:
18
9
2023
pubmed:
8
8
2023
entrez:
7
8
2023
Statut:
ppublish
Résumé
Generative approaches to molecular design are an area of intense study in recent years as a method to generate new pharmaceuticals with desired properties. Often though, these types of efforts are constrained by limited experimental activity data, resulting in either models that generate molecules with poor performance or models that are overfit and produce close analogs of known molecules. In this paper, we reduce this data dependency for the generation of new chemotypes by incorporating docking scores of known and de novo molecules to expand the applicability domain of the reward function and diversify the compounds generated during reinforcement learning. Our approach employs a deep generative model initially trained using a combination of limited known drug activity and an approximate docking score provided by a second machine learned Bayes regression model, with final evaluation of high scoring compounds by a full docking simulation. This strategy results in molecules with docking scores improved by 10-20% compared to molecules of similar size, while being 130 × faster than a docking only approach on a typical GPU workstation. We also show that the increased docking scores correlate with (1) docking poses with interactions similar to known inhibitors and (2) result in higher MM-GBSA binding energies comparable to the energies of known DDR1 inhibitors, demonstrating that the Bayesian model contains sufficient information for the network to learn to efficiently interact with the binding pocket during reinforcement learning. This outcome shows that the combination of the learned latent molecular representation along with the feature-based docking regression is sufficient for reinforcement learning to infer the relationship between the molecules and the receptor binding site, which suggest that our method can be a powerful tool for the discovery of new chemotypes with potential therapeutic applications.
Identifiants
pubmed: 37550462
doi: 10.1007/s10822-023-00523-3
pii: 10.1007/s10822-023-00523-3
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
507-517Informations de copyright
© 2023. The Author(s), under exclusive licence to Springer Nature Switzerland AG.
Références
Lyu J, Irwin JJ, Shoichet BK (2023) Modeling the expansion of virtual screening libraries. Nat Chem Biol 19:712–718. https://doi.org/10.1038/s41589-022-01234-w
doi: 10.1038/s41589-022-01234-w
pubmed: 36646956
Lyu J, Wang S, Balius TE et al (2019) Ultra-large library docking for discovering new chemotypes. Nature 566:224–229. https://doi.org/10.1038/s41586-019-0917-9
doi: 10.1038/s41586-019-0917-9
pubmed: 30728502
pmcid: 6383769
Irwin JJ, Tang KG, Young J et al (2020) ZINC20—a free ultralarge-scale chemical database for ligand discovery. J Chem Inf Model 60:6065–6073
doi: 10.1021/acs.jcim.0c00675
pubmed: 33118813
pmcid: 8284596
Shivanyuk AN, Ryabukhin SV, Tolmachev A et al (2007) Enamine real database: making chemical diversity real. Chemistry today 25:58–59
Varela-Rial A, Majewski M, De Fabritiis G (2022) Structure based virtual screening: Fast and slow. WIREs Comput Mol Sci 12:e1544. https://doi.org/10.1002/wcms.1544
doi: 10.1002/wcms.1544
Bragina ME, Daina A, Perez MA et al (2022) The SwissSimilarity 2021 web tool: novel chemical libraries and additional methods for an enhanced ligand-based virtual screening experience. Int J Mol Sci 23:811
doi: 10.3390/ijms23020811
pubmed: 35054998
pmcid: 8776004
Martinelli DD (2022) Generative machine learning for de novo drug discovery: a systematic review. Comput Biol Med 145:105403. https://doi.org/10.1016/j.compbiomed.2022.105403
doi: 10.1016/j.compbiomed.2022.105403
pubmed: 35339849
Coleman RG, Carchia M, Sterling T et al (2013) Ligand pose and orientational sampling in molecular docking. PLoS ONE 8:e75992. https://doi.org/10.1371/journal.pone.0075992
doi: 10.1371/journal.pone.0075992
pubmed: 24098414
pmcid: 3787967
Xu W, Lucke AJ, Fairlie DP (2015) Comparing sixteen scoring functions for predicting biological activities of ligands for protein targets. J Mol Graph Model 57:76–88. https://doi.org/10.1016/j.jmgm.2015.01.009
doi: 10.1016/j.jmgm.2015.01.009
pubmed: 25682361
Zhavoronkov A, Ivanenkov YA, Aliper A et al (2019) Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat Biotechnol 37:1038–1040. https://doi.org/10.1038/s41587-019-0224-x
doi: 10.1038/s41587-019-0224-x
pubmed: 31477924
Gainor JF, Chabner BA (2015) Ponatinib: accelerated disapproval. Oncologist 20:847–848. https://doi.org/10.1634/theoncologist.2015-0253
doi: 10.1634/theoncologist.2015-0253
pubmed: 26173838
pmcid: 4524765
Zeng X, Wang F, Luo Y et al (2022) Deep generative molecular design reshapes drug discovery. Cell Rep Med. https://doi.org/10.1016/j.xcrm.2022.100794
doi: 10.1016/j.xcrm.2022.100794
pubmed: 36513070
pmcid: 9798030
Li Y, Zhang L, Wang Y et al (2022) Generative deep learning enables the discovery of a potent and selective RIPK1 inhibitor. Nat Commun 13:6891. https://doi.org/10.1038/s41467-022-34692-w
doi: 10.1038/s41467-022-34692-w
pubmed: 36371441
pmcid: 9653409
Grant LL, Sit CS (2021) De novo molecular drug design benchmarking. RSC Med Chem 12:1273–1280. https://doi.org/10.1039/D1MD00074H
doi: 10.1039/D1MD00074H
pubmed: 34458735
pmcid: 8372209
Vella D, Ebejer J-P (2022) Few-shot learning for low-data drug discovery. J Chem Inf Model. https://doi.org/10.1021/acs.jcim.2c00779
doi: 10.1021/acs.jcim.2c00779
pubmed: 36410391
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754
doi: 10.1021/ci100050t
pubmed: 20426451
Jeon W, Kim D (2020) Autonomous molecule generation using reinforcement learning and docking to develop potential novel inhibitors. Sci Rep 10:22104. https://doi.org/10.1038/s41598-020-78537-2
doi: 10.1038/s41598-020-78537-2
pubmed: 33328504
pmcid: 7744578
Thomas M, Smith RT, O’Boyle NM et al (2021) Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study. J Cheminform 13:39. https://doi.org/10.1186/s13321-021-00516-0
doi: 10.1186/s13321-021-00516-0
pubmed: 33985583
pmcid: 8117600
Sadybekov AA, Sadybekov AV, Liu Y et al (2022) Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature 601:452–459. https://doi.org/10.1038/s41586-021-04220-9
doi: 10.1038/s41586-021-04220-9
pubmed: 34912117
Gentile F, Yaacoub JC, Gleave J et al (2022) Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking. Nat Protoc 17:672–697
doi: 10.1038/s41596-021-00659-2
pubmed: 35121854
Berenger F, Kumar A, Zhang KYJ, Yamanishi Y (2021) Lean-docking: exploiting ligands’ predicted docking scores to accelerate molecular docking. J Chem Inf Model 61:2341–2352. https://doi.org/10.1021/acs.jcim.0c01452
doi: 10.1021/acs.jcim.0c01452
pubmed: 33861591
Bucinsky L, Bortňák D, Gall M et al (2022) Machine learning prediction of 3CL SARS-CoV-2 docking scores. Comput Biol Chem 98:107656. https://doi.org/10.1016/j.compbiolchem.2022.107656
doi: 10.1016/j.compbiolchem.2022.107656
pubmed: 35288359
pmcid: 8881816
MolFinder: an evolutionary algorithm for the global optimization of molecular properties and the extensive exploration of chemical space using SMILES | Journal of Cheminformatics | Full Text. https://jcheminf.biomedcentral.com/articles/ https://doi.org/10.1186/s13321-021-00501-7 . Accessed 21 Jun 2023
Ciepliński T, Danel T, Podlewska S, Jastrzȩbski S (2023) Generative models should at least be able to design molecules that dock well: a new benchmark. J Chem Inf Model 63:3238–3247. https://doi.org/10.1021/acs.jcim.2c01355
doi: 10.1021/acs.jcim.2c01355
pubmed: 37224003
pmcid: 10268949
Gómez-Bombarelli R, Wei JN, Duvenaud D et al (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 4:268–276
doi: 10.1021/acscentsci.7b00572
pubmed: 29532027
pmcid: 5833007
Kusner MJ, Paige B, Hernández-Lobato JM (2017) Grammar variational autoencoder. In: International conference on machine learning. PMLR, pp 1945–1954
Olivecrona M, Blaschke T, Engkvist O, Chen H (2017) Molecular de-novo design through deep reinforcement learning. J Cheminform 9:48. https://doi.org/10.1186/s13321-017-0235-x
doi: 10.1186/s13321-017-0235-x
pubmed: 29086083
pmcid: 5583141
Gao Y, Zhou J, Li J (2021) Discoidin domain receptors orchestrate cancer progression: a focus on cancer therapies. Cancer Sci 112:962–969. https://doi.org/10.1111/cas.14789
doi: 10.1111/cas.14789
pubmed: 33377205
pmcid: 7935774
Moll S, Desmoulière A, Moeller MJ et al (2019) DDR1 role in fibrosis and its pharmacological targeting. Biochimica et Biophysica Acta (BBA) - Mol Cell Res 1866:118474. https://doi.org/10.1016/j.bbamcr.2019.04.004
doi: 10.1016/j.bbamcr.2019.04.004
Tian Y, Bai F, Zhang D (2022) New target DDR1: A “double-edged sword” in solid tumors. Biochimica et Biophysica Acta (BBA) -Rev Cancer 1878:188829
doi: 10.1016/j.bbcan.2022.188829
Hinton GE, Roweis S (2002) Stochastic neighbor embedding. Advances in neural information processing systems 15. https://proceedings.neurips.cc/paper_files/paper/2002/hash/6150ccc6069bea6b5716254057a194ef-Abstract.html
Koes DR, Baumgartner MP, Camacho CJ (2013) Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J Chem Inf Model 53:1893–1904
doi: 10.1021/ci300604z
pubmed: 23379370
pmcid: 3726561
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Machine Learn Res 12:2825–2830
Kohonen T (1990) The self-organizing map. Proc IEEE 78:1464–1480
doi: 10.1109/5.58325
Kaiser TM, Burger PB, Butch CJ et al (2018) A machine learning approach for predicting HIV reverse transcriptase mutation susceptibility of biologically active compounds. J Chem Inf Model 58:1544–1552
doi: 10.1021/acs.jcim.7b00475
pubmed: 29953819
Kaiser TM, Dentmon ZW, Dalloul CE et al (2020) Accelerated discovery of novel ponatinib analogs with improved properties for the treatment of parkinson’s disease. ACS Med Chem Lett 11:491–496
doi: 10.1021/acsmedchemlett.9b00612
pubmed: 32292555
pmcid: 7153011
Pribut N, Kaiser TM, Wilson RJ et al (2020) Accelerated discovery of potent fusion inhibitors for respiratory syncytial virus. ACS Infect Dis 6:922–929
doi: 10.1021/acsinfecdis.9b00524
pubmed: 32275393
pmcid: 7456560
Cox BD, Prosser AR, Sun Y et al (2015) Pyrazolo-piperidines exhibit dual inhibition of CCR5/CXCR4 HIV entry and reverse transcriptase. ACS Med Chem Lett 6:753–757
doi: 10.1021/acsmedchemlett.5b00036
pubmed: 26191361
pmcid: 4499816
Shi Q, Kaiser TM, Dentmon ZW et al (2015) Design and validation of FRESH, a drug discovery paradigm resting on robust chemical synthesis. ACS Med Chem Lett 6:518–522
doi: 10.1021/acsmedchemlett.5b00062
pubmed: 26005525
pmcid: 4434458
Lipinski CA (2004) Lead-and drug-like compounds: the rule-of-five revolution. Drug Discov Today Technol 1:337–341
doi: 10.1016/j.ddtec.2004.11.007
pubmed: 24981612
Pan Y, Huang N, Cho S, MacKerell AD (2003) Consideration of molecular weight during compound selection in virtual target-based database screening. J Chem Inf Comput Sci 43:267–272
doi: 10.1021/ci020055f
pubmed: 12546562
Bajusz D, Rácz A, Héberger K (2015) Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J Chem 7:1–13
Bouysset C, Fiorucci S (2021) ProLIF: a library to encode molecular interactions as fingerprints. J Cheminform 13:72. https://doi.org/10.1186/s13321-021-00548-6
doi: 10.1186/s13321-021-00548-6
pubmed: 34563256
pmcid: 8466659
Eastman P, Swails J, Chodera JD et al (2017) OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS Comput Biol 13:e1005659
doi: 10.1371/journal.pcbi.1005659
pubmed: 28746339
pmcid: 5549999
Tuccinardi T (2021) What is the current value of MM/PBSA and MM/GBSA methods in drug discovery? Expert Opin Drug Discov 16:1233–1237. https://doi.org/10.1080/17460441.2021.1942836
doi: 10.1080/17460441.2021.1942836
pubmed: 34165011
Altae-Tran H, Ramsundar B, Pappu AS, Pande V (2017) Low data drug discovery with one-shot learning. ACS Cent Sci 3:283–293. https://doi.org/10.1021/acscentsci.6b00367
doi: 10.1021/acscentsci.6b00367
pubmed: 28470045
pmcid: 5408335
Mendez D, Gaulton A, Bento AP et al (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47:D930–D940
doi: 10.1093/nar/gky1075
pubmed: 30398643
Gabrielson SW (2018) SciFinder. J Med Libr Assoc: JMLA 106:588
doi: 10.5195/jmla.2018.515
pmcid: 6148602
Polykovskiy D, Zhebrak A, Sanchez-Lengeling B et al (2020) Molecular sets (MOSES): a benchmarking platform for molecular generation models. Front Pharmacol 11:565644
doi: 10.3389/fphar.2020.565644
pubmed: 33390943
pmcid: 7775580
Sterling T, Irwin JJ (2015) ZINC 15–ligand discovery for everyone. J Chem Inf Model 55:2324–2337
doi: 10.1021/acs.jcim.5b00559
pubmed: 26479676
pmcid: 4658288
Trott O, Olson AJ (2009) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem NA-NA. https://doi.org/10.1002/jcc.21334
doi: 10.1002/jcc.21334
Richter H, Satz AL, Bedoucha M et al (2018) DNA-encoded library-derived DDR1 inhibitor prevents fibrosis and renal function loss in a genetic mouse model of Alport syndrome. ACS Chem Biol 14:37–49
doi: 10.1021/acschembio.8b00866
pubmed: 30452219
pmcid: 6343110
Pettersen EF, Goddard TD, Huang CC et al (2004) UCSF Chimera—a visualization system for exploratory research and analysis. J Comput Chem 25:1605–1612
doi: 10.1002/jcc.20084
pubmed: 15264254
Bento AP, Hersey A, Félix E et al (2020) An open source chemical structure curation pipeline using RDKit. J Cheminform 12:1–16
doi: 10.1186/s13321-020-00456-1
Halgren TA (1996) Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J Comput Chem 17:490–519
doi: 10.1002/(SICI)1096-987X(199604)17:5/6<490::AID-JCC1>3.0.CO;2-P
O’Boyle NM, Banck M, James CA et al (2011) Open Babel: an open chemical toolbox. J Cheminform 3:1–14
Vettigli G (2022) MiniSom