Improving drug discovery with a hybrid deep generative model using reinforcement learning trained on a Bayesian docking approximation.

Bayesian regression Discoidin domain receptor 1 Generative molecular models Machine learning in drug design Molecular docking

Journal

Journal of computer-aided molecular design
ISSN: 1573-4951
Titre abrégé: J Comput Aided Mol Des
Pays: Netherlands
ID NLM: 8710425

Informations de publication

Date de publication:
11 2023
Historique:
received: 22 05 2023
accepted: 17 07 2023
medline: 18 9 2023
pubmed: 8 8 2023
entrez: 7 8 2023
Statut: ppublish

Résumé

Generative approaches to molecular design are an area of intense study in recent years as a method to generate new pharmaceuticals with desired properties. Often though, these types of efforts are constrained by limited experimental activity data, resulting in either models that generate molecules with poor performance or models that are overfit and produce close analogs of known molecules. In this paper, we reduce this data dependency for the generation of new chemotypes by incorporating docking scores of known and de novo molecules to expand the applicability domain of the reward function and diversify the compounds generated during reinforcement learning. Our approach employs a deep generative model initially trained using a combination of limited known drug activity and an approximate docking score provided by a second machine learned Bayes regression model, with final evaluation of high scoring compounds by a full docking simulation. This strategy results in molecules with docking scores improved by 10-20% compared to molecules of similar size, while being 130 × faster than a docking only approach on a typical GPU workstation. We also show that the increased docking scores correlate with (1) docking poses with interactions similar to known inhibitors and (2) result in higher MM-GBSA binding energies comparable to the energies of known DDR1 inhibitors, demonstrating that the Bayesian model contains sufficient information for the network to learn to efficiently interact with the binding pocket during reinforcement learning. This outcome shows that the combination of the learned latent molecular representation along with the feature-based docking regression is sufficient for reinforcement learning to infer the relationship between the molecules and the receptor binding site, which suggest that our method can be a powerful tool for the discovery of new chemotypes with potential therapeutic applications.

Identifiants

pubmed: 37550462
doi: 10.1007/s10822-023-00523-3
pii: 10.1007/s10822-023-00523-3
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

507-517

Informations de copyright

© 2023. The Author(s), under exclusive licence to Springer Nature Switzerland AG.

Références

Lyu J, Irwin JJ, Shoichet BK (2023) Modeling the expansion of virtual screening libraries. Nat Chem Biol 19:712–718. https://doi.org/10.1038/s41589-022-01234-w
doi: 10.1038/s41589-022-01234-w pubmed: 36646956
Lyu J, Wang S, Balius TE et al (2019) Ultra-large library docking for discovering new chemotypes. Nature 566:224–229. https://doi.org/10.1038/s41586-019-0917-9
doi: 10.1038/s41586-019-0917-9 pubmed: 30728502 pmcid: 6383769
Irwin JJ, Tang KG, Young J et al (2020) ZINC20—a free ultralarge-scale chemical database for ligand discovery. J Chem Inf Model 60:6065–6073
doi: 10.1021/acs.jcim.0c00675 pubmed: 33118813 pmcid: 8284596
Shivanyuk AN, Ryabukhin SV, Tolmachev A et al (2007) Enamine real database: making chemical diversity real. Chemistry today 25:58–59
Varela-Rial A, Majewski M, De Fabritiis G (2022) Structure based virtual screening: Fast and slow. WIREs Comput Mol Sci 12:e1544. https://doi.org/10.1002/wcms.1544
doi: 10.1002/wcms.1544
Bragina ME, Daina A, Perez MA et al (2022) The SwissSimilarity 2021 web tool: novel chemical libraries and additional methods for an enhanced ligand-based virtual screening experience. Int J Mol Sci 23:811
doi: 10.3390/ijms23020811 pubmed: 35054998 pmcid: 8776004
Martinelli DD (2022) Generative machine learning for de novo drug discovery: a systematic review. Comput Biol Med 145:105403. https://doi.org/10.1016/j.compbiomed.2022.105403
doi: 10.1016/j.compbiomed.2022.105403 pubmed: 35339849
Coleman RG, Carchia M, Sterling T et al (2013) Ligand pose and orientational sampling in molecular docking. PLoS ONE 8:e75992. https://doi.org/10.1371/journal.pone.0075992
doi: 10.1371/journal.pone.0075992 pubmed: 24098414 pmcid: 3787967
Xu W, Lucke AJ, Fairlie DP (2015) Comparing sixteen scoring functions for predicting biological activities of ligands for protein targets. J Mol Graph Model 57:76–88. https://doi.org/10.1016/j.jmgm.2015.01.009
doi: 10.1016/j.jmgm.2015.01.009 pubmed: 25682361
Zhavoronkov A, Ivanenkov YA, Aliper A et al (2019) Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat Biotechnol 37:1038–1040. https://doi.org/10.1038/s41587-019-0224-x
doi: 10.1038/s41587-019-0224-x pubmed: 31477924
Gainor JF, Chabner BA (2015) Ponatinib: accelerated disapproval. Oncologist 20:847–848. https://doi.org/10.1634/theoncologist.2015-0253
doi: 10.1634/theoncologist.2015-0253 pubmed: 26173838 pmcid: 4524765
Zeng X, Wang F, Luo Y et al (2022) Deep generative molecular design reshapes drug discovery. Cell Rep Med. https://doi.org/10.1016/j.xcrm.2022.100794
doi: 10.1016/j.xcrm.2022.100794 pubmed: 36513070 pmcid: 9798030
Li Y, Zhang L, Wang Y et al (2022) Generative deep learning enables the discovery of a potent and selective RIPK1 inhibitor. Nat Commun 13:6891. https://doi.org/10.1038/s41467-022-34692-w
doi: 10.1038/s41467-022-34692-w pubmed: 36371441 pmcid: 9653409
Grant LL, Sit CS (2021) De novo molecular drug design benchmarking. RSC Med Chem 12:1273–1280. https://doi.org/10.1039/D1MD00074H
doi: 10.1039/D1MD00074H pubmed: 34458735 pmcid: 8372209
Vella D, Ebejer J-P (2022) Few-shot learning for low-data drug discovery. J Chem Inf Model. https://doi.org/10.1021/acs.jcim.2c00779
doi: 10.1021/acs.jcim.2c00779 pubmed: 36410391
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754
doi: 10.1021/ci100050t pubmed: 20426451
Jeon W, Kim D (2020) Autonomous molecule generation using reinforcement learning and docking to develop potential novel inhibitors. Sci Rep 10:22104. https://doi.org/10.1038/s41598-020-78537-2
doi: 10.1038/s41598-020-78537-2 pubmed: 33328504 pmcid: 7744578
Thomas M, Smith RT, O’Boyle NM et al (2021) Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study. J Cheminform 13:39. https://doi.org/10.1186/s13321-021-00516-0
doi: 10.1186/s13321-021-00516-0 pubmed: 33985583 pmcid: 8117600
Sadybekov AA, Sadybekov AV, Liu Y et al (2022) Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature 601:452–459. https://doi.org/10.1038/s41586-021-04220-9
doi: 10.1038/s41586-021-04220-9 pubmed: 34912117
Gentile F, Yaacoub JC, Gleave J et al (2022) Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking. Nat Protoc 17:672–697
doi: 10.1038/s41596-021-00659-2 pubmed: 35121854
Berenger F, Kumar A, Zhang KYJ, Yamanishi Y (2021) Lean-docking: exploiting ligands’ predicted docking scores to accelerate molecular docking. J Chem Inf Model 61:2341–2352. https://doi.org/10.1021/acs.jcim.0c01452
doi: 10.1021/acs.jcim.0c01452 pubmed: 33861591
Bucinsky L, Bortňák D, Gall M et al (2022) Machine learning prediction of 3CL SARS-CoV-2 docking scores. Comput Biol Chem 98:107656. https://doi.org/10.1016/j.compbiolchem.2022.107656
doi: 10.1016/j.compbiolchem.2022.107656 pubmed: 35288359 pmcid: 8881816
MolFinder: an evolutionary algorithm for the global optimization of molecular properties and the extensive exploration of chemical space using SMILES | Journal of Cheminformatics | Full Text. https://jcheminf.biomedcentral.com/articles/ https://doi.org/10.1186/s13321-021-00501-7 . Accessed 21 Jun 2023
Ciepliński T, Danel T, Podlewska S, Jastrzȩbski S (2023) Generative models should at least be able to design molecules that dock well: a new benchmark. J Chem Inf Model 63:3238–3247. https://doi.org/10.1021/acs.jcim.2c01355
doi: 10.1021/acs.jcim.2c01355 pubmed: 37224003 pmcid: 10268949
Gómez-Bombarelli R, Wei JN, Duvenaud D et al (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 4:268–276
doi: 10.1021/acscentsci.7b00572 pubmed: 29532027 pmcid: 5833007
Kusner MJ, Paige B, Hernández-Lobato JM (2017) Grammar variational autoencoder. In: International conference on machine learning. PMLR, pp 1945–1954
Olivecrona M, Blaschke T, Engkvist O, Chen H (2017) Molecular de-novo design through deep reinforcement learning. J Cheminform 9:48. https://doi.org/10.1186/s13321-017-0235-x
doi: 10.1186/s13321-017-0235-x pubmed: 29086083 pmcid: 5583141
Gao Y, Zhou J, Li J (2021) Discoidin domain receptors orchestrate cancer progression: a focus on cancer therapies. Cancer Sci 112:962–969. https://doi.org/10.1111/cas.14789
doi: 10.1111/cas.14789 pubmed: 33377205 pmcid: 7935774
Moll S, Desmoulière A, Moeller MJ et al (2019) DDR1 role in fibrosis and its pharmacological targeting. Biochimica et Biophysica Acta (BBA) - Mol Cell Res 1866:118474. https://doi.org/10.1016/j.bbamcr.2019.04.004
doi: 10.1016/j.bbamcr.2019.04.004
Tian Y, Bai F, Zhang D (2022) New target DDR1: A “double-edged sword” in solid tumors. Biochimica et Biophysica Acta (BBA) -Rev Cancer 1878:188829
doi: 10.1016/j.bbcan.2022.188829
Hinton GE, Roweis S (2002) Stochastic neighbor embedding. Advances in neural information processing systems 15. https://proceedings.neurips.cc/paper_files/paper/2002/hash/6150ccc6069bea6b5716254057a194ef-Abstract.html
Koes DR, Baumgartner MP, Camacho CJ (2013) Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J Chem Inf Model 53:1893–1904
doi: 10.1021/ci300604z pubmed: 23379370 pmcid: 3726561
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Machine Learn Res 12:2825–2830
Kohonen T (1990) The self-organizing map. Proc IEEE 78:1464–1480
doi: 10.1109/5.58325
Kaiser TM, Burger PB, Butch CJ et al (2018) A machine learning approach for predicting HIV reverse transcriptase mutation susceptibility of biologically active compounds. J Chem Inf Model 58:1544–1552
doi: 10.1021/acs.jcim.7b00475 pubmed: 29953819
Kaiser TM, Dentmon ZW, Dalloul CE et al (2020) Accelerated discovery of novel ponatinib analogs with improved properties for the treatment of parkinson’s disease. ACS Med Chem Lett 11:491–496
doi: 10.1021/acsmedchemlett.9b00612 pubmed: 32292555 pmcid: 7153011
Pribut N, Kaiser TM, Wilson RJ et al (2020) Accelerated discovery of potent fusion inhibitors for respiratory syncytial virus. ACS Infect Dis 6:922–929
doi: 10.1021/acsinfecdis.9b00524 pubmed: 32275393 pmcid: 7456560
Cox BD, Prosser AR, Sun Y et al (2015) Pyrazolo-piperidines exhibit dual inhibition of CCR5/CXCR4 HIV entry and reverse transcriptase. ACS Med Chem Lett 6:753–757
doi: 10.1021/acsmedchemlett.5b00036 pubmed: 26191361 pmcid: 4499816
Shi Q, Kaiser TM, Dentmon ZW et al (2015) Design and validation of FRESH, a drug discovery paradigm resting on robust chemical synthesis. ACS Med Chem Lett 6:518–522
doi: 10.1021/acsmedchemlett.5b00062 pubmed: 26005525 pmcid: 4434458
Lipinski CA (2004) Lead-and drug-like compounds: the rule-of-five revolution. Drug Discov Today Technol 1:337–341
doi: 10.1016/j.ddtec.2004.11.007 pubmed: 24981612
Pan Y, Huang N, Cho S, MacKerell AD (2003) Consideration of molecular weight during compound selection in virtual target-based database screening. J Chem Inf Comput Sci 43:267–272
doi: 10.1021/ci020055f pubmed: 12546562
Bajusz D, Rácz A, Héberger K (2015) Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J Chem 7:1–13
Bouysset C, Fiorucci S (2021) ProLIF: a library to encode molecular interactions as fingerprints. J Cheminform 13:72. https://doi.org/10.1186/s13321-021-00548-6
doi: 10.1186/s13321-021-00548-6 pubmed: 34563256 pmcid: 8466659
Eastman P, Swails J, Chodera JD et al (2017) OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS Comput Biol 13:e1005659
doi: 10.1371/journal.pcbi.1005659 pubmed: 28746339 pmcid: 5549999
Tuccinardi T (2021) What is the current value of MM/PBSA and MM/GBSA methods in drug discovery? Expert Opin Drug Discov 16:1233–1237. https://doi.org/10.1080/17460441.2021.1942836
doi: 10.1080/17460441.2021.1942836 pubmed: 34165011
Altae-Tran H, Ramsundar B, Pappu AS, Pande V (2017) Low data drug discovery with one-shot learning. ACS Cent Sci 3:283–293. https://doi.org/10.1021/acscentsci.6b00367
doi: 10.1021/acscentsci.6b00367 pubmed: 28470045 pmcid: 5408335
Mendez D, Gaulton A, Bento AP et al (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47:D930–D940
doi: 10.1093/nar/gky1075 pubmed: 30398643
Gabrielson SW (2018) SciFinder. J Med Libr Assoc: JMLA 106:588
doi: 10.5195/jmla.2018.515 pmcid: 6148602
Polykovskiy D, Zhebrak A, Sanchez-Lengeling B et al (2020) Molecular sets (MOSES): a benchmarking platform for molecular generation models. Front Pharmacol 11:565644
doi: 10.3389/fphar.2020.565644 pubmed: 33390943 pmcid: 7775580
Sterling T, Irwin JJ (2015) ZINC 15–ligand discovery for everyone. J Chem Inf Model 55:2324–2337
doi: 10.1021/acs.jcim.5b00559 pubmed: 26479676 pmcid: 4658288
Trott O, Olson AJ (2009) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem NA-NA. https://doi.org/10.1002/jcc.21334
doi: 10.1002/jcc.21334
Richter H, Satz AL, Bedoucha M et al (2018) DNA-encoded library-derived DDR1 inhibitor prevents fibrosis and renal function loss in a genetic mouse model of Alport syndrome. ACS Chem Biol 14:37–49
doi: 10.1021/acschembio.8b00866 pubmed: 30452219 pmcid: 6343110
Pettersen EF, Goddard TD, Huang CC et al (2004) UCSF Chimera—a visualization system for exploratory research and analysis. J Comput Chem 25:1605–1612
doi: 10.1002/jcc.20084 pubmed: 15264254
Bento AP, Hersey A, Félix E et al (2020) An open source chemical structure curation pipeline using RDKit. J Cheminform 12:1–16
doi: 10.1186/s13321-020-00456-1
Halgren TA (1996) Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J Comput Chem 17:490–519
doi: 10.1002/(SICI)1096-987X(199604)17:5/6<490::AID-JCC1>3.0.CO;2-P
O’Boyle NM, Banck M, James CA et al (2011) Open Babel: an open chemical toolbox. J Cheminform 3:1–14
Vettigli G (2022) MiniSom

Auteurs

Youjin Xiong (Y)

Department of Biomedical Engineering, Nanjing University, Nanjing, 210093, China.

Yiqing Wang (Y)

Icekredit Incorporated, Shanghai, 200120, China.

Yisheng Wang (Y)

Department of Biomedical Engineering, Nanjing University, Nanjing, 210093, China.

Chenmei Li (C)

Department of Biomedical Engineering, Nanjing University, Nanjing, 210093, China.

Peng Yusong (P)

Department of Biomedical Engineering, Nanjing University, Nanjing, 210093, China.

Junyu Wu (J)

Icekredit Incorporated, Shanghai, 200120, China.

Yiqing Wang (Y)

Department of Biomedical Engineering, Nanjing University, Nanjing, 210093, China.

Lingyun Gu (L)

Department of Information Systems Technology and Design, Singapore University of Technology and Design, Singapore, Singapore. gu_lingyun@icekredit.com.

Christopher J Butch (CJ)

Department of Biomedical Engineering, Nanjing University, Nanjing, 210093, China. chrisbutch@nju.edu.cn.

Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis
Humans Meta-Analysis as Topic Sample Size Models, Statistical Computer Simulation

Classifications MeSH