Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study.

AI Artificial Intelligence De novo design Deep learning Generative models LBDD Ligand-based drug design Molecular docking QSAR Quantitative structure–activity relationship Recurrent neural network Reinforcement learning SBDD Structure-based drug design

Journal

Journal of cheminformatics
ISSN: 1758-2946
Titre abrégé: J Cheminform
Pays: England
ID NLM: 101516718

Informations de publication

Date de publication:
13 May 2021
Historique:
received: 02 03 2021
accepted: 02 05 2021
entrez: 14 5 2021
pubmed: 15 5 2021
medline: 15 5 2021
Statut: epublish

Résumé

Deep generative models have shown the ability to devise both valid and novel chemistry, which could significantly accelerate the identification of bioactive compounds. Many current models, however, use molecular descriptors or ligand-based predictive methods to guide molecule generation towards a desirable property space. This restricts their application to relatively data-rich targets, neglecting those where little data is available to sufficiently train a predictor. Moreover, ligand-based approaches often bias molecule generation towards previously established chemical space, thereby limiting their ability to identify truly novel chemotypes. In this work, we assess the ability of using molecular docking via Glide-a structure-based approach-as a scoring function to guide the deep generative model REINVENT and compare model performance and behaviour to a ligand-based scoring function. Additionally, we modify the previously published MOSES benchmarking dataset to remove any induced bias towards non-protonatable groups. We also propose a new metric to measure dataset diversity, which is less confounded by the distribution of heavy atom count than the commonly used internal diversity metric. With respect to the main findings, we found that when optimizing the docking score against DRD2, the model improves predicted ligand affinity beyond that of known DRD2 active molecules. In addition, generated molecules occupy complementary chemical and physicochemical space compared to the ligand-based approach, and novel physicochemical space compared to known DRD2 active molecules. Furthermore, the structure-based approach learns to generate molecules that satisfy crucial residue interactions, which is information only available when taking protein structure into account. Overall, this work demonstrates the advantage of using molecular docking to guide de novo molecule generation over ligand-based predictors with respect to predicted affinity, novelty, and the ability to identify key interactions between ligand and protein target. Practically, this approach has applications in early hit generation campaigns to enrich a virtual library towards a particular target, and also in novelty-focused projects, where de novo molecule generation either has no prior ligand knowledge available or should not be biased by it.

Identifiants

pubmed: 33985583
doi: 10.1186/s13321-021-00516-0
pii: 10.1186/s13321-021-00516-0
pmc: PMC8117600
doi:

Types de publication

Journal Article

Langues

eng

Pagination

39

Références

J Chem Inf Model. 2019 Mar 25;59(3):1096-1108
pubmed: 30887799
J Cheminform. 2019 Dec 3;11(1):74
pubmed: 33430938
J Chem Inf Comput Sci. 2003 Jan-Feb;43(1):317-23
pubmed: 12546567
J Med Chem. 2012 Jul 26;55(14):6582-94
pubmed: 22716043
J Cheminform. 2019 Mar 12;11(1):20
pubmed: 30868314
J Chem Inf Model. 2020 Sep 28;60(9):4311-4325
pubmed: 32484669
J Comput Chem. 2011 Jan 15;32(1):81-98
pubmed: 20607693
J Chem Inf Comput Sci. 2004 Nov-Dec;44(6):1912-28
pubmed: 15554660
Cell. 2020 Feb 20;180(4):688-702.e13
pubmed: 32084340
Expert Opin Drug Metab Toxicol. 2005 Jun;1(1):91-142
pubmed: 16922655
J Cheminform. 2020 Nov 10;12(1):68
pubmed: 33292554
Nat Chem. 2012 Jan 24;4(2):90-8
pubmed: 22270643
ACS Cent Sci. 2018 Feb 28;4(2):268-276
pubmed: 29532027
J Chem Inf Model. 2012 Nov 26;52(11):2864-75
pubmed: 23088335
J Chem Inf Comput Sci. 2003 May-Jun;43(3):987-1003
pubmed: 12767158
Chem Sci. 2019 Jul 8;10(34):8016-8024
pubmed: 31853357
Nature. 2020 Dec;588(7837):203-204
pubmed: 33257889
Nucleic Acids Res. 2019 Jan 8;47(D1):D1102-D1109
pubmed: 30371825
Cell. 2020 Apr 2;181(1):81-91
pubmed: 32243800
J Chem Inf Model. 2015 Mar 23;55(3):483-94
pubmed: 25760829
Sci Rep. 2019 Jul 24;9(1):10752
pubmed: 31341196
J Chem Inf Model. 2018 Sep 24;58(9):1736-1741
pubmed: 30118593
J Med Chem. 2010 Apr 8;53(7):2719-40
pubmed: 20131845
J Chem Inf Model. 2019 Jul 22;59(7):3166-3176
pubmed: 31273995
J Comput Aided Mol Des. 2013 Mar;27(3):221-34
pubmed: 23579614
Nat Biotechnol. 2020 Feb;38(2):143-145
pubmed: 32001834
J Chem Inf Model. 2015 Dec 28;55(12):2562-74
pubmed: 26575315
Drug Discov Today. 2020 Dec;25(12):2174-2181
pubmed: 33010477
J Mol Biol. 1997 Apr 4;267(3):727-48
pubmed: 9126849
J Chem Inf Model. 2020 Dec 28;60(12):5714-5723
pubmed: 32250616
J Med Chem. 2014 Apr 24;57(8):3450-63
pubmed: 24666157
ChemMedChem. 2016 Apr 5;11(7):718-29
pubmed: 26990027
J Cheminform. 2017 Sep 4;9(1):48
pubmed: 29086083
Nucleic Acids Res. 2000 Jan 1;28(1):235-42
pubmed: 10592235
J Cheminform. 2017 Mar 7;9:17
pubmed: 28316655
J Med Chem. 2004 Mar 25;47(7):1739-49
pubmed: 15027865
J Cheminform. 2020 Jun 8;12(1):42
pubmed: 33430983
Nat Commun. 2020 Jan 3;11(1):10
pubmed: 31900408
J Comput Aided Mol Des. 2008 Mar-Apr;22(3-4):161-8
pubmed: 18183356
J Med Chem. 1999 Jul 29;42(15):2887-900
pubmed: 10425098
Trends Pharmacol Sci. 2019 Nov;40(11):806-809
pubmed: 31629547
J Chem Inf Model. 2013 Aug 26;53(8):1893-904
pubmed: 23379370
J Chem Theory Comput. 2019 Mar 12;15(3):1863-1874
pubmed: 30768902
J Chem Inf Model. 2020 Dec 28;60(12):5658-5666
pubmed: 32986426
J Chem Inf Model. 2019 Feb 25;59(2):895-913
pubmed: 30481020
Nature. 2018 Mar 8;555(7695):269-273
pubmed: 29466326
J Comput Chem. 2010 Jan 30;31(2):455-61
pubmed: 19499576
J Am Chem Soc. 2009 Jul 1;131(25):8732-3
pubmed: 19505099
Curr Drug Metab. 2005 Jun;6(3):161-225
pubmed: 15975040
Nucleic Acids Res. 2019 Jan 8;47(D1):D930-D940
pubmed: 30398643
Sci Adv. 2018 Jul 25;4(7):eaap7885
pubmed: 30050984
J Chem Inf Model. 2018 May 29;58(5):916-932
pubmed: 29698607
J Med Chem. 1996 Jul 19;39(15):2887-93
pubmed: 8709122
Drug Discov Today Technol. 2019 Dec;32-33:55-63
pubmed: 33386095
Sci Rep. 2016 Jun 24;6:28288
pubmed: 27339552
J Chem Inf Model. 2007 Jul-Aug;47(4):1564-71
pubmed: 17552493
J Chem Inf Model. 2015 Nov 23;55(11):2324-37
pubmed: 26479676
J Cheminform. 2009 Jun 10;1(1):8
pubmed: 20298526
J Med Chem. 2004 Jan 15;47(2):337-44
pubmed: 14711306
Nat Rev Drug Discov. 2004 Nov;3(11):935-49
pubmed: 15520816
J Comput Aided Mol Des. 2007 Dec;21(12):681-91
pubmed: 17899391
Nature. 2006 Mar 23;440(7083):463-9
pubmed: 16554806
ACS Cent Sci. 2018 Jan 24;4(1):120-131
pubmed: 29392184
Nat Rev Drug Discov. 2017 Dec;16(12):829-842
pubmed: 29075003
J Comput Aided Mol Des. 2008 Mar-Apr;22(3-4):147-59
pubmed: 18074107
J Chem Theory Comput. 2011 Jul 12;7(7):2284-95
pubmed: 26606496
ACS Omega. 2020 Dec 15;5(51):32984-32994
pubmed: 33403260
Front Pharmacol. 2020 Dec 18;11:565644
pubmed: 33390943
J Chem Inf Model. 2020 Dec 28;60(12):5699-5713
pubmed: 32659085
J Chem Inf Comput Sci. 2003 Jan-Feb;43(1):267-72
pubmed: 12546562
Curr Opin Pharmacol. 2016 Oct;30:59-68
pubmed: 27479316
Future Med Chem. 2016 Oct;8(15):1825-1839
pubmed: 27643715
Nat Biotechnol. 2019 Sep;37(9):1038-1040
pubmed: 31477924
Methods. 2015 Jan;71:77-84
pubmed: 25220914

Auteurs

Morgan Thomas (M)

Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.

Robert T Smith (RT)

Computational Chemistry, Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK.

Noel M O'Boyle (NM)

Computational Chemistry, Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK.

Chris de Graaf (C)

Computational Chemistry, Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK. chris.degraaf@soseiheptares.com.

Andreas Bender (A)

Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK. ab454@cam.ac.uk.

Classifications MeSH