A high quality, industrial data set for binding affinity prediction: performance comparison in different early drug discovery scenarios.
Docking
Drug design
Lead optimization
Machine learning
Journal
Journal of computer-aided molecular design
ISSN: 1573-4951
Titre abrégé: J Comput Aided Mol Des
Pays: Netherlands
ID NLM: 8710425
Informations de publication
Date de publication:
10 2022
10 2022
Historique:
received:
24
07
2022
accepted:
15
09
2022
pubmed:
25
9
2022
medline:
28
10
2022
entrez:
24
9
2022
Statut:
ppublish
Résumé
We release a new, high quality data set of 1162 PDE10A inhibitors with experimentally determined binding affinities together with 77 PDE10A X-ray co-crystal structures from a Roche legacy project. This data set is used to compare the performance of different 2D- and 3D-machine learning (ML) as well as empirical scoring functions for predicting binding affinities with high throughput. We simulate use cases that are relevant in the lead optimization phase of early drug discovery. ML methods perform well at interpolation, but poorly in extrapolation scenarios-which are most relevant to a real-world application. Moreover, we find that investing into the docking workflow for binding pose generation using multi-template docking is rewarded with an improved scoring performance. A combination of 2D-ML and 3D scoring using a modified piecewise linear potential shows best overall performance, combining information on the protein environment with learning from existing SAR data.
Identifiants
pubmed: 36153472
doi: 10.1007/s10822-022-00478-x
pii: 10.1007/s10822-022-00478-x
doi:
Substances chimiques
Ligands
0
Proteins
0
Banques de données
figshare
['10.6084/m9.figshare.20055014']
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
753-765Informations de copyright
© 2022. The Author(s), under exclusive licence to Springer Nature Switzerland AG.
Références
Kuhn B, Fuchs JE, Reutlinger M et al (2011) Rationalizing tight ligand binding through cooperative interaction networks. J Chem Inf Model 51:3180–3198. https://doi.org/10.1021/ci200319e
doi: 10.1021/ci200319e
pubmed: 22087588
pmcid: 3246350
Chan L, Morris GM, Hutchison GR (2021) Understanding conformational entropy in small molecules. J Chem Theory Comput 17:2099–2106. https://doi.org/10.1021/acs.jctc.0c01213
doi: 10.1021/acs.jctc.0c01213
pubmed: 33759518
Tosstorff A, Cole JC, Taylor R et al (2020) Identification of noncompetitive protein–ligand interactions for structural optimization. J Chem Inf Model 60:6595–6611. https://doi.org/10.1021/acs.jcim.0c00858
doi: 10.1021/acs.jcim.0c00858
pubmed: 33085891
Bash PA, Singh UC, Langridge R, Kollman PA (1987) Free energy calculations by computer simulation. Science 236:564–568. https://doi.org/10.1126/science.3576184
doi: 10.1126/science.3576184
pubmed: 3576184
Hochuli J, Helbling A, Skaist T et al (2018) Visualizing convolutional neural network protein-ligand scoring. J Mol Graph Model 84:96–108. https://doi.org/10.1016/j.jmgm.2018.06.005
doi: 10.1016/j.jmgm.2018.06.005
pubmed: 29940506
pmcid: 6343664
Brown BP, Mendenhall J, Geanes AR, Meiler J (2021) General purpose structure-based drug discovery neural network score functions with human-interpretable pharmacophore maps. J Chem Inf Model 61:603–620. https://doi.org/10.1021/acs.jcim.0c01001
doi: 10.1021/acs.jcim.0c01001
pubmed: 33496578
pmcid: 7903419
Gomes J, Ramsundar B, Feinberg EN, Pande VS (2017) Atomic convolutional networks for predicting protein-ligand binding affinity. Arxiv. https://doi.org/10.48550/arXiv.1703.10603
Wang R, Fang X, Lu Y, Wang S (2004) The PDBbind database: collection of binding affinities for protein−ligand complexes with known three-dimensional structures. J Med Chem 47:2977–2980. https://doi.org/10.1021/jm030580l
doi: 10.1021/jm030580l
pubmed: 15163179
Liu T, Lin Y, Wen X et al (2007) BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res 35:D198–D201. https://doi.org/10.1093/nar/gkl999
doi: 10.1093/nar/gkl999
pubmed: 17145705
Affinity Data PDB code 2W9H. http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_pdbids.jsp?pdbids_submit=Search&pdbids=2W9H . Accessed 26 Jan 2022
Mysinger MM, Carchia M, John JI, Shoichet BK (2012) Directory of Useful Decoys, Enhanced (DUD-E): better ligands and decoys for better benchmarking. J Med Chem 55:6582–6594. https://doi.org/10.1021/jm300687e
doi: 10.1021/jm300687e
pubmed: 22716043
pmcid: 3405771
Rohrer SG, Baumann K (2009) Maximum Unbiased Validation (MUV) data sets for virtual screening based on pubchem bioactivity data. J Chem Inf Model 49:169–184. https://doi.org/10.1021/ci8002649
doi: 10.1021/ci8002649
pubmed: 19434821
Tran-Nguyen V-K, Jacquemard C, Rognan D (2020) LIT-PCBA: an unbiased data set for machine learning and virtual screening. J Chem Inf Model 60:4263–4273. https://doi.org/10.1021/acs.jcim.0c00155
doi: 10.1021/acs.jcim.0c00155
pubmed: 32282202
Chappie TA, Helal CJ, Hou X (2012) Current landscape of phosphodiesterase 10A (PDE10A) inhibition. J Med Chem 55:7299–7331. https://doi.org/10.1021/jm3004976
doi: 10.1021/jm3004976
pubmed: 22834877
Jones G, Willett P, Glen RC et al (1997) Development and validation of a genetic algorithm for flexible docking. J Mol Biol 267:727–748. https://doi.org/10.1006/jmbi.1996.0897
doi: 10.1006/jmbi.1996.0897
pubmed: 9126849
Korb O, Stützle T, Exner TE (2009) Empirical scoring functions for advanced protein−ligand docking with PLANTS. J Chem Inf Model 49:84–96. https://doi.org/10.1021/ci800298z
doi: 10.1021/ci800298z
pubmed: 19125657
Tosstorff A, Cole JC, Bartelt R, Kuhn B (2021) Augmenting structure-based design with experimental protein-ligand interaction data: molecular recognition, interactive visualization, and rescoring. Chem Med Chem 16:3428–3438. https://doi.org/10.1002/cmdc.202100387
doi: 10.1002/cmdc.202100387
pubmed: 34342128
Feinberg EN, Sur D, Wu Z et al (2018) PotentialNet for molecular property prediction. ACS Central Sci 4:1520–1530. https://doi.org/10.1021/acscentsci.8b00507
doi: 10.1021/acscentsci.8b00507
Xiong Z, Wang D, Liu X et al (2020) Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J Med Chem 63:8749–8760. https://doi.org/10.1021/acs.jmedchem.9b00959
doi: 10.1021/acs.jmedchem.9b00959
pubmed: 31408336
Stumpfe D, Hu H, Bajorath J (2019) Evolving concept of activity cliffs. ACS Omega 4:14360–14368. https://doi.org/10.1021/acsomega.9b02221
doi: 10.1021/acsomega.9b02221
pubmed: 31528788
pmcid: 6740043
Thomas M, Smith RT, O’Boyle NM et al (2021) Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study. J Cheminformatics 13:39. https://doi.org/10.1186/s13321-021-00516-0
doi: 10.1186/s13321-021-00516-0
Wang L, Chambers J, Abel R (2019) Biomolecular simulations, methods and protocols. Methods Mol Biol 2022:201–232. https://doi.org/10.1007/978-1-4939-9608-7_9
doi: 10.1007/978-1-4939-9608-7_9
pubmed: 31396905
Yung-Chi C, Prusoff WH (1973) Relationship between the inhibition constant (KI) and the concentration of inhibitor which causes 50 per cent inhibition (IC ) of an enzymatic reaction. Biochem Pharmacol 22:3099–3108. https://doi.org/10.1016/0006-2952(73)90196-2
doi: 10.1016/0006-2952(73)90196-2
Proasis. Desert Scientific Software, Sydney, Australia
Landrum G RDKit. https://doi.org/10.5281/zenodo.5589557
Morgan HL (1965) The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J Chem Doc 5:107–113. https://doi.org/10.1021/c160017a018
doi: 10.1021/c160017a018
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Groom CR, Bruno IJ, Lightfoot MP, Ward SC (2016) The Cambridge Structural database. Acta Crystallogr Sect B Struct Sci Cryst Eng Mater 72:171–179. https://doi.org/10.1107/s2052520616003954
doi: 10.1107/S2052520616003954
Hawkins PCD, Skillman AG, Warren GL et al (2010) Conformer generation with OMEGA: algorithm and validation using high quality structures from the protein databank and cambridge structural database. J Chem Inf Model 50:572–584. https://doi.org/10.1021/ci100031x
doi: 10.1021/ci100031x
pubmed: 20235588
pmcid: 2859685
Cruciani G, Milletti F, Storchi L et al (2009) In silico pKa prediction and ADME profiling. Chem Biodivers 6:1812–1821. https://doi.org/10.1002/cbdv.200900153
doi: 10.1002/cbdv.200900153
pubmed: 19937818
Virtanen P, Gommers R, Oliphant TE et al (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17:261–272. https://doi.org/10.1038/s41592-019-0686-2
doi: 10.1038/s41592-019-0686-2
pubmed: 32015543
pmcid: 7056644
O’Boyle NM, Brewerton SC, Taylor R (2008) Using buriedness to improve discrimination between actives and inactives in docking. J Chem Inf Model 48:1269–1278. https://doi.org/10.1021/ci8000452
doi: 10.1021/ci8000452
pubmed: 18533645
Eldridge MD, Murray CW, Auton TR et al (1997) Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput Aid Mol Des 11:425–445. https://doi.org/10.1023/a:1007996124545
doi: 10.1023/A:1007996124545
Li M, Zhou J, Hu J et al (2021) DGL-lifesci: an open-source toolkit for deep learning on graphs in life science. ACS Omega 6:27233–27238. https://doi.org/10.1021/acsomega.1c04017
doi: 10.1021/acsomega.1c04017
pubmed: 34693143
pmcid: 8529678
Cleves AE, Johnson SR, Jain AN (2021) Synergy and complementarity between focused machine learning and physics-based simulation in affinity prediction. J Chem Inf Model 61:5948–5966. https://doi.org/10.1021/acs.jcim.1c01382
doi: 10.1021/acs.jcim.1c01382
pubmed: 34890185
Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9:90–95. https://doi.org/10.1109/mcse.2007.55
doi: 10.1109/MCSE.2007.55
Waskom ML (2021) Seaborn: statistical data visualization. J Open Source Softw 6:3021. https://doi.org/10.21105/joss.03021
doi: 10.21105/joss.03021
Schrödinger L The PyMOL Molecular Graphics System, Version~1.8
Kabsch W (2010) XDS. Acta Crystallogr Sect D Biol Crystallogr 66:125–132. https://doi.org/10.1107/s0907444909047337
doi: 10.1107/S0907444909047337
McCoy AJ, Grosse-Kunstleve RW, Adams PD et al (2007) Phaser crystallographic software. J Appl Crystallogr 40:658–674. https://doi.org/10.1107/s0021889807021206
doi: 10.1107/S0021889807021206
pubmed: 19461840
pmcid: 2483472
Emsley P, Lohkamp B, Scott WG, Cowtan K (2010) Features and development of coot. Acta Crystallogr Sect D Biol Crystallogr 66:486–501. https://doi.org/10.1107/s0907444910007493
doi: 10.1107/S0907444910007493
Murshudov GN, Skubák P, Lebedev AA et al (2011) REFMAC5 for the refinement of macromolecular crystal structures. Acta Crystallogr Sect D Biol Crystallogr 67:355–367. https://doi.org/10.1107/s0907444911001314
doi: 10.1107/S0907444911001314