A high quality, industrial data set for binding affinity prediction: performance comparison in different early drug discovery scenarios.


Journal

Journal of computer-aided molecular design
ISSN: 1573-4951
Titre abrégé: J Comput Aided Mol Des
Pays: Netherlands
ID NLM: 8710425

Informations de publication

Date de publication:
10 2022
Historique:
received: 24 07 2022
accepted: 15 09 2022
pubmed: 25 9 2022
medline: 28 10 2022
entrez: 24 9 2022
Statut: ppublish

Résumé

We release a new, high quality data set of 1162 PDE10A inhibitors with experimentally determined binding affinities together with 77 PDE10A X-ray co-crystal structures from a Roche legacy project. This data set is used to compare the performance of different 2D- and 3D-machine learning (ML) as well as empirical scoring functions for predicting binding affinities with high throughput. We simulate use cases that are relevant in the lead optimization phase of early drug discovery. ML methods perform well at interpolation, but poorly in extrapolation scenarios-which are most relevant to a real-world application. Moreover, we find that investing into the docking workflow for binding pose generation using multi-template docking is rewarded with an improved scoring performance. A combination of 2D-ML and 3D scoring using a modified piecewise linear potential shows best overall performance, combining information on the protein environment with learning from existing SAR data.

Identifiants

pubmed: 36153472
doi: 10.1007/s10822-022-00478-x
pii: 10.1007/s10822-022-00478-x
doi:

Substances chimiques

Ligands 0
Proteins 0

Banques de données

figshare
['10.6084/m9.figshare.20055014']

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

753-765

Informations de copyright

© 2022. The Author(s), under exclusive licence to Springer Nature Switzerland AG.

Références

Kuhn B, Fuchs JE, Reutlinger M et al (2011) Rationalizing tight ligand binding through cooperative interaction networks. J Chem Inf Model 51:3180–3198. https://doi.org/10.1021/ci200319e
doi: 10.1021/ci200319e pubmed: 22087588 pmcid: 3246350
Chan L, Morris GM, Hutchison GR (2021) Understanding conformational entropy in small molecules. J Chem Theory Comput 17:2099–2106. https://doi.org/10.1021/acs.jctc.0c01213
doi: 10.1021/acs.jctc.0c01213 pubmed: 33759518
Tosstorff A, Cole JC, Taylor R et al (2020) Identification of noncompetitive protein–ligand interactions for structural optimization. J Chem Inf Model 60:6595–6611. https://doi.org/10.1021/acs.jcim.0c00858
doi: 10.1021/acs.jcim.0c00858 pubmed: 33085891
Bash PA, Singh UC, Langridge R, Kollman PA (1987) Free energy calculations by computer simulation. Science 236:564–568. https://doi.org/10.1126/science.3576184
doi: 10.1126/science.3576184 pubmed: 3576184
Hochuli J, Helbling A, Skaist T et al (2018) Visualizing convolutional neural network protein-ligand scoring. J Mol Graph Model 84:96–108. https://doi.org/10.1016/j.jmgm.2018.06.005
doi: 10.1016/j.jmgm.2018.06.005 pubmed: 29940506 pmcid: 6343664
Brown BP, Mendenhall J, Geanes AR, Meiler J (2021) General purpose structure-based drug discovery neural network score functions with human-interpretable pharmacophore maps. J Chem Inf Model 61:603–620. https://doi.org/10.1021/acs.jcim.0c01001
doi: 10.1021/acs.jcim.0c01001 pubmed: 33496578 pmcid: 7903419
Gomes J, Ramsundar B, Feinberg EN, Pande VS (2017) Atomic convolutional networks for predicting protein-ligand binding affinity. Arxiv. https://doi.org/10.48550/arXiv.1703.10603
Wang R, Fang X, Lu Y, Wang S (2004) The PDBbind database: collection of binding affinities for protein−ligand complexes with known three-dimensional structures. J Med Chem 47:2977–2980. https://doi.org/10.1021/jm030580l
doi: 10.1021/jm030580l pubmed: 15163179
Liu T, Lin Y, Wen X et al (2007) BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res 35:D198–D201. https://doi.org/10.1093/nar/gkl999
doi: 10.1093/nar/gkl999 pubmed: 17145705
Affinity Data PDB code 2W9H. http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_pdbids.jsp?pdbids_submit=Search&pdbids=2W9H . Accessed 26 Jan 2022
Mysinger MM, Carchia M, John JI, Shoichet BK (2012) Directory of Useful Decoys, Enhanced (DUD-E): better ligands and decoys for better benchmarking. J Med Chem 55:6582–6594. https://doi.org/10.1021/jm300687e
doi: 10.1021/jm300687e pubmed: 22716043 pmcid: 3405771
Rohrer SG, Baumann K (2009) Maximum Unbiased Validation (MUV) data sets for virtual screening based on pubchem bioactivity data. J Chem Inf Model 49:169–184. https://doi.org/10.1021/ci8002649
doi: 10.1021/ci8002649 pubmed: 19434821
Tran-Nguyen V-K, Jacquemard C, Rognan D (2020) LIT-PCBA: an unbiased data set for machine learning and virtual screening. J Chem Inf Model 60:4263–4273. https://doi.org/10.1021/acs.jcim.0c00155
doi: 10.1021/acs.jcim.0c00155 pubmed: 32282202
Chappie TA, Helal CJ, Hou X (2012) Current landscape of phosphodiesterase 10A (PDE10A) inhibition. J Med Chem 55:7299–7331. https://doi.org/10.1021/jm3004976
doi: 10.1021/jm3004976 pubmed: 22834877
Jones G, Willett P, Glen RC et al (1997) Development and validation of a genetic algorithm for flexible docking. J Mol Biol 267:727–748. https://doi.org/10.1006/jmbi.1996.0897
doi: 10.1006/jmbi.1996.0897 pubmed: 9126849
Korb O, Stützle T, Exner TE (2009) Empirical scoring functions for advanced protein−ligand docking with PLANTS. J Chem Inf Model 49:84–96. https://doi.org/10.1021/ci800298z
doi: 10.1021/ci800298z pubmed: 19125657
Tosstorff A, Cole JC, Bartelt R, Kuhn B (2021) Augmenting structure-based design with experimental protein-ligand interaction data: molecular recognition, interactive visualization, and rescoring. Chem Med Chem 16:3428–3438. https://doi.org/10.1002/cmdc.202100387
doi: 10.1002/cmdc.202100387 pubmed: 34342128
Feinberg EN, Sur D, Wu Z et al (2018) PotentialNet for molecular property prediction. ACS Central Sci 4:1520–1530. https://doi.org/10.1021/acscentsci.8b00507
doi: 10.1021/acscentsci.8b00507
Xiong Z, Wang D, Liu X et al (2020) Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J Med Chem 63:8749–8760. https://doi.org/10.1021/acs.jmedchem.9b00959
doi: 10.1021/acs.jmedchem.9b00959 pubmed: 31408336
Stumpfe D, Hu H, Bajorath J (2019) Evolving concept of activity cliffs. ACS Omega 4:14360–14368. https://doi.org/10.1021/acsomega.9b02221
doi: 10.1021/acsomega.9b02221 pubmed: 31528788 pmcid: 6740043
Thomas M, Smith RT, O’Boyle NM et al (2021) Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study. J Cheminformatics 13:39. https://doi.org/10.1186/s13321-021-00516-0
doi: 10.1186/s13321-021-00516-0
Wang L, Chambers J, Abel R (2019) Biomolecular simulations, methods and protocols. Methods Mol Biol 2022:201–232. https://doi.org/10.1007/978-1-4939-9608-7_9
doi: 10.1007/978-1-4939-9608-7_9 pubmed: 31396905
Yung-Chi C, Prusoff WH (1973) Relationship between the inhibition constant (KI) and the concentration of inhibitor which causes 50 per cent inhibition (IC ) of an enzymatic reaction. Biochem Pharmacol 22:3099–3108. https://doi.org/10.1016/0006-2952(73)90196-2
doi: 10.1016/0006-2952(73)90196-2
Proasis. Desert Scientific Software, Sydney, Australia
Landrum G RDKit. https://doi.org/10.5281/zenodo.5589557
Morgan HL (1965) The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J Chem Doc 5:107–113. https://doi.org/10.1021/c160017a018
doi: 10.1021/c160017a018
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Groom CR, Bruno IJ, Lightfoot MP, Ward SC (2016) The Cambridge Structural database. Acta Crystallogr Sect B Struct Sci Cryst Eng Mater 72:171–179. https://doi.org/10.1107/s2052520616003954
doi: 10.1107/S2052520616003954
Hawkins PCD, Skillman AG, Warren GL et al (2010) Conformer generation with OMEGA: algorithm and validation using high quality structures from the protein databank and cambridge structural database. J Chem Inf Model 50:572–584. https://doi.org/10.1021/ci100031x
doi: 10.1021/ci100031x pubmed: 20235588 pmcid: 2859685
Cruciani G, Milletti F, Storchi L et al (2009) In silico pKa prediction and ADME profiling. Chem Biodivers 6:1812–1821. https://doi.org/10.1002/cbdv.200900153
doi: 10.1002/cbdv.200900153 pubmed: 19937818
Virtanen P, Gommers R, Oliphant TE et al (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17:261–272. https://doi.org/10.1038/s41592-019-0686-2
doi: 10.1038/s41592-019-0686-2 pubmed: 32015543 pmcid: 7056644
O’Boyle NM, Brewerton SC, Taylor R (2008) Using buriedness to improve discrimination between actives and inactives in docking. J Chem Inf Model 48:1269–1278. https://doi.org/10.1021/ci8000452
doi: 10.1021/ci8000452 pubmed: 18533645
Eldridge MD, Murray CW, Auton TR et al (1997) Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput Aid Mol Des 11:425–445. https://doi.org/10.1023/a:1007996124545
doi: 10.1023/A:1007996124545
Li M, Zhou J, Hu J et al (2021) DGL-lifesci: an open-source toolkit for deep learning on graphs in life science. ACS Omega 6:27233–27238. https://doi.org/10.1021/acsomega.1c04017
doi: 10.1021/acsomega.1c04017 pubmed: 34693143 pmcid: 8529678
Cleves AE, Johnson SR, Jain AN (2021) Synergy and complementarity between focused machine learning and physics-based simulation in affinity prediction. J Chem Inf Model 61:5948–5966. https://doi.org/10.1021/acs.jcim.1c01382
doi: 10.1021/acs.jcim.1c01382 pubmed: 34890185
Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9:90–95. https://doi.org/10.1109/mcse.2007.55
doi: 10.1109/MCSE.2007.55
Waskom ML (2021) Seaborn: statistical data visualization. J Open Source Softw 6:3021. https://doi.org/10.21105/joss.03021
doi: 10.21105/joss.03021
Schrödinger L The PyMOL Molecular Graphics System, Version~1.8
Kabsch W (2010) XDS. Acta Crystallogr Sect D Biol Crystallogr 66:125–132. https://doi.org/10.1107/s0907444909047337
doi: 10.1107/S0907444909047337
McCoy AJ, Grosse-Kunstleve RW, Adams PD et al (2007) Phaser crystallographic software. J Appl Crystallogr 40:658–674. https://doi.org/10.1107/s0021889807021206
doi: 10.1107/S0021889807021206 pubmed: 19461840 pmcid: 2483472
Emsley P, Lohkamp B, Scott WG, Cowtan K (2010) Features and development of coot. Acta Crystallogr Sect D Biol Crystallogr 66:486–501. https://doi.org/10.1107/s0907444910007493
doi: 10.1107/S0907444910007493
Murshudov GN, Skubák P, Lebedev AA et al (2011) REFMAC5 for the refinement of macromolecular crystal structures. Acta Crystallogr Sect D Biol Crystallogr 67:355–367. https://doi.org/10.1107/s0907444911001314
doi: 10.1107/S0907444911001314

Auteurs

Andreas Tosstorff (A)

Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland. a.tosstorff@gmail.com.

Markus G Rudolph (MG)

Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland.

Jason C Cole (JC)

Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, CB2 1EZ, UK.

Michael Reutlinger (M)

Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland.

Christian Kramer (C)

Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland.

Hervé Schaffhauser (H)

Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland.

Agnès Nilly (A)

Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland.

Alexander Flohr (A)

Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland.

Bernd Kuhn (B)

Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland.

Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Animals Hemiptera Insect Proteins Phylogeny Insecticides

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis

Classifications MeSH