A practical guide to machine-learning scoring for structure-based virtual screening.
Journal
Nature protocols
ISSN: 1750-2799
Titre abrégé: Nat Protoc
Pays: England
ID NLM: 101284307
Informations de publication
Date de publication:
Nov 2023
Nov 2023
Historique:
received:
08
02
2022
accepted:
03
07
2023
medline:
8
11
2023
pubmed:
17
10
2023
entrez:
16
10
2023
Statut:
ppublish
Résumé
Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol , can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.
Identifiants
pubmed: 37845361
doi: 10.1038/s41596-023-00885-w
pii: 10.1038/s41596-023-00885-w
doi:
Substances chimiques
Acetylcholinesterase
EC 3.1.1.7
Ligands
0
Types de publication
Journal Article
Review
Langues
eng
Sous-ensembles de citation
IM
Pagination
3460-3511Informations de copyright
© 2023. Springer Nature Limited.
Références
Pereira, D. A. & Williams, J. A. Origin and evolution of high throughput screening. Br. J. Pharmacol. 152, 53–61 (2007).
pubmed: 17603542
pmcid: 1978279
doi: 10.1038/sj.bjp.0707373
Wang, Y., Cheng, T. & Bryant, S. H. PubChem BioAssay: a decade’s development toward open high-throughput screening data sharing. SLAS Discov. 22, 655–666 (2017).
pubmed: 28346087
pmcid: 5480605
doi: 10.1177/2472555216685069
Payne, D. J., Gwynn, M. N., Holmes, D. J. & Pompliano, D. L. Drugs for bad bugs: confronting the challenges of antibacterial discovery. Nat. Rev. Drug Discov. 6, 29–40 (2007).
pubmed: 17159923
doi: 10.1038/nrd2201
Heifetz, A., Southey, M., Morao, I., Townsend-Nicholson, A. & Bodkin, M. J. Computational methods used in hit-to-lead and lead optimization stages of structure-based drug discovery. Methods Mol. Biol. 1705, 375–394 (2018).
pubmed: 29188574
doi: 10.1007/978-1-4939-7465-8_19
Jorgensen, W. L. Efficient drug lead discovery and optimization. Acc. Chem. Res. 42, 724–733 (2009).
pubmed: 19317443
pmcid: 2727934
doi: 10.1021/ar800236t
Gloriam, D. E. Bigger is better in virtual drug screens. Nature 566, 193–194 (2019).
pubmed: 30737502
doi: 10.1038/d41586-019-00145-6
Jia, C.-Y., Li, J.-Y., Hao, G.-F. & Yang, G.-F. A drug-likeness toolbox facilitates ADMET study in drug discovery. Drug Discov. Today 25, 248–258 (2020).
pubmed: 31705979
doi: 10.1016/j.drudis.2019.10.014
Göller, A. H. et al. Bayer’s in silico ADMET platform: a journey of machine learning over the past two decades. Drug Discov. Today 25, 1702–1709 (2020).
pubmed: 32652309
doi: 10.1016/j.drudis.2020.07.001
Grygorenko, O. O. et al. Generating multibillion chemical space of readily accessible screening compounds. iScience 23, 101681 (2020).
pubmed: 33145486
pmcid: 7593547
doi: 10.1016/j.isci.2020.101681
Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224–229 (2019).
pubmed: 30728502
pmcid: 6383769
doi: 10.1038/s41586-019-0917-9
Gorgulla, C. et al. An open-source drug discovery platform enables ultra-large virtual screens. Nature 580, 663–668 (2020).
pubmed: 32152607
pmcid: 8352709
doi: 10.1038/s41586-020-2117-z
Stein, R. M. et al. Virtual discovery of melatonin receptor ligands to modulate circadian rhythms. Nature 579, 609–614 (2020).
pubmed: 32040955
pmcid: 7134359
doi: 10.1038/s41586-020-2027-0
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702 (2020).
pubmed: 32084340
pmcid: 8349178
doi: 10.1016/j.cell.2020.01.021
Gorgulla, C. et al. A multi-pronged approach targeting SARS-CoV-2 proteins using ultra-large virtual screening. iScience 24, 102021 (2021).
pubmed: 33426509
pmcid: 7783459
doi: 10.1016/j.isci.2020.102021
Luttens, A. et al. Ultralarge virtual screening identifies SARS-CoV-2 main protease inhibitors with broad-spectrum activity against coronaviruses. J. Am. Chem. Soc. 144, 2905–2920 (2022).
pubmed: 35142215
pmcid: 8848513
doi: 10.1021/jacs.1c08402
Crunkhorn, S. Screening ultra-large virtual libraries. Nat. Rev. Drug Discov. 21, 95 (2022).
pubmed: 34987228
doi: 10.1038/d41573-022-00002-8
Fresnais, L. & Ballester, P. J. The impact of compound library size on the performance of scoring functions for structure-based virtual screening. Brief. Bioinform. 22, bbaa095 (2021).
pubmed: 32568385
doi: 10.1093/bib/bbaa095
Koes, D. R., Baumgartner, M. P. & Camacho, C. J. Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J. Chem. Inf. Model. 53, 1893–1904 (2013).
pubmed: 23379370
pmcid: 3726561
doi: 10.1021/ci300604z
Bender, B. J. et al. A practical guide to large-scale docking. Nat. Protoc. 16, 4799–4832 (2021).
pubmed: 34561691
pmcid: 8522653
doi: 10.1038/s41596-021-00597-z
Ain, Q. U., Aleksandrova, A., Roessler, F. D. & Ballester, P. J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdiscip. Rev. Comput. Mol. Sci. 5, 405–424 (2015).
pubmed: 27110292
pmcid: 4832270
doi: 10.1002/wcms.1225
Ballester, P. J. & Mitchell, J. B. O. A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26, 1169–1175 (2010).
pubmed: 20236947
doi: 10.1093/bioinformatics/btq112
Xiong, G.-L. et al. Improving structure-based virtual screening performance via learning from scoring function components. Brief. Bioinform. 22, bbaa094 (2021).
pubmed: 32496540
doi: 10.1093/bib/bbaa094
Li, H., Sze, K.-H., Lu, G. & Ballester, P. J. Machine-learning scoring functions for structure-based virtual screening. Wiley Interdiscip. Rev. Comput. Mol. Sci. 11, e1478 (2021).
doi: 10.1002/wcms.1478
Adeshina, Y. O., Deeds, E. J. & Karanicolas, J. Machine learning classification can reduce false positives in structure-based virtual screening. Proc. Natl Acad. Sci. USA 117, 18477–18488 (2020).
pubmed: 32669436
pmcid: 7414157
doi: 10.1073/pnas.2000585117
Nguyen, D. D. et al. Mathematical deep learning for pose and binding affinity prediction and ranking in D3R Grand Challenges. J. Comput. Aided Mol. Des. 33, 71–82 (2019).
pubmed: 30116918
doi: 10.1007/s10822-018-0146-6
Nguyen, D. D., Gao, K., Wang, M. & Wei, G. W. MathDL: mathematical deep learning for D3R Grand Challenge 4. J. Comput. Aided Mol. Des. 34, 131–147 (2020).
pubmed: 31734815
doi: 10.1007/s10822-019-00237-5
Li, H., Sze, K.-H., Lu, G. & Ballester, P. J. Machine-learning scoring functions for structure-based drug lead optimization. Wiley Interdiscip. Rev. Comput. Mol. Sci. 10, e1465 (2020).
doi: 10.1002/wcms.1465
Li, H. et al. Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data. Bioinformatics 35, 3989–3995 (2019).
pubmed: 30873528
doi: 10.1093/bioinformatics/btz183
Meng, Z. & Xia, K. Persistent spectral–based machine learning (PerSpect ML) for protein-ligand binding affinity prediction. Sci. Adv. 7, eabc5329 (2021).
pubmed: 33962954
pmcid: 8104863
doi: 10.1126/sciadv.abc5329
Shen, C. et al. From machine learning to deep learning: advances in scoring functions for protein–ligand docking. Wiley Interdiscip. Rev. Comput. Mol. Sci. 10, e1429 (2020).
doi: 10.1002/wcms.1429
Jiménez-Luna, J. et al. DeltaDelta neural networks for lead optimization of small molecule potency. Chem. Sci. 10, 10911–10918 (2019).
pubmed: 32190246
pmcid: 7066671
doi: 10.1039/C9SC04606B
Sánchez-Cruz, N., Medina-Franco, J. L., Mestres, J. & Barril, X. Extended connectivity interaction features: improving binding affinity prediction through chemical description. Bioinformatics 37, 1376–1382 (2021).
pubmed: 33226061
doi: 10.1093/bioinformatics/btaa982
Boyles, F., Deane, C. M. & Morris, G. M. Learning from docked ligands: ligand-based features rescue structure-based scoring functions when trained on docked poses. J. Chem. Inf. Model. 62, 5329–5341 (2022).
pubmed: 34469150
doi: 10.1021/acs.jcim.1c00096
Li, H. et al. The impact of protein structure and sequence similarity on the accuracy of machine-learning scoring functions for binding affinity prediction. Biomolecules 8, 12 (2018).
pubmed: 29538331
pmcid: 5871981
doi: 10.3390/biom8010012
Cang, Z., Mu, L. & Wei, G.-W. Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. PLoS Comput. Biol. 14, e1005929 (2018).
pubmed: 29309403
pmcid: 5774846
doi: 10.1371/journal.pcbi.1005929
Jiang, P. et al. Molecular persistent spectral image (Mol-PSI) representation for machine learning models in drug design. Brief. Bioinform. 23, bbab527 (2022).
pubmed: 34958660
doi: 10.1093/bib/bbab527
Wang, Z. et al. OnionNet-2: a convolutional neural network model for predicting protein-ligand binding affinity based on residue-atom contacting shells. Front. Chem. 9, 753002 (2021).
pubmed: 34778208
pmcid: 8579074
doi: 10.3389/fchem.2021.753002
Karlov, D. S., Sosnin, S., Fedorov, M. V. & Popov, P. graphDelta: MPNN scoring function for the affinity prediction of protein-ligand complexes. ACS Omega 5, 5150–5159 (2020).
pubmed: 32201802
pmcid: 7081425
doi: 10.1021/acsomega.9b04162
Tran-Nguyen, V. K. & Ballester, P. J. Beware of simple methods for structure-based virtual screening: the critical importance of broader comparisons. J. Chem. Inf. Model. 63, 1401–1405 (2023).
pubmed: 36848585
pmcid: 10015451
doi: 10.1021/acs.jcim.3c00218
Wójcikowski, M., Ballester, P. J. & Siedlecki, P. Performance of machine-learning scoring functions in structure-based virtual screening. Sci. Rep. 7, 46710 (2017).
pubmed: 28440302
pmcid: 5404222
doi: 10.1038/srep46710
Li, H., Leung, K.-S., Wong, M.-H. & Ballester, P. J. Correcting the impact of docking pose generation error on binding affinity prediction. BMC Bioinforma. 17, 308 (2016).
doi: 10.1186/s12859-016-1169-4
Coleman, R. G., Carchia, M., Sterling, T., Irwin, J. J. & Shoichet, B. K. Ligand pose and orientational sampling in molecular docking. PLoS One 8, e75992 (2013).
pubmed: 24098414
pmcid: 3787967
doi: 10.1371/journal.pone.0075992
Ragoza, M., Hochuli, J., Idrobo, E., Sunseri, J. & Koes, D. R. Protein–ligand scoring with convolutional neural networks. J. Chem. Inf. Model. 57, 942–957 (2017).
pubmed: 28368587
pmcid: 5479431
doi: 10.1021/acs.jcim.6b00740
Imrie, F., Bradley, A. R., van der Schaar, M. & Deane, C. M. Protein family-specific models using deep neural networks and transfer learning improve virtual screening and highlight the need for more data. J. Chem. Inf. Model. 58, 2319–2330 (2018).
pubmed: 30273487
doi: 10.1021/acs.jcim.8b00350
Ghislat, G., Rahman, T. & Ballester, P. J. Recent progress on the prospective application of machine learning to structure-based virtual screening. Curr. Opin. Chem. Biol. 65, 28–34 (2021).
pubmed: 34052776
doi: 10.1016/j.cbpa.2021.04.009
Durrant, J. D. et al. Neural-network scoring functions identify structurally novel estrogen-receptor ligands. J. Chem. Inf. Model. 55, 1953–1961 (2015).
pubmed: 26286148
pmcid: 4780411
doi: 10.1021/acs.jcim.5b00241
Sun, H. et al. Constructing and validating high-performance MIEC-SVM models in virtual screening for kinases: a better way for actives discovery. Sci. Rep. 6, 24817 (2016).
pubmed: 27102549
pmcid: 4840416
doi: 10.1038/srep24817
Stecula, A., Hussain, M. S. & Viola, R. E. Discovery of novel inhibitors of a critical brain enzyme using a homology model and a deep convolutional neural network. J. Med. Chem. 63, 8867–8875 (2020).
pubmed: 32787146
doi: 10.1021/acs.jmedchem.0c00473
Yasuo, N. & Sekijima, M. An improved method of structure-based virtual screening via interaction-energy-based learning. J. Chem. Inf. Model. 59, 1050–1061 (2019).
pubmed: 30808172
doi: 10.1021/acs.jcim.8b00673
Wijewardhane, P. R., Jethava, K. P., Fine, J. A. & Chopra, G. Combined molecular graph neural network and structural docking selects potent programmable cell death protein 1/programmable death-ligand 1 (PD-1/PD-L1) small molecule inhibitors. Preprint at https://chemrxiv.org/engage/chemrxiv/article-details/60c74991bb8c1a15b13dae70 (2020).
Doman, T. N. et al. Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. J. Med. Chem. 45, 2213–2221 (2002).
pubmed: 12014959
doi: 10.1021/jm010548w
Shoichet, B. K., Stroud, R. M., Santi, D. V., Kuntz, I. D. & Perry, K. M. Structure-based discovery of inhibitors of thymidylate synthase. Science 259, 1445–1450 (1993).
pubmed: 8451640
doi: 10.1126/science.8451640
Gentile, F. et al. Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking. Nat. Protoc. 17, 672–697 (2022).
pubmed: 35121854
doi: 10.1038/s41596-021-00659-2
Ashtawy, H. M. & Mahapatra, N. R. Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins. BMC Bioinforma. 16 (Suppl 6), S3 (2015).
doi: 10.1186/1471-2105-16-S6-S3
Bauer, M. R., Ibrahim, T. M., Vogel, S. M. & Boeckler, F. M. Evaluation and optimization of virtual screening workflows with DEKOIS 2.0—a public library of challenging docking benchmark sets. J. Chem. Inf. Model. 53, 1447–1462 (2013).
pubmed: 23705874
doi: 10.1021/ci400115b
Marcou, G. & Rognan, D. Optimizing fragment and scaffold docking by use of molecular interaction fingerprints. J. Chem. Inf. Model. 47, 195–207 (2007).
pubmed: 17238265
doi: 10.1021/ci600342e
Zhan, W. et al. Integrating docking scores, interaction profiles and molecular descriptors to improve the accuracy of molecular docking: toward the discovery of novel Akt1 inhibitors. Eur. J. Med. Chem. 75, 11–20 (2014).
pubmed: 24508830
doi: 10.1016/j.ejmech.2014.01.019
Mir, S. et al. PDBe: towards reusable data delivery infrastructure at protein data bank in Europe. Nucleic Acids Res. 46, D486–D492 (2018).
pubmed: 29126160
doi: 10.1093/nar/gkx1070
Harrison, C. Homology model allows effective virtual screening. Nat. Rev. Drug Discov. 10, 816 (2011).
Huang, D. et al. On the value of homology models for virtual screening: discovering hCXCR3 antagonists by pharmacophore-based and structure-based approaches. J. Chem. Inf. Model. 52, 1356–1366 (2012).
pubmed: 22545675
doi: 10.1021/ci300067q
Messaoudi, A., Belguith, H. & Hamida, J. B. Homology modeling and virtual screening approaches to identify potent inhibitors of VEB-1 β-lactamase. Theor. Biol. Med. Model. 10, 22 (2013).
pubmed: 23547944
pmcid: 3668210
doi: 10.1186/1742-4682-10-22
Chen, X.-R. et al. Homology modeling and virtual screening to discover potent inhibitors targeting the imidazole glycerophosphate dehydratase protein in Staphylococcus xylosus. Front. Chem. 5, 98 (2017).
pubmed: 29177138
pmcid: 5686052
doi: 10.3389/fchem.2017.00098
Leffler, A. E. et al. Discovery of peptide ligands through docking and virtual screening at nicotinic acetylcholine receptor homology models. Proc. Natl Acad. Sci. USA 114, E8100–E8109 (2017).
pubmed: 28874590
pmcid: 5617267
doi: 10.1073/pnas.1703952114
Jaiteh, M., Rodríguez-Espigares, I., Selent, J. & Carlsson, J. Performance of virtual screening against GPCR homology models: impact of template selection and treatment of binding site plasticity. PloS Comput. Biol. 16, e1007680 (2020).
pubmed: 32168319
pmcid: 7135368
doi: 10.1371/journal.pcbi.1007680
Panda, S. K., Saxena, S. & Guruprasad, L. Homology modeling, docking and structure-based virtual screening for new inhibitor identification of Klebsiella pneumoniae heptosyltransferase-III. J. Biomol. Struct. Dyn. 38, 1887–1902 (2020).
pubmed: 31179839
doi: 10.1080/07391102.2019.1624296
Kopp, J. & Schwede, T. The SWISS-MODEL Repository of annotated three-dimensional protein structure homology models. Nucleic Acids Res. 32, D230–D234 (2004).
pubmed: 14681401
pmcid: 308743
doi: 10.1093/nar/gkh008
Bienert, S. et al. The SWISS-MODEL Repository-new features and functionality. Nucleic Acids Res. 45, D313–D319 (2017).
pubmed: 27899672
doi: 10.1093/nar/gkw1132
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
pubmed: 34265844
pmcid: 8371605
doi: 10.1038/s41586-021-03819-2
Callaway, E. ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures. Nature 588, 203–204 (2020).
pubmed: 33257889
doi: 10.1038/d41586-020-03348-4
Callaway, E. What’s next for AlphaFold and the AI protein-folding revolution. Nature 604, 234–238 (2022).
pubmed: 35418629
doi: 10.1038/d41586-022-00997-5
Ren, F. et al. AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor. Chem. Sci. 14, 1443–1452 (2023).
pubmed: 36794205
pmcid: 9906638
doi: 10.1039/D2SC05709C
Wong, F. et al. Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery. Mol. Syst. Biol. 18, e11081 (2022).
pubmed: 36065847
pmcid: 9446081
doi: 10.15252/msb.202211081
Ballester, P. J. Selecting machine-learning scoring functions for structure-based virtual screening. Drug Discov. Today Technol. 32–33, 81–87 (2020).
Xiong, G. et al. Featurization strategies for protein–ligand interactions and their applications in scoring function development. Wiley Interdiscip. Rev. Comput. Mol. Sci. 12, e1567 (2021).
doi: 10.1002/wcms.1567
Huang, N., Shoichet, B. K. & Irwin, J. J. Benchmarking sets for molecular docking. J. Med. Chem. 49, 6789–6801 (2006).
pubmed: 17154509
pmcid: 3383317
doi: 10.1021/jm0608356
Vogel, S. M., Bauer, M. R. & Boeckler, F. M. DEKOIS: demanding evaluation kits for objective in silico screening—a versatile tool for benchmarking docking programs and scoring functions. J. Chem. Inf. Model. 51, 2650–2665 (2011).
pubmed: 21774552
doi: 10.1021/ci2001549
Mysinger, M. M., Carchia, M., Irwin, J. J. & Shoichet, B. K. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 55, 6582–6594 (2012).
pubmed: 22716043
pmcid: 3405771
doi: 10.1021/jm300687e
Rohrer, S. G. & Baumann, K. Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data. J. Chem. Inf. Model. 49, 169–184 (2009).
pubmed: 19434821
doi: 10.1021/ci8002649
Tran-Nguyen, V. K., Jacquemard, C. & Rognan, D. LIT-PCBA: an unbiased data set for machine learning and virtual screening. J. Chem. Inf. Model. 60, 4263–4273 (2020).
pubmed: 32282202
doi: 10.1021/acs.jcim.0c00155
Wallach, I. & Heifets, A. Most ligand-based classification benchmarks reward memorization rather than generalization. J. Chem. Inf. Model. 58, 916–932 (2018).
pubmed: 29698607
doi: 10.1021/acs.jcim.7b00403
Tran-Nguyen, V. K. & Rognan, D. Benchmarking data sets from PubChem BioAssay data: current scenario and room for improvement. Int. J. Mol. Sci. 21, 4380 (2020).
pubmed: 32575564
pmcid: 7352161
doi: 10.3390/ijms21124380
Lagarde, N., Zagury, J.-F. & Montes, M. Benchmarking data sets for the evaluation of virtual ligand screening methods: review and perspectives. J. Chem. Inf. Model. 55, 1297–1307 (2015).
pubmed: 26038804
doi: 10.1021/acs.jcim.5b00090
O’Boyle, N. M. et al. Open Babel: an open chemical toolbox. J. Cheminform. 3, 33 (2011).
pubmed: 21982300
pmcid: 3198950
doi: 10.1186/1758-2946-3-33
Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
pubmed: 15264254
doi: 10.1002/jcc.20084
Dos Santos, R. N., Ferreira, L. G. & Andricopulo, A. D. Practices in molecular docking and structure-based virtual screening. Methods Mol. Biol. 1762, 31–50 (2018).
pubmed: 29594766
doi: 10.1007/978-1-4939-7756-7_3
Da Silva, F., Desaphy, J. & Rognan, D. IChem: a versatile toolkit for detecting, comparing, and predicting protein-ligand interactions. ChemMedChem 13, 507–510 (2018).
pubmed: 29024463
doi: 10.1002/cmdc.201700505
Tran-Nguyen, V. K., Da Silva, F., Bret, G. & Rognan, D. All in one: cavity detection, druggability estimate, cavity-based pharmacophore perception, and virtual screening. J. Chem. Inf. Model. 59, 573–585 (2019).
pubmed: 30563339
doi: 10.1021/acs.jcim.8b00684
Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading. J. Comput. Chem. 31, 455–461 (2010).
pubmed: 19499576
pmcid: 3041641
doi: 10.1002/jcc.21334
Tran-Nguyen, V. K., Simeon, S., Junaid, M. & Ballester, P. J. Structure-based virtual screening for PDL1 dimerizers: evaluating generic scoring functions. Curr. Res. Struct. Biol. 4, 206–210 (2022).
pubmed: 35769111
pmcid: 9234010
doi: 10.1016/j.crstbi.2022.06.002
Eriksson, L. et al. Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environ. Health Perspect. 111, 1361–1375 (2003).
pubmed: 12896860
pmcid: 1241620
doi: 10.1289/ehp.5758
Sahigara, F. et al. Comparison of different approaches to define the applicability domain of QSAR models. Molecules 17, 4791–4810 (2012).
pubmed: 22534664
pmcid: 6268288
doi: 10.3390/molecules17054791
Carrio, P., Pinto, M., Ecker, G., Sanz, F. & Pastor, M. Applicability domain analysis (ADAN): a robust method for assessing the reliability of drug property predictions. J. Chem. Inf. Model. 54, 1500–1511 (2014).
pubmed: 24821140
doi: 10.1021/ci500172z
Sahlin, U., Jeliazkova, N. & Öberg, T. Applicability domain dependent predictive uncertainty in QSAR regressions. Mol. Inform. 33, 26–35 (2014).
pubmed: 27485196
doi: 10.1002/minf.201200131
Kaneko, H. & Funatsu, K. Applicability domain based on ensemble learning in classification and regression analyses. J. Chem. Inf. Model. 54, 2469–2482 (2014).
pubmed: 25119661
doi: 10.1021/ci500364e
Ballester, P. J. & Mitchell, J. B. O. Comments on “Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets”: significance for the validation of scoring functions. J. Chem. Inf. Model. 51, 1739–1741 (2011).
pubmed: 21591735
doi: 10.1021/ci200057e
Tran-Nguyen, V. K., Bret, G. & Rognan, D. True accuracy of fast scoring functions to predict high-throughput screening data from docking poses: the simpler the better. J. Chem. Inf. Model. 61, 2788–2797 (2021).
pubmed: 34109796
doi: 10.1021/acs.jcim.1c00292
Stepniewska-Dziubinska, M. M., Zielenkiewicz, P. & Siedlecki, P. Development and evaluation of a deep learning model for protein-ligand binding affinity prediction. Bioinformatics 34, 3666–3674 (2018).
pubmed: 29757353
pmcid: 6198856
doi: 10.1093/bioinformatics/bty374
Wang, C. & Zhang, Y. Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J. Comput. Chem. 38, 169–177 (2017).
pubmed: 27859414
doi: 10.1002/jcc.24667
Shen, C. et al. Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening? Brief. Bioinform. 22, bbaa410 (2021).
pubmed: 33418562
doi: 10.1093/bib/bbaa410
McNutt, A. T. et al. GNINA 1.0: molecular docking with deep learning. J. Cheminform. 13, 43 (2021).
pubmed: 34108002
pmcid: 8191141
doi: 10.1186/s13321-021-00522-2
Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS One 10, e0118432 (2015).
pubmed: 25738806
pmcid: 4349800
doi: 10.1371/journal.pone.0118432
Liu, S. et al. Practical model selection for prospective virtual screening. J. Chem. Inf. Model. 59, 282–293 (2019).
pubmed: 30500183
doi: 10.1021/acs.jcim.8b00363
Mendez, D. et al. ChEMBL: toward direct deposition of bioassay data. Nucleic Acids Res. 47, D930–D940 (2019).
pubmed: 30398643
doi: 10.1093/nar/gky1075
Papadatos, G. et al. SureChEMBL: a large-scale, chemically annotated patent document database. Nucleic Acids Res. 44, D1220–D1228 (2016).
pubmed: 26582922
doi: 10.1093/nar/gkv1253
Sunghwan, K. et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 49, D1388–D1395 (2021).
doi: 10.1093/nar/gkaa971
McCloskey, K. et al. Machine learning on DNA-encoded libraries: a new paradigm for hit finding. J. Med. Chem. 63, 8857–8866 (2020).
pubmed: 32525674
doi: 10.1021/acs.jmedchem.0c00452
Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).
pubmed: 8709122
doi: 10.1021/jm9602928
Baell, J. B. & Holloway, G. A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J. Med. Chem. 53, 2719–2740 (2010).
pubmed: 20131845
doi: 10.1021/jm901137j
Gilberg, E., Jasial, S., Stumpfe, D., Dimova, D. & Bajorath, J. Highly promiscuous small molecules from biological screening assays include many pan-assay interference compounds but also candidates for polypharmacology. J. Med. Chem. 59, 10285–10290 (2016).
pubmed: 27809519
doi: 10.1021/acs.jmedchem.6b01314
Baell, J. B. Feeling nature’s PAINS: natural products, natural product drugs, and pan assay interference compounds (PAINS). J. Nat. Prod. 79, 616–628 (2016).
pubmed: 26900761
doi: 10.1021/acs.jnatprod.5b00947
Capuzzi, S. J., Muratov, E. N. & Tropsha, A. Phantom PAINS: problems with the utility of alerts for Pan-Assay INterference CompoundS. J. Chem. Inf. Model. 57, 417–427 (2017).
pubmed: 28165734
pmcid: 5411023
doi: 10.1021/acs.jcim.6b00465
Kenny, P. W. Comment on the ecstasy and agony of assay interference compounds. J. Chem. Inf. Model. 57, 2640–2645 (2017).
pubmed: 29048168
doi: 10.1021/acs.jcim.7b00313
Baell, J. B. & Nissink, J. W. Seven year itch: pan-assay interference compounds (PAINS) in 2017—utility and limitations. ACS Chem. Biol. 13, 36–44 (2018).
pubmed: 29202222
doi: 10.1021/acschembio.7b00903
Stork, C., Chen, Y., Sicho, M. & Kirchmair, J. Hit Dexter 2.0: machine-learning models for the prediction of frequent hitters. J. Chem. Inf. Model. 59, 1030–1043 (2019).
pubmed: 30624935
doi: 10.1021/acs.jcim.8b00677
Stork, C. et al. NERDD: a web portal providing access to in silico tools for drug discovery. Bioinformatics 36, 1291–1292 (2020).
pubmed: 32077475
doi: 10.1093/bioinformatics/btz695
Pearl, L. H. Review: the HSP90 molecular chaperone-an enigmatic ATPase. Biopolymers 105, 594–607 (2016).
pubmed: 26991466
pmcid: 4879513
doi: 10.1002/bip.22835
Sgobba, M., Forestiero, R., Degliesposti, G. & Rastelli, G. Exploring the binding site of C-terminal hsp90 inhibitors. J. Chem. Inf. Model. 50, 1522–1528 (2010).
pubmed: 20828111
doi: 10.1021/ci1001857
Halgren, T. A. Identifying and characterizing binding sites and assessing druggability. J. Chem. Inf. Model. 49, 377–389 (2009).
pubmed: 19434839
doi: 10.1021/ci800324m
Molecular Operating Environment (MOE), 2020.09. Chemical Computing Group https://www.chemcomp.com/Products.htm (2022).
Smyth, M. S. & Martin, J. H. J. x Ray crystallography. Mol. Pathol. 53, 8–14 (2000).
pubmed: 10884915
pmcid: 1186895
doi: 10.1136/mp.53.1.8
Wüthrich, K. Protein structure determination in solution by NMR spectroscopy. J. Biol. Chem. 265, 22059–22062 (1990).
pubmed: 2266107
doi: 10.1016/S0021-9258(18)45665-7
Purslow, J. A., Khatiwada, B., Bayro, M. J. & Venditti, V. NMR methods for structural characterization of protein-protein complexes. Front. Mol. Biosci. 7, 9 (2020).
pubmed: 32047754
pmcid: 6997237
doi: 10.3389/fmolb.2020.00009
Fowler, N. J., Sljoka, A. & Williamson, M. P. A method for validating the accuracy of NMR protein structures. Nat. Commun. 11, 6321 (2020).
pubmed: 33339822
pmcid: 7749147
doi: 10.1038/s41467-020-20177-1
Hu, Y. et al. NMR-based methods for protein analysis. Anal. Chem. 93, 1866–1879 (2021).
pubmed: 33439619
doi: 10.1021/acs.analchem.0c03830
Callaway, E. Revolutionary cryo-EM is taking over structural biology. Nature 578, 201 (2020).
pubmed: 32047310
doi: 10.1038/d41586-020-00341-9
Wu, X. & Rapoport, T. A. Cryo-EM structure determination of small proteins by nanobody-binding scaffolds (Legobodies). Proc. Natl Acad. Sci. USA 118, e2115001118 (2021).
pubmed: 34620716
pmcid: 8521671
doi: 10.1073/pnas.2115001118
Berman, H. M. et al. The protein data bank. Nucleic Acids Res. 28, 235–242 (2000).
pubmed: 10592235
pmcid: 102472
doi: 10.1093/nar/28.1.235
Oleinikovas, V., Saladino, G., Cossins, B. P. & Gervasio, F. L. Understanding cryptic pocket formation in protein targets by enhanced sampling simulations. J. Am. Chem. Soc. 138, 14257–14263 (2016).
pubmed: 27726386
doi: 10.1021/jacs.6b05425
Vajda, S., Beglov, D., Wakefield, A. E., Egbert, M. & Whitty, A. Cryptic binding sites on proteins: definition, detection, and druggability. Curr. Opin. Chem. Biol. 44, 1–8 (2018).
pubmed: 29800865
pmcid: 6088748
doi: 10.1016/j.cbpa.2018.05.003
Bekker, G. J., Fukuda, I., Higo, J., Fukunishi, Y. & Kamiya, N. Cryptic-site binding mechanism of medium-sized Bcl-xL inhibiting compounds elucidated by McMD-based dynamic docking simulations. Sci. Rep. 11, 5046 (2021).
pubmed: 33658550
pmcid: 7930018
doi: 10.1038/s41598-021-84488-z
Zhu, J., Hoop, C. L., Case, D. A. & Baum, J. Cryptic binding sites become accessible through surface reconstruction of the type I collagen fibril. Sci. Rep. 8, 16646 (2018).
pubmed: 30413772
pmcid: 6226522
doi: 10.1038/s41598-018-34616-z
Posner, B. A., Xi, H. & Mills, J. E. Enhanced HTS hit selection via a local hit rate analysis. J. Chem. Inf. Model. 49, 2202–2210 (2009).
pubmed: 19795815
doi: 10.1021/ci900113d
Stein, R. M. et al. Property-unmatched decoys in docking benchmarks. J. Chem. Inf. Model. 61, 699–714 (2021).
pubmed: 33494610
pmcid: 7913603
doi: 10.1021/acs.jcim.0c00598
Imrie, F., Bradley, A. R. & Deane, C. M. Generating property-matched decoy molecules using deep learning. Bioinformatics 37, 2134–2141 (2021).
pubmed: 33532838
pmcid: 8352508
doi: 10.1093/bioinformatics/btab080
Irwin, J. J., Sterling, T., Mysinger, M. M., Bolstad, E. S. & Coleman, R. G. ZINC: a free tool to discover chemistry for biology. J. Chem. Inf. Model. 52, 1757–1768 (2012).
pubmed: 22587354
pmcid: 3402020
doi: 10.1021/ci3001277
Réau, M., Langenfeld, F., Zagury, J.-F., Lagarde, N. & Montes, M. Decoys selection in benchmarking datasets: overview and perspectives. Front. Pharmacol. 9, 11 (2018).
pubmed: 29416509
pmcid: 5787549
doi: 10.3389/fphar.2018.00011
Moriwaki, H., Tian, Y.-S., Kawashita, N. & Takagi, T. Mordred: a molecular descriptor calculator. J. Cheminform. 10, 4 (2018).
pubmed: 29411163
pmcid: 5801138
doi: 10.1186/s13321-018-0258-y
Barillari, C., Taylor, J., Viner, R. & Essex, J. W. Classification of water molecules in protein binding sites. J. Am. Chem. Soc. 129, 2577–2587 (2007).
pubmed: 17288418
doi: 10.1021/ja066980q
Liu, T., Lin, Y., Wen, X., Jorissen, R. N. & Gilson, M. K. BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res. 35, D198–D201 (2007).
pubmed: 17145705
doi: 10.1093/nar/gkl999
Hernández-Hernández, S. & Ballester, P. J. On the best way to cluster NCI-60 molecules. Biomolecules 13, 498 (2023).
pubmed: 36979433
pmcid: 10046274
doi: 10.3390/biom13030498
Butina, D. Unsupervised data base clustering based on Daylight’s fingerprint and Tanimoto similarity: a fast and automated way to cluster small and large data sets. J. Chem. Inf. Comput. Sci. 39, 747–750 (1999).
doi: 10.1021/ci9803381
Gómez-Sacristán, P. et al. Structure-based virtual screening for PDL1 dimerizers is boosted by inactive-enriched machine-learning models exploiting patent data. Zenodo https://zenodo.org/record/6226320/export/dcite4 (2023).
Radifar, M., Yuniarti, N. & Istyastono, E. P. PyPLIF: Python-based protein-ligand interaction fingerprinting. Bioinformation 9, 325–328 (2013).
pubmed: 23559752
pmcid: 3607193
doi: 10.6026/97320630009325
Chupakhin, V., Marcou, G., Gaspar, H. & Varnek, A. Simple ligand–receptor interaction descriptor (SILIRID) for alignment-free binding site comparison. Comput. Struct. Biotechnol. J. 10, 33–37 (2014).
pubmed: 25210596
pmcid: 4151984
doi: 10.1016/j.csbj.2014.05.004
Da, C. & Kireev, D. Structural protein–ligand interaction fingerprints (SPLIF) for structure-based virtual screening: method and benchmark study. J. Chem. Inf. Model. 54, 2555–2561 (2014).
pubmed: 25116840
pmcid: 4170813
doi: 10.1021/ci500319f
Ballester, P. J., Schreyer, A. & Blundell, T. L. Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity? J. Chem. Inf. Model. 54, 944–955 (2014).
pubmed: 24528282
pmcid: 3966527
doi: 10.1021/ci500091r
Li, H., Leung, K.-S., Wong, M.-H. & Ballester, P. J. Improving AutoDock Vina using Random Forest: the growing accuracy of binding affinity prediction by the effective exploitation of larger data sets. Mol. Inform. 34, 115–126 (2015).
pubmed: 27490034
doi: 10.1002/minf.201400132
Wójcikowski, M., Kukiełka, M., Stepniewska-Dziubinska, M. M. & Siedlecki, P. Development of a protein-ligand extended connectivity (PLEC) fingerprint and its application for binding affinity predictions. Bioinformatics 35, 1334–1341 (2019).
pubmed: 30202917
doi: 10.1093/bioinformatics/bty757
Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).
pubmed: 29629118
doi: 10.1039/C7SC02664A
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
pubmed: 20426451
doi: 10.1021/ci100050t
Ballester, P. J. et al. Hierarchical virtual screening for the discovery of new molecular scaffolds in antibacterial hit identification. J. R. Soc. Interface 9, 3196–3207 (2012).
pubmed: 22933186
pmcid: 3481598
doi: 10.1098/rsif.2012.0569
Li, L. et al. Target-specific support vector machine scoring in structure-based virtual screening: computational validation, in vitro testing in kinases, and effects on lung cancer cell proliferation. J. Chem. Inf. Model. 51, 755–759 (2011).
pubmed: 21438548
pmcid: 3092157
doi: 10.1021/ci100490w
Durrant, J. D. & McCammon, J. A. NNScore: a neural-network-based scoring function for the characterization of protein−ligand complexes. J. Chem. Inf. Model. 50, 1865–1871 (2010).
pubmed: 20845954
pmcid: 2964041
doi: 10.1021/ci100244v
Durrant, J. D. & McCammon, J. A. NNScore 2.0: a neural-network receptor–ligand scoring function. J. Chem. Inf. Model. 51, 2897–2903 (2011).
pubmed: 22017367
pmcid: 3225089
doi: 10.1021/ci2003889
Wang, D. et al. Improving the virtual screening ability of target-specific scoring functions using deep learning methods. Front. Pharmacol. 10, 924 (2019).
pubmed: 31507420
pmcid: 6713720
doi: 10.3389/fphar.2019.00924
Ashtawy, H. M. & Mahapatra, N. R. Task-specific scoring functions for predicting ligand binding poses and affinity and for screening enrichment. J. Chem. Inf. Model. 58, 119–133 (2018).
pubmed: 29190087
doi: 10.1021/acs.jcim.7b00309
Turner, R. et al. Bayesian optimization is superior to random search for machine learning hyperparameter tuning: analysis of the Black-Box Optimization Challenge 2020. Proc. Mach. Learn. Res. 133, 3–26 (2021).
Cowen-Rivers, A. I. et al. HEBO: pushing the limits of sample-efficient hyperparameter optimisation. J. Artif. Intell. Res. 74, 1269–1349 (2022).
doi: 10.1613/jair.1.13643
Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: a next-generation hyperparameter optimization framework. in The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’19), August 4–8, 2019, Anchorage, AK, USA. https://doi.org/10.1145/3292500.3330701 (2019).
Case, D. A. et al. The Amber biomolecular simulation programs. J. Comput. Chem. 26, 1668–1688 (2005).
pubmed: 16200636
pmcid: 1989667
doi: 10.1002/jcc.20290
Götz, A. W. et al. Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. Generalized Born. J. Chem. Theory Comput. 8, 1542–1555 (2012).
pubmed: 22582031
pmcid: 3348677
doi: 10.1021/ct200909j
Berendsen, H. J. C., van der Spoel, D. & van Drunen, R. GROMACS: a message-passing parallel molecular dynamics implementation. Comput. Phys. Commun. 91, 43–56 (1995).
doi: 10.1016/0010-4655(95)00042-E
Makarewicz, T. & Kaźmierkiewicz, R. Molecular dynamics simulation by GROMACS using GUI plugin for PyMOL. J. Chem. Inf. Model. 53, 1229–1234 (2013).
pubmed: 23611462
doi: 10.1021/ci400071x
van Dijk, M., Wassenaar, T. A. & Bonvin, A. M. J. J. A flexible, grid-enabled web portal for GROMACS molecular dynamics simulations. J. Chem. Theory Comput. 8, 3463–3472 (2012).
pubmed: 26592996
doi: 10.1021/ct300102d
Bietz, S., Urbaczek, S., Schulz, B. & Rarey, M. Protoss: a holistic approach to predict tautomers and protonation states in protein-ligand complexes. J. Cheminform. 6, 12 (2014).
pubmed: 24694216
pmcid: 4019353
doi: 10.1186/1758-2946-6-12
Sunseri, J. & Koes, D. R. Virtual screening with Gnina 1.0. Molecules 26, 7369 (2021).
pubmed: 34885952
pmcid: 8659095
doi: 10.3390/molecules26237369