A practical guide to machine-learning scoring for structure-based virtual screening.

Artificial Intelligence Acetylcholinesterase Ligands Machine Learning Algorithms Molecular Docking Simulation

Journal

Nature protocols

ISSN: 1750-2799

Titre abrégé: Nat Protoc

Pays: England

ID NLM: 101284307

Informations de publication

Date de publication:
Nov 2023

Historique:

received: 08 02 2022

accepted: 03 07 2023

medline: 8 11 2023

pubmed: 17 10 2023

entrez: 16 10 2023

Statut: ppublish

Résumé

Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol , can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.

Identifiants

DOI: 10.1038/s41596-023-00885-w PMID: 37845361

pubmed: 37845361

doi: 10.1038/s41596-023-00885-w

pii: 10.1038/s41596-023-00885-w

doi:

Substances chimiques

Acetylcholinesterase EC 3.1.1.7

Ligands 0

Types de publication

Journal Article Review

Langues

eng

Sous-ensembles de citation

Pagination

3460-3511

Informations de copyright

Références

Pereira, D. A. & Williams, J. A. Origin and evolution of high throughput screening. Br. J. Pharmacol. 152, 53–61 (2007).

pubmed: 17603542 pmcid: 1978279 doi: 10.1038/sj.bjp.0707373

Wang, Y., Cheng, T. & Bryant, S. H. PubChem BioAssay: a decade’s development toward open high-throughput screening data sharing. SLAS Discov. 22, 655–666 (2017).

pubmed: 28346087 pmcid: 5480605 doi: 10.1177/2472555216685069

Payne, D. J., Gwynn, M. N., Holmes, D. J. & Pompliano, D. L. Drugs for bad bugs: confronting the challenges of antibacterial discovery. Nat. Rev. Drug Discov. 6, 29–40 (2007).

pubmed: 17159923 doi: 10.1038/nrd2201

Heifetz, A., Southey, M., Morao, I., Townsend-Nicholson, A. & Bodkin, M. J. Computational methods used in hit-to-lead and lead optimization stages of structure-based drug discovery. Methods Mol. Biol. 1705, 375–394 (2018).

pubmed: 29188574 doi: 10.1007/978-1-4939-7465-8_19

Jorgensen, W. L. Efficient drug lead discovery and optimization. Acc. Chem. Res. 42, 724–733 (2009).

pubmed: 19317443 pmcid: 2727934 doi: 10.1021/ar800236t

Gloriam, D. E. Bigger is better in virtual drug screens. Nature 566, 193–194 (2019).

pubmed: 30737502 doi: 10.1038/d41586-019-00145-6

Jia, C.-Y., Li, J.-Y., Hao, G.-F. & Yang, G.-F. A drug-likeness toolbox facilitates ADMET study in drug discovery. Drug Discov. Today 25, 248–258 (2020).

pubmed: 31705979 doi: 10.1016/j.drudis.2019.10.014

Göller, A. H. et al. Bayer’s in silico ADMET platform: a journey of machine learning over the past two decades. Drug Discov. Today 25, 1702–1709 (2020).

pubmed: 32652309 doi: 10.1016/j.drudis.2020.07.001

Grygorenko, O. O. et al. Generating multibillion chemical space of readily accessible screening compounds. iScience 23, 101681 (2020).

pubmed: 33145486 pmcid: 7593547 doi: 10.1016/j.isci.2020.101681

Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224–229 (2019).

pubmed: 30728502 pmcid: 6383769 doi: 10.1038/s41586-019-0917-9

Gorgulla, C. et al. An open-source drug discovery platform enables ultra-large virtual screens. Nature 580, 663–668 (2020).

pubmed: 32152607 pmcid: 8352709 doi: 10.1038/s41586-020-2117-z

Stein, R. M. et al. Virtual discovery of melatonin receptor ligands to modulate circadian rhythms. Nature 579, 609–614 (2020).

pubmed: 32040955 pmcid: 7134359 doi: 10.1038/s41586-020-2027-0

Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702 (2020).

pubmed: 32084340 pmcid: 8349178 doi: 10.1016/j.cell.2020.01.021

Gorgulla, C. et al. A multi-pronged approach targeting SARS-CoV-2 proteins using ultra-large virtual screening. iScience 24, 102021 (2021).

pubmed: 33426509 pmcid: 7783459 doi: 10.1016/j.isci.2020.102021

Luttens, A. et al. Ultralarge virtual screening identifies SARS-CoV-2 main protease inhibitors with broad-spectrum activity against coronaviruses. J. Am. Chem. Soc. 144, 2905–2920 (2022).

pubmed: 35142215 pmcid: 8848513 doi: 10.1021/jacs.1c08402

Crunkhorn, S. Screening ultra-large virtual libraries. Nat. Rev. Drug Discov. 21, 95 (2022).

pubmed: 34987228 doi: 10.1038/d41573-022-00002-8

Fresnais, L. & Ballester, P. J. The impact of compound library size on the performance of scoring functions for structure-based virtual screening. Brief. Bioinform. 22, bbaa095 (2021).

pubmed: 32568385 doi: 10.1093/bib/bbaa095

Koes, D. R., Baumgartner, M. P. & Camacho, C. J. Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J. Chem. Inf. Model. 53, 1893–1904 (2013).

pubmed: 23379370 pmcid: 3726561 doi: 10.1021/ci300604z

Bender, B. J. et al. A practical guide to large-scale docking. Nat. Protoc. 16, 4799–4832 (2021).

pubmed: 34561691 pmcid: 8522653 doi: 10.1038/s41596-021-00597-z

Ain, Q. U., Aleksandrova, A., Roessler, F. D. & Ballester, P. J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdiscip. Rev. Comput. Mol. Sci. 5, 405–424 (2015).

pubmed: 27110292 pmcid: 4832270 doi: 10.1002/wcms.1225

Ballester, P. J. & Mitchell, J. B. O. A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26, 1169–1175 (2010).

pubmed: 20236947 doi: 10.1093/bioinformatics/btq112

Xiong, G.-L. et al. Improving structure-based virtual screening performance via learning from scoring function components. Brief. Bioinform. 22, bbaa094 (2021).

pubmed: 32496540 doi: 10.1093/bib/bbaa094

Li, H., Sze, K.-H., Lu, G. & Ballester, P. J. Machine-learning scoring functions for structure-based virtual screening. Wiley Interdiscip. Rev. Comput. Mol. Sci. 11, e1478 (2021).

doi: 10.1002/wcms.1478

Adeshina, Y. O., Deeds, E. J. & Karanicolas, J. Machine learning classification can reduce false positives in structure-based virtual screening. Proc. Natl Acad. Sci. USA 117, 18477–18488 (2020).

pubmed: 32669436 pmcid: 7414157 doi: 10.1073/pnas.2000585117

Nguyen, D. D. et al. Mathematical deep learning for pose and binding affinity prediction and ranking in D3R Grand Challenges. J. Comput. Aided Mol. Des. 33, 71–82 (2019).

pubmed: 30116918 doi: 10.1007/s10822-018-0146-6

Nguyen, D. D., Gao, K., Wang, M. & Wei, G. W. MathDL: mathematical deep learning for D3R Grand Challenge 4. J. Comput. Aided Mol. Des. 34, 131–147 (2020).

pubmed: 31734815 doi: 10.1007/s10822-019-00237-5

Li, H., Sze, K.-H., Lu, G. & Ballester, P. J. Machine-learning scoring functions for structure-based drug lead optimization. Wiley Interdiscip. Rev. Comput. Mol. Sci. 10, e1465 (2020).

doi: 10.1002/wcms.1465

Li, H. et al. Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data. Bioinformatics 35, 3989–3995 (2019).

pubmed: 30873528 doi: 10.1093/bioinformatics/btz183

Meng, Z. & Xia, K. Persistent spectral–based machine learning (PerSpect ML) for protein-ligand binding affinity prediction. Sci. Adv. 7, eabc5329 (2021).

pubmed: 33962954 pmcid: 8104863 doi: 10.1126/sciadv.abc5329

Shen, C. et al. From machine learning to deep learning: advances in scoring functions for protein–ligand docking. Wiley Interdiscip. Rev. Comput. Mol. Sci. 10, e1429 (2020).

doi: 10.1002/wcms.1429

Jiménez-Luna, J. et al. DeltaDelta neural networks for lead optimization of small molecule potency. Chem. Sci. 10, 10911–10918 (2019).

pubmed: 32190246 pmcid: 7066671 doi: 10.1039/C9SC04606B

Sánchez-Cruz, N., Medina-Franco, J. L., Mestres, J. & Barril, X. Extended connectivity interaction features: improving binding affinity prediction through chemical description. Bioinformatics 37, 1376–1382 (2021).

pubmed: 33226061 doi: 10.1093/bioinformatics/btaa982

Boyles, F., Deane, C. M. & Morris, G. M. Learning from docked ligands: ligand-based features rescue structure-based scoring functions when trained on docked poses. J. Chem. Inf. Model. 62, 5329–5341 (2022).

pubmed: 34469150 doi: 10.1021/acs.jcim.1c00096

Li, H. et al. The impact of protein structure and sequence similarity on the accuracy of machine-learning scoring functions for binding affinity prediction. Biomolecules 8, 12 (2018).

pubmed: 29538331 pmcid: 5871981 doi: 10.3390/biom8010012

Cang, Z., Mu, L. & Wei, G.-W. Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. PLoS Comput. Biol. 14, e1005929 (2018).

pubmed: 29309403 pmcid: 5774846 doi: 10.1371/journal.pcbi.1005929

Jiang, P. et al. Molecular persistent spectral image (Mol-PSI) representation for machine learning models in drug design. Brief. Bioinform. 23, bbab527 (2022).

pubmed: 34958660 doi: 10.1093/bib/bbab527

Wang, Z. et al. OnionNet-2: a convolutional neural network model for predicting protein-ligand binding affinity based on residue-atom contacting shells. Front. Chem. 9, 753002 (2021).

pubmed: 34778208 pmcid: 8579074 doi: 10.3389/fchem.2021.753002

Karlov, D. S., Sosnin, S., Fedorov, M. V. & Popov, P. graphDelta: MPNN scoring function for the affinity prediction of protein-ligand complexes. ACS Omega 5, 5150–5159 (2020).

pubmed: 32201802 pmcid: 7081425 doi: 10.1021/acsomega.9b04162

Tran-Nguyen, V. K. & Ballester, P. J. Beware of simple methods for structure-based virtual screening: the critical importance of broader comparisons. J. Chem. Inf. Model. 63, 1401–1405 (2023).

pubmed: 36848585 pmcid: 10015451 doi: 10.1021/acs.jcim.3c00218

Wójcikowski, M., Ballester, P. J. & Siedlecki, P. Performance of machine-learning scoring functions in structure-based virtual screening. Sci. Rep. 7, 46710 (2017).

pubmed: 28440302 pmcid: 5404222 doi: 10.1038/srep46710

Li, H., Leung, K.-S., Wong, M.-H. & Ballester, P. J. Correcting the impact of docking pose generation error on binding affinity prediction. BMC Bioinforma. 17, 308 (2016).

doi: 10.1186/s12859-016-1169-4

Coleman, R. G., Carchia, M., Sterling, T., Irwin, J. J. & Shoichet, B. K. Ligand pose and orientational sampling in molecular docking. PLoS One 8, e75992 (2013).

pubmed: 24098414 pmcid: 3787967 doi: 10.1371/journal.pone.0075992

Ragoza, M., Hochuli, J., Idrobo, E., Sunseri, J. & Koes, D. R. Protein–ligand scoring with convolutional neural networks. J. Chem. Inf. Model. 57, 942–957 (2017).

pubmed: 28368587 pmcid: 5479431 doi: 10.1021/acs.jcim.6b00740

Imrie, F., Bradley, A. R., van der Schaar, M. & Deane, C. M. Protein family-specific models using deep neural networks and transfer learning improve virtual screening and highlight the need for more data. J. Chem. Inf. Model. 58, 2319–2330 (2018).

pubmed: 30273487 doi: 10.1021/acs.jcim.8b00350

Ghislat, G., Rahman, T. & Ballester, P. J. Recent progress on the prospective application of machine learning to structure-based virtual screening. Curr. Opin. Chem. Biol. 65, 28–34 (2021).

pubmed: 34052776 doi: 10.1016/j.cbpa.2021.04.009

Durrant, J. D. et al. Neural-network scoring functions identify structurally novel estrogen-receptor ligands. J. Chem. Inf. Model. 55, 1953–1961 (2015).

pubmed: 26286148 pmcid: 4780411 doi: 10.1021/acs.jcim.5b00241

Sun, H. et al. Constructing and validating high-performance MIEC-SVM models in virtual screening for kinases: a better way for actives discovery. Sci. Rep. 6, 24817 (2016).

pubmed: 27102549 pmcid: 4840416 doi: 10.1038/srep24817

Stecula, A., Hussain, M. S. & Viola, R. E. Discovery of novel inhibitors of a critical brain enzyme using a homology model and a deep convolutional neural network. J. Med. Chem. 63, 8867–8875 (2020).

pubmed: 32787146 doi: 10.1021/acs.jmedchem.0c00473

Yasuo, N. & Sekijima, M. An improved method of structure-based virtual screening via interaction-energy-based learning. J. Chem. Inf. Model. 59, 1050–1061 (2019).

pubmed: 30808172 doi: 10.1021/acs.jcim.8b00673

Wijewardhane, P. R., Jethava, K. P., Fine, J. A. & Chopra, G. Combined molecular graph neural network and structural docking selects potent programmable cell death protein 1/programmable death-ligand 1 (PD-1/PD-L1) small molecule inhibitors. Preprint at https://chemrxiv.org/engage/chemrxiv/article-details/60c74991bb8c1a15b13dae70 (2020).

Doman, T. N. et al. Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. J. Med. Chem. 45, 2213–2221 (2002).

pubmed: 12014959 doi: 10.1021/jm010548w

Shoichet, B. K., Stroud, R. M., Santi, D. V., Kuntz, I. D. & Perry, K. M. Structure-based discovery of inhibitors of thymidylate synthase. Science 259, 1445–1450 (1993).

pubmed: 8451640 doi: 10.1126/science.8451640

Gentile, F. et al. Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking. Nat. Protoc. 17, 672–697 (2022).

pubmed: 35121854 doi: 10.1038/s41596-021-00659-2

Ashtawy, H. M. & Mahapatra, N. R. Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins. BMC Bioinforma. 16 (Suppl 6), S3 (2015).

doi: 10.1186/1471-2105-16-S6-S3

Bauer, M. R., Ibrahim, T. M., Vogel, S. M. & Boeckler, F. M. Evaluation and optimization of virtual screening workflows with DEKOIS 2.0—a public library of challenging docking benchmark sets. J. Chem. Inf. Model. 53, 1447–1462 (2013).

pubmed: 23705874 doi: 10.1021/ci400115b

Marcou, G. & Rognan, D. Optimizing fragment and scaffold docking by use of molecular interaction fingerprints. J. Chem. Inf. Model. 47, 195–207 (2007).

pubmed: 17238265 doi: 10.1021/ci600342e

Zhan, W. et al. Integrating docking scores, interaction profiles and molecular descriptors to improve the accuracy of molecular docking: toward the discovery of novel Akt1 inhibitors. Eur. J. Med. Chem. 75, 11–20 (2014).

pubmed: 24508830 doi: 10.1016/j.ejmech.2014.01.019

Mir, S. et al. PDBe: towards reusable data delivery infrastructure at protein data bank in Europe. Nucleic Acids Res. 46, D486–D492 (2018).

pubmed: 29126160 doi: 10.1093/nar/gkx1070

Harrison, C. Homology model allows effective virtual screening. Nat. Rev. Drug Discov. 10, 816 (2011).

Huang, D. et al. On the value of homology models for virtual screening: discovering hCXCR3 antagonists by pharmacophore-based and structure-based approaches. J. Chem. Inf. Model. 52, 1356–1366 (2012).

pubmed: 22545675 doi: 10.1021/ci300067q

Messaoudi, A., Belguith, H. & Hamida, J. B. Homology modeling and virtual screening approaches to identify potent inhibitors of VEB-1 β-lactamase. Theor. Biol. Med. Model. 10, 22 (2013).

pubmed: 23547944 pmcid: 3668210 doi: 10.1186/1742-4682-10-22

Chen, X.-R. et al. Homology modeling and virtual screening to discover potent inhibitors targeting the imidazole glycerophosphate dehydratase protein in Staphylococcus xylosus. Front. Chem. 5, 98 (2017).

pubmed: 29177138 pmcid: 5686052 doi: 10.3389/fchem.2017.00098

Leffler, A. E. et al. Discovery of peptide ligands through docking and virtual screening at nicotinic acetylcholine receptor homology models. Proc. Natl Acad. Sci. USA 114, E8100–E8109 (2017).

pubmed: 28874590 pmcid: 5617267 doi: 10.1073/pnas.1703952114

Jaiteh, M., Rodríguez-Espigares, I., Selent, J. & Carlsson, J. Performance of virtual screening against GPCR homology models: impact of template selection and treatment of binding site plasticity. PloS Comput. Biol. 16, e1007680 (2020).

pubmed: 32168319 pmcid: 7135368 doi: 10.1371/journal.pcbi.1007680

Panda, S. K., Saxena, S. & Guruprasad, L. Homology modeling, docking and structure-based virtual screening for new inhibitor identification of Klebsiella pneumoniae heptosyltransferase-III. J. Biomol. Struct. Dyn. 38, 1887–1902 (2020).

pubmed: 31179839 doi: 10.1080/07391102.2019.1624296

Kopp, J. & Schwede, T. The SWISS-MODEL Repository of annotated three-dimensional protein structure homology models. Nucleic Acids Res. 32, D230–D234 (2004).

pubmed: 14681401 pmcid: 308743 doi: 10.1093/nar/gkh008

Bienert, S. et al. The SWISS-MODEL Repository-new features and functionality. Nucleic Acids Res. 45, D313–D319 (2017).

pubmed: 27899672 doi: 10.1093/nar/gkw1132

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

pubmed: 34265844 pmcid: 8371605 doi: 10.1038/s41586-021-03819-2

Callaway, E. ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures. Nature 588, 203–204 (2020).

pubmed: 33257889 doi: 10.1038/d41586-020-03348-4

Callaway, E. What’s next for AlphaFold and the AI protein-folding revolution. Nature 604, 234–238 (2022).

pubmed: 35418629 doi: 10.1038/d41586-022-00997-5

Ren, F. et al. AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor. Chem. Sci. 14, 1443–1452 (2023).

pubmed: 36794205 pmcid: 9906638 doi: 10.1039/D2SC05709C

Wong, F. et al. Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery. Mol. Syst. Biol. 18, e11081 (2022).

pubmed: 36065847 pmcid: 9446081 doi: 10.15252/msb.202211081

Ballester, P. J. Selecting machine-learning scoring functions for structure-based virtual screening. Drug Discov. Today Technol. 32–33, 81–87 (2020).

Xiong, G. et al. Featurization strategies for protein–ligand interactions and their applications in scoring function development. Wiley Interdiscip. Rev. Comput. Mol. Sci. 12, e1567 (2021).

doi: 10.1002/wcms.1567

Huang, N., Shoichet, B. K. & Irwin, J. J. Benchmarking sets for molecular docking. J. Med. Chem. 49, 6789–6801 (2006).

pubmed: 17154509 pmcid: 3383317 doi: 10.1021/jm0608356

Vogel, S. M., Bauer, M. R. & Boeckler, F. M. DEKOIS: demanding evaluation kits for objective in silico screening—a versatile tool for benchmarking docking programs and scoring functions. J. Chem. Inf. Model. 51, 2650–2665 (2011).

pubmed: 21774552 doi: 10.1021/ci2001549

Mysinger, M. M., Carchia, M., Irwin, J. J. & Shoichet, B. K. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 55, 6582–6594 (2012).

pubmed: 22716043 pmcid: 3405771 doi: 10.1021/jm300687e

Rohrer, S. G. & Baumann, K. Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data. J. Chem. Inf. Model. 49, 169–184 (2009).

pubmed: 19434821 doi: 10.1021/ci8002649

Tran-Nguyen, V. K., Jacquemard, C. & Rognan, D. LIT-PCBA: an unbiased data set for machine learning and virtual screening. J. Chem. Inf. Model. 60, 4263–4273 (2020).

pubmed: 32282202 doi: 10.1021/acs.jcim.0c00155

Wallach, I. & Heifets, A. Most ligand-based classification benchmarks reward memorization rather than generalization. J. Chem. Inf. Model. 58, 916–932 (2018).

pubmed: 29698607 doi: 10.1021/acs.jcim.7b00403

Tran-Nguyen, V. K. & Rognan, D. Benchmarking data sets from PubChem BioAssay data: current scenario and room for improvement. Int. J. Mol. Sci. 21, 4380 (2020).

pubmed: 32575564 pmcid: 7352161 doi: 10.3390/ijms21124380

Lagarde, N., Zagury, J.-F. & Montes, M. Benchmarking data sets for the evaluation of virtual ligand screening methods: review and perspectives. J. Chem. Inf. Model. 55, 1297–1307 (2015).

pubmed: 26038804 doi: 10.1021/acs.jcim.5b00090

O’Boyle, N. M. et al. Open Babel: an open chemical toolbox. J. Cheminform. 3, 33 (2011).

pubmed: 21982300 pmcid: 3198950 doi: 10.1186/1758-2946-3-33

Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).

pubmed: 15264254 doi: 10.1002/jcc.20084

Dos Santos, R. N., Ferreira, L. G. & Andricopulo, A. D. Practices in molecular docking and structure-based virtual screening. Methods Mol. Biol. 1762, 31–50 (2018).

pubmed: 29594766 doi: 10.1007/978-1-4939-7756-7_3

Da Silva, F., Desaphy, J. & Rognan, D. IChem: a versatile toolkit for detecting, comparing, and predicting protein-ligand interactions. ChemMedChem 13, 507–510 (2018).

pubmed: 29024463 doi: 10.1002/cmdc.201700505

Tran-Nguyen, V. K., Da Silva, F., Bret, G. & Rognan, D. All in one: cavity detection, druggability estimate, cavity-based pharmacophore perception, and virtual screening. J. Chem. Inf. Model. 59, 573–585 (2019).

pubmed: 30563339 doi: 10.1021/acs.jcim.8b00684

Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading. J. Comput. Chem. 31, 455–461 (2010).

pubmed: 19499576 pmcid: 3041641 doi: 10.1002/jcc.21334

Tran-Nguyen, V. K., Simeon, S., Junaid, M. & Ballester, P. J. Structure-based virtual screening for PDL1 dimerizers: evaluating generic scoring functions. Curr. Res. Struct. Biol. 4, 206–210 (2022).

pubmed: 35769111 pmcid: 9234010 doi: 10.1016/j.crstbi.2022.06.002

Eriksson, L. et al. Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environ. Health Perspect. 111, 1361–1375 (2003).

pubmed: 12896860 pmcid: 1241620 doi: 10.1289/ehp.5758

Sahigara, F. et al. Comparison of different approaches to define the applicability domain of QSAR models. Molecules 17, 4791–4810 (2012).

pubmed: 22534664 pmcid: 6268288 doi: 10.3390/molecules17054791

Carrio, P., Pinto, M., Ecker, G., Sanz, F. & Pastor, M. Applicability domain analysis (ADAN): a robust method for assessing the reliability of drug property predictions. J. Chem. Inf. Model. 54, 1500–1511 (2014).

pubmed: 24821140 doi: 10.1021/ci500172z

Sahlin, U., Jeliazkova, N. & Öberg, T. Applicability domain dependent predictive uncertainty in QSAR regressions. Mol. Inform. 33, 26–35 (2014).

pubmed: 27485196 doi: 10.1002/minf.201200131

Kaneko, H. & Funatsu, K. Applicability domain based on ensemble learning in classification and regression analyses. J. Chem. Inf. Model. 54, 2469–2482 (2014).

pubmed: 25119661 doi: 10.1021/ci500364e

Ballester, P. J. & Mitchell, J. B. O. Comments on “Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets”: significance for the validation of scoring functions. J. Chem. Inf. Model. 51, 1739–1741 (2011).

pubmed: 21591735 doi: 10.1021/ci200057e

Tran-Nguyen, V. K., Bret, G. & Rognan, D. True accuracy of fast scoring functions to predict high-throughput screening data from docking poses: the simpler the better. J. Chem. Inf. Model. 61, 2788–2797 (2021).

pubmed: 34109796 doi: 10.1021/acs.jcim.1c00292

Stepniewska-Dziubinska, M. M., Zielenkiewicz, P. & Siedlecki, P. Development and evaluation of a deep learning model for protein-ligand binding affinity prediction. Bioinformatics 34, 3666–3674 (2018).

pubmed: 29757353 pmcid: 6198856 doi: 10.1093/bioinformatics/bty374

Wang, C. & Zhang, Y. Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J. Comput. Chem. 38, 169–177 (2017).

pubmed: 27859414 doi: 10.1002/jcc.24667

Shen, C. et al. Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening? Brief. Bioinform. 22, bbaa410 (2021).

pubmed: 33418562 doi: 10.1093/bib/bbaa410

McNutt, A. T. et al. GNINA 1.0: molecular docking with deep learning. J. Cheminform. 13, 43 (2021).

pubmed: 34108002 pmcid: 8191141 doi: 10.1186/s13321-021-00522-2

Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS One 10, e0118432 (2015).

pubmed: 25738806 pmcid: 4349800 doi: 10.1371/journal.pone.0118432

Liu, S. et al. Practical model selection for prospective virtual screening. J. Chem. Inf. Model. 59, 282–293 (2019).

pubmed: 30500183 doi: 10.1021/acs.jcim.8b00363

Mendez, D. et al. ChEMBL: toward direct deposition of bioassay data. Nucleic Acids Res. 47, D930–D940 (2019).

pubmed: 30398643 doi: 10.1093/nar/gky1075

Papadatos, G. et al. SureChEMBL: a large-scale, chemically annotated patent document database. Nucleic Acids Res. 44, D1220–D1228 (2016).

pubmed: 26582922 doi: 10.1093/nar/gkv1253

Sunghwan, K. et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 49, D1388–D1395 (2021).

doi: 10.1093/nar/gkaa971

McCloskey, K. et al. Machine learning on DNA-encoded libraries: a new paradigm for hit finding. J. Med. Chem. 63, 8857–8866 (2020).

pubmed: 32525674 doi: 10.1021/acs.jmedchem.0c00452

Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).

pubmed: 8709122 doi: 10.1021/jm9602928

Baell, J. B. & Holloway, G. A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J. Med. Chem. 53, 2719–2740 (2010).

pubmed: 20131845 doi: 10.1021/jm901137j

Gilberg, E., Jasial, S., Stumpfe, D., Dimova, D. & Bajorath, J. Highly promiscuous small molecules from biological screening assays include many pan-assay interference compounds but also candidates for polypharmacology. J. Med. Chem. 59, 10285–10290 (2016).

pubmed: 27809519 doi: 10.1021/acs.jmedchem.6b01314

Baell, J. B. Feeling nature’s PAINS: natural products, natural product drugs, and pan assay interference compounds (PAINS). J. Nat. Prod. 79, 616–628 (2016).

pubmed: 26900761 doi: 10.1021/acs.jnatprod.5b00947

Capuzzi, S. J., Muratov, E. N. & Tropsha, A. Phantom PAINS: problems with the utility of alerts for Pan-Assay INterference CompoundS. J. Chem. Inf. Model. 57, 417–427 (2017).

pubmed: 28165734 pmcid: 5411023 doi: 10.1021/acs.jcim.6b00465

Kenny, P. W. Comment on the ecstasy and agony of assay interference compounds. J. Chem. Inf. Model. 57, 2640–2645 (2017).

pubmed: 29048168 doi: 10.1021/acs.jcim.7b00313

Baell, J. B. & Nissink, J. W. Seven year itch: pan-assay interference compounds (PAINS) in 2017—utility and limitations. ACS Chem. Biol. 13, 36–44 (2018).

pubmed: 29202222 doi: 10.1021/acschembio.7b00903

Stork, C., Chen, Y., Sicho, M. & Kirchmair, J. Hit Dexter 2.0: machine-learning models for the prediction of frequent hitters. J. Chem. Inf. Model. 59, 1030–1043 (2019).

pubmed: 30624935 doi: 10.1021/acs.jcim.8b00677

Stork, C. et al. NERDD: a web portal providing access to in silico tools for drug discovery. Bioinformatics 36, 1291–1292 (2020).

pubmed: 32077475 doi: 10.1093/bioinformatics/btz695

Pearl, L. H. Review: the HSP90 molecular chaperone-an enigmatic ATPase. Biopolymers 105, 594–607 (2016).

pubmed: 26991466 pmcid: 4879513 doi: 10.1002/bip.22835

Sgobba, M., Forestiero, R., Degliesposti, G. & Rastelli, G. Exploring the binding site of C-terminal hsp90 inhibitors. J. Chem. Inf. Model. 50, 1522–1528 (2010).

pubmed: 20828111 doi: 10.1021/ci1001857

Halgren, T. A. Identifying and characterizing binding sites and assessing druggability. J. Chem. Inf. Model. 49, 377–389 (2009).

pubmed: 19434839 doi: 10.1021/ci800324m

Molecular Operating Environment (MOE), 2020.09. Chemical Computing Group https://www.chemcomp.com/Products.htm (2022).

Smyth, M. S. & Martin, J. H. J. x Ray crystallography. Mol. Pathol. 53, 8–14 (2000).

pubmed: 10884915 pmcid: 1186895 doi: 10.1136/mp.53.1.8

Wüthrich, K. Protein structure determination in solution by NMR spectroscopy. J. Biol. Chem. 265, 22059–22062 (1990).

pubmed: 2266107 doi: 10.1016/S0021-9258(18)45665-7

Purslow, J. A., Khatiwada, B., Bayro, M. J. & Venditti, V. NMR methods for structural characterization of protein-protein complexes. Front. Mol. Biosci. 7, 9 (2020).

pubmed: 32047754 pmcid: 6997237 doi: 10.3389/fmolb.2020.00009

Fowler, N. J., Sljoka, A. & Williamson, M. P. A method for validating the accuracy of NMR protein structures. Nat. Commun. 11, 6321 (2020).

pubmed: 33339822 pmcid: 7749147 doi: 10.1038/s41467-020-20177-1

Hu, Y. et al. NMR-based methods for protein analysis. Anal. Chem. 93, 1866–1879 (2021).

pubmed: 33439619 doi: 10.1021/acs.analchem.0c03830

Callaway, E. Revolutionary cryo-EM is taking over structural biology. Nature 578, 201 (2020).

pubmed: 32047310 doi: 10.1038/d41586-020-00341-9

Wu, X. & Rapoport, T. A. Cryo-EM structure determination of small proteins by nanobody-binding scaffolds (Legobodies). Proc. Natl Acad. Sci. USA 118, e2115001118 (2021).

pubmed: 34620716 pmcid: 8521671 doi: 10.1073/pnas.2115001118

Berman, H. M. et al. The protein data bank. Nucleic Acids Res. 28, 235–242 (2000).

pubmed: 10592235 pmcid: 102472 doi: 10.1093/nar/28.1.235

Oleinikovas, V., Saladino, G., Cossins, B. P. & Gervasio, F. L. Understanding cryptic pocket formation in protein targets by enhanced sampling simulations. J. Am. Chem. Soc. 138, 14257–14263 (2016).

pubmed: 27726386 doi: 10.1021/jacs.6b05425

Vajda, S., Beglov, D., Wakefield, A. E., Egbert, M. & Whitty, A. Cryptic binding sites on proteins: definition, detection, and druggability. Curr. Opin. Chem. Biol. 44, 1–8 (2018).

pubmed: 29800865 pmcid: 6088748 doi: 10.1016/j.cbpa.2018.05.003

Bekker, G. J., Fukuda, I., Higo, J., Fukunishi, Y. & Kamiya, N. Cryptic-site binding mechanism of medium-sized Bcl-xL inhibiting compounds elucidated by McMD-based dynamic docking simulations. Sci. Rep. 11, 5046 (2021).

pubmed: 33658550 pmcid: 7930018 doi: 10.1038/s41598-021-84488-z

Zhu, J., Hoop, C. L., Case, D. A. & Baum, J. Cryptic binding sites become accessible through surface reconstruction of the type I collagen fibril. Sci. Rep. 8, 16646 (2018).

pubmed: 30413772 pmcid: 6226522 doi: 10.1038/s41598-018-34616-z

Posner, B. A., Xi, H. & Mills, J. E. Enhanced HTS hit selection via a local hit rate analysis. J. Chem. Inf. Model. 49, 2202–2210 (2009).

pubmed: 19795815 doi: 10.1021/ci900113d

Stein, R. M. et al. Property-unmatched decoys in docking benchmarks. J. Chem. Inf. Model. 61, 699–714 (2021).

pubmed: 33494610 pmcid: 7913603 doi: 10.1021/acs.jcim.0c00598

Imrie, F., Bradley, A. R. & Deane, C. M. Generating property-matched decoy molecules using deep learning. Bioinformatics 37, 2134–2141 (2021).

pubmed: 33532838 pmcid: 8352508 doi: 10.1093/bioinformatics/btab080

Irwin, J. J., Sterling, T., Mysinger, M. M., Bolstad, E. S. & Coleman, R. G. ZINC: a free tool to discover chemistry for biology. J. Chem. Inf. Model. 52, 1757–1768 (2012).

pubmed: 22587354 pmcid: 3402020 doi: 10.1021/ci3001277

Réau, M., Langenfeld, F., Zagury, J.-F., Lagarde, N. & Montes, M. Decoys selection in benchmarking datasets: overview and perspectives. Front. Pharmacol. 9, 11 (2018).

pubmed: 29416509 pmcid: 5787549 doi: 10.3389/fphar.2018.00011

Moriwaki, H., Tian, Y.-S., Kawashita, N. & Takagi, T. Mordred: a molecular descriptor calculator. J. Cheminform. 10, 4 (2018).

pubmed: 29411163 pmcid: 5801138 doi: 10.1186/s13321-018-0258-y

Barillari, C., Taylor, J., Viner, R. & Essex, J. W. Classification of water molecules in protein binding sites. J. Am. Chem. Soc. 129, 2577–2587 (2007).

pubmed: 17288418 doi: 10.1021/ja066980q

Liu, T., Lin, Y., Wen, X., Jorissen, R. N. & Gilson, M. K. BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res. 35, D198–D201 (2007).

pubmed: 17145705 doi: 10.1093/nar/gkl999

Hernández-Hernández, S. & Ballester, P. J. On the best way to cluster NCI-60 molecules. Biomolecules 13, 498 (2023).

pubmed: 36979433 pmcid: 10046274 doi: 10.3390/biom13030498

Butina, D. Unsupervised data base clustering based on Daylight’s fingerprint and Tanimoto similarity: a fast and automated way to cluster small and large data sets. J. Chem. Inf. Comput. Sci. 39, 747–750 (1999).

doi: 10.1021/ci9803381

Gómez-Sacristán, P. et al. Structure-based virtual screening for PDL1 dimerizers is boosted by inactive-enriched machine-learning models exploiting patent data. Zenodo https://zenodo.org/record/6226320/export/dcite4 (2023).

Radifar, M., Yuniarti, N. & Istyastono, E. P. PyPLIF: Python-based protein-ligand interaction fingerprinting. Bioinformation 9, 325–328 (2013).

pubmed: 23559752 pmcid: 3607193 doi: 10.6026/97320630009325

Chupakhin, V., Marcou, G., Gaspar, H. & Varnek, A. Simple ligand–receptor interaction descriptor (SILIRID) for alignment-free binding site comparison. Comput. Struct. Biotechnol. J. 10, 33–37 (2014).

pubmed: 25210596 pmcid: 4151984 doi: 10.1016/j.csbj.2014.05.004

Da, C. & Kireev, D. Structural protein–ligand interaction fingerprints (SPLIF) for structure-based virtual screening: method and benchmark study. J. Chem. Inf. Model. 54, 2555–2561 (2014).

pubmed: 25116840 pmcid: 4170813 doi: 10.1021/ci500319f

Ballester, P. J., Schreyer, A. & Blundell, T. L. Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity? J. Chem. Inf. Model. 54, 944–955 (2014).

pubmed: 24528282 pmcid: 3966527 doi: 10.1021/ci500091r

Li, H., Leung, K.-S., Wong, M.-H. & Ballester, P. J. Improving AutoDock Vina using Random Forest: the growing accuracy of binding affinity prediction by the effective exploitation of larger data sets. Mol. Inform. 34, 115–126 (2015).

pubmed: 27490034 doi: 10.1002/minf.201400132

Wójcikowski, M., Kukiełka, M., Stepniewska-Dziubinska, M. M. & Siedlecki, P. Development of a protein-ligand extended connectivity (PLEC) fingerprint and its application for binding affinity predictions. Bioinformatics 35, 1334–1341 (2019).

pubmed: 30202917 doi: 10.1093/bioinformatics/bty757

Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).

pubmed: 29629118 doi: 10.1039/C7SC02664A

Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

pubmed: 20426451 doi: 10.1021/ci100050t

Ballester, P. J. et al. Hierarchical virtual screening for the discovery of new molecular scaffolds in antibacterial hit identification. J. R. Soc. Interface 9, 3196–3207 (2012).

pubmed: 22933186 pmcid: 3481598 doi: 10.1098/rsif.2012.0569

Li, L. et al. Target-specific support vector machine scoring in structure-based virtual screening: computational validation, in vitro testing in kinases, and effects on lung cancer cell proliferation. J. Chem. Inf. Model. 51, 755–759 (2011).

pubmed: 21438548 pmcid: 3092157 doi: 10.1021/ci100490w

Durrant, J. D. & McCammon, J. A. NNScore: a neural-network-based scoring function for the characterization of protein−ligand complexes. J. Chem. Inf. Model. 50, 1865–1871 (2010).

pubmed: 20845954 pmcid: 2964041 doi: 10.1021/ci100244v

Durrant, J. D. & McCammon, J. A. NNScore 2.0: a neural-network receptor–ligand scoring function. J. Chem. Inf. Model. 51, 2897–2903 (2011).

pubmed: 22017367 pmcid: 3225089 doi: 10.1021/ci2003889

Wang, D. et al. Improving the virtual screening ability of target-specific scoring functions using deep learning methods. Front. Pharmacol. 10, 924 (2019).

pubmed: 31507420 pmcid: 6713720 doi: 10.3389/fphar.2019.00924

Ashtawy, H. M. & Mahapatra, N. R. Task-specific scoring functions for predicting ligand binding poses and affinity and for screening enrichment. J. Chem. Inf. Model. 58, 119–133 (2018).

pubmed: 29190087 doi: 10.1021/acs.jcim.7b00309

Turner, R. et al. Bayesian optimization is superior to random search for machine learning hyperparameter tuning: analysis of the Black-Box Optimization Challenge 2020. Proc. Mach. Learn. Res. 133, 3–26 (2021).

Cowen-Rivers, A. I. et al. HEBO: pushing the limits of sample-efficient hyperparameter optimisation. J. Artif. Intell. Res. 74, 1269–1349 (2022).

doi: 10.1613/jair.1.13643

Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: a next-generation hyperparameter optimization framework. in The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’19), August 4–8, 2019, Anchorage, AK, USA. https://doi.org/10.1145/3292500.3330701 (2019).

Case, D. A. et al. The Amber biomolecular simulation programs. J. Comput. Chem. 26, 1668–1688 (2005).

pubmed: 16200636 pmcid: 1989667 doi: 10.1002/jcc.20290

Götz, A. W. et al. Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. Generalized Born. J. Chem. Theory Comput. 8, 1542–1555 (2012).

pubmed: 22582031 pmcid: 3348677 doi: 10.1021/ct200909j

Berendsen, H. J. C., van der Spoel, D. & van Drunen, R. GROMACS: a message-passing parallel molecular dynamics implementation. Comput. Phys. Commun. 91, 43–56 (1995).

doi: 10.1016/0010-4655(95)00042-E

Makarewicz, T. & Kaźmierkiewicz, R. Molecular dynamics simulation by GROMACS using GUI plugin for PyMOL. J. Chem. Inf. Model. 53, 1229–1234 (2013).

pubmed: 23611462 doi: 10.1021/ci400071x

van Dijk, M., Wassenaar, T. A. & Bonvin, A. M. J. J. A flexible, grid-enabled web portal for GROMACS molecular dynamics simulations. J. Chem. Theory Comput. 8, 3463–3472 (2012).

pubmed: 26592996 doi: 10.1021/ct300102d

Bietz, S., Urbaczek, S., Schulz, B. & Rarey, M. Protoss: a holistic approach to predict tautomers and protonation states in protein-ligand complexes. J. Cheminform. 6, 12 (2014).

pubmed: 24694216 pmcid: 4019353 doi: 10.1186/1758-2946-6-12

Sunseri, J. & Koes, D. R. Virtual screening with Gnina 1.0. Molecules 26, 7369 (2021).

pubmed: 34885952 pmcid: 8659095 doi: 10.3390/molecules26237369

A practical guide to machine-learning scoring for structure-based virtual screening.

Journal

Informations de publication

Résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Viet-Khoa Tran-Nguyen (VK)

Muhammad Junaid (M)

Saw Simeon (S)

Pedro J Ballester (PJ)

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Decoding the genomic terrain: functional insights into 14 chemosensory proteins in whitefly Bemisia tabaci Asia II-1.

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Understanding the role of machine learning in predicting progression of osteoarthritis.

Classifications MeSH