Topological regression as an interpretable and efficient tool for quantitative structure-activity relationship modeling.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
13 Jun 2024
Historique:
received: 13 05 2023
accepted: 04 06 2024
medline: 14 6 2024
pubmed: 14 6 2024
entrez: 13 6 2024
Statut: epublish

Résumé

Quantitative structure-activity relationship (QSAR) modeling is a powerful tool for drug discovery, yet the lack of interpretability of commonly used QSAR models hinders their application in molecular design. We propose a similarity-based regression framework, topological regression (TR), that offers a statistically grounded, computationally fast, and interpretable technique to predict drug responses. We compare the predictive performance of TR on 530 ChEMBL human target activity datasets against the predictive performance of deep-learning-based QSAR models. Our results suggest that our sparse TR model can achieve equal, if not better, performance than the deep learning-based QSAR models and provide better intuitive interpretation by extracting an approximate isometry between the chemical space of the drugs and their activity space.

Identifiants

pubmed: 38871711
doi: 10.1038/s41467-024-49372-0
pii: 10.1038/s41467-024-49372-0
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

5072

Subventions

Organisme : National Science Foundation (NSF)
ID : 2007903
Organisme : National Science Foundation (NSF)
ID : 2007418

Informations de copyright

© 2024. The Author(s).

Références

Neves, B. J. et al. Qsar-based virtual screening: advances and applications in drug discovery. Front. Pharmacol. 9, 1275 (2018).
pubmed: 30524275 pmcid: 6262347 doi: 10.3389/fphar.2018.01275
Kwon, S., Bae, H., Jo, J. & Yoon, S. Comprehensive ensemble in qsar prediction for drug discovery. BMC Bioinformatics 20, 1–12 (2019).
doi: 10.1186/s12859-019-3135-4
Cherkasov, A. et al. Qsar modeling: where have you been? where are you going to? J. Medicinal Chem. 57, 4977–5010 (2014).
doi: 10.1021/jm4004285
Grisoni, F., Ballabio, D., Todeschini, R. & Consonni, V. Molecular descriptors for structure–activity applications: a hands-on approach. Methods Mol. Biol. 1800, 3–53 (2018).
Yap, C. W. Padel-descriptor: An open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 32, 1466–1474 (2011).
pubmed: 21425294 doi: 10.1002/jcc.21707
Moriwaki, H., Tian, Y.-S., Kawashita, N. & Takagi, T. Mordred: a molecular descriptor calculator. J. Cheminform. 10, 1–14 (2018).
doi: 10.1186/s13321-018-0258-y
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inform. Modeling 50, 742–754 (2010).
doi: 10.1021/ci100050t
Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inform. Modeling 59, 3370–3388 (2019).
doi: 10.1021/acs.jcim.9b00237
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–70213 (2020).
pubmed: 32084340 pmcid: 8349178 doi: 10.1016/j.cell.2020.01.021
Liu, G. et al. Deep learning-guided discovery of an antibiotic targeting acinetobacter baumannii. Nat. Chem. Biol. 19, 1342–1350 (2023).
Isert, C., Kromann, J. C., Stiefl, N., Schneider, G. & Lewis, R. A. Machine learning for fast, quantum mechanics-based approximation of drug lipophilicity. ACS Omega 8, 2046–2056 (2023).
pubmed: 36687099 pmcid: 9850743 doi: 10.1021/acsomega.2c05607
Wang, S., Guo, Y., Wang, Y., Sun, H. & Huang, J. Smiles-bert: large scale unsupervised pre-training for molecular property prediction. In: Proc. 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics 429–436 (IEEE, 2019).
Karpov, P., Godin, G. & Tetko, I. V. Transformer-cnn: Swiss knife for qsar modeling and interpretation. Journal of cheminformatics 12, 1–12 (2020).
doi: 10.1186/s13321-020-00423-w
Doshi-Velez, F. & Kim, B. Towards a rigorous science of interpretable machine learning. Preprint at https://arxiv.org/abs/1702.08608 (2017).
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In: International Conference on Machine Learning. (eds Precup, D. & The, Y. W.) 3319–3328 (PMLR, 2017).
Nembrini, S., König, I. R. & Wright, M. N. The revival of the gini importance? Bioinformatics 34, 3711–3718 (2018).
pubmed: 29757357 pmcid: 6198850 doi: 10.1093/bioinformatics/bty373
Altmann, A., Toloşi, L., Sander, O. & Lengauer, T. Permutation importance: a corrected feature importance measure. Bioinformatics 26, 1340–1347 (2010).
pubmed: 20385727 doi: 10.1093/bioinformatics/btq134
Smilkov, D., Thorat, N., Kim, B., Viégas, F. & Wattenberg, M. Smoothgrad: removing noise by adding noise. Preprint at https://arxiv.org/abs/1706.03825 (2017).
Koh, P.W. & Liang, P. Understanding black-box predictions via influence functions. In: International Conference on Machine Learning (eds Precup, D. & The, Y. W.) 1885–1894 (PMLR, 2017).
Ribeiro, M.T., Singh, S. & Guestrin, C. "why should i trust you?” explaining the predictions of any classifier. In: Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (ed Krishnapuram, B.) 1135–1144 (ACM, Digital Library, 2016).
Lundberg, S.M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inform. Process. Syst. 30 (2017).
Rodríguez-Pérez, R. & Bajorath, J. Interpretation of compound activity predictions from complex machine learning models using local approximations and shapley values. J. Medicinal Chem. 63, 8761–8777 (2019).
doi: 10.1021/acs.jmedchem.9b01101
Mothilal, R.K., Sharma, A. & Tan, C. Explaining machine learning classifiers through diverse counterfactual explanations. In Proc. 2020 Conference on Fairness, Accountability, and Transparency 607–617 (2020).
Wellawatte, G. P., Seshadri, A. & White, A. D. Model agnostic generation of counterfactual explanations for molecules. Chem. Sci. 13, 3697–3705 (2022).
pubmed: 35432902 pmcid: 8966631 doi: 10.1039/D1SC05259D
Marchese Robinson, R. L., Palczewska, A., Palczewski, J. & Kidley, N. Comparison of the predictive performance and interpretability of random forest and linear models on benchmark data sets. J. Chem. Inform. modeling 57, 1773–1792 (2017).
doi: 10.1021/acs.jcim.6b00753
Polishchuk, P. Interpretation of quantitative structure–activity relationship models: past, present, and future. J. Chem. Inform. Modeling 57, 2618–2639 (2017).
doi: 10.1021/acs.jcim.7b00274
Balfer, J. & Bajorath, J. Visualization and interpretation of support vector machine activity predictions. J. Chem. Inform. Modeling 55, 1136–1147 (2015).
doi: 10.1021/acs.jcim.5b00175
Sheridan, R. P. Interpretation of qsar models by coloring atoms according to changes in predicted activity: how robust is it? J. Chem. Inform. Modeling 59, 1324–1337 (2019).
doi: 10.1021/acs.jcim.8b00825
Shoombuatong, W. et al. Towards the Revival of Interpretable Qsar Models. Advances in Qsar Modeling: Applications in Pharmaceutical, Chemical, Food, Agricultural and Environmental Sciences 3–55 (Springer, 2017).
Xiong, Z. et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Medicinal Chem. 63, 8749–8760 (2019).
doi: 10.1021/acs.jmedchem.9b00959
Baldassarre, F. & Azizpour, H. Explainability techniques for graph convolutional networks. Preprint at https://arxiv.org/abs/1905.13686 (2019).
Weber, J. K. et al. Simplified, interpretable graph convolutional neural networks for small molecule activity prediction. J. Comput.-Aided Mol. Des. 36, 391–404 (2021).
Ding, H., Takigawa, I., Mamitsuka, H. & Zhu, S. Similarity-based machine learning methods for predicting drug–target interactions: a brief review. Briefings Bioinform. 15, 734–747 (2014).
doi: 10.1093/bib/bbt056
Yamanishi, Y., Araki, M., Gutteridge, A., Honda, W. & Kanehisa, M. Prediction of drug–target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 24, 232–240 (2008).
doi: 10.1093/bioinformatics/btn162
Gajewicz-Skretna, A., Furuhama, A., Yamamoto, H. & Suzuki, N. Generating accurate in silico predictions of acute aquatic toxicity for a range of organic chemicals: Towards similarity-based machine learning methods. Chemosphere 280, 130681 (2021).
pubmed: 34162070 doi: 10.1016/j.chemosphere.2021.130681
Jacob, L. & Vert, J.-P. Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24, 2149–2156 (2008).
pubmed: 18676415 pmcid: 2553441 doi: 10.1093/bioinformatics/btn409
Patlewicz, G., Helman, G., Pradeep, P. & Shah, I. Navigating through the minefield of read-across tools: a review of in silico tools for grouping. Comput. Toxicol. 3, 1–18 (2017).
doi: 10.1016/j.comtox.2017.05.003
Wawer, M., Peltason, L., Weskamp, N., Teckentrup, A. & Bajorath, J. Structure- activity relationship anatomy by network-like similarity graphs and local structure- activity relationship indices. J. Medicinal Chem. 51, 6075–6084 (2008).
doi: 10.1021/jm800867g
Keiser, M. J. et al. Relating protein pharmacology by ligand chemistry. Nat. Biotechnol. 25, 197–206 (2007).
pubmed: 17287757 doi: 10.1038/nbt1284
Lo, Y.-C. et al. Large-scale chemical similarity networks for target profiling of compounds identified in cell-based chemical screens. PLoS Comput. Biol. 11, 1004153 (2015).
doi: 10.1371/journal.pcbi.1004153
Lounkine, E. et al. Large-scale prediction and testing of drug activity on side-effect targets. Nature 486, 361–367 (2012).
pubmed: 22722194 pmcid: 3383642 doi: 10.1038/nature11159
Keiser, M. J. et al. Predicting new molecular targets for known drugs. Nature 462, 175–181 (2009).
pubmed: 19881490 pmcid: 2784146 doi: 10.1038/nature08506
He, X., Cai, D. & Niyogi, P. Laplacian score for feature selection. Adv. Neural Inform. Process. Syst. 18 (2005).
Sheikhpour, R., Sarram, M. A., Gharaghani, S. & Chahooki, M. A. Z. Feature selection based on graph laplacian by using compounds with known and unknown activities. J. Chemometrics 31, 2899 (2017).
doi: 10.1002/cem.2899
Valizade Hasanloei, M. A., Sheikhpour, R., Sarram, M. A., Sheikhpour, E. & Sharifi, H. A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities. J. Comput.-Aided Mol. Des. 32, 375–384 (2018).
pubmed: 29280033 doi: 10.1007/s10822-017-0094-6
Cruz-Monteagudo, M. et al. Activity cliffs in drug discovery: Dr jekyll or mr hyde? Drug Discov. Today 19, 1069–1080 (2014).
pubmed: 24560935 doi: 10.1016/j.drudis.2014.02.003
Stumpfe, D., Hu, H. & Bajorath, J. Evolving concept of activity cliffs. ACS Omega 4, 14360–14368 (2019).
pubmed: 31528788 pmcid: 6740043 doi: 10.1021/acsomega.9b02221
Maggiora, G. M. On outliers and activity cliffs why QSAR often disappoints. J. Chem. Inform. Modeling 46, 1535–1535 (2006).
doi: 10.1021/ci060117s
Hu, H. & Bajorath, J. Simplified activity cliff network representations with high interpretability and immediate access to SAR information. J. Comput.-Aided Mol. Des. 34, 943–952 (2020).
pubmed: 32500478 pmcid: 7367913 doi: 10.1007/s10822-020-00319-9
Weinberger, K.Q., Blitzer, J. & Saul, L. Distance metric learning for large margin nearest neighbor classification. Adv. Neural Inform. Process. Syst. 18 (2005).
Weinberger, K.Q. & Tesauro, G. in Artificial Intelligence and Statistics (eds. Meila, M. & Shen, x) 612–619 (PMLR, 2007).
Kireeva, N. V., Ovchinnikova, S. I., Kuznetsov, S. L., Kazennov, A. M. & Tsivadze, A. Y. Impact of distance-based metric learning on classification and visualization model performance and structure–activity landscapes. J. Comput.-aided Mol. Des. 28, 61–73 (2014).
pubmed: 24493411 doi: 10.1007/s10822-014-9719-1
Horvath, D., Marcou, G. & Varnek, A. In (ed Roy, K.) Advances in QSAR Modeling: Applications in Pharmaceutical, Chemical, Food, Agricultural and Environmental Sciences 167–199 (Springer Verlag, 2017).
Fröhlich, H., Wegner, J. K., Sieker, F. & Zell, A. Kernel functions for attributed molecular graphs—a new similarity-based approach to ADME prediction in classification and regression. QSAR Combinatorial Sci. 25, 317–326 (2006).
doi: 10.1002/qsar.200510135
Mohr, J. A., Jain, B. J. & Obermayer, K. Molecule kernels: a descriptor-and alignment-free quantitative structure–activity relationship approach. J. Chem. Inform. Modeling 48, 1868–1881 (2008).
doi: 10.1021/ci800144y
Charlton, M., Fotheringham, S. & Brunsdon, C. Geographically Weighted Regression Vol. 2, White paper (National Centre for Geocomputation, National University of Ireland Maynooth, 2009).
Johnson, R.A. & Dean, W.W. et al. Applied Multivariate Statistical Analysis, 5th edn. (Prentice Hall, NJ, 2002).
Gaulton, A. et al. The ChEMBL database in 2017. Nucleic Acids Res. 45, 945–954 (2017).
doi: 10.1093/nar/gkw1074
Bosc, N., Atkinson, F., Felix, E., Gaulton, A., Hersey, A. & Leach, A. R. Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery. J. Cheminform. 11, 1–16 (2019).
Carroll, R. J. & Ruppert, D. Prediction and tolerance intervals with transformation and/or weighting. Technometrics 33, 197–210 (1991).
doi: 10.1080/00401706.1991.10484807
Asmussen, S., Jensen, J. L. & Rojas-Nandayapa, L. On the Laplace transform of the lognormal distribution. Methodol. Comput. Appl. Probab. 18, 441–458 (2016).
doi: 10.1007/s11009-014-9430-7
Fotheringham, A.S., Brunsdon, C. & Charlton, M. Geographically Weighted Regression: the Analysis of Spatially Varying Relationships (John Wiley & Sons, 2003).
Zhang, R., Nolte, D., Sanchez-Villalobos, C., Ghosh, S. & Pal, R. Topological Regression as an interpretable and efficient tool for Quantitative Structure-Activity Relationship Modeling. Zenodo https://doi.org/10.5281/zenodo.10929477 (2024).

Auteurs

Ruibo Zhang (R)

Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA.

Daniel Nolte (D)

Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA.

Cesar Sanchez-Villalobos (C)

Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA.

Souparno Ghosh (S)

Department of Statistics, University of Nebraska - Lincoln, Lincoln, NB, 68588, USA. sghosh5@unl.edu.

Ranadip Pal (R)

Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA. ranadip.pal@ttu.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH