Will we ever be able to accurately predict solubility?


Journal

Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192

Informations de publication

Date de publication:
18 Mar 2024
Historique:
received: 01 09 2023
accepted: 29 02 2024
medline: 19 3 2024
pubmed: 19 3 2024
entrez: 19 3 2024
Statut: epublish

Résumé

Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available.

Identifiants

pubmed: 38499581
doi: 10.1038/s41597-024-03105-6
pii: 10.1038/s41597-024-03105-6
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

303

Informations de copyright

© 2024. The Author(s).

Références

Kennedy, T. Managing the drug discovery/development interface. Drug Discov. Today 2, 436–444 (1997).
doi: 10.1016/S1359-6446(97)01099-4
Kola, I. & Landis, J. Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discov. 3, 711–716 (2004).
pubmed: 15286737 doi: 10.1038/nrd1470
Millard, J., Alvarez-Núñez, F. & Yalkowsky, S. Solubilization by cosolvents. Establishing useful constants for the log-linear model. Int. J. Pharm. 245, 153–166 (2002).
pubmed: 12270252 doi: 10.1016/S0378-5173(02)00334-4
Jouyban, A. & Abolghassemi Fakhree, M. A. Solubility prediction methods for drug/drug like molecules. Recent Pat. Chem. Eng. 1, 220–231 (2008).
doi: 10.2174/2211334710801030220
van de Waterbeemd, H. Improving compound quality through in vitro and in silico physicochemical profiling. Chem. Biodivers. 6, 1760–1766 (2009).
pubmed: 19937820 doi: 10.1002/cbdv.200900056
Llompart, P. et al Will we ever be able to accurately predict solubility? Recherche Data Gouv https://doi.org/10.57745/CZVZIA (2023)
Wang, J. & Hou, T. Recent advances on aqueous solubility prediction. Comb. Chem. High Throughput Screen. 14, 328–338 (2011).
pubmed: 21470182 doi: 10.2174/138620711795508331
Elder, D. P., Holm, R. & Diego, H. L. Use of pharmaceutical salts and cocrystals to address the issue of poor solubility. Int. J. Pharm. 453, 88–100 (2013). de.
pubmed: 23182973 doi: 10.1016/j.ijpharm.2012.11.028
Saal, C. & Petereit, A. C. Optimizing solubility: Kinetic versus thermodynamic solubility temptations and risks. Eur. J. Pharm. Sci. 47, 589–595 (2012).
pubmed: 22885099 doi: 10.1016/j.ejps.2012.07.019
Wang, J. et al. Development of reliable aqueous solubility models and their application in druglike analysis. J. Chem. Inf. Model. 47, 1395–1404 (2007).
pubmed: 17569522 doi: 10.1021/ci700096r
Johnson, S. R. & Zheng, W. Recent progress in the computational prediction of aqueous solubility and absorption. AAPS J. 8, E27–E40 (2006).
pubmed: 16584131 pmcid: 2751421 doi: 10.1208/aapsj080104
Delaney, J. S. Predicting aqueous solubility from structure. Drug Discov. Today 10, 289–295 (2005).
pubmed: 15708748 doi: 10.1016/S1359-6446(04)03365-3
OECD. Test No. 105: Water Solubility. OECD Guidelines for the Testing of Chemicals, Section 1 https://read.oecd-ilibrary.org/environment/test-no-105-water-solubility_9789264069589-en (1995).
Llinàs, A., Glen, R. C. & Goodman, J. M. Solubility Challenge: Can You Predict Solubilities of 32 Molecules Using a Database of 100 Reliable Measurements? J. Chem. Inf. Model. 48, 1289–1303 (2008).
pubmed: 18624401 doi: 10.1021/ci800058v
Stuart, M. & Box, K. Chasing Equilibrium:  Measuring the Intrinsic Solubility of Weak Acids and Bases. Anal. Chem. 77, 983–990 (2005).
pubmed: 15858976 doi: 10.1021/ac048767n
Huuskonen, J., Rantanen, J. & Livingstone, D. Prediction of aqueous solubility for a diverse set of organic compounds based on atom-type electrotopological state indices. Eur. J. Med. Chem. 35, 1081–1088 (2000).
pubmed: 11248406 doi: 10.1016/S0223-5234(00)01186-7
Yalkowsky, RM & Dannenfleser, SH. Aquasol database of aqueous solubility. Version 5. https://hero.epa.gov/hero/index.cfm/reference/details/reference_id/5348039 (2009).
Bloch, D. Computer Software Review. Review of PHYSPROP Database (Version 1.0). ACS Publications https://pubs.acs.org/doi/pdf/10.1021/ci00024a602 (2004) https://doi.org/10.1021/ci00024a602 .
Dalanay, J. S. ESOL:  Estimating Aqueous Solubility Directly from Molecular Structure. J. Chem. Inf. Comput. Sci. 44, 1000–1005 (2004).
doi: 10.1021/ci034243x
US EPA. EPI Suite. https://www.epa.gov/tsca-screening-tools/epi-suitetm-estimation-program-interface
Wang, J., Hou, T. & Xu, X. Aqueous Solubility Prediction Based on Weighted Atom Type Counts and Solvent Accessible Surface Areas. J. Chem. Inf. Model. 49, 571–581 (2009).
pubmed: 19226181 doi: 10.1021/ci800406y
Boobier, S., Hose, D. R. J., Blacker, A. J. & Nguyen, B. N. Machine learning with physicochemical relationships: solubility prediction in organic solvents and water. Nat. Commun. 11, 5753 (2020).
pubmed: 33188226 pmcid: 7666209 doi: 10.1038/s41467-020-19594-z
Tetko, I. V., Tanchuk, V. Y., Kasheva, T. N. & Villa, A. E. P. Estimation of Aqueous Solubility of Chemical Compounds Using E-State Indices. J. Chem. Inf. Comput. Sci. 41, 1488–1493 (2001).
pubmed: 11749573 doi: 10.1021/ci000392t
Avdeef, A. Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database. ADMET DMPK 8, 29 (2020).
pubmed: 35299775 pmcid: 8915599 doi: 10.5599/admet.766
Sorkun, M. C., Khetan, A. & Er, S. AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds. Sci. Data 6, 143 (2019).
pubmed: 31395888 pmcid: 6687799 doi: 10.1038/s41597-019-0151-1
Sushko, I. et al. Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J. Comput. Aided Mol. Des. 25, 533–554 (2011).
pubmed: 21660515 pmcid: 3131510 doi: 10.1007/s10822-011-9440-2
Panapitiya, G. et al. Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction. ACS Omega 7, 15695–15710 (2022).
pubmed: 35571767 pmcid: 9096921 doi: 10.1021/acsomega.2c00642
Wiercioch, M. & Kirchmair, J. Dealing with a data-limited regime: Combining transfer learning and transformer attention mechanism to increase aqueous solubility prediction performance. Artif. Intell. Life Sci. 1, 100021 (2021).
Lowe, C. N. et al. Transparency in Modeling through Careful Application of OECD’s QSAR/QSPR Principles via a Curated Water Solubility Data Set. Chem. Res. Toxicol. 36, 465–478 (2023).
pubmed: 36877669 doi: 10.1021/acs.chemrestox.2c00379
Francoeur, P. G. & Koes, D. R. SolTranNet-A Machine Learning Tool for Fast Aqueous Solubility Prediction. J. Chem. Inf. Model. 61, 2530–2536 (2021).
pubmed: 34038123 pmcid: 8900744 doi: 10.1021/acs.jcim.1c00331
Sluga, J., Venko, K., Drgan, V. & Novič, M. QSPR Models for Prediction of Aqueous Solubility: Exploring the Potency of Randić-type Indices. Croat. Chem. Acta 93 (2020).
Meng, J. et al. Boosting the predictive performance with aqueous solubility dataset curation. Sci. Data 9, 71 (2022).
pubmed: 35241693 pmcid: 8894363 doi: 10.1038/s41597-022-01154-3
Lee, S. et al. Novel Solubility Prediction Models: Molecular Fingerprints and Physicochemical Features vs Graph Convolutional Neural Networks. ACS Omega 7, 12268–12277 (2022).
pubmed: 35449985 pmcid: 9016862 doi: 10.1021/acsomega.2c00697
Schrödinger. QikProp. (2015).
United States National Library of Medicine. ChemIDplus advanced. https://pubchem.ncbi.nlm.nih.gov/source/ChemIDplus (2011).
Kühne, R., Ebert, R.-U., Kleint, F., Schmidt, G. & Schüürmann, G. Group contribution methods to estimate water solubility of organic chemicals. Chemosphere 30, 2061–2077 (1995).
doi: 10.1016/0045-6535(95)00084-L
OECD. eChemPortal: The Global Portal to Information on Chemical Substances, https://www.echemportal.org/echemportal/ (2023).
European Chemicals Agency. ECHA. https://echa.europa.eu/fr/ (2023).
Irmann, F. Eine einfache Korrelation zwischen Wasserlöslichkeit und Struktur von Kohlenwasserstoffen und Halogenkohlenwasserstoffen. Chem. Ing. Tech. 37, 789–798 (1965).
doi: 10.1002/cite.330370802
Hansch, C., Quinlan, J. E. & Lawrence, G. L. Linear free-energy relationship between partition coefficients and the aqueous solubility of organic liquids. J. Org. Chem. 33, 347–350 (1968).
doi: 10.1021/jo01265a071
Yalkowsky, S. H. & Valvani, S. C. Solubility and partitioning I: Solubility of nonelectrolytes in water. J. Pharm. Sci. 69, 912–922 (1980).
pubmed: 7400936 doi: 10.1002/jps.2600690814
Ran, Y. & Yalkowsky, S. H. Prediction of drug solubility by the general solubility equation (GSE). J. Chem. Inf. Comput. Sci. 41, 354–357 (2001).
pubmed: 11277722 doi: 10.1021/ci000338c
Hansen, N. T., Kouskoumvekaki, I., Jørgensen, F. S., Brunak, S. & Jónsdóttir, S. Ó. Prediction of pH-Dependent Aqueous Solubility of Druglike Molecules. J. Chem. Inf. Model. 46, 2601–2609 (2006).
pubmed: 17125200 doi: 10.1021/ci600292q
ChemAxon. Marvin. https://chemaxon.com/products/marvin (2023).
Johnson, S. R., Chen, X.-Q., Murphy, D. & Gudmundsson, O. A Computational Model for the Prediction of Aqueous Solubility That Includes Crystal Packing, Intrinsic Solubility, and Ionization Effects. Mol. Pharm. 4, 513–523 (2007).
pubmed: 17539661 doi: 10.1021/mp070030+
Hopfinger, A. J., Esposito, E. X., Llinàs, A., Glen, R. C. & Goodman, J. M. Findings of the Challenge To Predict Aqueous Solubility. ACS Publications https://pubs.acs.org/doi/pdf/10.1021/ci800436c (2008).
Lusci, A., Pollastri, G. & Baldi, P. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J. Chem. Inf. Model. 53, 1563–1575 (2013).
pubmed: 23795551 pmcid: 3739985 doi: 10.1021/ci400187y
Huuskonen, J., Livingstone, D. J. & Manallack, D. T. Prediction of drug solubility from molecular structure using a drug-like training set. SAR QSAR Environ. Res. 19, 191–212 (2008).
pubmed: 18484495 doi: 10.1080/10629360802083855
Zhou, D., Alelyunas, Y. & Liu, R. Scores of Extended Connectivity Fingerprint as Descriptors in QSPR Study of Melting Point and Aqueous Solubility. J. Chem. Inf. Model. 48, 981–987 (2008).
pubmed: 18465850 doi: 10.1021/ci800024c
Erić, S., Kalinić, M., Popović, A., Zloh, M. & Kuzmanovski, I. Prediction of aqueous solubility of drug-like molecules using a novel algorithm for automatic adjustment of relative importance of descriptors implemented in counter-propagation artificial neural networks. Int. J. Pharm. 437, 232–241 (2012).
pubmed: 22940210 doi: 10.1016/j.ijpharm.2012.08.022
Llinas, A. & Avdeef, A. Solubility Challenge Revisited after Ten Years, with Multilab Shake-Flask Data, Using Tight (SD ∼ 0.17 log) and Loose (SD ∼ 0.62 log) Test Sets. J. Chem. Inf. Model. 59, 3036–3040 (2019).
pubmed: 31042031 doi: 10.1021/acs.jcim.9b00345
Llinas, A., Oprisiu, I. & Avdeef, A. Findings of the Second Challenge to Predict Aqueous Solubility. J. Chem. Inf. Model. 60, 4791–4803 (2020).
pubmed: 32794744 doi: 10.1021/acs.jcim.0c00701
Hewitt, M. et al. In silico prediction of aqueous solubility: the solubility challenge. J. Chem. Inf. Model. 49, 2572–2587 (2009).
pubmed: 19877720 doi: 10.1021/ci900286s
Goh, G. B., Hodas, N., Siegel, C. & Vishnu, A. SMILES2vec: Predicting Chemical Properties from Text Representations. Preprint at arXiv:1712.02034 (2018).
Cui, Q. et al. Improved Prediction of Aqueous Solubility of Novel Compounds by Going Deeper With Deep Learning. Front. Oncol. 10 (2020).
Maziarka, Ł. et al. Molecule Attention Transformer. (2020).
Lovrić, M. et al. Machine learning in prediction of intrinsic aqueous solubility of drug-like compounds: Generalization, complexity, or predictive ability? J. Chemom. 35, e3349 (2021).
doi: 10.1002/cem.3349
Kohavi, R. & Wolpert, D. H. in International Conference on Machine Learning Bias Plus Variance Decomposition for Zero-One Loss Function (1996).
Dwork, C. et al. The reusable holdout: Preserving validity in adaptive data analysis. Science 349, 636–638 (2015).
pubmed: 26250683 doi: 10.1126/science.aaa9375
Breiman, L. & Spector, P. Submodel Selection and Evaluation in Regression. The X-Random Case. Int. Stat. Rev. Rev. Int. Stat. 60, 291–319 (1992).
doi: 10.2307/1403680
Rao, R. B., Fung, G. & Rosales, R. in Proceedings of the 2008 SIAM International Conference on Data Mining (SDM) On the Dangers of Cross-Validation. An Experimental Evaluation. 588–596 (Society for Industrial and Applied Mathematics, 2008).
Rytting, E., Lentz, K. A., Chen, X. Q., Qian, F. & Vakatesh S. Aqueous and cosolvent solubility data for drug-like organic compounds. AAPS J. 7, E78–105, https://doi.org/10.1208/aapsj070110 (2005).
Heid, E. et al. Chemprop: A Machine Learning Package for Chemical Property Prediction. J. Chem. Inf. Model. 64, 9–17, https://doi.org/10.1021/acs.jcim.3c01250 (2024).
Chevillard, F. et al. In Silico Prediction of Aqueous Solubility: A Multimodel Protocol Based on Chemical Similarity. Mol. Pharm. 9, 3127–3135 (2012).
pubmed: 23072744 doi: 10.1021/mp300234q
Cao, D.-S., Xu, Q.-S., Liang, Y.-Z., Chen, X. & Li, H.-D. Prediction of aqueous solubility of druglike organic compounds using partial least squares, back‐propagation network and support vector machine. J. Chemometrics. 24, 584–595 (2010).
doi: 10.1002/cem.1321
Ruggiu, F., Marcou, G., Varnek, A. & Horvath, D. ISIDA Property-Labelled Fragment Descriptors. Mol. Inform. 29, 855–868 (2010).
pubmed: 27464350 doi: 10.1002/minf.201000099
Ferguson, A. L., Debenedetti, P. G. & Panagiotopoulos, A. Z. Solubility and Molecular Conformations of n-Alkane Chains in Water. J. Phys. Chem. B 113, 6405–6414 (2009).
pubmed: 19361179 doi: 10.1021/jp811229q
Birch, H., Redman, A. D., Letinski, D. J., Lyon, D. Y. & Mayer, P. Determining the water solubility of difficult-to-test substances: A tutorial review. Anal. Chim. Acta 1086, 16–28 (2019).
pubmed: 31561791 doi: 10.1016/j.aca.2019.07.034
Marcou, G., Horvath, D. & Solov, V. Interpretability of SAR/QSAR Models of any Complexity by Atomic Contributions. Mol Inf.
OECD. Principles For The Validation, For Regulatory Purposes, of QSAR models. https://www2.oecd.org/chemicalsafety/risk-assessment/37849783.pdf (2004).
Dearden, J. C. In silico prediction of aqueous solubility. Expert Opin. Drug Discov. 1, 31–52 (2006).
pubmed: 23506031 doi: 10.1517/17460441.1.1.31
ChemAxon. JChem Base, version 22.19.0 (2022).
Ayers, M. ChemSpider: The Free Chemical Database. Royal Society of Chemistry https://www.chemspider.com (2023)
CAS. SciFinder. https://scifinder.cas.org (2023).
OECD, eChemPortal, https://www.echemportal.org/echemportal/ .
Kim, S. et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 49, D1388–D1395 (2021).
pubmed: 33151290 doi: 10.1093/nar/gkaa971
Groom, C. R., Bruno, I. J., Lightfoot, M. P. & Ward, S. C. The Cambridge Structural Database. Acta Crystallogr. Sect. B Struct. Sci. Cryst. Eng. Mater. 72, 171–179 (2016).
doi: 10.1107/S2052520616003954
Pedretti, A., Mazzolari, A., Gervasoni, S., Fumagalli, L. & Vistoli, G. The VEGA suite of programs: an versatile platform for cheminformatics and drug design projects. Bioinformatics. 37, 1174–1175 (2021).
pubmed: 33289523 doi: 10.1093/bioinformatics/btaa774
US EPA. User’s Guide for T.E.S.T. (version 4.2) (Toxicity Estimation Software Tool) A Program to Estimate Toxicity from Molecular Structure. https://www.epa.gov/chemical-research/users-guide-test-version-42-toxicity-estimation-software-tool-program-estimate (2016).
Mansouri, K., Grulke, C. M., Judson, R. S. & Williams, A. J. OPERA models for predicting physicochemical properties and environmental fate endpoints. J. Cheminformatics 10, 10 (2018).
doi: 10.1186/s13321-018-0263-1
Lin, A. et al. Mapping of the Available Chemical Space versus the Chemical Universe of Lead-Like Compounds. ChemMedChem 13, 540–554 (2018).
pubmed: 29154440 doi: 10.1002/cmdc.201700561
Bonachera, F. Isida/fragmentor 2017 user guide. 25.
Gaspar, H. A., Baskin, I. I., Marcou, G., Horvath, D. & Varnek, A. GTM-Based QSAR Models and Their Applicability Domains. Mol. Inform. 34, 348–356 (2015).
pubmed: 27490381 doi: 10.1002/minf.201400153
Pedregosa, F. et al Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 2825–2830 (2011).
Chemical Computing Group ULC. Molecular Operating Environment (MOE). (2022).
Liu, F. T., Ting, K. M. & Zhou, Z.-H. in 2008 Eighth IEEE International Conference on Data Mining. Isolation Forest. 413–422 (2008).
Huuskonen, J., Salo, M. & Taskinen, J. Neural Network Modeling for Estimation of the Aqueous Solubility of Structurally Related Drugs. J. Pharm. Sci. 86, 450–454 (1997).
pubmed: 9109047 doi: 10.1021/js960358m
Bruneau, P. Search for Predictive Generic Model of Aqueous Solubility Using Bayesian Neural Nets. J. Chem. Inf. Comput. Sci. 41, 1605–1616 (2001).
pubmed: 11749587 doi: 10.1021/ci010363y
Liu, R. & So, S.-S. Development of Quantitative Structure−Property Relationship Models for Early ADME Evaluation in Drug Discovery. 1. Aqueous Solubility. J. Chem. Inf. Comput. Sci. 41, 1633–1639 (2001).
pubmed: 11749590 doi: 10.1021/ci010289j
Klamt, A., Eckert, F., Hornig, M., Beck, M. E. & Bürger, T. Prediction of aqueous solubility of drugs and pesticides with COSMO-RS. J. Comput. Chem. 23, 275–281 (2002).
pubmed: 11924739 doi: 10.1002/jcc.1168
Engkvist, O. & Wrede, P. High-Throughput, In Silico Prediction of Aqueous Solubility Based on One- and Two-Dimensional Descriptors. J. Chem. Inf. Comput. Sci. 42, 1247–1249 (2002).
pubmed: 12377015 doi: 10.1021/ci0202685
Chen, X., Cho, S. J., Li, Y. & Venkatesh, S. Prediction of aqueous solubility of organic compounds using a quantitative structure–property relationship. J. Pharm. Sci. 91, 1838–1852 (2002).
pubmed: 12115811 doi: 10.1002/jps.10178
Wegner, J. K. & Zell, A. Prediction of Aqueous Solubility and Partition Coefficient Optimized by a Genetic Algorithm Based Descriptor Selection Method. J. Chem. Inf. Comput. Sci. 43, 1077–1084 (2003).
pubmed: 12767167 doi: 10.1021/ci034006u
Cheng, A. & Merz, K. M. Prediction of Aqueous Solubility of a Diverse Set of Compounds Using Quantitative Structure−Property Relationships. J. Med. Chem. 46, 3572–3580 (2003).
pubmed: 12904062 doi: 10.1021/jm020266b
Yan, A. & Gasteiger, J. Prediction of Aqueous Solubility of Organic Compounds by Topological Descriptors. QSAR Comb. Sci. 22, 821–829 (2003).
doi: 10.1002/qsar.200330822
Lind, P. & Maltseva, T. Support vector machines for the estimation of aqueous solubility. J. Chem. Inf. Comput. Sci. 43, 1855–1859 (2003).
pubmed: 14632433 doi: 10.1021/ci034107s
Yan, A., Gasteiger, J., Krug, M. & Anzali, S. Linear and nonlinear functions on modeling of aqueous solubility of organic compounds by two structure representation methods. J. Comput. Aided Mol. Des. 18, 75–87 (2004).
pubmed: 15287695 doi: 10.1023/B:jcam.0000030031.81235.05
Hou, T. J., Xia, K. & Zhang, W. ADME Evaluation in Drug Discovery. 4. Prediction of Aqueous Solubility Based on Atom Contribution Approach. J. Chem. Inf. Comput. Sci. 44, 266–275 (2004).
pubmed: 14741036 doi: 10.1021/ci034184n
Fröhlich, H., Wegner, J. K. & Zell, A. Towards Optimal Descriptor Subset Selection with Support Vector Machines in Classification and Regression. QSAR Comb. Sci. 23, 311–318 (2004).
doi: 10.1002/qsar.200410011
Votano, J. R., Parham, M., Hall, L. H., Kier, L. B. & Hall, L. M. Prediction of aqueous solubility based on large datasets using several QSPR models utilizing topological structure representation. Chem. Biodivers. 1, 1829–1841 (2004).
pubmed: 17191819 doi: 10.1002/cbdv.200490137
Clark, M. Generalized Fragment-Substructure Based Property Prediction Method. J. Chem. Inf. Model. 45, 30–38 (2005).
pubmed: 15667126 doi: 10.1021/ci049744c
Catana, C., Gao, H., Orrenius, C. & Stouten, P. F. W. Linear and nonlinear methods in modeling the aqueous solubility of organic compounds. J. Chem. Inf. Model. 45, 170–176 (2005).
pubmed: 15667142 doi: 10.1021/ci049797u
Wassvik, C. M., Holmén, A. G., Bergström, C. A. S., Zamora, I. & Artursson, P. Contribution of solid-state properties to the aqueous solubility of drugs. Eur. J. Pharm. Sci. 29, 294–305 (2006).
pubmed: 16949802 doi: 10.1016/j.ejps.2006.05.013
Schwaighofer, A. et al. Accurate Solubility Prediction with Error Bars for Electrolytes:  A Machine Learning Approach. J. Chem. Inf. Model. 47, 407–424 (2007).
pubmed: 17243756 doi: 10.1021/ci600205g
Cheung, M., Johnson, S., Hecht, D. & Fogel, G. B. Quantitative structure-property relationships for drug solubility prediction using evolved neural networks. in 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence) 688–693 (2008). https://doi.org/10.1109/CEC.2008.4630870 .
Duchowicz, P. R., Talevi, A., Bruno-Blanch, L. E. & Castro, E. A. New QSPR study for the prediction of aqueous solubility of drug-like compounds. Bioorg. Med. Chem. 16, 7944–7955 (2008).
pubmed: 18701302 doi: 10.1016/j.bmc.2008.07.067
Hughes, L. D., Palmer, D. S., Nigsch, F. & Mitchell, J. B. O. Why Are Some Properties More Difficult To Predict than Others? A Study of QSPR Models of Solubility, Melting Point, and Log P. J. Chem. Inf. Model. 48, 220–232 (2008).
pubmed: 18186622 doi: 10.1021/ci700307p
Du-Cuny, L., Huwyler, J., Wiese, M. & Kansy, M. Computational aqueous solubility prediction for drug-like compounds in congeneric series. Eur. J. Med. Chem. 43, 501–512 (2008).
pubmed: 17574307 doi: 10.1016/j.ejmech.2007.04.009
Obrezanova, O., Gola, J. M. R., Champness, E. J. & Segall, M. D. Automatic QSAR modeling of ADME properties: blood–brain barrier penetration and aqueous solubility. J. Comput. Aided Mol. Des. 22, 431–440 (2008).
pubmed: 18273554 doi: 10.1007/s10822-008-9193-8
Duchowicz, P. R. & Castro, E. A. QSPR Studies on Aqueous Solubilities of Drug-Like Compounds. Int. J. Mol. Sci. 10, 2558–2577 (2009).
pubmed: 19582218 pmcid: 2705505 doi: 10.3390/ijms10062558
Ghafourian, T. & Bozorgi, A. H. A. Estimation of drug solubility in water, PEG 400 and their binary mixtures using the molecular structures of solutes. Eur. J. Pharm. Sci. 40, 430–440 (2010).
pubmed: 20452421 doi: 10.1016/j.ejps.2010.04.016
Muratov, E. N. et al. New QSPR equations for prediction of aqueous solubility for military compounds. Chemosphere 79, 887–890 (2010).
pubmed: 20233619 doi: 10.1016/j.chemosphere.2010.02.030
Jain, P. & Yalkowsky, S. H. Prediction of aqueous solubility from SCRATCH. Int. J. Pharm. 385, 1–5 (2010).
pubmed: 19819319 doi: 10.1016/j.ijpharm.2009.10.003
Eric, S. et al. The importance of the accuracy of the experimental data for the prediction of solubility. J. Serbian Chem. Soc. 75, 483–495 (2010).
doi: 10.2298/JSC090809022E
Louis, B., Agrawal, V. K. & Khadikar, P. V. Prediction of intrinsic solubility of generic drugs using MLR, ANN and SVM analyses. Eur. J. Med. Chem. 45, 4018–4025 (2010).
pubmed: 20584562 doi: 10.1016/j.ejmech.2010.05.059
Fatemi, M., Heidari, A. & Ghorbanzadeh, M. Prediction of Aqueous Solubility of Drug-Like Compounds by Using an Artificial Neural Network and Least-Squares Support Vector Machine. Bull. Chem. Soc. Jpn. 83, 1338–1345 (2010).
doi: 10.1246/bcsj.20100074
Salahinejad, M., Le, T. C. & Winkler, D. A. Aqueous solubility prediction: do crystal lattice interactions help? Mol. Pharm. 10, 2757–2766 (2013).
pubmed: 23718811 doi: 10.1021/mp4001958
McDonagh, J. L., Nath, N., De Ferrari, L., van Mourik, T. & Mitchell, J. B. O. Uniting Cheminformatics and Chemical Theory To Predict the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules. J. Chem. Inf. Model. 54, 844–856 (2014).
pubmed: 24564264 pmcid: 3965570 doi: 10.1021/ci4005805
Kim, S., Jinich, A. & Aspuru-Guzik, A. MultiDK: A Multiple Descriptor Multiple Kernel Approach for Molecular Discovery and Its Application to Organic Flow Battery Electrolytes. J. Chem. Inf. Model. 57, 657–668 (2017).
pubmed: 28328209 doi: 10.1021/acs.jcim.6b00332
Coley, C. W., Barzilay, R., Green, W. H., Jaakkola, T. S. & Jensen, K. F. Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction. J. Chem. Inf. Model. 57, 1757–1772 (2017).
pubmed: 28696688 doi: 10.1021/acs.jcim.6b00601
Cho, H. & Choi, I. S. Enhanced Deep-Learning Prediction of Molecular Properties via Augmentation of Bond Topology. ChemMedChem 14, 1604–1609 (2019).
pubmed: 31389167 doi: 10.1002/cmdc.201900458
Cho, H. & Choi, I. S. Enhanced Deep-Learning Prediction of Molecular Properties via Augmentation of Bond Topology. Chem Med Chem 14, 1604 (2019).
Deng, T. & Jia, G. Prediction of aqueous solubility of compounds based on neural network. Mol. Phys. 118, e1600754 (2020).
doi: 10.1080/00268976.2019.1600754
Gao, P., Zhang, J., Sun, Y. & Yu, J. Accurate predictions of aqueous solubility of drug molecules via the multilevel graph convolutional network (MGCN) and SchNet architectures. Phys. Chem. Chem. Phys. 22, 23766–23772 (2020).
pubmed: 33063077 doi: 10.1039/D0CP03596C
Falcón-Cano, G., Molina, C. & Cabrera-Pérez, M. A. ADME prediction with KNIME: In silico aqueous solubility consensus model based on supervised recursive random forest approaches. ADMET DMPK 8, 251–273 (2020).
pubmed: 35300309 pmcid: 8915604
Shen, W. X. et al. Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations. Nat Mach Intell 3, 334–343 (2021).
doi: 10.1038/s42256-021-00301-6
Tosca, E. M., Bartolucci, R. & Magni, P. Application of Artificial Neural Networks to Predict the Intrinsic Solubility of Drug-Like Molecules. Pharmaceutics 13, 1101 (2021).
pubmed: 34371792 pmcid: 8309152 doi: 10.3390/pharmaceutics13071101
Wieder, O. et al. Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks. Molecules 26, 6185 (2021).
pubmed: 34684766 pmcid: 8539502 doi: 10.3390/molecules26206185
Chen, J.-H. & Tseng, Y. J. Different molecular enumeration influences in deep learning: an example using aqueous solubility. Briefings Bioinf 22, bbaa092 (2021).
doi: 10.1093/bib/bbaa092
Panapitiya, G. et al. Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models with Varied Molecular Representations. ACS Omega 7, 15695–15710 (2022).
pubmed: 35571767 pmcid: 9096921 doi: 10.1021/acsomega.2c00642
Hou, Y., Wang, S., Bai, B., Chan, H. C. S. & Yuan, S. Accurate Physical Property Predictions via Deep Learning. Molecules 27, 1668 (2022).
pubmed: 35268770 pmcid: 8912091 doi: 10.3390/molecules27051668
Raevsky, O. A., Grigor’ev, V. Y., Polianczyk, D. E., Raevskaja, O. E. & Dearden, J. C. Calculation of aqueous solubility of crystalline un-ionized organic chemicals and drugs based on structural similarity and physicochemical descriptors. J Chem Inf Model. 54, 683–91, https://doi.org/10.1021/ci400692n (2014).
Schaper, K.-J., Kunz, B. & Raevsky, O. Analysis of water solubility data on the basis of HYBOT descriptors. Part 2. QSAR Comb. Sci. 22, 943–958, https://doi.org/10.1002/qsar.200330840 (2003).

Auteurs

P Llompart (P)

Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France.
IDD/CADD, Sanofi, Vitry-Sur-Seine, France.

C Minoletti (C)

IDD/CADD, Sanofi, Vitry-Sur-Seine, France.

S Baybekov (S)

Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France.

D Horvath (D)

Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France.

G Marcou (G)

Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France. g.marcou@unistra.fr.

A Varnek (A)

Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France.

Classifications MeSH