Will we ever be able to accurately predict solubility?
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
18 Mar 2024
18 Mar 2024
Historique:
received:
01
09
2023
accepted:
29
02
2024
medline:
19
3
2024
pubmed:
19
3
2024
entrez:
19
3
2024
Statut:
epublish
Résumé
Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available.
Identifiants
pubmed: 38499581
doi: 10.1038/s41597-024-03105-6
pii: 10.1038/s41597-024-03105-6
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
303Informations de copyright
© 2024. The Author(s).
Références
Kennedy, T. Managing the drug discovery/development interface. Drug Discov. Today 2, 436–444 (1997).
doi: 10.1016/S1359-6446(97)01099-4
Kola, I. & Landis, J. Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discov. 3, 711–716 (2004).
pubmed: 15286737
doi: 10.1038/nrd1470
Millard, J., Alvarez-Núñez, F. & Yalkowsky, S. Solubilization by cosolvents. Establishing useful constants for the log-linear model. Int. J. Pharm. 245, 153–166 (2002).
pubmed: 12270252
doi: 10.1016/S0378-5173(02)00334-4
Jouyban, A. & Abolghassemi Fakhree, M. A. Solubility prediction methods for drug/drug like molecules. Recent Pat. Chem. Eng. 1, 220–231 (2008).
doi: 10.2174/2211334710801030220
van de Waterbeemd, H. Improving compound quality through in vitro and in silico physicochemical profiling. Chem. Biodivers. 6, 1760–1766 (2009).
pubmed: 19937820
doi: 10.1002/cbdv.200900056
Llompart, P. et al Will we ever be able to accurately predict solubility? Recherche Data Gouv https://doi.org/10.57745/CZVZIA (2023)
Wang, J. & Hou, T. Recent advances on aqueous solubility prediction. Comb. Chem. High Throughput Screen. 14, 328–338 (2011).
pubmed: 21470182
doi: 10.2174/138620711795508331
Elder, D. P., Holm, R. & Diego, H. L. Use of pharmaceutical salts and cocrystals to address the issue of poor solubility. Int. J. Pharm. 453, 88–100 (2013). de.
pubmed: 23182973
doi: 10.1016/j.ijpharm.2012.11.028
Saal, C. & Petereit, A. C. Optimizing solubility: Kinetic versus thermodynamic solubility temptations and risks. Eur. J. Pharm. Sci. 47, 589–595 (2012).
pubmed: 22885099
doi: 10.1016/j.ejps.2012.07.019
Wang, J. et al. Development of reliable aqueous solubility models and their application in druglike analysis. J. Chem. Inf. Model. 47, 1395–1404 (2007).
pubmed: 17569522
doi: 10.1021/ci700096r
Johnson, S. R. & Zheng, W. Recent progress in the computational prediction of aqueous solubility and absorption. AAPS J. 8, E27–E40 (2006).
pubmed: 16584131
pmcid: 2751421
doi: 10.1208/aapsj080104
Delaney, J. S. Predicting aqueous solubility from structure. Drug Discov. Today 10, 289–295 (2005).
pubmed: 15708748
doi: 10.1016/S1359-6446(04)03365-3
OECD. Test No. 105: Water Solubility. OECD Guidelines for the Testing of Chemicals, Section 1 https://read.oecd-ilibrary.org/environment/test-no-105-water-solubility_9789264069589-en (1995).
Llinàs, A., Glen, R. C. & Goodman, J. M. Solubility Challenge: Can You Predict Solubilities of 32 Molecules Using a Database of 100 Reliable Measurements? J. Chem. Inf. Model. 48, 1289–1303 (2008).
pubmed: 18624401
doi: 10.1021/ci800058v
Stuart, M. & Box, K. Chasing Equilibrium: Measuring the Intrinsic Solubility of Weak Acids and Bases. Anal. Chem. 77, 983–990 (2005).
pubmed: 15858976
doi: 10.1021/ac048767n
Huuskonen, J., Rantanen, J. & Livingstone, D. Prediction of aqueous solubility for a diverse set of organic compounds based on atom-type electrotopological state indices. Eur. J. Med. Chem. 35, 1081–1088 (2000).
pubmed: 11248406
doi: 10.1016/S0223-5234(00)01186-7
Yalkowsky, RM & Dannenfleser, SH. Aquasol database of aqueous solubility. Version 5. https://hero.epa.gov/hero/index.cfm/reference/details/reference_id/5348039 (2009).
Bloch, D. Computer Software Review. Review of PHYSPROP Database (Version 1.0). ACS Publications https://pubs.acs.org/doi/pdf/10.1021/ci00024a602 (2004) https://doi.org/10.1021/ci00024a602 .
Dalanay, J. S. ESOL: Estimating Aqueous Solubility Directly from Molecular Structure. J. Chem. Inf. Comput. Sci. 44, 1000–1005 (2004).
doi: 10.1021/ci034243x
US EPA. EPI Suite. https://www.epa.gov/tsca-screening-tools/epi-suitetm-estimation-program-interface
Wang, J., Hou, T. & Xu, X. Aqueous Solubility Prediction Based on Weighted Atom Type Counts and Solvent Accessible Surface Areas. J. Chem. Inf. Model. 49, 571–581 (2009).
pubmed: 19226181
doi: 10.1021/ci800406y
Boobier, S., Hose, D. R. J., Blacker, A. J. & Nguyen, B. N. Machine learning with physicochemical relationships: solubility prediction in organic solvents and water. Nat. Commun. 11, 5753 (2020).
pubmed: 33188226
pmcid: 7666209
doi: 10.1038/s41467-020-19594-z
Tetko, I. V., Tanchuk, V. Y., Kasheva, T. N. & Villa, A. E. P. Estimation of Aqueous Solubility of Chemical Compounds Using E-State Indices. J. Chem. Inf. Comput. Sci. 41, 1488–1493 (2001).
pubmed: 11749573
doi: 10.1021/ci000392t
Avdeef, A. Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database. ADMET DMPK 8, 29 (2020).
pubmed: 35299775
pmcid: 8915599
doi: 10.5599/admet.766
Sorkun, M. C., Khetan, A. & Er, S. AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds. Sci. Data 6, 143 (2019).
pubmed: 31395888
pmcid: 6687799
doi: 10.1038/s41597-019-0151-1
Sushko, I. et al. Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J. Comput. Aided Mol. Des. 25, 533–554 (2011).
pubmed: 21660515
pmcid: 3131510
doi: 10.1007/s10822-011-9440-2
Panapitiya, G. et al. Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction. ACS Omega 7, 15695–15710 (2022).
pubmed: 35571767
pmcid: 9096921
doi: 10.1021/acsomega.2c00642
Wiercioch, M. & Kirchmair, J. Dealing with a data-limited regime: Combining transfer learning and transformer attention mechanism to increase aqueous solubility prediction performance. Artif. Intell. Life Sci. 1, 100021 (2021).
Lowe, C. N. et al. Transparency in Modeling through Careful Application of OECD’s QSAR/QSPR Principles via a Curated Water Solubility Data Set. Chem. Res. Toxicol. 36, 465–478 (2023).
pubmed: 36877669
doi: 10.1021/acs.chemrestox.2c00379
Francoeur, P. G. & Koes, D. R. SolTranNet-A Machine Learning Tool for Fast Aqueous Solubility Prediction. J. Chem. Inf. Model. 61, 2530–2536 (2021).
pubmed: 34038123
pmcid: 8900744
doi: 10.1021/acs.jcim.1c00331
Sluga, J., Venko, K., Drgan, V. & Novič, M. QSPR Models for Prediction of Aqueous Solubility: Exploring the Potency of Randić-type Indices. Croat. Chem. Acta 93 (2020).
Meng, J. et al. Boosting the predictive performance with aqueous solubility dataset curation. Sci. Data 9, 71 (2022).
pubmed: 35241693
pmcid: 8894363
doi: 10.1038/s41597-022-01154-3
Lee, S. et al. Novel Solubility Prediction Models: Molecular Fingerprints and Physicochemical Features vs Graph Convolutional Neural Networks. ACS Omega 7, 12268–12277 (2022).
pubmed: 35449985
pmcid: 9016862
doi: 10.1021/acsomega.2c00697
Schrödinger. QikProp. (2015).
United States National Library of Medicine. ChemIDplus advanced. https://pubchem.ncbi.nlm.nih.gov/source/ChemIDplus (2011).
Kühne, R., Ebert, R.-U., Kleint, F., Schmidt, G. & Schüürmann, G. Group contribution methods to estimate water solubility of organic chemicals. Chemosphere 30, 2061–2077 (1995).
doi: 10.1016/0045-6535(95)00084-L
OECD. eChemPortal: The Global Portal to Information on Chemical Substances, https://www.echemportal.org/echemportal/ (2023).
European Chemicals Agency. ECHA. https://echa.europa.eu/fr/ (2023).
Irmann, F. Eine einfache Korrelation zwischen Wasserlöslichkeit und Struktur von Kohlenwasserstoffen und Halogenkohlenwasserstoffen. Chem. Ing. Tech. 37, 789–798 (1965).
doi: 10.1002/cite.330370802
Hansch, C., Quinlan, J. E. & Lawrence, G. L. Linear free-energy relationship between partition coefficients and the aqueous solubility of organic liquids. J. Org. Chem. 33, 347–350 (1968).
doi: 10.1021/jo01265a071
Yalkowsky, S. H. & Valvani, S. C. Solubility and partitioning I: Solubility of nonelectrolytes in water. J. Pharm. Sci. 69, 912–922 (1980).
pubmed: 7400936
doi: 10.1002/jps.2600690814
Ran, Y. & Yalkowsky, S. H. Prediction of drug solubility by the general solubility equation (GSE). J. Chem. Inf. Comput. Sci. 41, 354–357 (2001).
pubmed: 11277722
doi: 10.1021/ci000338c
Hansen, N. T., Kouskoumvekaki, I., Jørgensen, F. S., Brunak, S. & Jónsdóttir, S. Ó. Prediction of pH-Dependent Aqueous Solubility of Druglike Molecules. J. Chem. Inf. Model. 46, 2601–2609 (2006).
pubmed: 17125200
doi: 10.1021/ci600292q
ChemAxon. Marvin. https://chemaxon.com/products/marvin (2023).
Johnson, S. R., Chen, X.-Q., Murphy, D. & Gudmundsson, O. A Computational Model for the Prediction of Aqueous Solubility That Includes Crystal Packing, Intrinsic Solubility, and Ionization Effects. Mol. Pharm. 4, 513–523 (2007).
pubmed: 17539661
doi: 10.1021/mp070030+
Hopfinger, A. J., Esposito, E. X., Llinàs, A., Glen, R. C. & Goodman, J. M. Findings of the Challenge To Predict Aqueous Solubility. ACS Publications https://pubs.acs.org/doi/pdf/10.1021/ci800436c (2008).
Lusci, A., Pollastri, G. & Baldi, P. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J. Chem. Inf. Model. 53, 1563–1575 (2013).
pubmed: 23795551
pmcid: 3739985
doi: 10.1021/ci400187y
Huuskonen, J., Livingstone, D. J. & Manallack, D. T. Prediction of drug solubility from molecular structure using a drug-like training set. SAR QSAR Environ. Res. 19, 191–212 (2008).
pubmed: 18484495
doi: 10.1080/10629360802083855
Zhou, D., Alelyunas, Y. & Liu, R. Scores of Extended Connectivity Fingerprint as Descriptors in QSPR Study of Melting Point and Aqueous Solubility. J. Chem. Inf. Model. 48, 981–987 (2008).
pubmed: 18465850
doi: 10.1021/ci800024c
Erić, S., Kalinić, M., Popović, A., Zloh, M. & Kuzmanovski, I. Prediction of aqueous solubility of drug-like molecules using a novel algorithm for automatic adjustment of relative importance of descriptors implemented in counter-propagation artificial neural networks. Int. J. Pharm. 437, 232–241 (2012).
pubmed: 22940210
doi: 10.1016/j.ijpharm.2012.08.022
Llinas, A. & Avdeef, A. Solubility Challenge Revisited after Ten Years, with Multilab Shake-Flask Data, Using Tight (SD ∼ 0.17 log) and Loose (SD ∼ 0.62 log) Test Sets. J. Chem. Inf. Model. 59, 3036–3040 (2019).
pubmed: 31042031
doi: 10.1021/acs.jcim.9b00345
Llinas, A., Oprisiu, I. & Avdeef, A. Findings of the Second Challenge to Predict Aqueous Solubility. J. Chem. Inf. Model. 60, 4791–4803 (2020).
pubmed: 32794744
doi: 10.1021/acs.jcim.0c00701
Hewitt, M. et al. In silico prediction of aqueous solubility: the solubility challenge. J. Chem. Inf. Model. 49, 2572–2587 (2009).
pubmed: 19877720
doi: 10.1021/ci900286s
Goh, G. B., Hodas, N., Siegel, C. & Vishnu, A. SMILES2vec: Predicting Chemical Properties from Text Representations. Preprint at arXiv:1712.02034 (2018).
Cui, Q. et al. Improved Prediction of Aqueous Solubility of Novel Compounds by Going Deeper With Deep Learning. Front. Oncol. 10 (2020).
Maziarka, Ł. et al. Molecule Attention Transformer. (2020).
Lovrić, M. et al. Machine learning in prediction of intrinsic aqueous solubility of drug-like compounds: Generalization, complexity, or predictive ability? J. Chemom. 35, e3349 (2021).
doi: 10.1002/cem.3349
Kohavi, R. & Wolpert, D. H. in International Conference on Machine Learning Bias Plus Variance Decomposition for Zero-One Loss Function (1996).
Dwork, C. et al. The reusable holdout: Preserving validity in adaptive data analysis. Science 349, 636–638 (2015).
pubmed: 26250683
doi: 10.1126/science.aaa9375
Breiman, L. & Spector, P. Submodel Selection and Evaluation in Regression. The X-Random Case. Int. Stat. Rev. Rev. Int. Stat. 60, 291–319 (1992).
doi: 10.2307/1403680
Rao, R. B., Fung, G. & Rosales, R. in Proceedings of the 2008 SIAM International Conference on Data Mining (SDM) On the Dangers of Cross-Validation. An Experimental Evaluation. 588–596 (Society for Industrial and Applied Mathematics, 2008).
Rytting, E., Lentz, K. A., Chen, X. Q., Qian, F. & Vakatesh S. Aqueous and cosolvent solubility data for drug-like organic compounds. AAPS J. 7, E78–105, https://doi.org/10.1208/aapsj070110 (2005).
Heid, E. et al. Chemprop: A Machine Learning Package for Chemical Property Prediction. J. Chem. Inf. Model. 64, 9–17, https://doi.org/10.1021/acs.jcim.3c01250 (2024).
Chevillard, F. et al. In Silico Prediction of Aqueous Solubility: A Multimodel Protocol Based on Chemical Similarity. Mol. Pharm. 9, 3127–3135 (2012).
pubmed: 23072744
doi: 10.1021/mp300234q
Cao, D.-S., Xu, Q.-S., Liang, Y.-Z., Chen, X. & Li, H.-D. Prediction of aqueous solubility of druglike organic compounds using partial least squares, back‐propagation network and support vector machine. J. Chemometrics. 24, 584–595 (2010).
doi: 10.1002/cem.1321
Ruggiu, F., Marcou, G., Varnek, A. & Horvath, D. ISIDA Property-Labelled Fragment Descriptors. Mol. Inform. 29, 855–868 (2010).
pubmed: 27464350
doi: 10.1002/minf.201000099
Ferguson, A. L., Debenedetti, P. G. & Panagiotopoulos, A. Z. Solubility and Molecular Conformations of n-Alkane Chains in Water. J. Phys. Chem. B 113, 6405–6414 (2009).
pubmed: 19361179
doi: 10.1021/jp811229q
Birch, H., Redman, A. D., Letinski, D. J., Lyon, D. Y. & Mayer, P. Determining the water solubility of difficult-to-test substances: A tutorial review. Anal. Chim. Acta 1086, 16–28 (2019).
pubmed: 31561791
doi: 10.1016/j.aca.2019.07.034
Marcou, G., Horvath, D. & Solov, V. Interpretability of SAR/QSAR Models of any Complexity by Atomic Contributions. Mol Inf.
OECD. Principles For The Validation, For Regulatory Purposes, of QSAR models. https://www2.oecd.org/chemicalsafety/risk-assessment/37849783.pdf (2004).
Dearden, J. C. In silico prediction of aqueous solubility. Expert Opin. Drug Discov. 1, 31–52 (2006).
pubmed: 23506031
doi: 10.1517/17460441.1.1.31
ChemAxon. JChem Base, version 22.19.0 (2022).
Ayers, M. ChemSpider: The Free Chemical Database. Royal Society of Chemistry https://www.chemspider.com (2023)
CAS. SciFinder. https://scifinder.cas.org (2023).
OECD, eChemPortal, https://www.echemportal.org/echemportal/ .
Kim, S. et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 49, D1388–D1395 (2021).
pubmed: 33151290
doi: 10.1093/nar/gkaa971
Groom, C. R., Bruno, I. J., Lightfoot, M. P. & Ward, S. C. The Cambridge Structural Database. Acta Crystallogr. Sect. B Struct. Sci. Cryst. Eng. Mater. 72, 171–179 (2016).
doi: 10.1107/S2052520616003954
Pedretti, A., Mazzolari, A., Gervasoni, S., Fumagalli, L. & Vistoli, G. The VEGA suite of programs: an versatile platform for cheminformatics and drug design projects. Bioinformatics. 37, 1174–1175 (2021).
pubmed: 33289523
doi: 10.1093/bioinformatics/btaa774
US EPA. User’s Guide for T.E.S.T. (version 4.2) (Toxicity Estimation Software Tool) A Program to Estimate Toxicity from Molecular Structure. https://www.epa.gov/chemical-research/users-guide-test-version-42-toxicity-estimation-software-tool-program-estimate (2016).
Mansouri, K., Grulke, C. M., Judson, R. S. & Williams, A. J. OPERA models for predicting physicochemical properties and environmental fate endpoints. J. Cheminformatics 10, 10 (2018).
doi: 10.1186/s13321-018-0263-1
Lin, A. et al. Mapping of the Available Chemical Space versus the Chemical Universe of Lead-Like Compounds. ChemMedChem 13, 540–554 (2018).
pubmed: 29154440
doi: 10.1002/cmdc.201700561
Bonachera, F. Isida/fragmentor 2017 user guide. 25.
Gaspar, H. A., Baskin, I. I., Marcou, G., Horvath, D. & Varnek, A. GTM-Based QSAR Models and Their Applicability Domains. Mol. Inform. 34, 348–356 (2015).
pubmed: 27490381
doi: 10.1002/minf.201400153
Pedregosa, F. et al Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 2825–2830 (2011).
Chemical Computing Group ULC. Molecular Operating Environment (MOE). (2022).
Liu, F. T., Ting, K. M. & Zhou, Z.-H. in 2008 Eighth IEEE International Conference on Data Mining. Isolation Forest. 413–422 (2008).
Huuskonen, J., Salo, M. & Taskinen, J. Neural Network Modeling for Estimation of the Aqueous Solubility of Structurally Related Drugs. J. Pharm. Sci. 86, 450–454 (1997).
pubmed: 9109047
doi: 10.1021/js960358m
Bruneau, P. Search for Predictive Generic Model of Aqueous Solubility Using Bayesian Neural Nets. J. Chem. Inf. Comput. Sci. 41, 1605–1616 (2001).
pubmed: 11749587
doi: 10.1021/ci010363y
Liu, R. & So, S.-S. Development of Quantitative Structure−Property Relationship Models for Early ADME Evaluation in Drug Discovery. 1. Aqueous Solubility. J. Chem. Inf. Comput. Sci. 41, 1633–1639 (2001).
pubmed: 11749590
doi: 10.1021/ci010289j
Klamt, A., Eckert, F., Hornig, M., Beck, M. E. & Bürger, T. Prediction of aqueous solubility of drugs and pesticides with COSMO-RS. J. Comput. Chem. 23, 275–281 (2002).
pubmed: 11924739
doi: 10.1002/jcc.1168
Engkvist, O. & Wrede, P. High-Throughput, In Silico Prediction of Aqueous Solubility Based on One- and Two-Dimensional Descriptors. J. Chem. Inf. Comput. Sci. 42, 1247–1249 (2002).
pubmed: 12377015
doi: 10.1021/ci0202685
Chen, X., Cho, S. J., Li, Y. & Venkatesh, S. Prediction of aqueous solubility of organic compounds using a quantitative structure–property relationship. J. Pharm. Sci. 91, 1838–1852 (2002).
pubmed: 12115811
doi: 10.1002/jps.10178
Wegner, J. K. & Zell, A. Prediction of Aqueous Solubility and Partition Coefficient Optimized by a Genetic Algorithm Based Descriptor Selection Method. J. Chem. Inf. Comput. Sci. 43, 1077–1084 (2003).
pubmed: 12767167
doi: 10.1021/ci034006u
Cheng, A. & Merz, K. M. Prediction of Aqueous Solubility of a Diverse Set of Compounds Using Quantitative Structure−Property Relationships. J. Med. Chem. 46, 3572–3580 (2003).
pubmed: 12904062
doi: 10.1021/jm020266b
Yan, A. & Gasteiger, J. Prediction of Aqueous Solubility of Organic Compounds by Topological Descriptors. QSAR Comb. Sci. 22, 821–829 (2003).
doi: 10.1002/qsar.200330822
Lind, P. & Maltseva, T. Support vector machines for the estimation of aqueous solubility. J. Chem. Inf. Comput. Sci. 43, 1855–1859 (2003).
pubmed: 14632433
doi: 10.1021/ci034107s
Yan, A., Gasteiger, J., Krug, M. & Anzali, S. Linear and nonlinear functions on modeling of aqueous solubility of organic compounds by two structure representation methods. J. Comput. Aided Mol. Des. 18, 75–87 (2004).
pubmed: 15287695
doi: 10.1023/B:jcam.0000030031.81235.05
Hou, T. J., Xia, K. & Zhang, W. ADME Evaluation in Drug Discovery. 4. Prediction of Aqueous Solubility Based on Atom Contribution Approach. J. Chem. Inf. Comput. Sci. 44, 266–275 (2004).
pubmed: 14741036
doi: 10.1021/ci034184n
Fröhlich, H., Wegner, J. K. & Zell, A. Towards Optimal Descriptor Subset Selection with Support Vector Machines in Classification and Regression. QSAR Comb. Sci. 23, 311–318 (2004).
doi: 10.1002/qsar.200410011
Votano, J. R., Parham, M., Hall, L. H., Kier, L. B. & Hall, L. M. Prediction of aqueous solubility based on large datasets using several QSPR models utilizing topological structure representation. Chem. Biodivers. 1, 1829–1841 (2004).
pubmed: 17191819
doi: 10.1002/cbdv.200490137
Clark, M. Generalized Fragment-Substructure Based Property Prediction Method. J. Chem. Inf. Model. 45, 30–38 (2005).
pubmed: 15667126
doi: 10.1021/ci049744c
Catana, C., Gao, H., Orrenius, C. & Stouten, P. F. W. Linear and nonlinear methods in modeling the aqueous solubility of organic compounds. J. Chem. Inf. Model. 45, 170–176 (2005).
pubmed: 15667142
doi: 10.1021/ci049797u
Wassvik, C. M., Holmén, A. G., Bergström, C. A. S., Zamora, I. & Artursson, P. Contribution of solid-state properties to the aqueous solubility of drugs. Eur. J. Pharm. Sci. 29, 294–305 (2006).
pubmed: 16949802
doi: 10.1016/j.ejps.2006.05.013
Schwaighofer, A. et al. Accurate Solubility Prediction with Error Bars for Electrolytes: A Machine Learning Approach. J. Chem. Inf. Model. 47, 407–424 (2007).
pubmed: 17243756
doi: 10.1021/ci600205g
Cheung, M., Johnson, S., Hecht, D. & Fogel, G. B. Quantitative structure-property relationships for drug solubility prediction using evolved neural networks. in 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence) 688–693 (2008). https://doi.org/10.1109/CEC.2008.4630870 .
Duchowicz, P. R., Talevi, A., Bruno-Blanch, L. E. & Castro, E. A. New QSPR study for the prediction of aqueous solubility of drug-like compounds. Bioorg. Med. Chem. 16, 7944–7955 (2008).
pubmed: 18701302
doi: 10.1016/j.bmc.2008.07.067
Hughes, L. D., Palmer, D. S., Nigsch, F. & Mitchell, J. B. O. Why Are Some Properties More Difficult To Predict than Others? A Study of QSPR Models of Solubility, Melting Point, and Log P. J. Chem. Inf. Model. 48, 220–232 (2008).
pubmed: 18186622
doi: 10.1021/ci700307p
Du-Cuny, L., Huwyler, J., Wiese, M. & Kansy, M. Computational aqueous solubility prediction for drug-like compounds in congeneric series. Eur. J. Med. Chem. 43, 501–512 (2008).
pubmed: 17574307
doi: 10.1016/j.ejmech.2007.04.009
Obrezanova, O., Gola, J. M. R., Champness, E. J. & Segall, M. D. Automatic QSAR modeling of ADME properties: blood–brain barrier penetration and aqueous solubility. J. Comput. Aided Mol. Des. 22, 431–440 (2008).
pubmed: 18273554
doi: 10.1007/s10822-008-9193-8
Duchowicz, P. R. & Castro, E. A. QSPR Studies on Aqueous Solubilities of Drug-Like Compounds. Int. J. Mol. Sci. 10, 2558–2577 (2009).
pubmed: 19582218
pmcid: 2705505
doi: 10.3390/ijms10062558
Ghafourian, T. & Bozorgi, A. H. A. Estimation of drug solubility in water, PEG 400 and their binary mixtures using the molecular structures of solutes. Eur. J. Pharm. Sci. 40, 430–440 (2010).
pubmed: 20452421
doi: 10.1016/j.ejps.2010.04.016
Muratov, E. N. et al. New QSPR equations for prediction of aqueous solubility for military compounds. Chemosphere 79, 887–890 (2010).
pubmed: 20233619
doi: 10.1016/j.chemosphere.2010.02.030
Jain, P. & Yalkowsky, S. H. Prediction of aqueous solubility from SCRATCH. Int. J. Pharm. 385, 1–5 (2010).
pubmed: 19819319
doi: 10.1016/j.ijpharm.2009.10.003
Eric, S. et al. The importance of the accuracy of the experimental data for the prediction of solubility. J. Serbian Chem. Soc. 75, 483–495 (2010).
doi: 10.2298/JSC090809022E
Louis, B., Agrawal, V. K. & Khadikar, P. V. Prediction of intrinsic solubility of generic drugs using MLR, ANN and SVM analyses. Eur. J. Med. Chem. 45, 4018–4025 (2010).
pubmed: 20584562
doi: 10.1016/j.ejmech.2010.05.059
Fatemi, M., Heidari, A. & Ghorbanzadeh, M. Prediction of Aqueous Solubility of Drug-Like Compounds by Using an Artificial Neural Network and Least-Squares Support Vector Machine. Bull. Chem. Soc. Jpn. 83, 1338–1345 (2010).
doi: 10.1246/bcsj.20100074
Salahinejad, M., Le, T. C. & Winkler, D. A. Aqueous solubility prediction: do crystal lattice interactions help? Mol. Pharm. 10, 2757–2766 (2013).
pubmed: 23718811
doi: 10.1021/mp4001958
McDonagh, J. L., Nath, N., De Ferrari, L., van Mourik, T. & Mitchell, J. B. O. Uniting Cheminformatics and Chemical Theory To Predict the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules. J. Chem. Inf. Model. 54, 844–856 (2014).
pubmed: 24564264
pmcid: 3965570
doi: 10.1021/ci4005805
Kim, S., Jinich, A. & Aspuru-Guzik, A. MultiDK: A Multiple Descriptor Multiple Kernel Approach for Molecular Discovery and Its Application to Organic Flow Battery Electrolytes. J. Chem. Inf. Model. 57, 657–668 (2017).
pubmed: 28328209
doi: 10.1021/acs.jcim.6b00332
Coley, C. W., Barzilay, R., Green, W. H., Jaakkola, T. S. & Jensen, K. F. Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction. J. Chem. Inf. Model. 57, 1757–1772 (2017).
pubmed: 28696688
doi: 10.1021/acs.jcim.6b00601
Cho, H. & Choi, I. S. Enhanced Deep-Learning Prediction of Molecular Properties via Augmentation of Bond Topology. ChemMedChem 14, 1604–1609 (2019).
pubmed: 31389167
doi: 10.1002/cmdc.201900458
Cho, H. & Choi, I. S. Enhanced Deep-Learning Prediction of Molecular Properties via Augmentation of Bond Topology. Chem Med Chem 14, 1604 (2019).
Deng, T. & Jia, G. Prediction of aqueous solubility of compounds based on neural network. Mol. Phys. 118, e1600754 (2020).
doi: 10.1080/00268976.2019.1600754
Gao, P., Zhang, J., Sun, Y. & Yu, J. Accurate predictions of aqueous solubility of drug molecules via the multilevel graph convolutional network (MGCN) and SchNet architectures. Phys. Chem. Chem. Phys. 22, 23766–23772 (2020).
pubmed: 33063077
doi: 10.1039/D0CP03596C
Falcón-Cano, G., Molina, C. & Cabrera-Pérez, M. A. ADME prediction with KNIME: In silico aqueous solubility consensus model based on supervised recursive random forest approaches. ADMET DMPK 8, 251–273 (2020).
pubmed: 35300309
pmcid: 8915604
Shen, W. X. et al. Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations. Nat Mach Intell 3, 334–343 (2021).
doi: 10.1038/s42256-021-00301-6
Tosca, E. M., Bartolucci, R. & Magni, P. Application of Artificial Neural Networks to Predict the Intrinsic Solubility of Drug-Like Molecules. Pharmaceutics 13, 1101 (2021).
pubmed: 34371792
pmcid: 8309152
doi: 10.3390/pharmaceutics13071101
Wieder, O. et al. Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks. Molecules 26, 6185 (2021).
pubmed: 34684766
pmcid: 8539502
doi: 10.3390/molecules26206185
Chen, J.-H. & Tseng, Y. J. Different molecular enumeration influences in deep learning: an example using aqueous solubility. Briefings Bioinf 22, bbaa092 (2021).
doi: 10.1093/bib/bbaa092
Panapitiya, G. et al. Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models with Varied Molecular Representations. ACS Omega 7, 15695–15710 (2022).
pubmed: 35571767
pmcid: 9096921
doi: 10.1021/acsomega.2c00642
Hou, Y., Wang, S., Bai, B., Chan, H. C. S. & Yuan, S. Accurate Physical Property Predictions via Deep Learning. Molecules 27, 1668 (2022).
pubmed: 35268770
pmcid: 8912091
doi: 10.3390/molecules27051668
Raevsky, O. A., Grigor’ev, V. Y., Polianczyk, D. E., Raevskaja, O. E. & Dearden, J. C. Calculation of aqueous solubility of crystalline un-ionized organic chemicals and drugs based on structural similarity and physicochemical descriptors. J Chem Inf Model. 54, 683–91, https://doi.org/10.1021/ci400692n (2014).
Schaper, K.-J., Kunz, B. & Raevsky, O. Analysis of water solubility data on the basis of HYBOT descriptors. Part 2. QSAR Comb. Sci. 22, 943–958, https://doi.org/10.1002/qsar.200330840 (2003).