Exploring Dimensionality Reduction Techniques for Deep Learning Driven QSAR Models of Mutagenicity.
QSAR
autoencoder
cheminformatics
deep learning
dimensionality reduction
grid search
hyperparameter optimisation
locally linear embedding
mutagenicity
principal component analysis
Journal
Toxics
ISSN: 2305-6304
Titre abrégé: Toxics
Pays: Switzerland
ID NLM: 101639637
Informations de publication
Date de publication:
30 Jun 2023
30 Jun 2023
Historique:
received:
31
05
2023
revised:
28
06
2023
accepted:
28
06
2023
medline:
28
7
2023
pubmed:
28
7
2023
entrez:
28
7
2023
Statut:
epublish
Résumé
Dimensionality reduction techniques are crucial for enabling deep learning driven quantitative structure-activity relationship (QSAR) models to navigate higher dimensional toxicological spaces, however the use of specific techniques is often arbitrary and poorly explored. Six dimensionality techniques (both linear and non-linear) were hence applied to a higher dimensionality mutagenicity dataset and compared in their ability to power a simple deep learning driven QSAR model, following grid searches for optimal hyperparameter values. It was found that comparatively simpler linear techniques, such as principal component analysis (PCA), were sufficient for enabling optimal QSAR model performances, which indicated that the original dataset was at least approximately linearly separable (in accordance with Cover's theorem). However certain non-linear techniques such as kernel PCA and autoencoders performed at closely comparable levels, while (especially in the case of autoencoders) being more widely applicable to potentially non-linearly separable datasets. Analysis of the chemical space, in terms of XLogP and molecular weight, uncovered that the vast majority of testing data occurred within the defined applicability domain, as well as that certain regions were measurably more problematic and antagonised performances. It was however indicated that certain dimensionality reduction techniques were able to facilitate uniquely beneficial navigations of the chemical space.
Identifiants
pubmed: 37505541
pii: toxics11070572
doi: 10.3390/toxics11070572
pmc: PMC10384850
pii:
doi:
Types de publication
Journal Article
Langues
eng
Subventions
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/T008709/1
Pays : United Kingdom
Références
Nucleic Acids Res. 2016 Jan 4;44(D1):D1202-13
pubmed: 26400175
J Chem Inf Model. 2015 Mar 23;55(3):510-28
pubmed: 25647539
Science. 2000 Dec 22;290(5500):2319-23
pubmed: 11125149
Arch Toxicol. 2022 May;96(5):1279-1295
pubmed: 35267067
Methods. 2015 Jan;71:58-63
pubmed: 25132639
Mol Divers. 2021 Aug;25(3):1283-1299
pubmed: 34146224
Science. 2000 Dec 22;290(5500):2323-6
pubmed: 11125150
J Comput Aided Mol Des. 2004 Jul-Sep;18(7-9):475-82
pubmed: 15729847
Arch Toxicol. 2019 Dec;93(12):3643-3667
pubmed: 31781791
Methods Mol Biol. 2013;930:499-526
pubmed: 23086855
Molecules. 2019 Apr 30;24(9):
pubmed: 31052325
Environ Sci Pollut Res Int. 2021 Sep;28(34):47641-47650
pubmed: 33895950
Mutagenesis. 2019 Mar 6;34(1):3-16
pubmed: 30357358
Molecules. 2012 Apr 25;17(5):4791-810
pubmed: 22534664