Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations.
Journal
Chemical science
ISSN: 2041-6520
Titre abrégé: Chem Sci
Pays: England
ID NLM: 101545951
Informations de publication
Date de publication:
14 Feb 2019
14 Feb 2019
Historique:
received:
19
09
2018
accepted:
17
11
2018
entrez:
8
3
2019
pubmed:
8
3
2019
medline:
8
3
2019
Statut:
epublish
Résumé
There has been a recent surge of interest in using machine learning across chemical space in order to predict properties of molecules or design molecules and materials with the desired properties. Most of this work relies on defining clever feature representations, in which the chemical graph structure is encoded in a uniform way such that predictions across chemical space can be made. In this work, we propose to exploit the powerful ability of deep neural networks to learn a feature representation from low-level encodings of a huge corpus of chemical structures. Our model borrows ideas from neural machine translation: it translates between two semantically equivalent but syntactically different representations of molecular structures, compressing the meaningful information both representations have in common in a low-dimensional representation vector. Once the model is trained, this representation can be extracted for any new molecule and utilized as a descriptor. In fair benchmarks with respect to various human-engineered molecular fingerprints and graph-convolution models, our method shows competitive performance in modelling quantitative structure-activity relationships in all analysed datasets. Additionally, we show that our descriptor significantly outperforms all baseline molecular fingerprints in two ligand-based virtual screening tasks. Overall, our descriptors show the most consistent performances in all experiments. The continuity of the descriptor space and the existence of the decoder that permits deducing a chemical structure from an embedding vector allow for exploration of the space and open up new opportunities for compound optimization and idea generation.
Identifiants
pubmed: 30842833
doi: 10.1039/c8sc04175j
pii: c8sc04175j
pmc: PMC6368215
doi:
Types de publication
Journal Article
Langues
eng
Pagination
1692-1701Références
Mol Inform. 2018 Jan;37(1-2):
pubmed: 29235269
J Chem Inf Comput Sci. 2003 Jul-Aug;43(4):1177-85
pubmed: 12870909
J Comput Aided Mol Des. 2015 Sep;29(9):885-96
pubmed: 26201396
J Cheminform. 2017 Aug 14;9(1):45
pubmed: 29086168
J Chem Inf Model. 2009 Feb;49(2):169-84
pubmed: 19434821
Nucleic Acids Res. 2014 Jan;42(Database issue):D1083-90
pubmed: 24214965
Methods. 2015 Jan;71:58-63
pubmed: 25132639
Chem Sci. 2017 Oct 31;9(2):513-530
pubmed: 29629118
J Chem Inf Model. 2009 Sep;49(9):2077-81
pubmed: 19702240
Neural Netw. 2015 Jan;61:85-117
pubmed: 25462637
Nucleic Acids Res. 2016 Jan 4;44(D1):D1202-13
pubmed: 26400175
J Cheminform. 2013 Jan 24;5(1):7
pubmed: 23343401
Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
J Chem Inf Model. 2016 Oct 24;56(10):1936-1949
pubmed: 27689393
Proc Natl Acad Sci U S A. 2008 Jul 1;105(26):9059-64
pubmed: 18579783
J Chem Inf Model. 2005 Jan-Feb;45(1):177-82
pubmed: 15667143
J Chem Inf Model. 2012 May 25;52(5):1103-13
pubmed: 22551340
J Chem Inf Model. 2013 Sep 23;53(9):2240-51
pubmed: 23944269
J Chem Inf Model. 2006 Sep-Oct;46(5):1924-36
pubmed: 16995723
SAR QSAR Environ Res. 2018 Sep;29(9):743-754
pubmed: 30220217
J Med Chem. 2012 Jun 14;55(11):5165-73
pubmed: 22643060
J Chem Inf Comput Sci. 2004 May-Jun;44(3):1000-5
pubmed: 15154768
ACS Cent Sci. 2018 Jan 24;4(1):120-131
pubmed: 29392184
J Med Chem. 2006 Nov 16;49(23):6789-801
pubmed: 17154509
J Chem Inf Model. 2012 Jun 25;52(6):1686-97
pubmed: 22612593
Drug Discov Today. 2018 Jun;23(6):1241-1250
pubmed: 29366762
ACS Cent Sci. 2018 Feb 28;4(2):268-276
pubmed: 29532027
J Cheminform. 2013 May 30;5(1):26
pubmed: 23721588
J Pharm Sci. 2007 Nov;96(11):2838-60
pubmed: 17786989