Inverse mapping of quantum properties to structures for chemical space of small organic molecules.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
18 Jul 2024
Historique:
received: 14 09 2023
accepted: 01 07 2024
medline: 19 7 2024
pubmed: 19 7 2024
entrez: 18 7 2024
Statut: epublish

Résumé

Computer-driven molecular design combines the principles of chemistry, physics, and artificial intelligence to identify chemical compounds with tailored properties. While quantum-mechanical (QM) methods, coupled with machine learning, already offer a direct mapping from 3D molecular structures to their properties, effective methodologies for the inverse mapping in chemical space remain elusive. We address this challenge by demonstrating the possibility of parametrizing a chemical space with a finite set of QM properties. Our proof-of-concept implementation achieves an approximate property-to-structure mapping, the QIM model (which stands for "Quantum Inverse Mapping"), by forcing a variational auto-encoder with a property encoder to obtain a common internal representation for both structures and properties. After validating this mapping for small drug-like molecules, we illustrate its capabilities with an explainability study as well as by the generation of de novo molecular structures with targeted properties and transition pathways between conformational isomers. Our findings thus provide a proof-of-principle demonstration aiming to enable the inverse property-to-structure design in diverse chemical spaces.

Identifiants

pubmed: 39025883
doi: 10.1038/s41467-024-50401-1
pii: 10.1038/s41467-024-50401-1
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

6061

Subventions

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 956832
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 956832
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 956832

Informations de copyright

© 2024. The Author(s).

Références

Kulik, H. J. et al. Roadmap on machine learning in electronic structure. Electron. Struct. 4, 023004 (2022).
doi: 10.1088/2516-1075/ac572f
Sadybekov, A. V. & Katritch, V. Computational approaches streamlining drug discovery. Nature 616, 673–685 (2023).
pubmed: 37100941 doi: 10.1038/s41586-023-05905-z
von Lilienfeld, O., Müller, K. & Tkatchenko, A. Exploring chemical compound space with quantum-based machine learning. Nat. Rev. Chem. 4, 347–358 (2020).
doi: 10.1038/s41570-020-0189-9
Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R. & Tkatchenko, A. Quantum-chemical insights from deep tensor neural networks. Nat. Commun. 8, 13890 (2017).
pubmed: 28067221 pmcid: 5228054 doi: 10.1038/ncomms13890
Gao, X., Ramezanghorbani, F., Isayev, O., Smith, J. S. & Roitberg, A. E. Torchani: A free and open source pytorch-based deep learning implementation of the ani neural network potentials. J. Chem. Inf. Model. 60, 3408–3415 (2020).
pubmed: 32568524 doi: 10.1021/acs.jcim.0c00451
Bigi, F., Pozdnyakov, S. N. & Ceriotti, M. Wigner kernels: body-ordered equivariant machine learning without a basis. Preprint at https://arxiv.org/abs/2303.04124 (2023).
Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2022).
pubmed: 35508450 pmcid: 9068614 doi: 10.1038/s41467-022-29939-5
Steinmann, S. N., Wang, Q. & Seh, Z. W. How machine learning can accelerate electrocatalysis discovery and optimization. Mater. Horiz. 10, 393–406 (2023).
pubmed: 36541226 doi: 10.1039/D2MH01279K
Dreiman, G. H. S., Bictash, M., Fish, P., Griffin, L. D. & Svensson, F. Changing the hts paradigm: Ai-driven iterative screening for hit finding. Slas Discov. 26, 257–262 (2020).
pubmed: 32808550 pmcid: 7838329 doi: 10.1177/2472555220949495
Jansen, J. et al. Biased complement diversity selection for effective exploration of chemical space in hit-finding campaigns. J. Chem. Inf. Model. 59, 1709–1714 (2019).
pubmed: 30943027 doi: 10.1021/acs.jcim.9b00048
Paricharak, S. et al. Data-driven approaches used for compound library design, hit triage and bioactivity modeling in high-throughput screening. Brief. Bioinforma. 19, 277–285 (2016).
Riniker, S., Wang, Y., Jenkins, J. & Landrum, G. Using information from historical high-throughput screens to predict active compounds. J. Chem. Inf. Model. 54, 1880–91 (2014).
pubmed: 24933016 doi: 10.1021/ci500190p
Ahmed, L. et al. Efficient iterative virtual screening with apache spark and conformal prediction. J. Cheminformatics 10, 8 (2018).
doi: 10.1186/s13321-018-0265-z
Helal, K. Y., Maciejewski, M., Gregori-Puigjané, E., Glick, M. & Wassermann, A. Public domain hts fingerprints: Design and evaluation of compound bioactivity profiles from pubchem’s bioassay repository. J. Chem. Inf. Model. 56 2, 390–398 (2016).
doi: 10.1021/acs.jcim.5b00498
Beresini, M. et al. Small-molecule library subset screening as an aid for accelerating lead identification. J. Biomol. Screen. 19, 758–770 (2014).
pubmed: 24518067 doi: 10.1177/1087057114522515
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: Generative models for matter engineering. Science 361, 360–365 (2018).
pubmed: 30049875 doi: 10.1126/science.aat2663
Zunger, A. Inverse design in search of materials with target functionalities. Nat. Rev. Chem. 2, 0121 (2018).
doi: 10.1038/s41570-018-0121
Kim, K. et al. Deep-learning-based inverse design model for intelligent discovery of organic molecules. npj Comput. Mater. 4, 67 (2018).
doi: 10.1038/s41524-018-0128-1
Chen, Y. et al. Deep generative model for drug design from protein target sequence. J. Cheminformatics 15, 38 (2023).
doi: 10.1186/s13321-023-00702-2
Lee, J. et al. Machine learning-based inverse design methods considering data characteristics and design space size in materials design and manufacturing: a review. Mater. Horiz. 10, 5436–5456 (2023).
pubmed: 37560794 doi: 10.1039/D3MH00039G
Moret, M. et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun. 14, 114 (2023).
pubmed: 36611029 pmcid: 9825622 doi: 10.1038/s41467-022-35692-6
Lin, J. et al. Machine learning accelerates the investigation of targeted mofs: Performance prediction, rational design and intelligent synthesis. Nano Today 49, 101802 (2023).
doi: 10.1016/j.nantod.2023.101802
Noh, J., Gu, G. H., Kim, S. & Jung, Y. Machine-enabled inverse design of inorganic solid materials: Promises and challenges. Chem. Sci. 11, 4871–4881 (2020).
pubmed: 34122942 pmcid: 8159218 doi: 10.1039/D0SC00594K
Nigam, A., Pollice, R., Krenn, M., Gomes, Gd. P. & Aspuru-Guzik, A. Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (stoned) algorithm for molecules using selfies. Chem. Sci. 12, 7079–7090 (2021).
pubmed: 34123336 pmcid: 8153210 doi: 10.1039/D1SC00231G
Nigam, A., Pollice, R. & Aspuru-Guzik, A. Parallel tempered genetic algorithm guided by deep neural networks for inverse molecular design. Digital Discov. 1, 390–404 (2022).
doi: 10.1039/D2DD00003B
Anstine, D. M. & Isayev, O. Generative models as an emerging paradigm in the chemical sciences. J. Am. Chem. Soc. 145, 8736–8750 (2023).
pubmed: 37052978 pmcid: 10141264 doi: 10.1021/jacs.2c13467
Seo, S., Lim, J. & Kim, W. Y. Molecular generative model via retrosynthetically prepared chemical building block assembly. Adv. Sci. 10, 2206674 (2023).
doi: 10.1002/advs.202206674
Dollar, O., Joshi, N., Beck, D. A. C. & Pfaendtner, J. Attention-based generative models for de novo molecular design. Chem. Sci. 12, 8362–8372 (2021).
pubmed: 34221317 pmcid: 8221056 doi: 10.1039/D1SC01050F
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
pubmed: 29532027 pmcid: 5833007 doi: 10.1021/acscentsci.7b00572
De Cao, N. & Kipf, T. MolGAN: an implicit generative model for small molecular graphs. Preprint at https://arxiv.org/abs/1805.11973 (2018).
Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de novo design through deep reinforcement learning. J. Cheminformatics 9, 48 (2017).
doi: 10.1186/s13321-017-0235-x
Kang, S. & Cho, K. Conditional molecular design with deep generative models. J. Chem. Inf. Model. 59, 43–52 (2018).
pubmed: 30016587 doi: 10.1021/acs.jcim.8b00263
Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. S. DiffDock: diffusion steps, twists, and turns for molecular docking. In Proc. 11th International Conference on Learning Representations https://openreview.net/forum?id=kKF8_K-mBbS (2023).
Guimaraes, G. L., Sanchez-Lengeling, B., Outeiral, C., Farias, P. L. C. & Aspuru-Guzik, A. Objective-reinforced generative adversarial networks (organ) for sequence generation models. Preprint at https://arXiv.org/abs/1705.10843 (2018).
Samanta, B. et al. Nevae: A deep generative model for molecular graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, 33, 1110–1117 (2019).
Li, Y., Zhang, L. & ming Liu, Z. Multi-objective de novo drug design with conditional graph generative model. J. Cheminformatics 10, 33 (2018).
doi: 10.1186/s13321-018-0287-6
Maziarka, Ł. et al. Mol-cyclegan: a generative model for molecular optimization. J. Cheminformatics 12, 2 (2019).
doi: 10.1186/s13321-019-0404-1
Zang, C. & Wang, F. Moflow: an invertible flow model for generating molecular graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 617–626 (2020).
Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. Preprint at https://arXiv.org/abs/1802.04364 (2019).
Grover, A., Zweig, A. & Ermon, S. Graphite: Iterative generative modeling of graphs. Preprint at https://arXiv.org/abs/1803.10459 (2019).
Xue, D. et al. Advances and challenges in deep generative models for de novo molecule generation. WIREs Comput. Mol. Sci. 9, e1395 (2019).
doi: 10.1002/wcms.1395
Gebauer, N. W. A., Gastegger, M., Hessmann, S. S. P., Müller, K.-R. & Schütt, K. T. Inverse design of 3d molecular structures with conditional generative neural networks. Nat. Commun. 13, 973 (2022).
pubmed: 35190542 pmcid: 8861047 doi: 10.1038/s41467-022-28526-y
Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3d. Preprint at https://arXiv.org/abs/2203.17003 (2022).
Xie, T., Fu, X., Ganea, O.-E., Barzilay, R. & Jaakkola, T. S. Crystal diffusion variational autoencoder for periodic material generation. In International Conference on Learning Representations https://openreview.net/forum?id=03RLpj-tc_ (2022).
Wu, L., Gong, C., Liu, X., Ye, M. & Liu, Q. Diffusion-based molecule generation with informative prior bridges. In Advances in Neural Information Processing Systems https://openreview.net/forum?id=TJUNtiZiTKE (2022).
Guan, J.et al. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. In The Eleventh International Conference on Learning Representations https://openreview.net/forum?id=kJqXEPXMsE0 (2023).
Xu, M. et al. Geodiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations https://openreview.net/forum?id=PzcvxEMzvQC (2022).
Hiener, D. C. & Hutchison, G. R. Pareto optimization of oligomer polarizability and dipole moment using a genetic algorithm. J. Phys. Chem. A 126, 2750–2760 (2022).
pubmed: 35471827 doi: 10.1021/acs.jpca.2c01266
Mannodi-Kanakkithodi, A., Pilania, G., Huan, T. D., Lookman, T. & Ramprasad, R. Machine learning strategy for accelerated design of polymer dielectrics. Sci. Rep. 6, 20952 (2016).
pubmed: 26876223 pmcid: 4753456 doi: 10.1038/srep20952
Yuan, Q., Santana-Bonilla, A., Zwijnenburg, M. A. & Jelfs, K. E. Molecular generation targeting desired electronic properties via deep generative models. Nanoscale 12, 6744–6758 (2020).
pubmed: 32163074 doi: 10.1039/C9NR10687A
Westermayr, J., Gilkes, J., Barrett, R. & Maurer, R. J. High-throughput property-driven generative design of functional organic molecules. Nat. Comput. Sci. 3, 139–148 (2023).
pubmed: 38177626 doi: 10.1038/s43588-022-00391-1
Medrano Sandonas, L. et al. "Freedom of design” in chemical compound space: towards rational in silico design of molecules with targeted quantum-mechanical properties. Chem. Sci. 14, 10702–10717 (2023).
pubmed: 37829035 pmcid: 10566466 doi: 10.1039/D3SC03598K
Góger, S., Medrano Sandonas, L., Müller, C. & Tkatchenko, A. Data-driven tailoring of molecular dipole polarizability and frontier orbital energies in chemical compound space. Phys. Chem. Chem. Phys. 25, 22211–22222 (2023).
pubmed: 37566426 pmcid: 10445328 doi: 10.1039/D3CP02256K
Hoja, J. et al. QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules. Sci. Data 8, 43 (2021).
pubmed: 33531509 pmcid: 7854709 doi: 10.1038/s41597-021-00812-2
van der Maaten, L. & Hinton, G. Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Rincón, L., Alvarellos, J. E. & Almeida, R. Electron density, exchange-correlation density, and bond characterization from the perspective of the valence-bond theory. II. Numerical results. J. Chem. Phys. 122, 214103 (2005).
Collins, T. C., Euwema, R. N., Stukel, D. J. & Wepfer, G. G. Valence electron density of states of znse obtained from an energy dependent exchange approximation. Int. J. Quantum Chem. 5, 77–85 (1970).
doi: 10.1002/qua.560050706
Shao, H., Kumar, A. & Fletcher, P. T. The riemannian geometry of deep generative models. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 428–4288 (2018).
Makri, S., Ortner, C. & Kermode, J. R. A preconditioning scheme for minimum energy path finding methods. J. Chem. Phys. 150, 094109 (2019).
pubmed: 30849914 doi: 10.1063/1.5064465
Unke, O. et al. Spookynet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nat. Commun. 12, 7273 (2021).
pubmed: 34907176 pmcid: 8671403 doi: 10.1038/s41467-021-27504-0
Schreiner, M., Bhowmik, A., Vegge, T., Jørgensen, P. B. & Winther, O. Neuralneb—neural networks can find reaction paths fast. Mach. Learn.: Sci. Technol. 3, 045022 (2022).
Vignac, C. & Frossard, P. Top-n: Equivariant set and graph generation without exchangeability. In International Conference on Learning Representations https://openreview.net/forum?id=-Gk_IPJWvk (2022).
Zhu, X., Thompson, K. & Martinez, T. Geodesic interpolation for reaction pathways. J. Chem. Phys. 150, 164103 (2019).
pubmed: 31042909 doi: 10.1063/1.5090303
Medrano Sandonas, L. et al. Dataset for quantum-mechanical exploration of conformers and solvent effects in large drug-like molecules. Sci. Data 11, 742 (2024).
Wu, Z. et al. Moleculenet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).
pubmed: 29629118 doi: 10.1039/C7SC02664A
Sorkun, M. C., Khetan, A. & Er, S. Aqsoldb, a curated reference set of aqueous solubility and 2d descriptors for a diverse set of compounds. Sci. Data 6, 143 (2019).
pubmed: 31395888 pmcid: 6687799 doi: 10.1038/s41597-019-0151-1
Cremer, J., Medrano Sandonas, L., Tkatchenko, A., Clevert, D.-A. & De Fabritiis, G. Equivariant graph neural networks for toxicity prediction. Chem. Res. Toxicol. 36, 1561–1573 (2023).
pubmed: 37690056 pmcid: 10583285
Kingma, D. P. & Welling, M. Auto-encoding variational bayes. Preprint at https://arXiv.org/abs/1312.6114 (2022).
Rupp, M., Tkatchenko, A., Müller, K.-R. & von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).
pubmed: 22400967 doi: 10.1103/PhysRevLett.108.058301
Montavon, G. et al. Machine learning of molecular electronic properties in chemical compound space. New J. Phys. 15, 095003 (2013).
doi: 10.1088/1367-2630/15/9/095003
Dokmanic, I., Parhizkar, R., Ranieri, J. & Vetterli, M. Euclidean distance matrices: Essential theory, algorithms, and applications. IEEE Signal Process. Mag. 32, 12–30 (2015).
doi: 10.1109/MSP.2015.2398954
Hoffmann, M. & Noé, F. Generating valid euclidean distance matrices. Preprint at https://arXiv.org/abs/1910.03131 (2019).
O’Boyle, N. M. et al. Open babel: An open chemical toolbox. J. Cheminformatics 3, 1–14 (2011).
Seifert, G., Porezag, D. & Frauenheim, T. Calculations of molecules, clusters, and solids with a simplified LCAO-DFT-LDA scheme. Int. J. Quantum Chem. 58, 185–192 (1996).
doi: 10.1002/(SICI)1097-461X(1996)58:2<185::AID-QUA7>3.0.CO;2-U
Gaus, M., Cui, Q. & Elstner, M. DFTB3: Extension of the self-consistent-charge density-functional tight-binding method (SCC-DFTB). J. Chem. Theory Comput. 7, 931–948 (2011).
doi: 10.1021/ct100684s
Tkatchenko, A., DiStasio Jr, R. A., Car, R. & Scheffler, M. Accurate and efficient method for many-body van der waals interactions. Phys. Rev. Lett. 108, 236402 (2012).
pubmed: 23003978 doi: 10.1103/PhysRevLett.108.236402
Stöhr, M., Michelitsch, G. S., Tully, J. C., Reuter, K. & Maurer, R. J. Communication: Charge-population based dispersion interactions for molecules and materials. J. Chem. Phys. 144, 151101 (2016).
pubmed: 27389199 doi: 10.1063/1.4947214
Perdew, J. P., Ernzerhof, M. & Burke, K. Rationale for mixing exact exchange with density functional approximations. J. Chem. Phys. 105, 9982–9985 (1996).
doi: 10.1063/1.472933
Adamo, C. & Barone, V. Toward reliable density functional methods without adjustable parameters: The PBE0 model. J. Chem. Phys. 110, 6158–6170 (1999).
doi: 10.1063/1.478522
Ambrosetti, A., Reilly, A. M., DiStasio Jr, R. A. & Tkatchenko, A. Long-range correlation energy calculated from coupled atomic response functions. J. Chem. Phys. 140, 18A508 (2014).
pubmed: 24832316 doi: 10.1063/1.4865104
Havu, V., Blum, V., Havu, P. & Scheffler, M. Efficient O(N) integration for all-electron electronic structure calculation using numeric basis functions. J. Comput. Phys. 228, 8367–8379 (2009).
doi: 10.1016/j.jcp.2009.08.008
Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. Preprint at https://arxiv.org/abs/1312.6034 (2014).
Fallani, A., Medrano Sandonas, L. & Tkatchenko, A. Inverse mapping of quantum properties to structures for chemical space of small organic molecules. ZENODO https://doi.org/10.5281/zenodo.11537048 (2024).

Auteurs

Alessio Fallani (A)

Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg. alessio.fallani.001@student.uni.lu.

Leonardo Medrano Sandonas (L)

Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg. leonardoms20@gmail.com.
Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany. leonardoms20@gmail.com.

Alexandre Tkatchenko (A)

Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg. alexandre.tkatchenko@uni.lu.

Classifications MeSH