Efficient Exploration of Chemical Compound Space Using Active Learning for Prediction of Thermodynamic Properties of Alkane Molecules.


Journal

Journal of chemical information and modeling
ISSN: 1549-960X
Titre abrégé: J Chem Inf Model
Pays: United States
ID NLM: 101230060

Informations de publication

Date de publication:
13 11 2023
Historique:
medline: 14 11 2023
pubmed: 20 10 2023
entrez: 19 10 2023
Statut: ppublish

Résumé

We introduce an exploratory active learning (AL) algorithm using Gaussian process regression and marginalized graph kernel (GPR-MGK) to sample chemical compound space (CCS) at minimal cost. Targeting 251,728 enumerated alkane molecules with 4-19 carbon atoms, we applied the AL algorithm to select a diverse and representative set of molecules and then conducted high-throughput molecular simulations on these selected molecules. To demonstrate the power of the AL algorithm, we built directed message-passing neural networks (D-MPNN) using simulation data as the training set to predict liquid densities, heat capacities, and vaporization enthalpies of the CCS. Validations show that D-MPNN models built on the smallest training set considered in this work, which consists of 313 molecules or 0.124% of the original CCS, predict the properties with

Identifiants

pubmed: 37857374
doi: 10.1021/acs.jcim.3c01430
doi:

Substances chimiques

Alkanes 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

6515-6524

Auteurs

Yan Xiang (Y)

School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.

Yu-Hang Tang (YH)

Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States.
NVIDIA Corporation, Santa Clara, California 95051, United States.

Zheng Gong (Z)

School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.

Hongyi Liu (H)

School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.

Liang Wu (L)

School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.

Guang Lin (G)

Department of Mathematics & School of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47907, United States.

Huai Sun (H)

School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Algorithms Software Artificial Intelligence Computer Simulation

Unsupervised learning for real-time and continuous gait phase detection.

Dollaporn Anopas, Yodchanan Wongsawat, Jetsada Arnin
1.00
Humans Gait Neural Networks, Computer Unsupervised Machine Learning Walking

Classifications MeSH