Efficient Exploration of Chemical Compound Space Using Active Learning for Prediction of Thermodynamic Properties of Alkane Molecules.
Journal
Journal of chemical information and modeling
ISSN: 1549-960X
Titre abrégé: J Chem Inf Model
Pays: United States
ID NLM: 101230060
Informations de publication
Date de publication:
13 11 2023
13 11 2023
Historique:
medline:
14
11
2023
pubmed:
20
10
2023
entrez:
19
10
2023
Statut:
ppublish
Résumé
We introduce an exploratory active learning (AL) algorithm using Gaussian process regression and marginalized graph kernel (GPR-MGK) to sample chemical compound space (CCS) at minimal cost. Targeting 251,728 enumerated alkane molecules with 4-19 carbon atoms, we applied the AL algorithm to select a diverse and representative set of molecules and then conducted high-throughput molecular simulations on these selected molecules. To demonstrate the power of the AL algorithm, we built directed message-passing neural networks (D-MPNN) using simulation data as the training set to predict liquid densities, heat capacities, and vaporization enthalpies of the CCS. Validations show that D-MPNN models built on the smallest training set considered in this work, which consists of 313 molecules or 0.124% of the original CCS, predict the properties with
Identifiants
pubmed: 37857374
doi: 10.1021/acs.jcim.3c01430
doi:
Substances chimiques
Alkanes
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM