A rapid feature selection method for catalyst design: Iterative Bayesian additive regression trees (iBART).


Journal

The Journal of chemical physics
ISSN: 1089-7690
Titre abrégé: J Chem Phys
Pays: United States
ID NLM: 0375360

Informations de publication

Date de publication:
28 Apr 2022
Historique:
entrez: 30 4 2022
pubmed: 1 5 2022
medline: 1 5 2022
Statut: ppublish

Résumé

Feature selection (FS) methods often are used to develop data-driven descriptors (i.e., features) for rapidly predicting the functional properties of a physical or chemical system based on its composition and structure. FS algorithms identify descriptors from a candidate pool (i.e., feature space) built by feature engineering (FE) steps that construct complex features from the system's fundamental physical properties. Recursive FE, which involves repeated FE operations on the feature space, is necessary to build features with sufficient complexity to capture the physical behavior of a system. However, this approach creates a highly correlated feature space that contains millions or billions of candidate features. Such feature spaces are computationally demanding to process using traditional FS approaches that often struggle with strong collinearity. Herein, we address this shortcoming by developing a new method that interleaves the FE and FS steps to progressively build and select powerful descriptors with reduced computational demand. We call this method iterative Bayesian additive regression trees (iBART), as it iterates between FE with unary/binary operators and FS with Bayesian additive regression trees (BART). The capabilities of iBART are illustrated by extracting descriptors for predicting metal-support interactions in catalysis, which we compare to those predicted in our previous work using other state-of-the-art FS methods (i.e., least absolute shrinkage and selection operator + l

Identifiants

pubmed: 35490030
doi: 10.1063/5.0090055
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

164105

Auteurs

Chun-Yen Liu (CY)

Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas 77005, USA.

Shengbin Ye (S)

Department of Statistics, Rice University, Houston, Texas 77005, USA.

Meng Li (M)

Department of Statistics, Rice University, Houston, Texas 77005, USA.

Thomas P Senftle (TP)

Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas 77005, USA.

Classifications MeSH