A rapid feature selection method for catalyst design: Iterative Bayesian additive regression trees (iBART).
Journal
The Journal of chemical physics
ISSN: 1089-7690
Titre abrégé: J Chem Phys
Pays: United States
ID NLM: 0375360
Informations de publication
Date de publication:
28 Apr 2022
28 Apr 2022
Historique:
entrez:
30
4
2022
pubmed:
1
5
2022
medline:
1
5
2022
Statut:
ppublish
Résumé
Feature selection (FS) methods often are used to develop data-driven descriptors (i.e., features) for rapidly predicting the functional properties of a physical or chemical system based on its composition and structure. FS algorithms identify descriptors from a candidate pool (i.e., feature space) built by feature engineering (FE) steps that construct complex features from the system's fundamental physical properties. Recursive FE, which involves repeated FE operations on the feature space, is necessary to build features with sufficient complexity to capture the physical behavior of a system. However, this approach creates a highly correlated feature space that contains millions or billions of candidate features. Such feature spaces are computationally demanding to process using traditional FS approaches that often struggle with strong collinearity. Herein, we address this shortcoming by developing a new method that interleaves the FE and FS steps to progressively build and select powerful descriptors with reduced computational demand. We call this method iterative Bayesian additive regression trees (iBART), as it iterates between FE with unary/binary operators and FS with Bayesian additive regression trees (BART). The capabilities of iBART are illustrated by extracting descriptors for predicting metal-support interactions in catalysis, which we compare to those predicted in our previous work using other state-of-the-art FS methods (i.e., least absolute shrinkage and selection operator + l
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM