Automatic feature engineering for catalyst design using small data without prior knowledge of target catalysis.


Journal

Communications chemistry
ISSN: 2399-3669
Titre abrégé: Commun Chem
Pays: England
ID NLM: 101725670

Informations de publication

Date de publication:
12 Jan 2024
Historique:
received: 08 08 2023
accepted: 08 12 2023
medline: 13 1 2024
pubmed: 13 1 2024
entrez: 12 1 2024
Statut: epublish

Résumé

The empirical aspect of descriptor design in catalyst informatics, particularly when confronted with limited data, necessitates adequate prior knowledge for delving into unknown territories, thus presenting a logical contradiction. This study introduces a technique for automatic feature engineering (AFE) that works on small catalyst datasets, without reliance on specific assumptions or pre-existing knowledge about the target catalysis when designing descriptors and building machine-learning models. This technique generates numerous features through mathematical operations on general physicochemical features of catalytic components and extracts relevant features for the desired catalysis, essentially screening numerous hypotheses on a machine. AFE yields reasonable regression results for three types of heterogeneous catalysis: oxidative coupling of methane (OCM), conversion of ethanol to butadiene, and three-way catalysis, where only the training set is swapped. Moreover, through the application of active learning that combines AFE and high-throughput experimentation for OCM, we successfully visualize the machine's process of acquiring precise recognition of the catalyst design. Thus, AFE is a versatile technique for data-driven catalysis research and a key step towards fully automated catalyst discoveries.

Identifiants

pubmed: 38216711
doi: 10.1038/s42004-023-01086-y
pii: 10.1038/s42004-023-01086-y
doi:

Types de publication

Journal Article

Langues

eng

Pagination

11

Subventions

Organisme : MEXT | JST | Core Research for Evolutional Science and Technology (CREST)
ID : JPMJCR17P2

Informations de copyright

© 2024. The Author(s).

Références

Ramprasad, R., Batra, R., Pilania, G., Mannodi-Kanakkithodi, A. & Kim, C. Machine learning in materials informatics: recent applications and prospects. npj Comput. Mater. 3, 54 (2017).
doi: 10.1038/s41524-017-0056-5
Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).
pubmed: 30046072 doi: 10.1038/s41586-018-0337-2
Toyao, T. et al. Machine learning for catalysis informatics: recent applications and prospects. ACS Catal. 10, 2260–2297 (2020).
doi: 10.1021/acscatal.9b04186
Takahashi, K. et al. Catalysts informatics: paradigm shift towards data-driven catalyst design. Chem. Commun. 59, 2222–2238 (2023).
doi: 10.1039/D2CC05938J
Beker, W. et al. Machine learning may sometimes simply capture literature popularity trends: a case study of heterocyclic suzuki–miyaura coupling. J. Am. Chem. Soc. 144, 4819–4827 (2022).
pubmed: 35258973 pmcid: 8949728 doi: 10.1021/jacs.1c12005
Schmidt, J., Marques, M. R. G., Botti, S. & Marques, M. A. L. Recent advances and applications of machine learning in solid-state materials science. npj Comput. Mater. 5, 83 (2019).
doi: 10.1038/s41524-019-0221-0
Strieth-Kalthoff, F. et al. Machine learning for chemical reactivity: the importance of failed experiments. Angew. Chem. Int. Ed. 61, e202204647 (2022).
doi: 10.1002/anie.202204647
Taniike, T. & Takahashi, K. The value of negative results in data-driven catalysis research. Nat. Catal. 6, 108–111 (2023).
doi: 10.1038/s41929-023-00920-9
Ryan, K., Lengyel, J. & Shatruk, M. Crystal structure prediction via deep learning. J. Am. Chem. Soc. 140, 10158–10168 (2018).
pubmed: 29874459 doi: 10.1021/jacs.8b03913
Coley, C. W. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).
pubmed: 30746086 doi: 10.1039/C8SC04228D
Reiser, P. et al. Graph neural networks for materials science and chemistry. Commun. Mater. 3, 93 (2022).
pubmed: 36468086 pmcid: 9702700 doi: 10.1038/s43246-022-00315-6
Hammer, B. & Nørskov, J. K. Theoretical surface science and catalysis—calculations and concepts. Adv. Catal. 45, 71–129 (2000).
Clavier, H. & Nolan, S. P. Percent buried volume for phosphine and N-eterocyclic carbeneligands: steric properties in organometallic chemistry. Chem. Commun. 46, 841–861 (2010).
doi: 10.1039/b922984a
Ringe, S. The importance of a charge transfer descriptor for screening potential CO
pubmed: 37147278 pmcid: 10162986 doi: 10.1038/s41467-023-37929-4
Santiago, C. B., Guo, J. Y. & Sigman, M. S. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 9, 2398–2412 (2018).
pubmed: 29719711 pmcid: 5903422 doi: 10.1039/C7SC04679K
Liu, J. et al. Toward excellence of electrocatalyst design by emerging descriptor-oriented machine learning. Adv. Funct. Mater. 32, 2110748 (2022).
doi: 10.1002/adfm.202110748
Zhang, Y. et al. Descriptor-free design of multicomponent catalysts. ACS Catal. 12, 10562–10571 (2022).
doi: 10.1021/acscatal.2c02807
Urakawa, A. & Baiker, A. Space-resolved profiling relevant in heterogeneous catalysis. Top. Catal. 52, 1312–1322 (2009).
doi: 10.1007/s11244-009-9312-3
Wada, T. et al. Structure-performance relationship of Mg(OEt)
doi: 10.1016/j.jcat.2020.06.030
Liu, C. et al. Machine learning to predict quasicrystals from chemical compositions. Adv. Mater. 33, 2102507 (2021).
doi: 10.1002/adma.202102507
Ghiringhelli, L. M. et al. Big data of materials science: critical role of the descriptor. Phys. Rev. Lett. 114, 105503 (2015).
pubmed: 25815947 doi: 10.1103/PhysRevLett.114.105503
Kim, C., Pilania, G. & Ramprasad, R. From organized high-throughput data to phenomenological theory using machine learning: the example of dielectric breakdown. Chem. Mater. 28, 1304–1311 (2016).
doi: 10.1021/acs.chemmater.5b04109
Pilania, G. et al. Machine learning bandgaps of double perovskites. Sci. Rep. 6, 19375 (2016).
pubmed: 26783247 pmcid: 4726030 doi: 10.1038/srep19375
Suzuki, K. et al. Statistical analysis and discovery of heterogeneous catalysts based on machine learning from diverse published data. ChemCatChem. 11, 4537–4547 (2019).
doi: 10.1002/cctc.201900971
Williams, T., McCullough, K. & Lauterbach, J. A. Enabling catalyst discovery through machine learning and high-throughput experimentation. Chem. Mater. 32, 157–165 (2020).
doi: 10.1021/acs.chemmater.9b03043
Ishioka, S. et al. Designing catalyst descriptors for machine learning in oxidative coupling of methane. ACS Catal. 12, 11541–11546 (2022).
doi: 10.1021/acscatal.2c03142
Nguyen, T. N. et al. Learning catalyst design based on bias-free data set for oxidative coupling of methane. ACS Catal. 11, 1797–1809 (2021).
doi: 10.1021/acscatal.0c04629
Nakanowatari, S. et al. Extraction of catalyst design heuristics from random catalyst dataset and their utilization in catalyst development for oxidative coupling of methane. ChemCatChem. 13, 3262–3269 (2021).
doi: 10.1002/cctc.202100460
Takahashi, L. et al. Constructing catalyst knowledge networks from catalyst big data in oxidative coupling of methane for designing catalysts. Chem. Sci. 12, 12546–12555 (2021).
pubmed: 34703540 pmcid: 8494033 doi: 10.1039/D1SC04390K
Takahashi, K. et al. Catalysis gene expression profiling: sequencing and designing catalysts. J. Phys. Chem. Lett. 12, 7335–7341 (2021).
pubmed: 34327995 doi: 10.1021/acs.jpclett.1c02111
Jayakumar, T. P. et al. Exploration of ethanol-to-butadiene catalysts by high-throughput experimentation and machine learning. Appl. Catal. A Gen. 666, 119427 (2023).
doi: 10.1016/j.apcata.2023.119427
Son, S. D. et al. High-throughput screening of multimetallic catalysts for three-way catalysis. Sci. Technol. Adv. Mater. Methods https://doi.org/10.1080/27660400.2023.2284130 (2023).
Yoshida, R. XenonPy is a Python software for materials informatics. https://github.com/yoshida-lab/XenonPy (2018).
Huber, P. J. Robust estimation of a location parameter. Ann. Math. Stat. 35, 73–101 (1964).
doi: 10.1214/aoms/1177703732
Maaten, L. V. D. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Wu, J. & Li, S. The role of distorted WO
doi: 10.1021/j100013a030
Ji, S. et al. Surface WO
doi: 10.1016/S0021-9517(03)00248-3
Ito, T., Wang, J., Lin, C. H. & Lunsford, J. H. Oxidative dimerization of methane over a lithium-promoted magnesium oxide catalyst. J. Am. Chem. Soc. 107, 5062–5068 (1985).
doi: 10.1021/ja00304a008
Xu, Y., Yu, L., Cai, C., Huang, J. & Guo, X. A study of the oxidative coupling of methane over SrO-La
doi: 10.1007/BF00807178
Ortiz-Bravo, C. A., Chagas, C. A. & Toniolo, F. S. Oxidative coupling of methane (OCM): An overview of the challenges and opportunities for developing new technologies. J. Nat. Gas. Sci. Eng. 96, 104254 (2021).
doi: 10.1016/j.jngse.2021.104254
Choudhary, T. V., Banerjee, S. & Choudhary, V. R. Catalysts for combustion of methane and lower alkanes. Appl. Catal. A Gen. 234, 1–23 (2002).
doi: 10.1016/S0926-860X(02)00231-4
Mine, S. et al. Analysis of updated literature data up to 2019 on the oxidative coupling of methane using an extrapolative machine-learning method to identify novel catalysts. ChemCatChem. 13, 3636–3655 (2021).
doi: 10.1002/cctc.202100495
Esterhuizen, J. A., Goldsmith, B. R. & Linic, S. Interpretable machine learning for knowledge generation in heterogeneous catalysis. Nat. Catal. 5, 175–184 (2022).
doi: 10.1038/s41929-022-00744-z
Mamun, O., Winther, K. T., Boes, J. R. & Bligaard, T. High-throughput calculations of catalytic properties of bimetallic alloy surfaces. Sci. Data 6, 76 (2019).
pubmed: 31138814 pmcid: 6538633 doi: 10.1038/s41597-019-0080-z
Trunschke, A. Prospects and challenges for autonomous catalyst discovery viewed from an experimental perspective. Catal. Sci. Technol. 12, 3650–3669 (2022).
doi: 10.1039/D2CY00275B
Ferri, F. J., Pudil, P., Hatef, M. & Kittler, J. Comparative study of techniques for large-scale feature selection. In: Pattern Recognition in Practice Iv: Multiple Paradigms, Comparative Studies, and Hybrid Systems: Proceedings of an International Workshop held on Vlieland, The Netherlands, 1–3 June 1994 (eds. Gelsema, E. S. & Kanal, L. S.) 403–416 (Elsevier, 1994).
Nguyen, T. N. et al. High-throughput experimentation and catalyst informatics for oxidative coupling of methane. ACS Catal. 10, 921–932 (2020).
doi: 10.1021/acscatal.9b04293

Auteurs

Toshiaki Taniike (T)

Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa, 923-1292, Japan. taniike@jaist.ac.jp.

Aya Fujiwara (A)

Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa, 923-1292, Japan.

Sunao Nakanowatari (S)

Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa, 923-1292, Japan.

Fernando García-Escobar (F)

Department of Chemistry, Hokkaido University, North 10, West 8, Sapporo, 060-0810, Japan.

Keisuke Takahashi (K)

Department of Chemistry, Hokkaido University, North 10, West 8, Sapporo, 060-0810, Japan.

Classifications MeSH