Fuelling the Digital Chemistry Revolution with Language Models.

Digital chemistry Language models Machine learning Sandmeyer Award 2022 Synthetic Organic Chemistry

Journal

Chimia
ISSN: 0009-4293
Titre abrégé: Chimia (Aarau)
Pays: Switzerland
ID NLM: 0373152

Informations de publication

Date de publication:
09 Aug 2023
Historique:
received: 28 06 2023
accepted: 28 06 2023
medline: 4 12 2023
pubmed: 4 12 2023
entrez: 4 12 2023
Statut: epublish

Résumé

The RXN for Chemistry project, initiated by IBM Research Europe - Zurich in 2017, aimed to develop a series of digital assets using machine learning techniques to promote the use of data-driven methodologies in synthetic organic chemistry. This research adopts an innovative concept by treating chemical reaction data as language records, treating the prediction of a synthetic organic chemistry reaction as a translation task between precursor and product languages. Over the years, the IBM Research team has successfully developed language models for various applications including forward reaction prediction, retrosynthesis, reaction classification, atom-mapping, procedure extraction from text, inference of experimental protocols and its use in programming commercial automation hardware to implement an autonomous chemical laboratory. Furthermore, the project has recently incorporated biochemical data in training models for greener and more sustainable chemical reactions. The remarkable ease of constructing prediction models and continually enhancing them through data augmentation with minimal human intervention has led to the widespread adoption of language model technologies, facilitating the digitalization of chemistry in diverse industrial sectors such as pharmaceuticals and chemical manufacturing. This manuscript provides a concise overview of the scientific components that contributed to the prestigious Sandmeyer Award in 2022.

Identifiants

pubmed: 38047789
doi: 10.2533/chimia.2023.484
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

484-488

Subventions

Organisme : Swiss National Science Foundation
ID : 180544
Pays : Switzerland

Informations de copyright

Copyright 2023 Antonio Cardinale, Alessandro Castrogiovanni, Theophile Gaudin, Joppe Geluykens, Teodoro Laino, Matteo Manica, Daniel Probst, Philippe Schwaller, Aleksandros Sobczyk, Alessandra Toniato, Alain C. Vaucher, Heiko Wolf, Federico Zipoli. License: This work is licensed under a Creative Commons Attribution 4.0 International License.

Auteurs

Antonio Cardinale (A)

IBM Research Europe - Zurich, Säumerstrasse 4, Rüschlikon, CH-8803, Switzerland.

Alessandro Castrogiovanni (A)

IBM Research Europe - Zurich, Säumerstrasse 4, Rüschlikon, CH-8803, Switzerland.

Theophile Gaudin (T)

IBM Research Europe - Zurich, Säumerstrasse 4, Rüschlikon, CH-8803, Switzerland.

Joppe Geluykens (J)

IBM Research Europe - Zurich, Säumerstrasse 4, Rüschlikon, CH-8803, Switzerland.

Teodoro Laino (T)

IBM Research Europe - Zurich, Säumerstrasse 4, Rüschlikon, CH-8803, Switzerland. teo@zurich.ibm.com.
National Center for Competence in Research-Catalysis (NCCR-Catalysis), Zurich, Switzerland.

Matteo Manica (M)

IBM Research Europe - Zurich, Säumerstrasse 4, Rüschlikon, CH-8803, Switzerland.

Daniel Probst (D)

IBM Research Europe - Zurich, Säumerstrasse 4, Rüschlikon, CH-8803, Switzerland.

Philippe Schwaller (P)

IBM Research Europe - Zurich, Säumerstrasse 4, Rüschlikon, CH-8803, Switzerland.
National Center for Competence in Research-Catalysis (NCCR-Catalysis), Zurich, Switzerland.

Aleksandros Sobczyk (A)

IBM Research Europe - Zurich, Säumerstrasse 4, Rüschlikon, CH-8803, Switzerland.
National Center for Competence in Research-Catalysis (NCCR-Catalysis), Zurich, Switzerland.

Alessandra Toniato (A)

IBM Research Europe - Zurich, Säumerstrasse 4, Rüschlikon, CH-8803, Switzerland.
National Center for Competence in Research-Catalysis (NCCR-Catalysis), Zurich, Switzerland.

Alain C Vaucher (AC)

IBM Research Europe - Zurich, Säumerstrasse 4, Rüschlikon, CH-8803, Switzerland.
National Center for Competence in Research-Catalysis (NCCR-Catalysis), Zurich, Switzerland.

Heiko Wolf (H)

IBM Research Europe - Zurich, Säumerstrasse 4, Rüschlikon, CH-8803, Switzerland.

Federico Zipoli (F)

IBM Research Europe - Zurich, Säumerstrasse 4, Rüschlikon, CH-8803, Switzerland.
National Center for Competence in Research-Catalysis (NCCR-Catalysis), Zurich, Switzerland.

Classifications MeSH