Fuelling the Digital Chemistry Revolution with Language Models.
Digital chemistry
Language models
Machine learning
Sandmeyer Award 2022
Synthetic Organic Chemistry
Journal
Chimia
ISSN: 0009-4293
Titre abrégé: Chimia (Aarau)
Pays: Switzerland
ID NLM: 0373152
Informations de publication
Date de publication:
09 Aug 2023
09 Aug 2023
Historique:
received:
28
06
2023
accepted:
28
06
2023
medline:
4
12
2023
pubmed:
4
12
2023
entrez:
4
12
2023
Statut:
epublish
Résumé
The RXN for Chemistry project, initiated by IBM Research Europe - Zurich in 2017, aimed to develop a series of digital assets using machine learning techniques to promote the use of data-driven methodologies in synthetic organic chemistry. This research adopts an innovative concept by treating chemical reaction data as language records, treating the prediction of a synthetic organic chemistry reaction as a translation task between precursor and product languages. Over the years, the IBM Research team has successfully developed language models for various applications including forward reaction prediction, retrosynthesis, reaction classification, atom-mapping, procedure extraction from text, inference of experimental protocols and its use in programming commercial automation hardware to implement an autonomous chemical laboratory. Furthermore, the project has recently incorporated biochemical data in training models for greener and more sustainable chemical reactions. The remarkable ease of constructing prediction models and continually enhancing them through data augmentation with minimal human intervention has led to the widespread adoption of language model technologies, facilitating the digitalization of chemistry in diverse industrial sectors such as pharmaceuticals and chemical manufacturing. This manuscript provides a concise overview of the scientific components that contributed to the prestigious Sandmeyer Award in 2022.
Identifiants
pubmed: 38047789
doi: 10.2533/chimia.2023.484
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
484-488Subventions
Organisme : Swiss National Science Foundation
ID : 180544
Pays : Switzerland
Informations de copyright
Copyright 2023 Antonio Cardinale, Alessandro Castrogiovanni, Theophile Gaudin, Joppe Geluykens, Teodoro Laino, Matteo Manica, Daniel Probst, Philippe Schwaller, Aleksandros Sobczyk, Alessandra Toniato, Alain C. Vaucher, Heiko Wolf, Federico Zipoli. License: This work is licensed under a Creative Commons Attribution 4.0 International License.