Reagent prediction with a molecular transformer improves reaction data quality.
Journal
Chemical science
ISSN: 2041-6520
Titre abrégé: Chem Sci
Pays: England
ID NLM: 101545951
Informations de publication
Date de publication:
22 Mar 2023
22 Mar 2023
Historique:
received:
09
12
2022
accepted:
12
02
2023
entrez:
27
3
2023
pubmed:
28
3
2023
medline:
28
3
2023
Statut:
epublish
Résumé
Automated synthesis planning is key for efficient generative chemistry. Since reactions of given reactants may yield different products depending on conditions such as the chemical context imposed by specific reagents, computer-aided synthesis planning should benefit from recommendations of reaction conditions. Traditional synthesis planning software, however, typically proposes reactions without specifying such conditions, relying on human organic chemists who know the conditions to carry out suggested reactions. In particular, reagent prediction for arbitrary reactions, a crucial aspect of condition recommendation, has been largely overlooked in cheminformatics until recently. Here we employ the Molecular Transformer, a state-of-the-art model for reaction prediction and single-step retrosynthesis, to tackle this problem. We train the model on the US patents dataset (USPTO) and test it on Reaxys to demonstrate its out-of-distribution generalization capabilities. Our reagent prediction model also improves the quality of product prediction: the Molecular Transformer is able to substitute the reagents in the noisy USPTO data with reagents that enable product prediction models to outperform those trained on plain USPTO. This makes it possible to improve upon the state-of-the-art in reaction product prediction on the USPTO MIT benchmark.
Identifiants
pubmed: 36970100
doi: 10.1039/d2sc06798f
pii: d2sc06798f
pmc: PMC10034139
doi:
Types de publication
Journal Article
Langues
eng
Pagination
3235-3246Informations de copyright
This journal is © The Royal Society of Chemistry.
Déclaration de conflit d'intérêts
There are no conflicts to declare.
Références
J Chem Inf Model. 2022 Mar 28;62(6):1376-1387
pubmed: 35266390
J Cheminform. 2022 Mar 15;14(1):15
pubmed: 35292121
Nat Chem. 2013 Nov;5(11):952-7
pubmed: 24153374
J Chem Inf Model. 2015 Jan 26;55(1):39-53
pubmed: 25541888
ACS Omega. 2021 Nov 03;6(45):30743-30751
pubmed: 34805702
Chem Sci. 2020 Mar 3;11(12):3316-3325
pubmed: 34122839
Nat Commun. 2020 Nov 4;11(1):5575
pubmed: 33149154
ACS Cent Sci. 2019 Sep 25;5(9):1572-1583
pubmed: 31572784
J Chem Inf Model. 2022 May 9;62(9):2111-2120
pubmed: 35034452
Int J Mol Sci. 2021 Dec 27;23(1):
pubmed: 35008674
Chem Sci. 2019 Nov 5;11(1):154-168
pubmed: 32110367
J Chem Inf Model. 2016 Dec 27;56(12):2336-2346
pubmed: 28024398
Chem Sci. 2018 Nov 26;10(2):370-377
pubmed: 30746086
Chemistry. 2017 May 2;23(25):5966-5971
pubmed: 28134452
J Chem Inf Model. 2015 Feb 23;55(2):239-50
pubmed: 25588070
Chem Sci. 2018 Jun 22;9(28):6091-6098
pubmed: 30090297
Nat Commun. 2020 Sep 25;11(1):4874
pubmed: 32978395
J Chem Inf Model. 2021 Jan 25;61(1):156-166
pubmed: 33417449
ACS Cent Sci. 2018 Nov 28;4(11):1465-1476
pubmed: 30555898
Chem Sci. 2020 Mar 3;11(12):3355-3364
pubmed: 34122843
J Chem Inf Model. 2021 Jul 26;61(7):3273-3284
pubmed: 34251814
Neural Comput. 1998 Sep 15;10(7):1895-1923
pubmed: 9744903
Science. 2022 Oct 28;378(6618):399-405
pubmed: 36302014
J Chem Inf Model. 2019 Sep 23;59(9):3645-3654
pubmed: 31381340