SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer.
Chemical Structure Recognition
Deep Learning
End-to-End Model
Swin Transfromer
Journal
Journal of cheminformatics
ISSN: 1758-2946
Titre abrégé: J Cheminform
Pays: England
ID NLM: 101516718
Informations de publication
Date de publication:
01 Jul 2022
01 Jul 2022
Historique:
received:
09
02
2022
accepted:
12
06
2022
entrez:
1
7
2022
pubmed:
2
7
2022
medline:
2
7
2022
Statut:
epublish
Résumé
Optical chemical structure recognition from scientific publications is essential for rediscovering a chemical structure. It is an extremely challenging problem, and current rule-based and deep-learning methods cannot achieve satisfactory recognition rates. Herein, we propose SwinOCSR, an end-to-end model based on a Swin Transformer. This model uses the Swin Transformer as the backbone to extract image features and introduces Transformer models to convert chemical information from publications into DeepSMILES. A novel chemical structure dataset was constructed to train and verify our method. Our proposed Swin Transformer-based model was extensively tested against the backbone of existing publicly available deep learning methods. The experimental results show that our model significantly outperforms the compared methods, demonstrating the model's effectiveness. Moreover, we used a focal loss to address the token imbalance problem in the text representation of the chemical structure diagram, and our model achieved an accuracy of 98.58%.
Identifiants
pubmed: 35778754
doi: 10.1186/s13321-022-00624-5
pii: 10.1186/s13321-022-00624-5
pmc: PMC9248127
doi:
Types de publication
Journal Article
Langues
eng
Pagination
41Subventions
Organisme : Important Drug Development Fund, Ministry of Science and Technology of China
ID : 2018ZX09735002
Organisme : National Key R&D Program of China
ID : 2016YFA0502304
Informations de copyright
© 2022. The Author(s).
Références
Nucleic Acids Res. 2019 Jan 8;47(D1):D1102-D1109
pubmed: 30371825
J Cheminform. 2020 Oct 27;12(1):65
pubmed: 33372621
J Cheminform. 2021 Aug 17;13(1):61
pubmed: 34404468
Chem Cent J. 2009 Feb 05;3:4
pubmed: 19196483
J Chem Inf Model. 2009 Apr;49(4):780-7
pubmed: 19298076
J Chem Inf Model. 2019 Mar 25;59(3):1017-1029
pubmed: 30758950
J Chem Inf Model. 2009 Mar;49(3):740-3
pubmed: 19434905
J Chem Inf Model. 2014 Aug 25;54(8):2380-90
pubmed: 25068386
Chem Sci. 2021 Sep 29;12(42):14174-14181
pubmed: 34760202
J Chem Inf Comput Sci. 2003 Mar-Apr;43(2):493-500
pubmed: 12653513
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276