Molecular Structure Extraction from Documents Using Deep Learning.


Journal

Journal of chemical information and modeling
ISSN: 1549-960X
Titre abrégé: J Chem Inf Model
Pays: United States
ID NLM: 101230060

Informations de publication

Date de publication:
25 03 2019
Historique:
pubmed: 14 2 2019
medline: 8 5 2020
entrez: 14 2 2019
Statut: ppublish

Résumé

Chemical structure extraction from documents remains a hard problem because of both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally but still routinely encounter situations where recognition rates are not yet satisfactory and systematic improvement is challenging. Complications impacting the performance of current approaches include the diversity in visual styles used by various software to render structures, the frequent use of ad hoc annotations, and other challenges related to image quality, including resolution and noise. We present end-to-end deep learning solutions for both segmenting molecular structures from documents and predicting chemical structures from the segmented images. This deep-learning-based approach does not require any handcrafted features, is learned directly from data, and is robust against variations in image quality and style. Using the deep learning approach described herein, we show that it is possible to perform well on both segmentation and prediction of low-resolution images containing moderately sized molecules found in journal articles and patents.

Identifiants

pubmed: 30758950
doi: 10.1021/acs.jcim.8b00669
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

1017-1029

Auteurs

Joshua Staker (J)

Schrödinger, Inc. , 101 SW Main Street , Portland , Oregon 97204 , United States.

Kyle Marshall (K)

Schrödinger, Inc. , 101 SW Main Street , Portland , Oregon 97204 , United States.

Robert Abel (R)

Schrödinger, Inc. , 120 West 45th Street , New York , New York 10036 , United States.

Carolyn M McQuaw (CM)

Schrödinger, Inc. , 101 SW Main Street , Portland , Oregon 97204 , United States.

Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Humans Breast Neoplasms Female Deep Learning Ultrasonography, Mammary
Humans Chondrocytes Osteoarthritis Matrix Metalloproteinase 13 Drug Discovery
Humans Deep Learning Mouth Neoplasms Drug Resistance, Neoplasm Cell Line, Tumor

Classifications MeSH