Molecular Structure Extraction from Documents Using Deep Learning.

Data Mining Deep Learning Documentation Drug Discovery / methods

Journal

Journal of chemical information and modeling

ISSN: 1549-960X

Titre abrégé: J Chem Inf Model

Pays: United States

ID NLM: 101230060

Informations de publication

Date de publication:
25 03 2019

Historique:

pubmed: 14 2 2019

medline: 8 5 2020

entrez: 14 2 2019

Statut: ppublish

Résumé

Chemical structure extraction from documents remains a hard problem because of both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally but still routinely encounter situations where recognition rates are not yet satisfactory and systematic improvement is challenging. Complications impacting the performance of current approaches include the diversity in visual styles used by various software to render structures, the frequent use of ad hoc annotations, and other challenges related to image quality, including resolution and noise. We present end-to-end deep learning solutions for both segmenting molecular structures from documents and predicting chemical structures from the segmented images. This deep-learning-based approach does not require any handcrafted features, is learned directly from data, and is robust against variations in image quality and style. Using the deep learning approach described herein, we show that it is possible to perform well on both segmentation and prediction of low-resolution images containing moderately sized molecules found in journal articles and patents.

Identifiants

DOI: 10.1021/acs.jcim.8b00669 PMID: 30758950

pubmed: 30758950

doi: 10.1021/acs.jcim.8b00669

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

1017-1029

Molecular Structure Extraction from Documents Using Deep Learning.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Auteurs

Joshua Staker (J)

Kyle Marshall (K)

Robert Abel (R)

Carolyn M McQuaw (CM)

Articles similaires

Exploring structural diversity across the protein universe with The Encyclopedia of Domains.

ACL-DUNet: A tumor segmentation method based on multiple attention and densely connected breast ultrasound images.

A novel small molecule screening assay using normal human chondrocytes toward osteoarthritis drug discovery.

Deep learning-based automatic image classification of oral cancer cells acquiring chemoresistance in vitro.

Classifications MeSH