PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical texts.

machine learning natural language processing neural networks (computer)

Journal

Genomics & informatics
ISSN: 1598-866X
Titre abrégé: Genomics Inform
Pays: Korea (South)
ID NLM: 101223836

Informations de publication

Date de publication:
Jun 2019
Historique:
received: 15 03 2019
accepted: 27 05 2019
entrez: 16 7 2019
pubmed: 16 7 2019
medline: 16 7 2019
Statut: ppublish

Résumé

Automatically detecting mentions of pharmaceutical drugs and chemical substances is key for the subsequent extraction of relations of chemicals with other biomedical entities such as genes, proteins, diseases, adverse reactions or symptoms. The identification of drug mentions is also a prior step for complex event types such as drug dosage recognition, duration of medical treatments or drug repurposing. Formally, this task is known as named entity recognition (NER), meaning automatically identifying mentions of predefined entities of interest in running text. In the domain of medical texts, for chemical entity recognition (CER), techniques based on hand-crafted rules and graph-based models can provide adequate performance. In the recent years, the field of natural language processing has mainly pivoted to deep learning and state-of-the-art results for most tasks involving natural language are usually obtained with artificial neural networks. Competitive resources for drug name recognition in English medical texts are already available and heavily used, while for other languages such as Spanish these tools, although clearly needed were missing. In this work, we adapt an existing neural NER system, NeuroNER, to the particular domain of Spanish clinical case texts, and extend the neural network to be able to take into account additional features apart from the plain text. NeuroNER can be considered a competitive baseline system for Spanish drug and CER promoted by the Spanish national plan for the advancement of language technologies (Plan TL). PharmacoNER Tagger can be accessed at https://github.com/PlanTL-SANIDAD/PharmacoNER.

Identifiants

pubmed: 31307130
pii: GI.2019.17.2.e15
doi: 10.5808/GI.2019.17.2.e15
pmc: PMC6808625
doi:

Types de publication

Journal Article

Langues

eng

Pagination

e15

Références

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1
pubmed: 25810766
Mol Inform. 2011 Jun;30(6-7):506-19
pubmed: 27467152
J Am Med Inform Assoc. 2017 May 1;24(3):596-606
pubmed: 28040687
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S15
pubmed: 25810772
Chem Rev. 2017 Jun 28;117(12):7673-7761
pubmed: 28475312
Drug Discov Today. 2008 Sep;13(17-18):816-23
pubmed: 18602492
J Cheminform. 2011 May 16;3(1):17
pubmed: 21575201

Auteurs

Jordi Armengol-Estapé (J)

Universitat Politècnica de Catalunya (UPC), 08034 Barcelona, Spain.

Felipe Soares (F)

Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain.

Montserrat Marimon (M)

Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain.

Martin Krallinger (M)

Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain.
Centro Nacional de Investigaciones Oncológicas (CNIO), 28029 Madrid, Spain.

Classifications MeSH