Supervised methods to extract clinical events from cardiology reports in Italian.
Information extraction
Natural language processing
Neural networks
Journal
Journal of biomedical informatics
ISSN: 1532-0480
Titre abrégé: J Biomed Inform
Pays: United States
ID NLM: 100970413
Informations de publication
Date de publication:
07 2019
07 2019
Historique:
received:
25
11
2018
revised:
17
05
2019
accepted:
28
05
2019
pubmed:
1
6
2019
medline:
12
9
2020
entrez:
1
6
2019
Statut:
ppublish
Résumé
Clinical narratives are a valuable source of information for both patient care and biomedical research. Given the unstructured nature of medical reports, specific automatic techniques are required to extract relevant entities from such texts. In the natural language processing (NLP) community, this task is often addressed by using supervised methods. To develop such methods, both reliably-annotated corpora and elaborately designed features are needed. Despite the recent advances on corpora collection and annotation, research on multiple domains and languages is still limited. In addition, to compute the features required for supervised classification, suitable language- and domain-specific tools are needed. In this work, we propose a novel application of recurrent neural networks (RNNs) for event extraction from medical reports written in Italian. To train and evaluate the proposed approach, we annotated a corpus of 75 cardiology reports for a total of 4365 mentions of relevant events and their attributes (e.g., the polarity). For the annotation task, we developed specific annotation guidelines, which are provided together with this paper. The RNN-based classifier was trained on a training set including 3335 events (60 documents). The resulting model was integrated into an NLP pipeline that uses a dictionary lookup approach to search for relevant concepts inside the text. A test set of 1030 events (15 documents) was used to evaluate and compare different pipeline configurations. As a main result, using the RNN-based classifier instead of the dictionary lookup approach allowed increasing recall from 52.4% to 88.9%, and precision from 81.1% to 88.2%. Further, using the two methods in combination, we obtained final recall, precision, and F1 score of 91.7%, 88.6%, and 90.1%, respectively. These experiments indicate that integrating a well-performing RNN-based classifier with a standard knowledge-based approach can be a good strategy to extract information from clinical text in non-English languages.
Identifiants
pubmed: 31150777
pii: S1532-0464(19)30139-X
doi: 10.1016/j.jbi.2019.103219
pmc: PMC6948016
mid: NIHMS1063850
pii:
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
103219Subventions
Organisme : NIGMS NIH HHS
ID : R01 GM114355
Pays : United States
Organisme : NLM NIH HHS
ID : R01 LM012973
Pays : United States
Informations de copyright
Copyright © 2019 Elsevier Inc. All rights reserved.
Références
J Biomed Inform. 2009 Oct;42(5):839-51
pubmed: 19435614
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):806-13
pubmed: 23564629
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):540-3
pubmed: 21846785
J Biomed Inform. 2013 Jun;46(3):425-35
pubmed: 23410888
J Am Med Inform Assoc. 2015 Jan;22(1):143-54
pubmed: 25147248
Acad Med. 1999 Aug;74(8):890-5
pubmed: 10495728
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276
J Am Med Inform Assoc. 2005 May-Jun;12(3):296-8
pubmed: 15684123
Proc Conf. 2013 Jun;2013:14-19
pubmed: 29082384
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6
pubmed: 21685143
Int J Med Inform. 2018 Mar;111:140-148
pubmed: 29425625
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67
pubmed: 28699566
Trans Assoc Comput Linguist. 2014 Apr;2:143-154
pubmed: 29082229
Yearb Med Inform. 2008;:128-44
pubmed: 18660887
Yearb Med Inform. 2015 Aug 13;10(1):183-93
pubmed: 26293867
AMIA Annu Symp Proc. 2018 Apr 16;2017:1812-1819
pubmed: 29854252
J Biomed Inform. 2016 Oct;63:22-32
pubmed: 27444186
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13
pubmed: 20819853
Proc AMIA Symp. 2000;:270-4
pubmed: 11079887
Proc Conf. 2016 Jun;2016:473-482
pubmed: 27885364