Supervised methods to extract clinical events from cardiology reports in Italian.


Journal

Journal of biomedical informatics
ISSN: 1532-0480
Titre abrégé: J Biomed Inform
Pays: United States
ID NLM: 100970413

Informations de publication

Date de publication:
07 2019
Historique:
received: 25 11 2018
revised: 17 05 2019
accepted: 28 05 2019
pubmed: 1 6 2019
medline: 12 9 2020
entrez: 1 6 2019
Statut: ppublish

Résumé

Clinical narratives are a valuable source of information for both patient care and biomedical research. Given the unstructured nature of medical reports, specific automatic techniques are required to extract relevant entities from such texts. In the natural language processing (NLP) community, this task is often addressed by using supervised methods. To develop such methods, both reliably-annotated corpora and elaborately designed features are needed. Despite the recent advances on corpora collection and annotation, research on multiple domains and languages is still limited. In addition, to compute the features required for supervised classification, suitable language- and domain-specific tools are needed. In this work, we propose a novel application of recurrent neural networks (RNNs) for event extraction from medical reports written in Italian. To train and evaluate the proposed approach, we annotated a corpus of 75 cardiology reports for a total of 4365 mentions of relevant events and their attributes (e.g., the polarity). For the annotation task, we developed specific annotation guidelines, which are provided together with this paper. The RNN-based classifier was trained on a training set including 3335 events (60 documents). The resulting model was integrated into an NLP pipeline that uses a dictionary lookup approach to search for relevant concepts inside the text. A test set of 1030 events (15 documents) was used to evaluate and compare different pipeline configurations. As a main result, using the RNN-based classifier instead of the dictionary lookup approach allowed increasing recall from 52.4% to 88.9%, and precision from 81.1% to 88.2%. Further, using the two methods in combination, we obtained final recall, precision, and F1 score of 91.7%, 88.6%, and 90.1%, respectively. These experiments indicate that integrating a well-performing RNN-based classifier with a standard knowledge-based approach can be a good strategy to extract information from clinical text in non-English languages.

Identifiants

pubmed: 31150777
pii: S1532-0464(19)30139-X
doi: 10.1016/j.jbi.2019.103219
pmc: PMC6948016
mid: NIHMS1063850
pii:
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

103219

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM114355
Pays : United States
Organisme : NLM NIH HHS
ID : R01 LM012973
Pays : United States

Informations de copyright

Copyright © 2019 Elsevier Inc. All rights reserved.

Références

J Biomed Inform. 2009 Oct;42(5):839-51
pubmed: 19435614
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):806-13
pubmed: 23564629
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):540-3
pubmed: 21846785
J Biomed Inform. 2013 Jun;46(3):425-35
pubmed: 23410888
J Am Med Inform Assoc. 2015 Jan;22(1):143-54
pubmed: 25147248
Acad Med. 1999 Aug;74(8):890-5
pubmed: 10495728
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276
J Am Med Inform Assoc. 2005 May-Jun;12(3):296-8
pubmed: 15684123
Proc Conf. 2013 Jun;2013:14-19
pubmed: 29082384
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6
pubmed: 21685143
Int J Med Inform. 2018 Mar;111:140-148
pubmed: 29425625
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67
pubmed: 28699566
Trans Assoc Comput Linguist. 2014 Apr;2:143-154
pubmed: 29082229
Yearb Med Inform. 2008;:128-44
pubmed: 18660887
Yearb Med Inform. 2015 Aug 13;10(1):183-93
pubmed: 26293867
AMIA Annu Symp Proc. 2018 Apr 16;2017:1812-1819
pubmed: 29854252
J Biomed Inform. 2016 Oct;63:22-32
pubmed: 27444186
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13
pubmed: 20819853
Proc AMIA Symp. 2000;:270-4
pubmed: 11079887
Proc Conf. 2016 Jun;2016:473-482
pubmed: 27885364

Auteurs

Natalia Viani (N)

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100 Pavia (PV), Italy. Electronic address: natalia.viani01@universitadipavia.it.

Timothy A Miller (TA)

Boston Children's Hospital, 300 Longwood Avenue, Boston, MA 02115, United States; Harvard Medical School, 25 Shattuck St, Boston, MA 02115, United States.

Carlo Napolitano (C)

IRCCS Istituti Clinici Scientifici Maugeri, Via Salvatore Maugeri 10, 27100 Pavia (PV), Italy.

Silvia G Priori (SG)

IRCCS Istituti Clinici Scientifici Maugeri, Via Salvatore Maugeri 10, 27100 Pavia (PV), Italy; Department of Molecular Medicine, University of Pavia, Via Forlanini, 27100 Pavia (PV), Italy.

Guergana K Savova (GK)

Boston Children's Hospital, 300 Longwood Avenue, Boston, MA 02115, United States; Harvard Medical School, 25 Shattuck St, Boston, MA 02115, United States.

Riccardo Bellazzi (R)

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100 Pavia (PV), Italy; IRCCS Istituti Clinici Scientifici Maugeri, Via Salvatore Maugeri 10, 27100 Pavia (PV), Italy.

Lucia Sacchi (L)

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100 Pavia (PV), Italy.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH