Supervised methods to extract clinical events from cardiology reports in Italian.

Data Mining / methods Electronic Health Records Heart Diseases Humans Italy Natural Language Processing Neural Networks, Computer Semantics

Information extraction Natural language processing Neural networks

Journal

Journal of biomedical informatics

ISSN: 1532-0480

Titre abrégé: J Biomed Inform

Pays: United States

ID NLM: 100970413

Informations de publication

Date de publication:
07 2019

Historique:

received: 25 11 2018

revised: 17 05 2019

accepted: 28 05 2019

pubmed: 1 6 2019

medline: 12 9 2020

entrez: 1 6 2019

Statut: ppublish

Résumé

Clinical narratives are a valuable source of information for both patient care and biomedical research. Given the unstructured nature of medical reports, specific automatic techniques are required to extract relevant entities from such texts. In the natural language processing (NLP) community, this task is often addressed by using supervised methods. To develop such methods, both reliably-annotated corpora and elaborately designed features are needed. Despite the recent advances on corpora collection and annotation, research on multiple domains and languages is still limited. In addition, to compute the features required for supervised classification, suitable language- and domain-specific tools are needed. In this work, we propose a novel application of recurrent neural networks (RNNs) for event extraction from medical reports written in Italian. To train and evaluate the proposed approach, we annotated a corpus of 75 cardiology reports for a total of 4365 mentions of relevant events and their attributes (e.g., the polarity). For the annotation task, we developed specific annotation guidelines, which are provided together with this paper. The RNN-based classifier was trained on a training set including 3335 events (60 documents). The resulting model was integrated into an NLP pipeline that uses a dictionary lookup approach to search for relevant concepts inside the text. A test set of 1030 events (15 documents) was used to evaluate and compare different pipeline configurations. As a main result, using the RNN-based classifier instead of the dictionary lookup approach allowed increasing recall from 52.4% to 88.9%, and precision from 81.1% to 88.2%. Further, using the two methods in combination, we obtained final recall, precision, and F1 score of 91.7%, 88.6%, and 90.1%, respectively. These experiments indicate that integrating a well-performing RNN-based classifier with a standard knowledge-based approach can be a good strategy to extract information from clinical text in non-English languages.

Identifiants

DOI: 10.1016/j.jbi.2019.103219 PMID: 31150777 PMC: PMC6948016

pubmed: 31150777

pii: S1532-0464(19)30139-X

doi: 10.1016/j.jbi.2019.103219

pmc: PMC6948016

mid: NIHMS1063850

pii:

doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

103219

Subventions

Organisme : NIGMS NIH HHS

ID : R01 GM114355

Pays : United States

Organisme : NLM NIH HHS

ID : R01 LM012973

Pays : United States

Informations de copyright

Références

J Biomed Inform. 2009 Oct;42(5):839-51

pubmed: 19435614

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):806-13

pubmed: 23564629

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):540-3

pubmed: 21846785

J Biomed Inform. 2013 Jun;46(3):425-35

pubmed: 23410888

J Am Med Inform Assoc. 2015 Jan;22(1):143-54

pubmed: 25147248

Acad Med. 1999 Aug;74(8):890-5

pubmed: 10495728

Neural Comput. 1997 Nov 15;9(8):1735-80

pubmed: 9377276

J Am Med Inform Assoc. 2005 May-Jun;12(3):296-8

pubmed: 15684123

Proc Conf. 2013 Jun;2013:14-19

pubmed: 29082384

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6

pubmed: 21685143

Int J Med Inform. 2018 Mar;111:140-148

pubmed: 29425625

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67

pubmed: 28699566

Trans Assoc Comput Linguist. 2014 Apr;2:143-154

pubmed: 29082229

Yearb Med Inform. 2008;:128-44

pubmed: 18660887

Yearb Med Inform. 2015 Aug 13;10(1):183-93

pubmed: 26293867

AMIA Annu Symp Proc. 2018 Apr 16;2017:1812-1819

pubmed: 29854252

J Biomed Inform. 2016 Oct;63:22-32

pubmed: 27444186

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13

pubmed: 20819853

Proc AMIA Symp. 2000;:270-4

pubmed: 11079887

Proc Conf. 2016 Jun;2016:473-482

pubmed: 27885364

Supervised methods to extract clinical events from cardiology reports in Italian.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Informations de copyright

Références

Auteurs

Natalia Viani (N)

Timothy A Miller (TA)

Carlo Napolitano (C)

Silvia G Priori (SG)

Guergana K Savova (GK)

Riccardo Bellazzi (R)

Lucia Sacchi (L)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH