Paraphrasing to improve the performance of Electronic Health Records Question Answering.

Journal

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

ISSN: 2153-4063

Titre abrégé: AMIA Jt Summits Transl Sci Proc

Pays: United States

ID NLM: 101539486

Informations de publication

Date de publication:
2020

Historique:

entrez: 2 6 2020

pubmed: 2 6 2020

medline: 2 6 2020

Statut: epublish

Résumé

This paper describes a paraphrasing approach to improve the performance of question answering (QA) for electronic health records (EHRs). QA systems for structured EHR data usually rely on semantic parsing, which aims to generate machine-understandable logical forms from free-text questions. Training semantic parsers requires large datasets of question-logical form (QL) pairs, which are labor-intensive to create. Considering the scarcity of large QL datasets in the clinical domain, we propose a framework for expanding an existing dataset using paraphrasing. We experiment with different heuristics for multiple sample sizes and iterations to assess the effect of adding paraphrasing to the task of semantic parsing. We found that adding paraphrases to an existing dataset based on TERTHRESHOLD scores results in an improved performance in the majority (74%) of the experimental runs. Hence, the proposed paraphrasing-based framework has the potential to improve the performance of QA systems using a limited set of existing QL annotations.

Identifiants

PMID: 32477685 PMC: PMC7233085

pubmed: 32477685

pmc: PMC7233085

Types de publication

Journal Article

Langues

eng

Pagination

626-635

Subventions

Organisme : NLM NIH HHS

ID : R00 LM012104

Pays : United States

Informations de copyright

Références

J Biomed Inform. 2017 Mar;67:69-79

pubmed: 28088527

LREC Int Conf Lang Resour Eval. 2016 May;2016:3772-3778

pubmed: 28503677

AMIA Annu Symp Proc. 2018 Apr 16;2017:1478-1487

pubmed: 29854217

AMIA Annu Symp Proc. 2020 Mar 04;2019:1207-1215

pubmed: 32308918