Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis.

clinical natural language processing data augmentation information extraction named entity recognition natural language processing neural language modeling sequence tagging

Journal

JMIR medical informatics

ISSN: 2291-9694

Titre abrégé: JMIR Med Inform

Pays: Canada

ID NLM: 101645109

Informations de publication

Date de publication:
30 Apr 2021

Historique:

received: 31 08 2020

accepted: 02 03 2021

revised: 23 12 2020

pubmed: 6 3 2021

medline: 6 3 2021

entrez: 5 3 2021

Statut: epublish

Résumé

The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes. The aim of this study is to develop automated methods that enable access to FH data through natural language processing. We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems. Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (P<.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%. Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.

Sections du résumé

BACKGROUND BACKGROUND

OBJECTIVE OBJECTIVE

The aim of this study is to develop automated methods that enable access to FH data through natural language processing.

METHODS METHODS

We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems.

RESULTS RESULTS

Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (P<.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%.

CONCLUSIONS CONCLUSIONS

Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.

Identifiants

DOI: 10.2196/24020 PMID: 33664015 PMC: PMC8092929

pubmed: 33664015

pii: v9i4e24020

doi: 10.2196/24020

pmc: PMC8092929

doi:

Types de publication

Journal Article

Langues

eng

Pagination

e24020

Commentaires et corrections

Type : ErratumIn

Informations de copyright

©Maciej Rybinski, Xiang Dai, Sonit Singh, Sarvnaz Karimi, Anthony Nguyen. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 30.04.2021.

Références

BMC Med Inform Decis Mak. 2019 Dec 27;19(Suppl 10):262

pubmed: 31882003

J Biomed Inform. 2015 Oct;57:28-37

pubmed: 26187250

AMIA Annu Symp Proc. 2006;:925

pubmed: 17238544

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6

pubmed: 21685143

J Biomed Inform. 2014 Feb;47:1-10

pubmed: 24393765

Sci Rep. 2016 May 17;6:26094

pubmed: 27185194

JMIR Med Inform. 2021 Jan 27;9(1):e24008

pubmed: 33502329

AMIA Annu Symp Proc. 2015 Nov 05;2015:2035-42

pubmed: 26958303

Bioinformatics. 2020 Feb 15;36(4):1234-1240

pubmed: 31501885

J Am Med Inform Assoc. 2015 Mar;22(2):426-34

pubmed: 25627278

AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:281-289

pubmed: 29888086

JMIR Med Inform. 2020 Dec 1;8(12):e21750

pubmed: 33258777

AMIA Annu Symp Proc. 2008 Nov 06;:247-51

pubmed: 18999129

J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36

pubmed: 20442139

BMC Med Inform Decis Mak. 2019 Dec 27;19(Suppl 10):277

pubmed: 31881967

BMC Med Inform Decis Mak. 2006 Jul 26;6:30

pubmed: 16872495

J Biomed Inform. 2018 Jan;77:34-49

pubmed: 29162496

Proc AMIA Symp. 2001;:17-21

pubmed: 11825149

BMC Med Inform Decis Mak. 2019 Dec 27;19(Suppl 10):257

pubmed: 31881965

Yearb Med Inform. 2008;:128-44

pubmed: 18660887

Pac Symp Biocomput. 2008;:652-63

pubmed: 18229723

AMIA Annu Symp Proc. 2014 Nov 14;2014:1709-17

pubmed: 25954443

Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Pagination

Commentaires et corrections

Informations de copyright

Références

Auteurs

Maciej Rybinski (M)

Xiang Dai (X)

Sonit Singh (S)

Sarvnaz Karimi (S)

Anthony Nguyen (A)

Classifications MeSH