Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis.

clinical natural language processing data augmentation information extraction named entity recognition natural language processing neural language modeling sequence tagging

Journal

JMIR medical informatics
ISSN: 2291-9694
Titre abrégé: JMIR Med Inform
Pays: Canada
ID NLM: 101645109

Informations de publication

Date de publication:
30 Apr 2021
Historique:
received: 31 08 2020
accepted: 02 03 2021
revised: 23 12 2020
pubmed: 6 3 2021
medline: 6 3 2021
entrez: 5 3 2021
Statut: epublish

Résumé

The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes. The aim of this study is to develop automated methods that enable access to FH data through natural language processing. We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems. Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (P<.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%. Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.

Sections du résumé

BACKGROUND BACKGROUND
The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes.
OBJECTIVE OBJECTIVE
The aim of this study is to develop automated methods that enable access to FH data through natural language processing.
METHODS METHODS
We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems.
RESULTS RESULTS
Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (P<.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%.
CONCLUSIONS CONCLUSIONS
Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.

Identifiants

pubmed: 33664015
pii: v9i4e24020
doi: 10.2196/24020
pmc: PMC8092929
doi:

Types de publication

Journal Article

Langues

eng

Pagination

e24020

Commentaires et corrections

Type : ErratumIn

Informations de copyright

©Maciej Rybinski, Xiang Dai, Sonit Singh, Sarvnaz Karimi, Anthony Nguyen. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 30.04.2021.

Références

BMC Med Inform Decis Mak. 2019 Dec 27;19(Suppl 10):262
pubmed: 31882003
J Biomed Inform. 2015 Oct;57:28-37
pubmed: 26187250
AMIA Annu Symp Proc. 2006;:925
pubmed: 17238544
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6
pubmed: 21685143
J Biomed Inform. 2014 Feb;47:1-10
pubmed: 24393765
Sci Rep. 2016 May 17;6:26094
pubmed: 27185194
JMIR Med Inform. 2021 Jan 27;9(1):e24008
pubmed: 33502329
AMIA Annu Symp Proc. 2015 Nov 05;2015:2035-42
pubmed: 26958303
Bioinformatics. 2020 Feb 15;36(4):1234-1240
pubmed: 31501885
J Am Med Inform Assoc. 2015 Mar;22(2):426-34
pubmed: 25627278
AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:281-289
pubmed: 29888086
JMIR Med Inform. 2020 Dec 1;8(12):e21750
pubmed: 33258777
AMIA Annu Symp Proc. 2008 Nov 06;:247-51
pubmed: 18999129
J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36
pubmed: 20442139
BMC Med Inform Decis Mak. 2019 Dec 27;19(Suppl 10):277
pubmed: 31881967
BMC Med Inform Decis Mak. 2006 Jul 26;6:30
pubmed: 16872495
J Biomed Inform. 2018 Jan;77:34-49
pubmed: 29162496
Proc AMIA Symp. 2001;:17-21
pubmed: 11825149
BMC Med Inform Decis Mak. 2019 Dec 27;19(Suppl 10):257
pubmed: 31881965
Yearb Med Inform. 2008;:128-44
pubmed: 18660887
Pac Symp Biocomput. 2008;:652-63
pubmed: 18229723
AMIA Annu Symp Proc. 2014 Nov 14;2014:1709-17
pubmed: 25954443

Auteurs

Maciej Rybinski (M)

Commonwealth Scientific and Industrial Research Organisation, Sydney, Australia.

Xiang Dai (X)

Commonwealth Scientific and Industrial Research Organisation, Sydney, Australia.
University of Sydney, Sydney, Australia.

Sonit Singh (S)

Commonwealth Scientific and Industrial Research Organisation, Sydney, Australia.
Macquarie University, Sydney, Australia.

Sarvnaz Karimi (S)

Commonwealth Scientific and Industrial Research Organisation, Sydney, Australia.

Anthony Nguyen (A)

Commonwealth Scientific and Industrial Research Organisation, Brisbane, Australia.

Classifications MeSH