Extracting seizure frequency from epilepsy clinic notes: a machine reading approach to natural language processing.

Electronic Health Records Epilepsy Humans Natural Language Processing Retrospective Studies Seizures

electronic medical record epilepsy natural language processing question-answering

Journal

Journal of the American Medical Informatics Association : JAMIA

ISSN: 1527-974X

Titre abrégé: J Am Med Inform Assoc

Pays: England

ID NLM: 9430800

Informations de publication

Date de publication:
13 04 2022

Historique:

received: 26 11 2021

revised: 11 01 2022

accepted: 08 02 2022

pubmed: 23 2 2022

medline: 16 4 2022

entrez: 22 2 2022

Statut: ppublish

Résumé

Seizure frequency and seizure freedom are among the most important outcome measures for patients with epilepsy. In this study, we aimed to automatically extract this clinical information from unstructured text in clinical notes. If successful, this could improve clinical decision-making in epilepsy patients and allow for rapid, large-scale retrospective research. We developed a finetuning pipeline for pretrained neural models to classify patients as being seizure-free and to extract text containing their seizure frequency and date of last seizure from clinical notes. We annotated 1000 notes for use as training and testing data and determined how well 3 pretrained neural models, BERT, RoBERTa, and Bio_ClinicalBERT, could identify and extract the desired information after finetuning. The finetuned models (BERTFT, Bio_ClinicalBERTFT, and RoBERTaFT) achieved near-human performance when classifying patients as seizure free, with BERTFT and Bio_ClinicalBERTFT achieving accuracy scores over 80%. All 3 models also achieved human performance when extracting seizure frequency and date of last seizure, with overall F1 scores over 0.80. The best combination of models was Bio_ClinicalBERTFT for classification, and RoBERTaFT for text extraction. Most of the gains in performance due to finetuning required roughly 70 annotated notes. Our novel machine reading approach to extracting important clinical outcomes performed at or near human performance on several tasks. This approach opens new possibilities to support clinical practice and conduct large-scale retrospective clinical research. Future studies can use our finetuning pipeline with minimal training annotations to answer new clinical questions.

Identifiants

DOI: 10.1093/jamia/ocac018 PMID: 35190834 PMC: PMC9006692

pubmed: 35190834

pii: 6534112

doi: 10.1093/jamia/ocac018

pmc: PMC9006692

doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

873-881

Subventions

Organisme : NINDS NIH HHS

ID : K23 NS121520

Pays : United States

Organisme : NINDS NIH HHS

ID : 1DP1 OD029758

Pays : United States

Organisme : NINDS NIH HHS

ID : DP1 NS122038

Pays : United States

Informations de copyright

Références

Int J Popul Data Sci. 2020 Jan 30;5(1):1123

pubmed: 32935049

Biochem Med (Zagreb). 2012;22(3):276-82

pubmed: 23092060

PLoS One. 2015 Jul 06;10(7):e0131521

pubmed: 26147611

Plast Reconstr Surg. 2010 Dec;126(6):2234-2242

pubmed: 20697313

Clin Res Cardiol. 2017 Jan;106(1):1-9

pubmed: 27557678

J Clin Oncol. 2003 Nov 15;21(22):4081-2

pubmed: 14559890

BMJ Open. 2019 Apr 1;9(4):e023232

pubmed: 30940752

J Am Med Inform Assoc. 2020 Dec 9;27(12):1935-1942

pubmed: 33120431

Annu Rev Public Health. 2016;37:61-81

pubmed: 26667605

J Clin Oncol. 2003 Nov 15;21(22):4145-50

pubmed: 14559889

Extracting seizure frequency from epilepsy clinic notes: a machine reading approach to natural language processing.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Informations de copyright

Références

Auteurs

Articles similaires

Classifications MeSH