Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports.

electronic health records natural language processing neuroimaging

Journal

JMIR medical informatics

ISSN: 2291-9694

Titre abrégé: JMIR Med Inform

Pays: Canada

ID NLM: 101645109

Informations de publication

Date de publication:
21 Apr 2019

Historique:

received: 05 09 2018

accepted: 30 03 2019

revised: 26 02 2019

entrez: 9 5 2019

pubmed: 9 5 2019

medline: 9 5 2019

Statut: epublish

Résumé

Silent brain infarction (SBI) is defined as the presence of 1 or more brain lesions, presumed to be because of vascular occlusion, found by neuroimaging (magnetic resonance imaging or computed tomography) in patients without clinical manifestations of stroke. It is more common than stroke and can be detected in 20% of healthy elderly people. Early detection of SBI may mitigate the risk of stroke by offering preventative treatment plans. Natural language processing (NLP) techniques offer an opportunity to systematically identify SBI cases from electronic health records (EHRs) by extracting, normalizing, and classifying SBI-related incidental findings interpreted by radiologists from neuroimaging reports. This study aimed to develop NLP systems to determine individuals with incidentally discovered SBIs from neuroimaging reports at 2 sites: Mayo Clinic and Tufts Medical Center. Both rule-based and machine learning approaches were adopted in developing the NLP system. The rule-based system was implemented using the open source NLP pipeline MedTagger, developed by Mayo Clinic. Features for rule-based systems, including significant words and patterns related to SBI, were generated using pointwise mutual information. The machine learning models adopted convolutional neural network (CNN), random forest, support vector machine, and logistic regression. The performance of the NLP algorithm was compared with a manually created gold standard. The gold standard dataset includes 1000 radiology reports randomly retrieved from the 2 study sites (Mayo and Tufts) corresponding to patients with no prior or current diagnosis of stroke or dementia. 400 out of the 1000 reports were randomly sampled and double read to determine interannotator agreements. The gold standard dataset was equally split to 3 subsets for training, developing, and testing. Among the 400 reports selected to determine interannotator agreement, 5 reports were removed due to invalid scan types. The interannotator agreements across Mayo and Tufts neuroimaging reports were 0.87 and 0.91, respectively. The rule-based system yielded the best performance of predicting SBI with an accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 0.991, 0.925, 1.000, 1.000, and 0.990, respectively. The CNN achieved the best score on predicting white matter disease (WMD) with an accuracy, sensitivity, specificity, PPV, and NPV of 0.994, 0.994, 0.994, 0.994, and 0.994, respectively. We adopted a standardized data abstraction and modeling process to developed NLP techniques (rule-based and machine learning) to detect incidental SBIs and WMDs from annotated neuroimaging reports. Validation statistics suggested a high feasibility of detecting SBIs and WMDs from EHRs using NLP.

Sections du résumé

BACKGROUND BACKGROUND

OBJECTIVE OBJECTIVE

This study aimed to develop NLP systems to determine individuals with incidentally discovered SBIs from neuroimaging reports at 2 sites: Mayo Clinic and Tufts Medical Center.

METHODS METHODS

Both rule-based and machine learning approaches were adopted in developing the NLP system. The rule-based system was implemented using the open source NLP pipeline MedTagger, developed by Mayo Clinic. Features for rule-based systems, including significant words and patterns related to SBI, were generated using pointwise mutual information. The machine learning models adopted convolutional neural network (CNN), random forest, support vector machine, and logistic regression. The performance of the NLP algorithm was compared with a manually created gold standard. The gold standard dataset includes 1000 radiology reports randomly retrieved from the 2 study sites (Mayo and Tufts) corresponding to patients with no prior or current diagnosis of stroke or dementia. 400 out of the 1000 reports were randomly sampled and double read to determine interannotator agreements. The gold standard dataset was equally split to 3 subsets for training, developing, and testing.

RESULTS RESULTS

Among the 400 reports selected to determine interannotator agreement, 5 reports were removed due to invalid scan types. The interannotator agreements across Mayo and Tufts neuroimaging reports were 0.87 and 0.91, respectively. The rule-based system yielded the best performance of predicting SBI with an accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 0.991, 0.925, 1.000, 1.000, and 0.990, respectively. The CNN achieved the best score on predicting white matter disease (WMD) with an accuracy, sensitivity, specificity, PPV, and NPV of 0.994, 0.994, 0.994, 0.994, and 0.994, respectively.

CONCLUSIONS CONCLUSIONS

We adopted a standardized data abstraction and modeling process to developed NLP techniques (rule-based and machine learning) to detect incidental SBIs and WMDs from annotated neuroimaging reports. Validation statistics suggested a high feasibility of detecting SBIs and WMDs from EHRs using NLP.

Identifiants

DOI: 10.2196/12109 PMID: 31066686 PMC: PMC6524454

pubmed: 31066686

pii: v7i2e12109

doi: 10.2196/12109

pmc: PMC6524454

doi:

Types de publication

Journal Article

Langues

eng

Pagination

e12109

Subventions

Organisme : NINDS NIH HHS

ID : R01 NS102233

Pays : United States

Organisme : NINDS NIH HHS

ID : U01 NS086294

Pays : United States

Organisme : NCATS NIH HHS

ID : U01 TR002062

Pays : United States

Informations de copyright

©Sunyang Fu, Lester Y Leung, Yanshan Wang, Anne-Olivia Raulli, David F Kallmes, Kristin A Kinsman, Kristoff B Nelson, Michael S Clark, Patrick H Luetmer, Paul R Kingsbury, David M Kent, Hongfang Liu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 21.04.2019.

Références

J Biomed Inform. 2002 Aug;35(4):236-46

pubmed: 12755518

Stroke. 1992 Oct;23(10):1434-8

pubmed: 1412580

J Neurol Neurosurg Psychiatry. 2005 Jun;76(6):793-6

pubmed: 15897500

Lancet Neurol. 2007 Jul;6(7):611-9

pubmed: 17582361

Stroke. 2008 May;39(5):1414-20

pubmed: 18323505

Psychometrika. 1947 Jun;12(2):153-7

pubmed: 20254758

Bioinformatics. 2010 May 1;26(9):1205-10

pubmed: 20335276

Stroke. 2011 Jan;42(1):227-76

pubmed: 20966421

BMC Med Genomics. 2011 Jan 26;4:13

pubmed: 21269473

Summit Transl Bioinform. 2009 Mar 01;2009:1-32

pubmed: 21347157

J Am Med Inform Assoc. 2011 Jul-Aug;18(4):387-91

pubmed: 21672908

JAMA. 2011 Aug 24;306(8):848-55

pubmed: 21862746

Stroke. 1990 Jun;21(6):890-4

pubmed: 2349592

BMC Med. 2014 Jul 09;12:119

pubmed: 25012298

Ann Neurol. 2014 Dec;76(6):899-904

pubmed: 25283088

Stroke. 2014 Nov;45(11):3461-71

pubmed: 25293663

Stroke. 2015 Apr;46(4):1123-6

pubmed: 25737316

J Biomed Inform. 2018 Jan;77:34-49

pubmed: 29162496

J Biomed Inform. 2018 Nov;87:12-20

pubmed: 30217670

Ann Emerg Med. 1996 Mar;27(3):305-8

pubmed: 8599488

Neural Comput. 1998 Sep 15;10(7):1895-1923

pubmed: 9744903

Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Pagination

Subventions

Informations de copyright

Références

Auteurs

Sunyang Fu (S)

Lester Y Leung (LY)

Yanshan Wang (Y)

Anne-Olivia Raulli (AO)

David F Kallmes (DF)

Kristin A Kinsman (KA)

Kristoff B Nelson (KB)

Michael S Clark (MS)

Patrick H Luetmer (PH)

Paul R Kingsbury (PR)

David M Kent (DM)

Hongfang Liu (H)

Classifications MeSH