Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports.
electronic health records
natural language processing
neuroimaging
Journal
JMIR medical informatics
ISSN: 2291-9694
Titre abrégé: JMIR Med Inform
Pays: Canada
ID NLM: 101645109
Informations de publication
Date de publication:
21 Apr 2019
21 Apr 2019
Historique:
received:
05
09
2018
accepted:
30
03
2019
revised:
26
02
2019
entrez:
9
5
2019
pubmed:
9
5
2019
medline:
9
5
2019
Statut:
epublish
Résumé
Silent brain infarction (SBI) is defined as the presence of 1 or more brain lesions, presumed to be because of vascular occlusion, found by neuroimaging (magnetic resonance imaging or computed tomography) in patients without clinical manifestations of stroke. It is more common than stroke and can be detected in 20% of healthy elderly people. Early detection of SBI may mitigate the risk of stroke by offering preventative treatment plans. Natural language processing (NLP) techniques offer an opportunity to systematically identify SBI cases from electronic health records (EHRs) by extracting, normalizing, and classifying SBI-related incidental findings interpreted by radiologists from neuroimaging reports. This study aimed to develop NLP systems to determine individuals with incidentally discovered SBIs from neuroimaging reports at 2 sites: Mayo Clinic and Tufts Medical Center. Both rule-based and machine learning approaches were adopted in developing the NLP system. The rule-based system was implemented using the open source NLP pipeline MedTagger, developed by Mayo Clinic. Features for rule-based systems, including significant words and patterns related to SBI, were generated using pointwise mutual information. The machine learning models adopted convolutional neural network (CNN), random forest, support vector machine, and logistic regression. The performance of the NLP algorithm was compared with a manually created gold standard. The gold standard dataset includes 1000 radiology reports randomly retrieved from the 2 study sites (Mayo and Tufts) corresponding to patients with no prior or current diagnosis of stroke or dementia. 400 out of the 1000 reports were randomly sampled and double read to determine interannotator agreements. The gold standard dataset was equally split to 3 subsets for training, developing, and testing. Among the 400 reports selected to determine interannotator agreement, 5 reports were removed due to invalid scan types. The interannotator agreements across Mayo and Tufts neuroimaging reports were 0.87 and 0.91, respectively. The rule-based system yielded the best performance of predicting SBI with an accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 0.991, 0.925, 1.000, 1.000, and 0.990, respectively. The CNN achieved the best score on predicting white matter disease (WMD) with an accuracy, sensitivity, specificity, PPV, and NPV of 0.994, 0.994, 0.994, 0.994, and 0.994, respectively. We adopted a standardized data abstraction and modeling process to developed NLP techniques (rule-based and machine learning) to detect incidental SBIs and WMDs from annotated neuroimaging reports. Validation statistics suggested a high feasibility of detecting SBIs and WMDs from EHRs using NLP.
Sections du résumé
BACKGROUND
BACKGROUND
Silent brain infarction (SBI) is defined as the presence of 1 or more brain lesions, presumed to be because of vascular occlusion, found by neuroimaging (magnetic resonance imaging or computed tomography) in patients without clinical manifestations of stroke. It is more common than stroke and can be detected in 20% of healthy elderly people. Early detection of SBI may mitigate the risk of stroke by offering preventative treatment plans. Natural language processing (NLP) techniques offer an opportunity to systematically identify SBI cases from electronic health records (EHRs) by extracting, normalizing, and classifying SBI-related incidental findings interpreted by radiologists from neuroimaging reports.
OBJECTIVE
OBJECTIVE
This study aimed to develop NLP systems to determine individuals with incidentally discovered SBIs from neuroimaging reports at 2 sites: Mayo Clinic and Tufts Medical Center.
METHODS
METHODS
Both rule-based and machine learning approaches were adopted in developing the NLP system. The rule-based system was implemented using the open source NLP pipeline MedTagger, developed by Mayo Clinic. Features for rule-based systems, including significant words and patterns related to SBI, were generated using pointwise mutual information. The machine learning models adopted convolutional neural network (CNN), random forest, support vector machine, and logistic regression. The performance of the NLP algorithm was compared with a manually created gold standard. The gold standard dataset includes 1000 radiology reports randomly retrieved from the 2 study sites (Mayo and Tufts) corresponding to patients with no prior or current diagnosis of stroke or dementia. 400 out of the 1000 reports were randomly sampled and double read to determine interannotator agreements. The gold standard dataset was equally split to 3 subsets for training, developing, and testing.
RESULTS
RESULTS
Among the 400 reports selected to determine interannotator agreement, 5 reports were removed due to invalid scan types. The interannotator agreements across Mayo and Tufts neuroimaging reports were 0.87 and 0.91, respectively. The rule-based system yielded the best performance of predicting SBI with an accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 0.991, 0.925, 1.000, 1.000, and 0.990, respectively. The CNN achieved the best score on predicting white matter disease (WMD) with an accuracy, sensitivity, specificity, PPV, and NPV of 0.994, 0.994, 0.994, 0.994, and 0.994, respectively.
CONCLUSIONS
CONCLUSIONS
We adopted a standardized data abstraction and modeling process to developed NLP techniques (rule-based and machine learning) to detect incidental SBIs and WMDs from annotated neuroimaging reports. Validation statistics suggested a high feasibility of detecting SBIs and WMDs from EHRs using NLP.
Identifiants
pubmed: 31066686
pii: v7i2e12109
doi: 10.2196/12109
pmc: PMC6524454
doi:
Types de publication
Journal Article
Langues
eng
Pagination
e12109Subventions
Organisme : NINDS NIH HHS
ID : R01 NS102233
Pays : United States
Organisme : NINDS NIH HHS
ID : U01 NS086294
Pays : United States
Organisme : NCATS NIH HHS
ID : U01 TR002062
Pays : United States
Informations de copyright
©Sunyang Fu, Lester Y Leung, Yanshan Wang, Anne-Olivia Raulli, David F Kallmes, Kristin A Kinsman, Kristoff B Nelson, Michael S Clark, Patrick H Luetmer, Paul R Kingsbury, David M Kent, Hongfang Liu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 21.04.2019.
Références
J Biomed Inform. 2002 Aug;35(4):236-46
pubmed: 12755518
Stroke. 1992 Oct;23(10):1434-8
pubmed: 1412580
J Neurol Neurosurg Psychiatry. 2005 Jun;76(6):793-6
pubmed: 15897500
Lancet Neurol. 2007 Jul;6(7):611-9
pubmed: 17582361
Stroke. 2008 May;39(5):1414-20
pubmed: 18323505
Psychometrika. 1947 Jun;12(2):153-7
pubmed: 20254758
Bioinformatics. 2010 May 1;26(9):1205-10
pubmed: 20335276
Stroke. 2011 Jan;42(1):227-76
pubmed: 20966421
BMC Med Genomics. 2011 Jan 26;4:13
pubmed: 21269473
Summit Transl Bioinform. 2009 Mar 01;2009:1-32
pubmed: 21347157
J Am Med Inform Assoc. 2011 Jul-Aug;18(4):387-91
pubmed: 21672908
JAMA. 2011 Aug 24;306(8):848-55
pubmed: 21862746
Stroke. 1990 Jun;21(6):890-4
pubmed: 2349592
BMC Med. 2014 Jul 09;12:119
pubmed: 25012298
Ann Neurol. 2014 Dec;76(6):899-904
pubmed: 25283088
Stroke. 2014 Nov;45(11):3461-71
pubmed: 25293663
Stroke. 2015 Apr;46(4):1123-6
pubmed: 25737316
J Biomed Inform. 2018 Jan;77:34-49
pubmed: 29162496
J Biomed Inform. 2018 Nov;87:12-20
pubmed: 30217670
Ann Emerg Med. 1996 Mar;27(3):305-8
pubmed: 8599488
Neural Comput. 1998 Sep 15;10(7):1895-1923
pubmed: 9744903