Early Prediction of Functional Outcomes After Acute Ischemic Stroke Using Unstructured Clinical Text: Retrospective Cohort Study.

MetaMap acute ischemic stroke bag-of-words extreme gradient boosting machine learning natural language processing outcome prediction text classification unstructured clinical text

Journal

JMIR medical informatics
ISSN: 2291-9694
Titre abrégé: JMIR Med Inform
Pays: Canada
ID NLM: 101645109

Informations de publication

Date de publication:
17 Feb 2022
Historique:
received: 28 04 2021
accepted: 02 01 2022
revised: 17 07 2021
entrez: 17 2 2022
pubmed: 18 2 2022
medline: 18 2 2022
Statut: epublish

Résumé

Several prognostic scores have been proposed to predict functional outcomes after an acute ischemic stroke (AIS). Most of these scores are based on structured information and have been used to develop prediction models via the logistic regression method. With the increased use of electronic health records and the progress in computational power, data-driven predictive modeling by using machine learning techniques is gaining popularity in clinical decision-making. We aimed to investigate whether machine learning models created by using unstructured text could improve the prediction of functional outcomes at an early stage after AIS. We identified all consecutive patients who were hospitalized for the first time for AIS from October 2007 to December 2019 by using a hospital stroke registry. The study population was randomly split into a training (n=2885) and test set (n=962). Free text in histories of present illness and computed tomography reports was transformed into input variables via natural language processing. Models were trained by using the extreme gradient boosting technique to predict a poor functional outcome at 90 days poststroke. Model performance on the test set was evaluated by using the area under the receiver operating characteristic curve (AUC). The AUCs of text-only models ranged from 0.768 to 0.807 and were comparable to that of the model using National Institutes of Health Stroke Scale (NIHSS) scores (0.811). Models using both patient age and text achieved AUCs of 0.823 and 0.825, which were similar to those of the model containing age and NIHSS scores (0.841); the model containing preadmission comorbidities, level of consciousness, age, and neurological deficit (PLAN) scores (0.837); and the model containing Acute Stroke Registry and Analysis of Lausanne (ASTRAL) scores (0.840). Adding variables from clinical text improved the predictive performance of the model containing age and NIHSS scores, the model containing PLAN scores, and the model containing ASTRAL scores (the AUC increased from 0.841 to 0.861, from 0.837 to 0.856, and from 0.840 to 0.860, respectively). Unstructured clinical text can be used to improve the performance of existing models for predicting poststroke functional outcomes. However, considering the different terminologies that are used across health systems, each individual health system may consider using the proposed methods to develop and validate its own models.

Sections du résumé

BACKGROUND BACKGROUND
Several prognostic scores have been proposed to predict functional outcomes after an acute ischemic stroke (AIS). Most of these scores are based on structured information and have been used to develop prediction models via the logistic regression method. With the increased use of electronic health records and the progress in computational power, data-driven predictive modeling by using machine learning techniques is gaining popularity in clinical decision-making.
OBJECTIVE OBJECTIVE
We aimed to investigate whether machine learning models created by using unstructured text could improve the prediction of functional outcomes at an early stage after AIS.
METHODS METHODS
We identified all consecutive patients who were hospitalized for the first time for AIS from October 2007 to December 2019 by using a hospital stroke registry. The study population was randomly split into a training (n=2885) and test set (n=962). Free text in histories of present illness and computed tomography reports was transformed into input variables via natural language processing. Models were trained by using the extreme gradient boosting technique to predict a poor functional outcome at 90 days poststroke. Model performance on the test set was evaluated by using the area under the receiver operating characteristic curve (AUC).
RESULTS RESULTS
The AUCs of text-only models ranged from 0.768 to 0.807 and were comparable to that of the model using National Institutes of Health Stroke Scale (NIHSS) scores (0.811). Models using both patient age and text achieved AUCs of 0.823 and 0.825, which were similar to those of the model containing age and NIHSS scores (0.841); the model containing preadmission comorbidities, level of consciousness, age, and neurological deficit (PLAN) scores (0.837); and the model containing Acute Stroke Registry and Analysis of Lausanne (ASTRAL) scores (0.840). Adding variables from clinical text improved the predictive performance of the model containing age and NIHSS scores, the model containing PLAN scores, and the model containing ASTRAL scores (the AUC increased from 0.841 to 0.861, from 0.837 to 0.856, and from 0.840 to 0.860, respectively).
CONCLUSIONS CONCLUSIONS
Unstructured clinical text can be used to improve the performance of existing models for predicting poststroke functional outcomes. However, considering the different terminologies that are used across health systems, each individual health system may consider using the proposed methods to develop and validate its own models.

Identifiants

pubmed: 35175201
pii: v10i2e29806
doi: 10.2196/29806
pmc: PMC8895286
doi:

Types de publication

Journal Article

Langues

eng

Pagination

e29806

Informations de copyright

©Sheng-Feng Sung, Cheng-Yang Hsieh, Ya-Han Hu. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 17.02.2022.

Références

Front Neurol. 2018 Sep 25;9:784
pubmed: 30319525
Neurology. 2012 Jun 12;78(24):1916-22
pubmed: 22649218
N Engl J Med. 2018 Dec 20;379(25):2429-2437
pubmed: 30575491
Lancet. 2020 Jul 11;396(10244):129-142
pubmed: 32653056
PLoS One. 2019 Feb 28;14(2):e0212778
pubmed: 30818342
J Stroke Cerebrovasc Dis. 2019 Jul;28(7):2045-2051
pubmed: 31103549
PLoS One. 2020 Jun 19;15(6):e0234908
pubmed: 32559211
IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2131-2140
pubmed: 30998478
Brief Bioinform. 2008 Sep;9(5):392-403
pubmed: 18562478
Stat Med. 2011 Jan 15;30(1):11-21
pubmed: 21204120
IEEE J Biomed Health Inform. 2020 Oct;24(10):2922-2931
pubmed: 32142458
Circulation. 2010 Sep 14;122(11):1116-23
pubmed: 20805428
Nat Mach Intell. 2020 Jan;2(1):56-67
pubmed: 32607472
Stroke. 2020 Dec;51(12):e351-e354
pubmed: 33106108
AJR Am J Roentgenol. 2019 Jan;212(1):44-51
pubmed: 30354266
Front Neurol. 2020 Aug 25;11:889
pubmed: 32982920
Stat Med. 2008 Jan 30;27(2):157-72; discussion 207-12
pubmed: 17569110
Stroke. 2019 May;50(5):1263-1265
pubmed: 30890116
Arch Intern Med. 2012 Nov 12;172(20):1548-56
pubmed: 23147454
J Clin Epidemiol. 2019 Jun;110:12-22
pubmed: 30763612
PLoS One. 2017 Apr 6;12(4):e0174708
pubmed: 28384212
PLoS One. 2020 May 21;15(5):e0232414
pubmed: 32437368
PLoS One. 2019 May 15;14(5):e0213653
pubmed: 31091238
Front Neurol. 2020 Nov 19;11:539509
pubmed: 33329298
Front Neurol. 2019 Mar 21;10:274
pubmed: 30949127
Stroke. 2004 Jan;35(1):158-62
pubmed: 14684776
J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36
pubmed: 20442139
J Am Med Inform Assoc. 2011 Mar-Apr;18(2):181-6
pubmed: 21233086
J Stroke Cerebrovasc Dis. 2019 Dec;28(12):104441
pubmed: 31627995
Stroke. 2020 May;51(5):1477-1483
pubmed: 32208843
Comput Methods Programs Biomed. 2018 Sep;163:39-46
pubmed: 30119856
Ann Intern Med. 2015 Feb 17;162(4):301-3
pubmed: 25581028
Biometrics. 1988 Sep;44(3):837-45
pubmed: 3203132
Comput Methods Programs Biomed. 2020 Jul;190:105381
pubmed: 32044620
IEEE/ACM Trans Comput Biol Bioinform. 2018 Nov-Dec;15(6):1953-1959
pubmed: 29994736
Diabetes Care. 2006 Feb;29(2):410-4
pubmed: 16443896
Crit Care Med. 2018 Jul;46(7):1125-1132
pubmed: 29629986
Circulation. 2017 Jan 17;135(3):208-219
pubmed: 27799272

Auteurs

Sheng-Feng Sung (SF)

Division of Neurology, Department of Internal Medicine, Ditmanson Medical Foundation Chia-Yi Christian Hospital, Chiayi City, Taiwan.
Department of Nursing, Min-Hwei Junior College of Health Care Management, Tainan, Taiwan.

Cheng-Yang Hsieh (CY)

Department of Neurology, Tainan Sin Lau Hospital, Tainan, Taiwan.

Ya-Han Hu (YH)

Department of Information Management, National Central University, Taoyuan City, Taiwan.

Classifications MeSH