Mining fall-related information in clinical notes: Comparison of rule-based and novel word embedding-based machine learning approaches.

Falls Natural language processing Nursing informatics Text mining Word embedding models

Journal

Journal of biomedical informatics
ISSN: 1532-0480
Titre abrégé: J Biomed Inform
Pays: United States
ID NLM: 100970413

Informations de publication

Date de publication:
02 2019
Historique:
received: 04 06 2018
revised: 14 11 2018
accepted: 31 12 2018
pubmed: 15 1 2019
medline: 17 3 2020
entrez: 15 1 2019
Statut: ppublish

Résumé

Natural language processing (NLP) of health-related data is still an expertise demanding, and resource expensive process. We created a novel, open source rapid clinical text mining system called NimbleMiner. NimbleMiner combines several machine learning techniques (word embedding models and positive only labels learning) to facilitate the process in which a human rapidly performs text mining of clinical narratives, while being aided by the machine learning components. This manuscript describes the general system architecture and user Interface and presents results of a case study aimed at classifying fall-related information (including fall history, fall prevention interventions, and fall risk) in homecare visit notes. We extracted a corpus of homecare visit notes (n = 1,149,586) for 89,459 patients from a large US-based homecare agency. We used a gold standard testing dataset of 750 notes annotated by two human reviewers to compare the NimbleMiner's ability to classify documents regarding whether they contain fall-related information with a previously developed rule-based NLP system. NimbleMiner outperformed the rule-based system in almost all domains. The overall F- score was 85.8% compared to 81% by the rule based-system with the best performance for identifying general fall history (F = 89% vs. F = 85.1% rule-based), followed by fall risk (F = 87% vs. F = 78.7% rule-based), fall prevention interventions (F = 88.1% vs. F = 78.2% rule-based) and fall within 2 days of the note date (F = 83.1% vs. F = 80.6% rule-based). The rule-based system achieved slightly better performance for fall within 2 weeks of the note date (F = 81.9% vs. F = 84% rule-based). NimbleMiner outperformed other systems aimed at fall information classification, including our previously developed rule-based approach. These promising results indicate that clinical text mining can be implemented without the need for large labeled datasets necessary for other types of machine learning. This is critical for domains with little NLP developments, like nursing or allied health professions.

Sections du résumé

BACKGROUND
Natural language processing (NLP) of health-related data is still an expertise demanding, and resource expensive process. We created a novel, open source rapid clinical text mining system called NimbleMiner. NimbleMiner combines several machine learning techniques (word embedding models and positive only labels learning) to facilitate the process in which a human rapidly performs text mining of clinical narratives, while being aided by the machine learning components.
OBJECTIVE
This manuscript describes the general system architecture and user Interface and presents results of a case study aimed at classifying fall-related information (including fall history, fall prevention interventions, and fall risk) in homecare visit notes.
METHODS
We extracted a corpus of homecare visit notes (n = 1,149,586) for 89,459 patients from a large US-based homecare agency. We used a gold standard testing dataset of 750 notes annotated by two human reviewers to compare the NimbleMiner's ability to classify documents regarding whether they contain fall-related information with a previously developed rule-based NLP system.
RESULTS
NimbleMiner outperformed the rule-based system in almost all domains. The overall F- score was 85.8% compared to 81% by the rule based-system with the best performance for identifying general fall history (F = 89% vs. F = 85.1% rule-based), followed by fall risk (F = 87% vs. F = 78.7% rule-based), fall prevention interventions (F = 88.1% vs. F = 78.2% rule-based) and fall within 2 days of the note date (F = 83.1% vs. F = 80.6% rule-based). The rule-based system achieved slightly better performance for fall within 2 weeks of the note date (F = 81.9% vs. F = 84% rule-based).
DISCUSSION & CONCLUSIONS
NimbleMiner outperformed other systems aimed at fall information classification, including our previously developed rule-based approach. These promising results indicate that clinical text mining can be implemented without the need for large labeled datasets necessary for other types of machine learning. This is critical for domains with little NLP developments, like nursing or allied health professions.

Identifiants

pubmed: 30639392
pii: S1532-0464(19)30021-8
doi: 10.1016/j.jbi.2019.103103
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

103103

Informations de copyright

Copyright © 2019 Elsevier Inc. All rights reserved.

Auteurs

Maxim Topaz (M)

School of Nursing & Data Science Institute, Columbia University, New York, NY, USA; The Visiting Nurse Service of New York, New York, NY, USA. Electronic address: mt3315@cumc.columbia.edu.

Ludmila Murga (L)

Cheryl Spencer Department of Nursing, University of Haifa, Haifa, Israel.

Katherine M Gaddis (KM)

School of Nursing, University of Pennsylvania, Philadelphia, PA, USA.

Margaret V McDonald (MV)

The Visiting Nurse Service of New York, New York, NY, USA.

Ofrit Bar-Bachar (O)

Cheryl Spencer Department of Nursing, University of Haifa, Haifa, Israel.

Yoav Goldberg (Y)

Department of Computer Science, Bar Ilan University, Tel Aviv, Israel.

Kathryn H Bowles (KH)

The Visiting Nurse Service of New York, New York, NY, USA; School of Nursing, University of Pennsylvania, Philadelphia, PA, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH