Use of Noisy Labels as Weak Learners to Identify Incompletely Ascertainable Outcomes: A Feasibility Study with Opioid-Induced Respiratory Depression.


Journal

medRxiv : the preprint server for health sciences
Titre abrégé: medRxiv
Pays: United States
ID NLM: 101767986

Informations de publication

Date de publication:
30 Jan 2024
Historique:
pubmed: 14 2 2024
medline: 14 2 2024
entrez: 14 2 2024
Statut: epublish

Résumé

Assigning outcome labels to large observational data sets in a timely and accurate manner, particularly when outcomes are rare or not directly ascertainable, remains a significant challenge within biomedical informatics. We examined whether noisy labels generated from subject matter experts' heuristics using heterogenous data types within a data programming paradigm could provide outcomes labels to a large, observational data set. We chose the clinical condition of opioid-induced respiratory depression for our use case because it is rare, has no administrative codes to easily identify the condition, and typically requires at least some unstructured text to ascertain its presence. Using de-identified electronic health records of 52,861 post-operative encounters, we applied a data programming paradigm (implemented in the Snorkel software) for the development of a machine learning classifier for opioid-induced respiratory depression. Our approach included subject matter experts creating 14 labeling functions that served as noisy labels for developing a probabilistic Generative model. We used probabilistic labels from the Generative model as outcome labels for training a Discriminative model on the source data. We evaluated performance of the Discriminative model with a hold-out test set of 599 independently-reviewed patient records. The final Discriminative classification model achieved an accuracy of 0.977, an F1 score of 0.417, a sensitivity of 1.0, and an AUC of 0.988 in the hold-out test set with a prevalence of 0.83% (5/599). All of the confirmed Cases were identified by the classifier. For rare outcomes, this finding is encouraging because it reduces the number of manual reviews needed by excluding visits/patients with low probabilities. Application of a data programming paradigm with expert-informed labeling functions might have utility for phenotyping clinical phenomena that are not easily ascertainable from highly-structured data.

Identifiants

pubmed: 38352435
doi: 10.1101/2024.01.29.24301963
pmc: PMC10863026
pii:
doi:

Types de publication

Preprint

Langues

eng

Subventions

Organisme : AHRQ HHS
ID : K12 HS026395
Pays : United States
Organisme : NIH HHS
ID : S10 OD023680
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR000445
Pays : United States

Commentaires et corrections

Type : UpdateIn

Auteurs

Alvin D Jeffery (AD)

School of Nursing, Vanderbilt University, Department of Biomedical Informatics, Vanderbilt University Medical Center, Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA.

Daniel Fabbri (D)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.

Ruth M Reeves (RM)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA.

Michael E Matheny (ME)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA.

Classifications MeSH