Detecting changes in the performance of a clinical machine learning tool over time.

Adult Humans Longitudinal Studies Time Factors Benchmarking Machine Learning Emergency Service, Hospital

Artificial intelligence Machine learning Performance drift Statistical process control Validation

Journal

EBioMedicine

ISSN: 2352-3964

Titre abrégé: EBioMedicine

Pays: Netherlands

ID NLM: 101647039

Informations de publication

Date de publication:
Nov 2023

Historique:

received: 27 05 2023

revised: 21 09 2023

accepted: 21 09 2023

medline: 13 11 2023

pubmed: 4 10 2023

entrez: 4 10 2023

Statut: ppublish

Résumé

Excessive use of blood cultures (BCs) in Emergency Departments (EDs) results in low yields and high contamination rates, associated with increased antibiotic use and unnecessary diagnostics. Our team previously developed and validated a machine learning model to predict BC outcomes and enhance diagnostic stewardship. While the model showed promising initial results, concerns over performance drift due to evolving patient demographics, clinical practices, and outcome rates warrant continual monitoring and evaluation of such models. A real-time evaluation of the model's performance was conducted between October 2021 and September 2022. The model was integrated into Amsterdam UMC's Electronic Health Record system, predicting BC outcomes for all adult patients with BC draws in real time. The model's performance was assessed monthly using metrics including the Area Under the Curve (AUC), Area Under the Precision-Recall Curve (AUPRC), and Brier scores. Statistical Process Control (SPC) charts were used to monitor variation over time. Across 3.035 unique adult patient visits, the model achieved an average AUC of 0.78, AUPRC of 0.41, and a Brier score of 0.10 for predicting the outcome of BCs drawn in the ED. While specific population characteristics changed over time, no statistical points outside the statistical control range were detected in the AUC, AUPRC, and Brier scores, indicating stable model performance. The average BC positivity rate during the study period was 13.4%. Despite significant changes in clinical practice, our BC stewardship tool exhibited stable performance, suggesting its robustness to changing environments. Using SPC charts for various metrics enables simple and effective monitoring of potential performance drift. The assessment of the variation of outcome rates and population changes may guide the specific interventions, such as intercept correction or recalibration, that may be needed to maintain a stable model performance over time. This study suggested no need to recalibrate or correct our BC stewardship tool. No funding to disclose.

Sections du résumé

BACKGROUND BACKGROUND

METHODS METHODS

A real-time evaluation of the model's performance was conducted between October 2021 and September 2022. The model was integrated into Amsterdam UMC's Electronic Health Record system, predicting BC outcomes for all adult patients with BC draws in real time. The model's performance was assessed monthly using metrics including the Area Under the Curve (AUC), Area Under the Precision-Recall Curve (AUPRC), and Brier scores. Statistical Process Control (SPC) charts were used to monitor variation over time.

FINDINGS RESULTS

Across 3.035 unique adult patient visits, the model achieved an average AUC of 0.78, AUPRC of 0.41, and a Brier score of 0.10 for predicting the outcome of BCs drawn in the ED. While specific population characteristics changed over time, no statistical points outside the statistical control range were detected in the AUC, AUPRC, and Brier scores, indicating stable model performance. The average BC positivity rate during the study period was 13.4%.

INTERPRETATION CONCLUSIONS

Despite significant changes in clinical practice, our BC stewardship tool exhibited stable performance, suggesting its robustness to changing environments. Using SPC charts for various metrics enables simple and effective monitoring of potential performance drift. The assessment of the variation of outcome rates and population changes may guide the specific interventions, such as intercept correction or recalibration, that may be needed to maintain a stable model performance over time. This study suggested no need to recalibrate or correct our BC stewardship tool.

FUNDING BACKGROUND

No funding to disclose.

Identifiants

DOI: 10.1016/j.ebiom.2023.104823 PMID: 37793210 PMC: PMC10550508

pubmed: 37793210

pii: S2352-3964(23)00389-4

doi: 10.1016/j.ebiom.2023.104823

pmc: PMC10550508

pii:

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

104823

Informations de copyright

Déclaration de conflit d'intérêts

Declaration of interests The authors declare no competing interests regarding this work.

Detecting changes in the performance of a clinical machine learning tool over time.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Déclaration de conflit d'intérêts

Auteurs

Michiel Schinkel (M)

Anneroos W Boerman (AW)

Ketan Paranjape (K)

W Joost Wiersinga (WJ)

Prabath W B Nanayakkara (PWB)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH