Detecting changes in the performance of a clinical machine learning tool over time.
Artificial intelligence
Machine learning
Performance drift
Statistical process control
Validation
Journal
EBioMedicine
ISSN: 2352-3964
Titre abrégé: EBioMedicine
Pays: Netherlands
ID NLM: 101647039
Informations de publication
Date de publication:
Nov 2023
Nov 2023
Historique:
received:
27
05
2023
revised:
21
09
2023
accepted:
21
09
2023
medline:
13
11
2023
pubmed:
4
10
2023
entrez:
4
10
2023
Statut:
ppublish
Résumé
Excessive use of blood cultures (BCs) in Emergency Departments (EDs) results in low yields and high contamination rates, associated with increased antibiotic use and unnecessary diagnostics. Our team previously developed and validated a machine learning model to predict BC outcomes and enhance diagnostic stewardship. While the model showed promising initial results, concerns over performance drift due to evolving patient demographics, clinical practices, and outcome rates warrant continual monitoring and evaluation of such models. A real-time evaluation of the model's performance was conducted between October 2021 and September 2022. The model was integrated into Amsterdam UMC's Electronic Health Record system, predicting BC outcomes for all adult patients with BC draws in real time. The model's performance was assessed monthly using metrics including the Area Under the Curve (AUC), Area Under the Precision-Recall Curve (AUPRC), and Brier scores. Statistical Process Control (SPC) charts were used to monitor variation over time. Across 3.035 unique adult patient visits, the model achieved an average AUC of 0.78, AUPRC of 0.41, and a Brier score of 0.10 for predicting the outcome of BCs drawn in the ED. While specific population characteristics changed over time, no statistical points outside the statistical control range were detected in the AUC, AUPRC, and Brier scores, indicating stable model performance. The average BC positivity rate during the study period was 13.4%. Despite significant changes in clinical practice, our BC stewardship tool exhibited stable performance, suggesting its robustness to changing environments. Using SPC charts for various metrics enables simple and effective monitoring of potential performance drift. The assessment of the variation of outcome rates and population changes may guide the specific interventions, such as intercept correction or recalibration, that may be needed to maintain a stable model performance over time. This study suggested no need to recalibrate or correct our BC stewardship tool. No funding to disclose.
Sections du résumé
BACKGROUND
BACKGROUND
Excessive use of blood cultures (BCs) in Emergency Departments (EDs) results in low yields and high contamination rates, associated with increased antibiotic use and unnecessary diagnostics. Our team previously developed and validated a machine learning model to predict BC outcomes and enhance diagnostic stewardship. While the model showed promising initial results, concerns over performance drift due to evolving patient demographics, clinical practices, and outcome rates warrant continual monitoring and evaluation of such models.
METHODS
METHODS
A real-time evaluation of the model's performance was conducted between October 2021 and September 2022. The model was integrated into Amsterdam UMC's Electronic Health Record system, predicting BC outcomes for all adult patients with BC draws in real time. The model's performance was assessed monthly using metrics including the Area Under the Curve (AUC), Area Under the Precision-Recall Curve (AUPRC), and Brier scores. Statistical Process Control (SPC) charts were used to monitor variation over time.
FINDINGS
RESULTS
Across 3.035 unique adult patient visits, the model achieved an average AUC of 0.78, AUPRC of 0.41, and a Brier score of 0.10 for predicting the outcome of BCs drawn in the ED. While specific population characteristics changed over time, no statistical points outside the statistical control range were detected in the AUC, AUPRC, and Brier scores, indicating stable model performance. The average BC positivity rate during the study period was 13.4%.
INTERPRETATION
CONCLUSIONS
Despite significant changes in clinical practice, our BC stewardship tool exhibited stable performance, suggesting its robustness to changing environments. Using SPC charts for various metrics enables simple and effective monitoring of potential performance drift. The assessment of the variation of outcome rates and population changes may guide the specific interventions, such as intercept correction or recalibration, that may be needed to maintain a stable model performance over time. This study suggested no need to recalibrate or correct our BC stewardship tool.
FUNDING
BACKGROUND
No funding to disclose.
Identifiants
pubmed: 37793210
pii: S2352-3964(23)00389-4
doi: 10.1016/j.ebiom.2023.104823
pmc: PMC10550508
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
104823Informations de copyright
Copyright © 2023 The Author(s). Published by Elsevier B.V. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of interests The authors declare no competing interests regarding this work.