Machine learning clinical prediction models for acute kidney injury: the impact of baseline creatinine on prediction efficacy.

Acute kidney injury Alert fatigue Artificial intelligence Decision Support System, Clinical Health personnel Machine learning

Journal

BMC medical informatics and decision making
ISSN: 1472-6947
Titre abrégé: BMC Med Inform Decis Mak
Pays: England
ID NLM: 101088682

Informations de publication

Date de publication:
09 10 2023
Historique:
received: 24 05 2023
accepted: 22 09 2023
medline: 1 11 2023
pubmed: 10 10 2023
entrez: 9 10 2023
Statut: epublish

Résumé

There are many Machine Learning (ML) models which predict acute kidney injury (AKI) for hospitalised patients. While a primary goal of these models is to support clinical decision-making, the adoption of inconsistent methods of estimating baseline serum creatinine (sCr) may result in a poor understanding of these models' effectiveness in clinical practice. Until now, the performance of such models with different baselines has not been compared on a single dataset. Additionally, AKI prediction models are known to have a high rate of false positive (FP) events regardless of baseline methods. This warrants further exploration of FP events to provide insight into potential underlying reasons. The first aim of this study was to assess the variance in performance of ML models using three methods of baseline sCr on a retrospective dataset. The second aim was to conduct an error analysis to gain insight into the underlying factors contributing to FP events. The Intensive Care Unit (ICU) patients of the Medical Information Mart for Intensive Care (MIMIC)-IV dataset was used with the KDIGO (Kidney Disease Improving Global Outcome) definition to identify AKI episodes. Three different methods of estimating baseline sCr were defined as (1) the minimum sCr, (2) the Modification of Diet in Renal Disease (MDRD) equation and the minimum sCr and (3) the MDRD equation and the mean of preadmission sCr. For the first aim of this study, a suite of ML models was developed for each baseline and the performance of the models was assessed. An analysis of variance was performed to assess the significant difference between eXtreme Gradient Boosting (XGB) models across all baselines. To address the second aim, Explainable AI (XAI) methods were used to analyse the XGB errors with Baseline 3. Regarding the first aim, we observed variances in discriminative metrics and calibration errors of ML models when different baseline methods were adopted. Using Baseline 1 resulted in a 14% reduction in the f1 score for both Baseline 2 and Baseline 3. There was no significant difference observed in the results between Baseline 2 and Baseline 3. For the second aim, the FP cohort was analysed using the XAI methods which led to relabelling data with the mean of sCr in 180 to 0 days pre-ICU as the preferred sCr baseline method. The XGB model using this relabelled data achieved an AUC of 0.85, recall of 0.63, precision of 0.54 and f1 score of 0.58. The cohort size was 31,586 admissions, of which 5,473 (17.32%) had AKI. In the absence of a widely accepted method of baseline sCr, AKI prediction studies need to consider the impact of different baseline methods on the effectiveness of ML models and their potential implications in real-world implementations. The utilisation of XAI methods can be effective in providing insight into the occurrence of prediction errors. This can potentially augment the success rate of ML implementation in routine care.

Sections du résumé

BACKGROUND
There are many Machine Learning (ML) models which predict acute kidney injury (AKI) for hospitalised patients. While a primary goal of these models is to support clinical decision-making, the adoption of inconsistent methods of estimating baseline serum creatinine (sCr) may result in a poor understanding of these models' effectiveness in clinical practice. Until now, the performance of such models with different baselines has not been compared on a single dataset. Additionally, AKI prediction models are known to have a high rate of false positive (FP) events regardless of baseline methods. This warrants further exploration of FP events to provide insight into potential underlying reasons.
OBJECTIVE
The first aim of this study was to assess the variance in performance of ML models using three methods of baseline sCr on a retrospective dataset. The second aim was to conduct an error analysis to gain insight into the underlying factors contributing to FP events.
MATERIALS AND METHODS
The Intensive Care Unit (ICU) patients of the Medical Information Mart for Intensive Care (MIMIC)-IV dataset was used with the KDIGO (Kidney Disease Improving Global Outcome) definition to identify AKI episodes. Three different methods of estimating baseline sCr were defined as (1) the minimum sCr, (2) the Modification of Diet in Renal Disease (MDRD) equation and the minimum sCr and (3) the MDRD equation and the mean of preadmission sCr. For the first aim of this study, a suite of ML models was developed for each baseline and the performance of the models was assessed. An analysis of variance was performed to assess the significant difference between eXtreme Gradient Boosting (XGB) models across all baselines. To address the second aim, Explainable AI (XAI) methods were used to analyse the XGB errors with Baseline 3.
RESULTS
Regarding the first aim, we observed variances in discriminative metrics and calibration errors of ML models when different baseline methods were adopted. Using Baseline 1 resulted in a 14% reduction in the f1 score for both Baseline 2 and Baseline 3. There was no significant difference observed in the results between Baseline 2 and Baseline 3. For the second aim, the FP cohort was analysed using the XAI methods which led to relabelling data with the mean of sCr in 180 to 0 days pre-ICU as the preferred sCr baseline method. The XGB model using this relabelled data achieved an AUC of 0.85, recall of 0.63, precision of 0.54 and f1 score of 0.58. The cohort size was 31,586 admissions, of which 5,473 (17.32%) had AKI.
CONCLUSION
In the absence of a widely accepted method of baseline sCr, AKI prediction studies need to consider the impact of different baseline methods on the effectiveness of ML models and their potential implications in real-world implementations. The utilisation of XAI methods can be effective in providing insight into the occurrence of prediction errors. This can potentially augment the success rate of ML implementation in routine care.

Identifiants

pubmed: 37814311
doi: 10.1186/s12911-023-02306-0
pii: 10.1186/s12911-023-02306-0
pmc: PMC10563357
doi:

Substances chimiques

Creatinine AYI8EX34EU

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

207

Informations de copyright

© 2023. BioMed Central Ltd., part of Springer Nature.

Références

Srisawat N, Hoste EE, Kellum JA. Modern classification of acute kidney injury. Blood Purif. 2010;29(3):300–7.
doi: 10.1159/000280099 pubmed: 20130395
Siew ED, Matheny ME. Choice of reference serum creatinine in defining acute kidney injury. Nephron. 2015;131(2):107–12.
doi: 10.1159/000439144 pubmed: 26332325
Li PKT, Burdmann EA, Mehta RL. Acute kidney injury: global health alert. Arab J Nephrol Transplant. 2013;6(2):75–81.
pubmed: 23795368
Coca SG, Singanamala S, Parikh CR. Chronic kidney disease after acute kidney injury: a systematic review and meta-analysis. Kidney Int. 2012;81(5):442–8.
doi: 10.1038/ki.2011.379 pubmed: 22113526
Coca SG, Yusuf B, Shlipak MG, Garg AX, Parikh CR. Long-term risk of mortality and other adverse outcomes after acute kidney injury: a systematic review and meta-analysis. Am J Kidney Dis. 2009;53(6):961–73.
doi: 10.1053/j.ajkd.2008.11.034 pubmed: 19346042 pmcid: 2726041
See EJ, Jayasinghe K, Glassford N, et al. Long-term risk of adverse outcomes after acute kidney injury: a systematic review and meta-analysis of cohort studies using consensus definitions of exposure. Kidney Int. 2019;95(1):160–72.
doi: 10.1016/j.kint.2018.08.036 pubmed: 30473140
Makris K, Spanou L. Acute kidney injury: definition, pathophysiology and clinical phenotypes. Clin Biochem Rev. 2016;37(2):85.
pubmed: 28303073 pmcid: 5198510
Biswas A, Parikh CR, Feldman HI, et al. Identification of patients expected to benefit from electronic alerts for acute kidney injury. Clin J Am Soc Nephrol. 2018;13(6):842–9. https://doi.org/10.2215/CJN.13351217 .
doi: 10.2215/CJN.13351217 pubmed: 29599299 pmcid: 5989673
Section 2: AKI Definition. Kidney Int Suppl. 2012;2(1):19–36. https://doi.org/10.1038/kisup.2011.32 .
Bellomo R, Ronco C, Kellum JA, Mehta RL, Palevsky P, the Aw. Acute renal failure – definition, outcome measures, animal models, fluid therapy and information technology needs: the Second International Consensus Conference of the Acute Dialysis Quality Initiative (ADQI) Group. Critical Care. 2004;8(4):R204. https://doi.org/10.1186/cc2872 .
Mehta RL, Kellum JA, Shah SV, et al. Acute Kidney Injury Network: report of an initiative to improve outcomes in acute kidney injury. Critical Care. 2007;11(2):R31. https://doi.org/10.1186/cc5713 .
doi: 10.1186/cc5713 pubmed: 17331245 pmcid: 2206446
Kellum JA, Lameire N, Aspelin P, et al. Kidney disease: improving global outcomes (KDIGO) acute kidney injury work group. KDIGO clinical practice guideline for acute kidney injury. Kidney Int Suppl. 2012;2(1):1–138. https://doi.org/10.1038/kisup.2012.1 .
doi: 10.1038/kisup.2012.1
Siew ED, Ikizler TA, Matheny ME, et al. Estimating baseline kidney function in hospitalized patients with impaired kidney function. Clin J Am Soc Nephrol. 2012;7(5):712–9. https://doi.org/10.2215/CJN.10821011 .
doi: 10.2215/CJN.10821011 pubmed: 22422536 pmcid: 3338282
Siew ED, Matheny ME, Ikizler TA, et al. Commonly used surrogates for baseline renal function affect the classification and prognosis of acute kidney injury. Kidney Int. 2010;77(6):536–42.
doi: 10.1038/ki.2009.479 pubmed: 20042998
Bellomo R. Acute Dialysis Quality Initiative workgroup. Acute renal failure-definition, outcome measures, animal models, fluid therapy and information technology needs: the Second International Consensus Conference of the Acute Dialysis Quality Initiative (ADQI) Group. Crit care. 2004;8:R204-R212. https://doi.org/10.1186/cc2872
Pickering JW, Endre ZH. Back-Calculating Baseline Creatinine with MDRD Misclassifies Acute Kidney Injury in the Intensive Care Unit. Clin J Am Soc Nephrol. 2010;5(7):1165–73.
Lee TH, Chen J-J, Cheng C-T, Chang C-H. Does artificial intelligence make clinical decision better? A review of artificial intelligence and machine learning in acute kidney injury prediction. Healthcare (Basel). 2021;9(12):1662. https://doi.org/10.3390/healthcare9121662 .
doi: 10.3390/healthcare9121662 pubmed: 34946388
Vagliano I, Chesnaye NC, Leopold JH, Jager KJ, Abu-Hanna A, Schut MC. Machine learning models for predicting acute kidney injury: a systematic review and critical appraisal. Clin Kidney J. 2022;15(12):2266–80. https://doi.org/10.1093/ckj/sfac181 .
doi: 10.1093/ckj/sfac181 pubmed: 36381375 pmcid: 9664575
Chromik J, Klopfenstein SAI, Pfitzner B, et al. Computational approaches to alleviate alarm fatigue in intensive care medicine: a systematic literature review. Front Digit Health. 2022;4:843747.
doi: 10.3389/fdgth.2022.843747 pubmed: 36052315 pmcid: 9424650
Kesselheim AS, Cresswell K, Phansalkar S, Bates DW, Sheikh A. Clinical Decision Support Systems Could Be Modified To Reduce “Alert Fatigue” While Still Minimizing The Risk Of Litigation. Health Affairs. 2011;30(12):2310–7. https://doi.org/10.1377/hlthaff.2010.1111 .
doi: 10.1377/hlthaff.2010.1111 pubmed: 22147858
Sendelbach S, Funk M. Alarm fatigue: a patient safety concern. AACN Adv Crit Care. 2013;24(4):378–86. https://doi.org/10.4037/NCI.0b013e3182a903f9 .
doi: 10.4037/NCI.0b013e3182a903f9 pubmed: 24153215
Cvach M. Monitor alarm fatigue: an integrative review. Biomed Instrum Technol. 2012;46(4):268–77.
doi: 10.2345/0899-8205-46.4.268 pubmed: 22839984
Parreco JMD, Soe-Lin HMD, Parks JJMD, et al. Comparing machine learning algorithms for predicting acute kidney injury. Am Surg. 2019;85(7):725–9.
doi: 10.1177/000313481908500731 pubmed: 31405416
He J, Hu Y, Zhang X, Wu L, Waitman LR, Liu M. Multi-perspective predictive modeling for acute kidney injury in general hospital populations using electronic medical records. JAMIA Open. 2019;2(1):115–22. https://doi.org/10.1093/jamiaopen/ooy043 .
doi: 10.1093/jamiaopen/ooy043 pubmed: 30976758
Xu Z, Chou J, Zhang XS, et al. Identifying sub-phenotypes of acute kidney injury using structured and unstructured electronic health record data with memory networks. J Biomed Informatics. 2020;102:103361. https://doi.org/10.1016/j.jbi.2019.103361 .
doi: 10.1016/j.jbi.2019.103361
Wang Y, Wei Y, Yang H, Li J, Zhou Y, Wu Q. Utilizing imbalanced electronic health records to predict acute kidney injury by ensemble learning and time series model. BMC Med Inform Decis Mak. 2020;20(1):238. https://doi.org/10.1186/s12911-020-01245-4 .
doi: 10.1186/s12911-020-01245-4 pubmed: 32957977 pmcid: 7507620
Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594. https://doi.org/10.1136/bmj.g7594 .
doi: 10.1136/bmj.g7594 pubmed: 25569120
Johnson A, Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. MIMIC-IV. PhysioNet. https://doi.org/10.13026/s6n6-xd98
Zimmerman LP, Reyfman PA, Smith AD, et al. Early prediction of acute kidney injury following ICU admission using a multivariate panel of physiological measurements. BMC Med Inform Decis Mak. 2019;19(1):1–12.
Thongprayoon C, Cheungpasitporn W, Harrison AM, et al. The comparison of the commonly used surrogates for baseline renal function in acute kidney injury diagnosis and staging. BMC Nephrol. 2016;17(1):6. https://doi.org/10.1186/s12882-016-0220-z .
doi: 10.1186/s12882-016-0220-z pubmed: 26748909 pmcid: 4707008
Shawwa K, Ghosh E, Lanius S, Schwager E, Eshelman L, Kashani KB. Predicting acute kidney injury in critically ill patients using comorbid conditions utilizing machine learning. Clin Kidney J. 2021;14(5):1428–35.
doi: 10.1093/ckj/sfaa145 pubmed: 33959271
Wei C, Zhang L, Feng Y, Ma A, Kang Y. Machine learning model for predicting acute kidney injury progression in critically ill patients. BMC Med Inform Decis Mak. 2022;22(1):17. https://doi.org/10.1186/s12911-021-01740-2 .
doi: 10.1186/s12911-021-01740-2 pubmed: 35045840 pmcid: 8772216
Morid MA, Sheng ORL, Del Fiol G, Facelli JC, Bray BE, Abdelrahman S. Temporal pattern detection to predict adverse events in critical care: Case study with acute kidney injury. JMIR Med Inform. 2020;8(3):e14272.
doi: 10.2196/14272 pubmed: 32181753 pmcid: 7109618
Zhang Z, Ho KM, Hong Y. Machine learning for the prediction of volume responsiveness in patients with oliguric acute kidney injury in critical care. Crit Care (London, England). 2019;23(1):112–112. https://doi.org/10.1186/s13054-019-2411-z .
doi: 10.1186/s13054-019-2411-z
Levey AS, Stevens LA, Schmid CH, et al. A new equation to estimate glomerular filtration rate. Ann Intern Med. 2009;150(9):604–12. https://doi.org/10.7326/0003-4819-150-9-200905050-00006 .
doi: 10.7326/0003-4819-150-9-200905050-00006 pubmed: 19414839 pmcid: 2763564
Sterne JA, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393. https://doi.org/10.1136/bmj.b2393 .
Abdi H, Williams LJ. Tukey’s honestly significant difference (HSD) test. Encyclopedia Res Design. 2010;3(1):1–5.
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. J Machine Learn Res. 2011;12:2825–30.
Van Calster B, McLernon DJ, van Smeden M, et al. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17(1):230. https://doi.org/10.1186/s12916-019-1466-7 .
doi: 10.1186/s12916-019-1466-7 pubmed: 31842878 pmcid: 6912996
Lundberg SM, Erion G, Chen H, et al. From local explanations to global understanding with explainable AI for trees. Nat Machine Intell. 2020;2(1):56–67. https://doi.org/10.1038/s42256-019-0138-9 .
doi: 10.1038/s42256-019-0138-9
Baniecki H, Kretowicz W, Piatyszek P, Wisniewski J, Biecek P. Dalex: responsible machine learning with interactive explainability and fairness in python. J Machine Learn Res. 2021;22(1):9759–65.
Toft EL, Kaae SE, Malmqvist J, Brodersen J. Psychosocial consequences of receiving false-positive colorectal cancer screening results: a qualitative study. Scand J Prim Health Care. 2019;37(2):145–54. https://doi.org/10.1080/02813432.2019.1608040 .
doi: 10.1080/02813432.2019.1608040 pubmed: 31079520 pmcid: 6566584
Ostermann M, Wu V, Sokolov D, Lumlertgul N. Definitions of acute renal dysfunction: An evolving clinical and biomarker paradigm. Curr Opin Crit Care. 2021;27(6):553–9.
doi: 10.1097/MCC.0000000000000886 pubmed: 34535002
Makris K, Spanou L. Acute kidney injury: diagnostic approaches and controversies. Clin Biochem Rev. 2016;37(4):153.
pubmed: 28167845 pmcid: 5242479
Bouchard J. Estimating baseline serum creatinine for assessing acute kidney injury: not a one size fits all approach. Kidney Int Rep. 2021;6(3):562. https://doi.org/10.1016/j.ekir.2021.01.030 .
doi: 10.1016/j.ekir.2021.01.030 pubmed: 33735329 pmcid: 7938179
Beydeda S, Book M, Gruhn V. Model-driven software development. vol 15. Heidelberg: Springer; 2005.
Olson RS, Cava WL, Mustahsan Z, Varik A, Moore JH. Data-driven advice for applying machine learning to bioinformatics problems. World Sci. 2018;23:192–203.

Auteurs

Amir Kamel Rahimi (A)

Queensland Digital Health Centre, Faculty of Medicine, The University of Queensland, Herston, Brisbane, 4006, Australia. amir.kamel@uq.edu.au.
Digital Health Cooperative Research Centre, Australian Government, Sydney, NSW, Australia. amir.kamel@uq.edu.au.

Moji Ghadimi (M)

The School of Mathematics and Physics, The University of Queensland, St Lucia, Brisbane, 4072, Australia.

Anton H van der Vegt (AH)

Queensland Digital Health Centre, Faculty of Medicine, The University of Queensland, Herston, Brisbane, 4006, Australia.

Oliver J Canfell (OJ)

Queensland Digital Health Centre, Faculty of Medicine, The University of Queensland, Herston, Brisbane, 4006, Australia.
Digital Health Cooperative Research Centre, Australian Government, Sydney, NSW, Australia.
UQ Business School, The University of Queensland, St Lucia, Brisbane, 4072, Australia.

Jason D Pole (JD)

Queensland Digital Health Centre, Faculty of Medicine, The University of Queensland, Herston, Brisbane, 4006, Australia.
Dalla Lana School of Public Health, The University of Toronto, Toronto, Canada.
ICES, Toronto, Canada.

Clair Sullivan (C)

Queensland Digital Health Centre, Faculty of Medicine, The University of Queensland, Herston, Brisbane, 4006, Australia.
Metro North Hospital and Health Service, Department of Health, Queensland Government, Herston, Brisbane, 4006, Australia.

Sally Shrapnel (S)

Queensland Digital Health Centre, Faculty of Medicine, The University of Queensland, Herston, Brisbane, 4006, Australia.
The School of Mathematics and Physics, The University of Queensland, St Lucia, Brisbane, 4072, Australia.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH