Artificial intelligence versus surgeon gestalt in predicting risk of emergency general surgery.
Journal
The journal of trauma and acute care surgery
ISSN: 2163-0763
Titre abrégé: J Trauma Acute Care Surg
Pays: United States
ID NLM: 101570622
Informations de publication
Date de publication:
01 10 2023
01 10 2023
Historique:
medline:
25
9
2023
pubmed:
14
6
2023
entrez:
14
6
2023
Statut:
ppublish
Résumé
Artificial intelligence (AI) risk prediction algorithms such as the smartphone-available Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) for emergency general surgery (EGS) are superior to traditional risk calculators because they account for complex nonlinear interactions between variables, but how they compare to surgeons' gestalt remains unknown. Herein, we sought to: (1) compare POTTER to surgeons' surgical risk estimation and (2) assess how POTTER influences surgeons' risk estimation. A total of 150 patients who underwent EGS at a large quaternary care center between May 2018 and May 2019 were prospectively followed up for 30-day postoperative outcomes (mortality, septic shock, ventilator dependence, bleeding requiring transfusion, pneumonia), and clinical cases were systematically created representing their initial presentation. POTTER's outcome predictions for each case were also recorded. Thirty acute care surgeons with diverse practice settings and levels of experience were then randomized into two groups: 15 surgeons (SURG) were asked to predict the outcomes without access to POTTER's predictions while the remaining 15 (SURG-POTTER) were asked to predict the same outcomes after interacting with POTTER. Comparing to actual patient outcomes, the area under the curve (AUC) methodology was used to assess the predictive performance of (1) POTTER versus SURG, and (2) SURG versus SURG-POTTER. POTTER outperformed SURG in predicting all outcomes (mortality-AUC: 0.880 vs. 0.841; ventilator dependence-AUC: 0.928 vs. 0.833; bleeding-AUC: 0.832 vs. 0.735; pneumonia-AUC: 0.837 vs. 0.753) except septic shock (AUC: 0.816 vs. 0.820). SURG-POTTER outperformed SURG in predicting mortality (AUC: 0.870 vs. 0.841), bleeding (AUC: 0.811 vs. 0.735), pneumonia (AUC: 0.803 vs. 0.753) but not septic shock (AUC: 0.712 vs. 0.820) or ventilator dependence (AUC: 0.834 vs. 0.833). The AI risk calculator POTTER outperformed surgeons' gestalt in predicting the postoperative mortality and outcomes of EGS patients, and when used, improved the individual surgeons' risk prediction. Artificial intelligence algorithms, such as POTTER, could prove useful as a bedside adjunct to surgeons when preoperatively counseling patients. Prognostic and Epidemiological; Level II.
Sections du résumé
BACKGROUND
Artificial intelligence (AI) risk prediction algorithms such as the smartphone-available Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) for emergency general surgery (EGS) are superior to traditional risk calculators because they account for complex nonlinear interactions between variables, but how they compare to surgeons' gestalt remains unknown. Herein, we sought to: (1) compare POTTER to surgeons' surgical risk estimation and (2) assess how POTTER influences surgeons' risk estimation.
STUDY DESIGN
A total of 150 patients who underwent EGS at a large quaternary care center between May 2018 and May 2019 were prospectively followed up for 30-day postoperative outcomes (mortality, septic shock, ventilator dependence, bleeding requiring transfusion, pneumonia), and clinical cases were systematically created representing their initial presentation. POTTER's outcome predictions for each case were also recorded. Thirty acute care surgeons with diverse practice settings and levels of experience were then randomized into two groups: 15 surgeons (SURG) were asked to predict the outcomes without access to POTTER's predictions while the remaining 15 (SURG-POTTER) were asked to predict the same outcomes after interacting with POTTER. Comparing to actual patient outcomes, the area under the curve (AUC) methodology was used to assess the predictive performance of (1) POTTER versus SURG, and (2) SURG versus SURG-POTTER.
RESULTS
POTTER outperformed SURG in predicting all outcomes (mortality-AUC: 0.880 vs. 0.841; ventilator dependence-AUC: 0.928 vs. 0.833; bleeding-AUC: 0.832 vs. 0.735; pneumonia-AUC: 0.837 vs. 0.753) except septic shock (AUC: 0.816 vs. 0.820). SURG-POTTER outperformed SURG in predicting mortality (AUC: 0.870 vs. 0.841), bleeding (AUC: 0.811 vs. 0.735), pneumonia (AUC: 0.803 vs. 0.753) but not septic shock (AUC: 0.712 vs. 0.820) or ventilator dependence (AUC: 0.834 vs. 0.833).
CONCLUSION
The AI risk calculator POTTER outperformed surgeons' gestalt in predicting the postoperative mortality and outcomes of EGS patients, and when used, improved the individual surgeons' risk prediction. Artificial intelligence algorithms, such as POTTER, could prove useful as a bedside adjunct to surgeons when preoperatively counseling patients.
LEVEL OF EVIDENCE
Prognostic and Epidemiological; Level II.
Identifiants
pubmed: 37314698
doi: 10.1097/TA.0000000000004030
pii: 01586154-202310000-00017
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
565-572Informations de copyright
Copyright © 2023 Wolters Kluwer Health, Inc. All rights reserved.
Références
Dencker EE, Bonde A, Troelsen A, Varadarajan KM, Sillesen M. Postoperative complications: an observational study of trends in the United States from 2012 to 2018. BMC Surg . 2021;21(1):393.
Ingraham AM, Cohen ME, Bilimoria KY, Raval MV, Ko CY, Nathens AB, Hall BL. Comparison of 30-day outcomes after emergency general surgery procedures: potential for targeted improvement. Surgery . 2010;148(2):217–238.
Havens JM, Peetz AB, Do WS, Cooper Z, Kelly E, Askari R, et al. The excess morbidity and mortality of emergency general surgery. J Trauma Acute Care Surg . 2015;78(2):306–311.
Ingraham AM, Cohen ME, Raval MV, Ko CY, Nathens AB. Comparison of Hospital Performance in Emergency Versus Elective General Surgery Operations at 198 Hospitals. J Am Coll Surg . 2010;212(1):20–28.e1.
Eappen S, Lane BH, Rosenberg B, Lipsitz SA, Sadoff D, Matheson D, et al. Relationship between occurrence of surgical complications and hospital finances. JAMA . 2013;309(15):1599–1606.
Bertsimas D, Dunn J, Velmahos GC, Kaafarani HMA. Surgical risk is not linear: derivation and validation of a novel, user-friendly, and machine-learning-based predictive OpTimal trees in emergency surgery risk (POTTER) calculator. Ann Surg . 2018;268(4):574–583.
El Hechi MW, Maurer LR, Levine J, Zhuo D, El Moheb M, Velmahos GC, et al. Validation of the Artificial Intelligence-Based Predictive Optimal Trees in Emergency Surgery Risk (POTTER) Calculator in Emergency General Surgery and Emergency Laparotomy Patients. J Am Coll Surg . 2021;232(6):912–919.e1.
Maurer LR, Chetlur P, Zhuo D, El Hechi M, Velmahos GC, Dunn J, et al. Validation of the AI-based predictive OpTimal trees in emergency surgery risk (POTTER) calculator in patients 65 years and older. Ann Surg . 2020;277:e8–e15.
Tegmark M. Life 3.0: Being Human in the Age of Artificial Intelligence . First ed. Alfred A. Knopf; 2017.
Agrawal A, Gans JS, Goldfarb A. Exploring the impact of artificial intelligence: prediction versus judgment. Inf Econ Policy . 2019;47:1–6.
Loftus TJ, Tighe PJ, Filiberto AC, Efron PA, Brakenridge SC, Mohr AM, et al. Artificial intelligence and surgical decision-making. JAMA Surg . 2019;155(2):148–158.
Teres D, Lemeshow S. Why severity models should be used with caution. Crit Care Clin . 1994;10(1):93–110.
Ivanov J, Borger MA, David TE, Cohen G, Walton N, Naylor CD. Predictive accuracy study: comparing a statistical model to clinicians’ estimates of outcomes after coronary bypass surgery. Ann Thorac Surg . 2000;70(1):162–168.
Kahneman D. Thinking, Fast and Slow . 1st ed. Straus and Giroux: Farrar; 2011.
Korteling JEH, van de Boer-Visschedijk GC, Blankendaal RAM, Boonekamp RC, Eikelboom AR. Human- versus artificial intelligence. Front Artif Intell . 2021;4:622364.
Tversky A, Kahneman D. Judgment under Uncertainty: Heuristics and Biases. Science (1979) . 1974;185(4157):1124–1131.
Bihorac A, Ozrazgat-Baslanti T, Ebadi A, Motaei A, Madkour M, Pardalos PM, et al. MySurgeryRisk: development and validation of a machine-learning risk algorithm for major complications and death after surgery. Ann Surg . 2019;269(4):652–662.
Dilaver NM, Gwilym BL, Preece R, Twine CP, Bosanquet DC. Systematic review and narrative synthesis of surgeons’ perception of postoperative outcomes and risk. BJS Open . 2020;4(1):16–26.
Ferro GM, Sornette D. Stochastic representation decision theory: how probabilities and values are entangled dual characteristics in cognitive processes. PLoS One . 2020;15(12):e0243661–e0243661.
Kingwell S. Predicting Complications After Spinal Surgery: Surgeons’ Aided and Unaided Predictions . University of Ottawa; 2020.
Brennan M, Puri S, Ozrazgat-Baslanti T, Feng Z, Ruppert M, Hashemighouchani H, et al. Comparing clinical judgment with the MySurgeryRisk algorithm for preoperative risk assessment: a pilot usability study. Surgery . 2019;165(5):1035–1045.
Agrawal A, Gans JS, Goldfarb A. Prediction, judgment and complexity: a theory of decision making and artificial intelligence. SSRN Electron J .
Andrew Ng: What AI Can and Can’t Do. Accessed June 13, 2022. https://hbr.org/2016/11/what-artificial-intelligence-can-and-cant-do-right-now
Lawrence DR, Palacios-González C, Harris J. Artificial Intelligence. Camb Q Healthc Ethics . 2016;25(2):250–261.
Cath C, Wachter S, Mittelstadt B, Taddeo M, Floridi L. Artificial intelligence and the ‘good society’: the US, EU, and UK approach. Sci Eng Ethics . 2017;24(2):505–528.
Hamet P, Tremblay J. Artificial intelligence in medicine. Metab Clin Exp . 2017;69:S36–S40. doi:10.1016/j.metabol.2017.01.011.
doi: 10.1016/j.metabol.2017.01.011
Voskens FJ, Abbing JR, Ruys AT, Ruurda JP, Broeders IAMJ. A nationwide survey on the perceptions of general surgeons on artificial intelligence. Artificial Intelligence Surgery . 2022;2(1):8–17.
The big-data revolution in US health care: Accelerating value and innovation | McKinsey. Accessed June 13, 2022. https://www.mckinsey.com/industries/healthcare-systems-and-services/our-insights/the-big-data-revolution-in-us-health-care
Rajpurkar P, Irvin J, Ball RL, Zhu K, Yang B, Mehta H, et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med . 2018;15(11):e1002686.