Can ChatGPT make surgical decisions with confidence similar to experienced knee surgeons?

Artificial intelligence Decision making Knee arthroplasty Natural language processing

Journal

The Knee
ISSN: 1873-5800
Titre abrégé: Knee
Pays: Netherlands
ID NLM: 9430798

Informations de publication

Date de publication:
09 Sep 2024
Historique:
received: 12 03 2024
revised: 04 08 2024
accepted: 15 08 2024
medline: 10 9 2024
pubmed: 10 9 2024
entrez: 10 9 2024
Statut: aheadofprint

Résumé

Unicompartmental knee replacements (UKRs) have become an increasingly attractive option for end-stage single-compartment knee osteoarthritis (OA). However, there remains controversy in patient selection. Natural language processing (NLP) is a form of artificial intelligence (AI). We aimed to determine whether general-purpose open-source natural language programs can make decisions regarding a patient's suitability for a total knee replacement (TKR) or a UKR and how confident AI NLP programs are in surgical decision making. We conducted a case-based cohort study using data from a separate study, where participants (73 surgeons and AI NLP programs) were presented with 32 fictitious clinical case scenarios that simulated patients with predominantly medial knee OA who would require surgery. Using the overall UKR/TKR judgments of the 73 experienced knee surgeons as the gold standard reference, we calculated the sensitivity, specificity, and positive predictive value of AI NLP programs to identify whether a patient should undergo UKR. There was disagreement between the surgeons and ChatGPT in only five scenarios (15.6%). With the 73 surgeons' decision as the gold standard, the sensitivity of ChatGPT in determining whether a patient should undergo UKR was 0.91 (95% confidence interval (CI): 0.71 to 0.98). The positive predictive value for ChatGPT was 0.87 (95% CI: 0.72 to 0.94). ChatGPT was more confident in its UKR decision making (surgeon mean confidence = 1.7, ChatGPT mean confidence = 2.4). It has been demonstrated that ChatGPT can make surgical decisions, and exceeded the confidence of experienced knee surgeons with substantial inter-rater agreement when deciding whether a patient was most appropriate for a UKR.

Sections du résumé

BACKGROUND BACKGROUND
Unicompartmental knee replacements (UKRs) have become an increasingly attractive option for end-stage single-compartment knee osteoarthritis (OA). However, there remains controversy in patient selection. Natural language processing (NLP) is a form of artificial intelligence (AI). We aimed to determine whether general-purpose open-source natural language programs can make decisions regarding a patient's suitability for a total knee replacement (TKR) or a UKR and how confident AI NLP programs are in surgical decision making.
METHODS METHODS
We conducted a case-based cohort study using data from a separate study, where participants (73 surgeons and AI NLP programs) were presented with 32 fictitious clinical case scenarios that simulated patients with predominantly medial knee OA who would require surgery. Using the overall UKR/TKR judgments of the 73 experienced knee surgeons as the gold standard reference, we calculated the sensitivity, specificity, and positive predictive value of AI NLP programs to identify whether a patient should undergo UKR.
RESULTS RESULTS
There was disagreement between the surgeons and ChatGPT in only five scenarios (15.6%). With the 73 surgeons' decision as the gold standard, the sensitivity of ChatGPT in determining whether a patient should undergo UKR was 0.91 (95% confidence interval (CI): 0.71 to 0.98). The positive predictive value for ChatGPT was 0.87 (95% CI: 0.72 to 0.94). ChatGPT was more confident in its UKR decision making (surgeon mean confidence = 1.7, ChatGPT mean confidence = 2.4).
CONCLUSIONS CONCLUSIONS
It has been demonstrated that ChatGPT can make surgical decisions, and exceeded the confidence of experienced knee surgeons with substantial inter-rater agreement when deciding whether a patient was most appropriate for a UKR.

Identifiants

pubmed: 39255525
pii: S0968-0160(24)00149-2
doi: 10.1016/j.knee.2024.08.015
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

120-129

Informations de copyright

Copyright © 2024 IMPERIAL COLLEGE LONDON. Published by Elsevier B.V. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Omar Musbahi (O)

MSk Lab, Sir Michael Uren Hub, Imperial College London, London, UK. Electronic address: o.musbahi19@imperial.ac.uk.

Martine Nurek (M)

Department of Surgery and Cancer, Imperial College London, London, UK.

Kyriacos Pouris (K)

MSk Lab, Sir Michael Uren Hub, Imperial College London, London, UK.

Martinique Vella-Baldacchino (M)

MSk Lab, Sir Michael Uren Hub, Imperial College London, London, UK.

Alex Bottle (A)

School of Public Health, Imperial College London, London, UK.

Caroline Hing (C)

St George's University Hospitals NHS Foundation Trust, London, UK.

Olga Kostopoulou (O)

Department of Surgery and Cancer, Imperial College London, London, UK; Institute of Global Health Innovation, Imperial College London, London, UK.

Justin P Cobb (JP)

MSk Lab, Sir Michael Uren Hub, Imperial College London, London, UK.

Gareth G Jones (GG)

MSk Lab, Sir Michael Uren Hub, Imperial College London, London, UK.

Classifications MeSH