Diagnostic accuracy of a large language model in rheumatology: comparison of physician and ChatGPT-4.
Artificial intelligence
ChatGPT
Diagnostic process
Large language models
Rheumatology
Triage
Journal
Rheumatology international
ISSN: 1437-160X
Titre abrégé: Rheumatol Int
Pays: Germany
ID NLM: 8206885
Informations de publication
Date de publication:
24 Sep 2023
24 Sep 2023
Historique:
received:
21
08
2023
accepted:
07
09
2023
medline:
24
9
2023
pubmed:
24
9
2023
entrez:
24
9
2023
Statut:
aheadofprint
Résumé
Pre-clinical studies suggest that large language models (i.e., ChatGPT) could be used in the diagnostic process to distinguish inflammatory rheumatic (IRD) from other diseases. We therefore aimed to assess the diagnostic accuracy of ChatGPT-4 in comparison to rheumatologists. For the analysis, the data set of Gräf et al. (2022) was used. Previous patient assessments were analyzed using ChatGPT-4 and compared to rheumatologists' assessments. ChatGPT-4 listed the correct diagnosis comparable often to rheumatologists as the top diagnosis 35% vs 39% (p = 0.30); as well as among the top 3 diagnoses, 60% vs 55%, (p = 0.38). In IRD-positive cases, ChatGPT-4 provided the top diagnosis in 71% vs 62% in the rheumatologists' analysis. Correct diagnosis was among the top 3 in 86% (ChatGPT-4) vs 74% (rheumatologists). In non-IRD cases, ChatGPT-4 provided the correct top diagnosis in 15% vs 27% in the rheumatologists' analysis. Correct diagnosis was among the top 3 in non-IRD cases in 46% of the ChatGPT-4 group vs 45% in the rheumatologists group. If only the first suggestion for diagnosis was considered, ChatGPT-4 correctly classified 58% of cases as IRD compared to 56% of the rheumatologists (p = 0.52). ChatGPT-4 showed a slightly higher accuracy for the top 3 overall diagnoses compared to rheumatologist's assessment. ChatGPT-4 was able to provide the correct differential diagnosis in a relevant number of cases and achieved better sensitivity to detect IRDs than rheumatologist, at the cost of lower specificity. The pilot results highlight the potential of this new technology as a triage tool for the diagnosis of IRD.
Identifiants
pubmed: 37742280
doi: 10.1007/s00296-023-05464-6
pii: 10.1007/s00296-023-05464-6
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© 2023. The Author(s).
Références
Rheumadocs und Arbeitskreis Junge Rheumatologie (AGJR), Krusche M, Sewerin P, Kleyer A, Mucke J, Vossen D, u. a. Facharztweiterbildung quo vadis? Z Für Rheumatol. Oktober 2019;78(8):692–7.
Miloslavsky EM, Marston B (2022) The challenge of addressing the rheumatology workforce shortage. J Rheumatol Juni 49(6):555–557
doi: 10.3899/jrheum.220300
Fuchs F, Morf H, Mohn J, Mühlensiepen F, Ignatyev Y, Bohr D (2023) Diagnostic delay stages and pre-diagnostic treatment in patients with suspected rheumatic diseases before special care consultation: results of a multicenter-based study. Rheumatol Int März 43(3):495–502
doi: 10.1007/s00296-022-05223-z
Knitza J, Mohn J, Bergmann C, Kampylafka E, Hagen M, Bohr D (2021) Accuracy, patient-perceived usability, and acceptance of two symptom checkers (Ada and Rheport) in rheumatology: interim results from a randomized controlled crossover trial. Arthritis Res Ther 23(1):112
doi: 10.1186/s13075-021-02498-8
pubmed: 33849654
pmcid: 8042673
Gräf M, Knitza J, Leipe J, Krusche M, Welcker M, Kuhn S (2022) Comparison of physician and artificial intelligence-based symptom checker diagnostic accuracy. Rheumatol Int 42(12):2167–2176
doi: 10.1007/s00296-022-05202-4
pubmed: 36087130
pmcid: 9548469
Hügle T (2023) The wide range of opportunities for large language models such as ChatGPT in rheumatology. RMD Open 9(2):e003105
doi: 10.1136/rmdopen-2023-003105
pubmed: 37116985
pmcid: 10151992
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29(8):1930–1940
doi: 10.1038/s41591-023-02448-8
pubmed: 37460753
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2(2):e0000198
doi: 10.1371/journal.pdig.0000198
pubmed: 36812645
pmcid: 9931230
Thirunavukarasu AJ, Hassan R, Mahmood S, Sanghera R, Barzangi K, El Mukashfi M (2023) Trialling a large language model (ChatGPT) in general practice with the applied knowledge test: observational study demonstrating opportunities and limitations in primary care. JMIR Med Educ 9:e46599
doi: 10.2196/46599
pubmed: 37083633
pmcid: 10163403
Verhoeven F, Wendling D, Prati C (2023) ChatGPT: when artificial intelligence replaces the rheumatologist in medical writing. Ann Rheum Dis 82(8):1015–1017
doi: 10.1136/ard-2023-223936
pubmed: 37041067
Ueda D, Mitsuyama Y, Takita H, Horiuchi D, Walston SL, Tatekawa H (2023) ChatGPT’s diagnostic performance from patient history and imaging findings on the diagnosis please quizzes. Radiology 308(1):e231040
doi: 10.1148/radiol.231040
pubmed: 37462501
Kanjee Z, Crowe B, Rodman A (2023) Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA 330(1):78
doi: 10.1001/jama.2023.8288
pubmed: 37318797
Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB (2023) Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 183(6):589–596
doi: 10.1001/jamainternmed.2023.1838
pubmed: 37115527
de Thurah A, Bosch P, Marques A, Meissner Y, Mukhtyar CB, Knitza J (2022) EULAR points to consider for remote care in rheumatic and musculoskeletal diseases. Ann Rheum Dis 81(8):1065–1071
doi: 10.1136/annrheumdis-2022-222341
pubmed: 35470160
Tools such as ChatGPT threaten transparent science; here are our ground rules for their use. Nature. Januar 2023;613(7945):612.