Diagnostic accuracy of a large language model in rheumatology: comparison of physician and ChatGPT-4.

Artificial intelligence ChatGPT Diagnostic process Large language models Rheumatology Triage

Journal

Rheumatology international
ISSN: 1437-160X
Titre abrégé: Rheumatol Int
Pays: Germany
ID NLM: 8206885

Informations de publication

Date de publication:
24 Sep 2023
Historique:
received: 21 08 2023
accepted: 07 09 2023
medline: 24 9 2023
pubmed: 24 9 2023
entrez: 24 9 2023
Statut: aheadofprint

Résumé

Pre-clinical studies suggest that large language models (i.e., ChatGPT) could be used in the diagnostic process to distinguish inflammatory rheumatic (IRD) from other diseases. We therefore aimed to assess the diagnostic accuracy of ChatGPT-4 in comparison to rheumatologists. For the analysis, the data set of Gräf et al. (2022) was used. Previous patient assessments were analyzed using ChatGPT-4 and compared to rheumatologists' assessments. ChatGPT-4 listed the correct diagnosis comparable often to rheumatologists as the top diagnosis 35% vs 39% (p = 0.30); as well as among the top 3 diagnoses, 60% vs 55%, (p = 0.38). In IRD-positive cases, ChatGPT-4 provided the top diagnosis in 71% vs 62% in the rheumatologists' analysis. Correct diagnosis was among the top 3 in 86% (ChatGPT-4) vs 74% (rheumatologists). In non-IRD cases, ChatGPT-4 provided the correct top diagnosis in 15% vs 27% in the rheumatologists' analysis. Correct diagnosis was among the top 3 in non-IRD cases in 46% of the ChatGPT-4 group vs 45% in the rheumatologists group. If only the first suggestion for diagnosis was considered, ChatGPT-4 correctly classified 58% of cases as IRD compared to 56% of the rheumatologists (p = 0.52). ChatGPT-4 showed a slightly higher accuracy for the top 3 overall diagnoses compared to rheumatologist's assessment. ChatGPT-4 was able to provide the correct differential diagnosis in a relevant number of cases and achieved better sensitivity to detect IRDs than rheumatologist, at the cost of lower specificity. The pilot results highlight the potential of this new technology as a triage tool for the diagnosis of IRD.

Identifiants

pubmed: 37742280
doi: 10.1007/s00296-023-05464-6
pii: 10.1007/s00296-023-05464-6
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© 2023. The Author(s).

Références

Rheumadocs und Arbeitskreis Junge Rheumatologie (AGJR), Krusche M, Sewerin P, Kleyer A, Mucke J, Vossen D, u. a. Facharztweiterbildung quo vadis? Z Für Rheumatol. Oktober 2019;78(8):692–7.
Miloslavsky EM, Marston B (2022) The challenge of addressing the rheumatology workforce shortage. J Rheumatol Juni 49(6):555–557
doi: 10.3899/jrheum.220300
Fuchs F, Morf H, Mohn J, Mühlensiepen F, Ignatyev Y, Bohr D (2023) Diagnostic delay stages and pre-diagnostic treatment in patients with suspected rheumatic diseases before special care consultation: results of a multicenter-based study. Rheumatol Int März 43(3):495–502
doi: 10.1007/s00296-022-05223-z
Knitza J, Mohn J, Bergmann C, Kampylafka E, Hagen M, Bohr D (2021) Accuracy, patient-perceived usability, and acceptance of two symptom checkers (Ada and Rheport) in rheumatology: interim results from a randomized controlled crossover trial. Arthritis Res Ther 23(1):112
doi: 10.1186/s13075-021-02498-8 pubmed: 33849654 pmcid: 8042673
Gräf M, Knitza J, Leipe J, Krusche M, Welcker M, Kuhn S (2022) Comparison of physician and artificial intelligence-based symptom checker diagnostic accuracy. Rheumatol Int 42(12):2167–2176
doi: 10.1007/s00296-022-05202-4 pubmed: 36087130 pmcid: 9548469
Hügle T (2023) The wide range of opportunities for large language models such as ChatGPT in rheumatology. RMD Open 9(2):e003105
doi: 10.1136/rmdopen-2023-003105 pubmed: 37116985 pmcid: 10151992
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29(8):1930–1940
doi: 10.1038/s41591-023-02448-8 pubmed: 37460753
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2(2):e0000198
doi: 10.1371/journal.pdig.0000198 pubmed: 36812645 pmcid: 9931230
Thirunavukarasu AJ, Hassan R, Mahmood S, Sanghera R, Barzangi K, El Mukashfi M (2023) Trialling a large language model (ChatGPT) in general practice with the applied knowledge test: observational study demonstrating opportunities and limitations in primary care. JMIR Med Educ 9:e46599
doi: 10.2196/46599 pubmed: 37083633 pmcid: 10163403
Verhoeven F, Wendling D, Prati C (2023) ChatGPT: when artificial intelligence replaces the rheumatologist in medical writing. Ann Rheum Dis 82(8):1015–1017
doi: 10.1136/ard-2023-223936 pubmed: 37041067
Ueda D, Mitsuyama Y, Takita H, Horiuchi D, Walston SL, Tatekawa H (2023) ChatGPT’s diagnostic performance from patient history and imaging findings on the diagnosis please quizzes. Radiology 308(1):e231040
doi: 10.1148/radiol.231040 pubmed: 37462501
Kanjee Z, Crowe B, Rodman A (2023) Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA 330(1):78
doi: 10.1001/jama.2023.8288 pubmed: 37318797
Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB (2023) Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 183(6):589–596
doi: 10.1001/jamainternmed.2023.1838 pubmed: 37115527
de Thurah A, Bosch P, Marques A, Meissner Y, Mukhtyar CB, Knitza J (2022) EULAR points to consider for remote care in rheumatic and musculoskeletal diseases. Ann Rheum Dis 81(8):1065–1071
doi: 10.1136/annrheumdis-2022-222341 pubmed: 35470160
Tools such as ChatGPT threaten transparent science; here are our ground rules for their use. Nature. Januar 2023;613(7945):612.

Auteurs

Martin Krusche (M)

Division of Rheumatology and Systemic Inflammatory Diseases, University Hospital Hamburg-Eppendorf (UKE), Hamburg, Germany. m.krusche@uke.de.

Johnna Callhoff (J)

Epidemiology Unit, German Rheumatism Research Centre, Berlin, Germany.
Institute for Social Medicine, Epidemiology and Health Economics, Charité Universitätsmedizin, Berlin, Germany.

Johannes Knitza (J)

Institute of Digital Medicine, University Hospital of Giessen and Marburg, Philipps University Marburg, Marburg, Germany.
Université Grenoble Alpes, AGEIS, Grenoble, France.

Nikolas Ruffer (N)

Division of Rheumatology and Systemic Inflammatory Diseases, University Hospital Hamburg-Eppendorf (UKE), Hamburg, Germany.

Classifications MeSH