Diagnostic accuracy of a large language model in rheumatology: comparison of physician and ChatGPT-4.

Artificial intelligence ChatGPT Diagnostic process Large language models Rheumatology Triage

Journal

Rheumatology international

ISSN: 1437-160X

Titre abrégé: Rheumatol Int

Pays: Germany

ID NLM: 8206885

Informations de publication

Date de publication:
24 Sep 2023

Historique:

received: 21 08 2023

accepted: 07 09 2023

medline: 24 9 2023

pubmed: 24 9 2023

entrez: 24 9 2023

Statut: aheadofprint

Résumé

Pre-clinical studies suggest that large language models (i.e., ChatGPT) could be used in the diagnostic process to distinguish inflammatory rheumatic (IRD) from other diseases. We therefore aimed to assess the diagnostic accuracy of ChatGPT-4 in comparison to rheumatologists. For the analysis, the data set of Gräf et al. (2022) was used. Previous patient assessments were analyzed using ChatGPT-4 and compared to rheumatologists' assessments. ChatGPT-4 listed the correct diagnosis comparable often to rheumatologists as the top diagnosis 35% vs 39% (p = 0.30); as well as among the top 3 diagnoses, 60% vs 55%, (p = 0.38). In IRD-positive cases, ChatGPT-4 provided the top diagnosis in 71% vs 62% in the rheumatologists' analysis. Correct diagnosis was among the top 3 in 86% (ChatGPT-4) vs 74% (rheumatologists). In non-IRD cases, ChatGPT-4 provided the correct top diagnosis in 15% vs 27% in the rheumatologists' analysis. Correct diagnosis was among the top 3 in non-IRD cases in 46% of the ChatGPT-4 group vs 45% in the rheumatologists group. If only the first suggestion for diagnosis was considered, ChatGPT-4 correctly classified 58% of cases as IRD compared to 56% of the rheumatologists (p = 0.52). ChatGPT-4 showed a slightly higher accuracy for the top 3 overall diagnoses compared to rheumatologist's assessment. ChatGPT-4 was able to provide the correct differential diagnosis in a relevant number of cases and achieved better sensitivity to detect IRDs than rheumatologist, at the cost of lower specificity. The pilot results highlight the potential of this new technology as a triage tool for the diagnosis of IRD.

Identifiants

DOI: 10.1007/s00296-023-05464-6 PMID: 37742280

pubmed: 37742280

doi: 10.1007/s00296-023-05464-6

pii: 10.1007/s00296-023-05464-6

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

Rheumadocs und Arbeitskreis Junge Rheumatologie (AGJR), Krusche M, Sewerin P, Kleyer A, Mucke J, Vossen D, u. a. Facharztweiterbildung quo vadis? Z Für Rheumatol. Oktober 2019;78(8):692–7.

Miloslavsky EM, Marston B (2022) The challenge of addressing the rheumatology workforce shortage. J Rheumatol Juni 49(6):555–557

doi: 10.3899/jrheum.220300

Fuchs F, Morf H, Mohn J, Mühlensiepen F, Ignatyev Y, Bohr D (2023) Diagnostic delay stages and pre-diagnostic treatment in patients with suspected rheumatic diseases before special care consultation: results of a multicenter-based study. Rheumatol Int März 43(3):495–502

doi: 10.1007/s00296-022-05223-z

Knitza J, Mohn J, Bergmann C, Kampylafka E, Hagen M, Bohr D (2021) Accuracy, patient-perceived usability, and acceptance of two symptom checkers (Ada and Rheport) in rheumatology: interim results from a randomized controlled crossover trial. Arthritis Res Ther 23(1):112

doi: 10.1186/s13075-021-02498-8 pubmed: 33849654 pmcid: 8042673

Gräf M, Knitza J, Leipe J, Krusche M, Welcker M, Kuhn S (2022) Comparison of physician and artificial intelligence-based symptom checker diagnostic accuracy. Rheumatol Int 42(12):2167–2176

doi: 10.1007/s00296-022-05202-4 pubmed: 36087130 pmcid: 9548469

Hügle T (2023) The wide range of opportunities for large language models such as ChatGPT in rheumatology. RMD Open 9(2):e003105

doi: 10.1136/rmdopen-2023-003105 pubmed: 37116985 pmcid: 10151992

Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29(8):1930–1940

doi: 10.1038/s41591-023-02448-8 pubmed: 37460753

Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2(2):e0000198

doi: 10.1371/journal.pdig.0000198 pubmed: 36812645 pmcid: 9931230

Thirunavukarasu AJ, Hassan R, Mahmood S, Sanghera R, Barzangi K, El Mukashfi M (2023) Trialling a large language model (ChatGPT) in general practice with the applied knowledge test: observational study demonstrating opportunities and limitations in primary care. JMIR Med Educ 9:e46599

doi: 10.2196/46599 pubmed: 37083633 pmcid: 10163403

Verhoeven F, Wendling D, Prati C (2023) ChatGPT: when artificial intelligence replaces the rheumatologist in medical writing. Ann Rheum Dis 82(8):1015–1017

doi: 10.1136/ard-2023-223936 pubmed: 37041067

Ueda D, Mitsuyama Y, Takita H, Horiuchi D, Walston SL, Tatekawa H (2023) ChatGPT’s diagnostic performance from patient history and imaging findings on the diagnosis please quizzes. Radiology 308(1):e231040

doi: 10.1148/radiol.231040 pubmed: 37462501

Kanjee Z, Crowe B, Rodman A (2023) Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA 330(1):78

doi: 10.1001/jama.2023.8288 pubmed: 37318797

Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB (2023) Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 183(6):589–596

doi: 10.1001/jamainternmed.2023.1838 pubmed: 37115527

de Thurah A, Bosch P, Marques A, Meissner Y, Mukhtyar CB, Knitza J (2022) EULAR points to consider for remote care in rheumatic and musculoskeletal diseases. Ann Rheum Dis 81(8):1065–1071

doi: 10.1136/annrheumdis-2022-222341 pubmed: 35470160

Tools such as ChatGPT threaten transparent science; here are our ground rules for their use. Nature. Januar 2023;613(7945):612.

Diagnostic accuracy of a large language model in rheumatology: comparison of physician and ChatGPT-4.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Martin Krusche (M)

Johnna Callhoff (J)

Johannes Knitza (J)

Nikolas Ruffer (N)

Classifications MeSH