Evaluation of ChatGPT-Generated Differential Diagnosis for Common Diseases With Atypical Presentation: Descriptive Research.

ChatGPT atypical presentation common disease diagnosis diagnostic accuracy patient safety

Journal

JMIR medical education
ISSN: 2369-3762
Titre abrégé: JMIR Med Educ
Pays: Canada
ID NLM: 101684518

Informations de publication

Date de publication:
21 Jun 2024
Historique:
received: 27 03 2024
revised: 03 05 2024
accepted: 19 05 2024
medline: 25 6 2024
pubmed: 25 6 2024
entrez: 25 6 2024
Statut: epublish

Résumé

The persistence of diagnostic errors, despite advances in medical knowledge and diagnostics, highlights the importance of understanding atypical disease presentations and their contribution to mortality and morbidity. Artificial intelligence (AI), particularly generative pre-trained transformers like GPT-4, holds promise for improving diagnostic accuracy, but requires further exploration in handling atypical presentations. This study aimed to assess the diagnostic accuracy of ChatGPT in generating differential diagnoses for atypical presentations of common diseases, with a focus on the model's reliance on patient history during the diagnostic process. We used 25 clinical vignettes from the Journal of Generalist Medicine characterizing atypical manifestations of common diseases. Two general medicine physicians categorized the cases based on atypicality. ChatGPT was then used to generate differential diagnoses based on the clinical information provided. The concordance between AI-generated and final diagnoses was measured, with a focus on the top-ranked disease (top 1) and the top 5 differential diagnoses (top 5). ChatGPT's diagnostic accuracy decreased with an increase in atypical presentation. For category 1 (C1) cases, the concordance rates were 17% (n=1) for the top 1 and 67% (n=4) for the top 5. Categories 3 (C3) and 4 (C4) showed a 0% concordance for top 1 and markedly lower rates for the top 5, indicating difficulties in handling highly atypical cases. The χ2 test revealed no significant difference in the top 1 differential diagnosis accuracy between less atypical (C1+C2) and more atypical (C3+C4) groups (χ²1=2.07; n=25; P=.13). However, a significant difference was found in the top 5 analyses, with less atypical cases showing higher accuracy (χ²1=4.01; n=25; P=.048). ChatGPT-4 demonstrates potential as an auxiliary tool for diagnosing typical and mildly atypical presentations of common diseases. However, its performance declines with greater atypicality. The study findings underscore the need for AI systems to encompass a broader range of linguistic capabilities, cultural understanding, and diverse clinical scenarios to improve diagnostic utility in real-world settings.

Sections du résumé

Background UNASSIGNED
The persistence of diagnostic errors, despite advances in medical knowledge and diagnostics, highlights the importance of understanding atypical disease presentations and their contribution to mortality and morbidity. Artificial intelligence (AI), particularly generative pre-trained transformers like GPT-4, holds promise for improving diagnostic accuracy, but requires further exploration in handling atypical presentations.
Objective UNASSIGNED
This study aimed to assess the diagnostic accuracy of ChatGPT in generating differential diagnoses for atypical presentations of common diseases, with a focus on the model's reliance on patient history during the diagnostic process.
Methods UNASSIGNED
We used 25 clinical vignettes from the Journal of Generalist Medicine characterizing atypical manifestations of common diseases. Two general medicine physicians categorized the cases based on atypicality. ChatGPT was then used to generate differential diagnoses based on the clinical information provided. The concordance between AI-generated and final diagnoses was measured, with a focus on the top-ranked disease (top 1) and the top 5 differential diagnoses (top 5).
Results UNASSIGNED
ChatGPT's diagnostic accuracy decreased with an increase in atypical presentation. For category 1 (C1) cases, the concordance rates were 17% (n=1) for the top 1 and 67% (n=4) for the top 5. Categories 3 (C3) and 4 (C4) showed a 0% concordance for top 1 and markedly lower rates for the top 5, indicating difficulties in handling highly atypical cases. The χ2 test revealed no significant difference in the top 1 differential diagnosis accuracy between less atypical (C1+C2) and more atypical (C3+C4) groups (χ²1=2.07; n=25; P=.13). However, a significant difference was found in the top 5 analyses, with less atypical cases showing higher accuracy (χ²1=4.01; n=25; P=.048).
Conclusions UNASSIGNED
ChatGPT-4 demonstrates potential as an auxiliary tool for diagnosing typical and mildly atypical presentations of common diseases. However, its performance declines with greater atypicality. The study findings underscore the need for AI systems to encompass a broader range of linguistic capabilities, cultural understanding, and diverse clinical scenarios to improve diagnostic utility in real-world settings.

Identifiants

pubmed: 38915174
pii: v10i1e58758
doi: 10.2196/58758
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

e58758

Informations de copyright

© Kiyoshi Shikino, Taro Shimizu, Yuki Otsuka, Masaki Tago, Hiromizu Takahashi, Takashi Watari, Yosuke Sasaki, Gemmei Iizuka, Hiroki Tamura, Koichi Nakashima, Kotaro Kunitomo, Morika Suzuki, Sayaka Aoyama, Shintaro Kosaka, Teiko Kawahigashi, Tomohiro Matsumoto, Fumina Orihara, Toru Morikawa, Toshinori Nishizawa, Yoji Hoshina, Yu Yamamoto, Yuichiro Matsuo, Yuto Unoki, Hirofumi Kimura, Midori Tokushima, Satoshi Watanuki, Takuma Saito, Fumio Otsuka, Yasuharu Tokuda. Originally published in JMIR Medical Education (https://mededu.jmir.org).

Auteurs

Kiyoshi Shikino (K)

Department of General Medicine, Chiba University Hospital, Chiba, Japan.
Department of Community-Oriented Medical Education, Chiba University Graduate School of Medicine, Chiba, Japan.

Taro Shimizu (T)

Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi, Japan.

Yuki Otsuka (Y)

Department of General Medicine, Dentistry and Pharmaceutical Sciences, Okayama University Graduate School of Medicine, Okayama, Japan.

Masaki Tago (M)

Department of General Medicine, Saga University Hospital, Saga, Japan.

Hiromizu Takahashi (H)

Department of General Medicine, Juntendo University Hospital Faculty of Medicine, Tokyo, Japan.

Takashi Watari (T)

Integrated Clinical Education Center Hospital Integrated Clinical Education, Kyoto University Hospital, Kyoto, Japan.

Yosuke Sasaki (Y)

Department of General Medicine and Emergency Care, Toho University School of Medicine, Tokyo, Japan.

Gemmei Iizuka (G)

Center for Preventive Medical Sciences, Chiba University, Chiba, Japan.
Tama Family Clinic, Kanagawa, Japan.

Hiroki Tamura (H)

Department of General Medicine, Chiba University Hospital, Chiba, Japan.

Koichi Nakashima (K)

Department of General Medicine, Awa Regional Medical Center, Chiba, Japan.

Kotaro Kunitomo (K)

Department of General Medicine, National Hospital Organization Kumamoto Medical Center, Kumamoto, Japan.

Morika Suzuki (M)

Department of General Medicine, National Hospital Organization Kumamoto Medical Center, Kumamoto, Japan.
Department of Neurology, University of Utah, Salt Lake City, UT, United States.

Sayaka Aoyama (S)

Department of Internal Medicine, Mito Kyodo General Hospital, Ibaraki, Japan.

Shintaro Kosaka (S)

Tokyo Metropolitan Hiroo Hospital, Tokyo, Japan.

Teiko Kawahigashi (T)

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, United States.

Tomohiro Matsumoto (T)

Division of General Medicine, Nerima Hikarigaoka Hospital, Tokyo, Japan.

Fumina Orihara (F)

Division of General Medicine, Nerima Hikarigaoka Hospital, Tokyo, Japan.

Toru Morikawa (T)

Department of General Medicine, Nara City Hospital, Nara, Japan.

Toshinori Nishizawa (T)

Department of General Internal Medicine, St. Luke's International Hospital, Tokyo, Japan.

Yoji Hoshina (Y)

Department of Neurology, University of Utah, Salt Lake City, UT, United States.

Yu Yamamoto (Y)

Division of General Medicine, Center for Community Medicine, Jichi Medical University, Tochigi, Japan.

Yuichiro Matsuo (Y)

Department of Clinical Epidemiology and Health Economics, The Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.

Yuto Unoki (Y)

Department of General Internal Medicine, Iizuka Hospital, Fukuoka, Japan.

Hirofumi Kimura (H)

Department of General Internal Medicine, Iizuka Hospital, Fukuoka, Japan.

Midori Tokushima (M)

Saga Medical Career Support Center, Saga University Hospital, Saga, Japan.

Satoshi Watanuki (S)

Department of Emergency and General Medicine, Tokyo Metropolitan Tama Medical Center, Tokyo, Japan.

Takuma Saito (T)

Department of Emergency and General Medicine, Tokyo Metropolitan Tama Medical Center, Tokyo, Japan.

Fumio Otsuka (F)

Department of General Medicine, Dentistry and Pharmaceutical Sciences, Okayama University Graduate School of Medicine, Okayama, Japan.

Yasuharu Tokuda (Y)

Muribushi Okinawa Center for Teaching Hospitals, Okinawa, Japan.
Tokyo Foundation for Policy Research, Tokyo, Japan.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH