Appraisal of ChatGPT's Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination.
ChatGPT
GPT
LLM
LLMs
NLP
artificial intelligence
comparative analysis
exam
examination
examinations
exams
generative
generative AI
generative artificial intelligence
language model
large language models
lung
lungs
medical education
natural language processing
pneumology
pulmonary
pulmonary medicine
respiration
respiratory
Journal
JMIR medical education
ISSN: 2369-3762
Titre abrégé: JMIR Med Educ
Pays: Canada
ID NLM: 101684518
Informations de publication
Date de publication:
23 Jul 2024
23 Jul 2024
Historique:
received:
29
09
2023
accepted:
26
02
2024
revised:
05
02
2024
medline:
23
7
2024
pubmed:
23
7
2024
entrez:
23
7
2024
Statut:
epublish
Résumé
The rapid evolution of ChatGPT has generated substantial interest and led to extensive discussions in both public and academic domains, particularly in the context of medical education. This study aimed to evaluate ChatGPT's performance in a pulmonology examination through a comparative analysis with that of third-year medical students. In this cross-sectional study, we conducted a comparative analysis with 2 distinct groups. The first group comprised 244 third-year medical students who had previously taken our institution's 2020 pulmonology examination, which was conducted in French. The second group involved ChatGPT-3.5 in 2 separate sets of conversations: without contextualization (V1) and with contextualization (V2). In both V1 and V2, ChatGPT received the same set of questions administered to the students. V1 demonstrated exceptional proficiency in radiology, microbiology, and thoracic surgery, surpassing the majority of medical students in these domains. However, it faced challenges in pathology, pharmacology, and clinical pneumology. In contrast, V2 consistently delivered more accurate responses across various question categories, regardless of the specialization. ChatGPT exhibited suboptimal performance in multiple choice questions compared to medical students. V2 excelled in responding to structured open-ended questions. Both ChatGPT conversations, particularly V2, outperformed students in addressing questions of low and intermediate difficulty. Interestingly, students showcased enhanced proficiency when confronted with highly challenging questions. V1 fell short of passing the examination. Conversely, V2 successfully achieved examination success, outperforming 139 (62.1%) medical students. While ChatGPT has access to a comprehensive web-based data set, its performance closely mirrors that of an average medical student. Outcomes are influenced by question format, item complexity, and contextual nuances. The model faces challenges in medical contexts requiring information synthesis, advanced analytical aptitude, and clinical judgment, as well as in non-English language assessments and when confronted with data outside mainstream internet sources.
Sections du résumé
BACKGROUND
BACKGROUND
The rapid evolution of ChatGPT has generated substantial interest and led to extensive discussions in both public and academic domains, particularly in the context of medical education.
OBJECTIVE
OBJECTIVE
This study aimed to evaluate ChatGPT's performance in a pulmonology examination through a comparative analysis with that of third-year medical students.
METHODS
METHODS
In this cross-sectional study, we conducted a comparative analysis with 2 distinct groups. The first group comprised 244 third-year medical students who had previously taken our institution's 2020 pulmonology examination, which was conducted in French. The second group involved ChatGPT-3.5 in 2 separate sets of conversations: without contextualization (V1) and with contextualization (V2). In both V1 and V2, ChatGPT received the same set of questions administered to the students.
RESULTS
RESULTS
V1 demonstrated exceptional proficiency in radiology, microbiology, and thoracic surgery, surpassing the majority of medical students in these domains. However, it faced challenges in pathology, pharmacology, and clinical pneumology. In contrast, V2 consistently delivered more accurate responses across various question categories, regardless of the specialization. ChatGPT exhibited suboptimal performance in multiple choice questions compared to medical students. V2 excelled in responding to structured open-ended questions. Both ChatGPT conversations, particularly V2, outperformed students in addressing questions of low and intermediate difficulty. Interestingly, students showcased enhanced proficiency when confronted with highly challenging questions. V1 fell short of passing the examination. Conversely, V2 successfully achieved examination success, outperforming 139 (62.1%) medical students.
CONCLUSIONS
CONCLUSIONS
While ChatGPT has access to a comprehensive web-based data set, its performance closely mirrors that of an average medical student. Outcomes are influenced by question format, item complexity, and contextual nuances. The model faces challenges in medical contexts requiring information synthesis, advanced analytical aptitude, and clinical judgment, as well as in non-English language assessments and when confronted with data outside mainstream internet sources.
Identifiants
pubmed: 39042876
pii: v10i1e52818
doi: 10.2196/52818
doi:
Types de publication
Journal Article
Comparative Study
Langues
eng
Sous-ensembles de citation
IM
Pagination
e52818Informations de copyright
©Hela Cherif, Chirine Moussa, Abdel Mouhaymen Missaoui, Issam Salouage, Salma Mokaddem, Besma Dhahri. Originally published in JMIR Medical Education (https://mededu.jmir.org), 23.07.2024.