ChatGPT failed Taiwan's Family Medicine Board Exam.
Journal
Journal of the Chinese Medical Association : JCMA
ISSN: 1728-7731
Titre abrégé: J Chin Med Assoc
Pays: Netherlands
ID NLM: 101174817
Informations de publication
Date de publication:
01 08 2023
01 08 2023
Historique:
medline:
11
8
2023
pubmed:
9
6
2023
entrez:
9
6
2023
Statut:
ppublish
Résumé
Chat Generative Pre-trained Transformer (ChatGPT), OpenAI Limited Partnership, San Francisco, CA, USA is an artificial intelligence language model gaining popularity because of its large database and ability to interpret and respond to various queries. Although it has been tested by researchers in different fields, its performance varies depending on the domain. We aimed to further test its ability in the medical field. We used questions from Taiwan's 2022 Family Medicine Board Exam, which combined both Chinese and English and covered various question types, including reverse questions and multiple-choice questions, and mainly focused on general medical knowledge. We pasted each question into ChatGPT and recorded its response, comparing it to the correct answer provided by the exam board. We used SAS 9.4 (Cary, North Carolina, USA) and Excel to calculate the accuracy rates for each question type. ChatGPT answered 52 questions out of 125 correctly, with an accuracy rate of 41.6%. The questions' length did not affect the accuracy rates. These were 45.5%, 33.3%, 58.3%, 50.0%, and 43.5% for negative-phrase questions, multiple-choice questions, mutually exclusive options, case scenario questions, and Taiwan's local policy-related questions, with no statistical difference observed. ChatGPT's accuracy rate was not good enough for Taiwan's Family Medicine Board Exam. Possible reasons include the difficulty level of the specialist exam and the relatively weak database of traditional Chinese language resources. However, ChatGPT performed acceptably in negative-phrase questions, mutually exclusive questions, and case scenario questions, and it can be a helpful tool for learning and exam preparation. Future research can explore ways to improve ChatGPT's accuracy rate for specialized exams and other domains.
Sections du résumé
BACKGROUND
Chat Generative Pre-trained Transformer (ChatGPT), OpenAI Limited Partnership, San Francisco, CA, USA is an artificial intelligence language model gaining popularity because of its large database and ability to interpret and respond to various queries. Although it has been tested by researchers in different fields, its performance varies depending on the domain. We aimed to further test its ability in the medical field.
METHODS
We used questions from Taiwan's 2022 Family Medicine Board Exam, which combined both Chinese and English and covered various question types, including reverse questions and multiple-choice questions, and mainly focused on general medical knowledge. We pasted each question into ChatGPT and recorded its response, comparing it to the correct answer provided by the exam board. We used SAS 9.4 (Cary, North Carolina, USA) and Excel to calculate the accuracy rates for each question type.
RESULTS
ChatGPT answered 52 questions out of 125 correctly, with an accuracy rate of 41.6%. The questions' length did not affect the accuracy rates. These were 45.5%, 33.3%, 58.3%, 50.0%, and 43.5% for negative-phrase questions, multiple-choice questions, mutually exclusive options, case scenario questions, and Taiwan's local policy-related questions, with no statistical difference observed.
CONCLUSION
ChatGPT's accuracy rate was not good enough for Taiwan's Family Medicine Board Exam. Possible reasons include the difficulty level of the specialist exam and the relatively weak database of traditional Chinese language resources. However, ChatGPT performed acceptably in negative-phrase questions, mutually exclusive questions, and case scenario questions, and it can be a helpful tool for learning and exam preparation. Future research can explore ways to improve ChatGPT's accuracy rate for specialized exams and other domains.
Identifiants
pubmed: 37294147
doi: 10.1097/JCMA.0000000000000946
pii: 02118582-990000000-00224
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
762-766Commentaires et corrections
Type : CommentIn
Informations de copyright
Copyright © 2023, the Chinese Medical Association.
Déclaration de conflit d'intérêts
Conflicts of interest: Dr. Tzeng-Ji Chen and Dr. Shinn-Jang Hwang, editorial board members at the Journal of the Chinese Medical Association , have no roles in the peer review process or decision to publish this article. The other authors declare that they have no conflicts of interest related to the subject matter or materials discussed in this article.
Références
Iskender A. Holy or unholy? Interview with Open AI’s ChatGPT. European J Tourism Research. 2023;34:3414.
Haleem A, Javaid M, Singh RP. An era of ChatGPT as a significant futuristic support tool: a study on features, abilities, and challenges. BenchCouncil Transact Benchmarks, Standards Evaluations. 2022;2:100089.
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepano C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2:e0000198.
Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States Medical Licensing Examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312.
Talan T, Kalinkara Y. The role of artificial intelligence in higher education: ChatGPT assessment for anatomy course. Int J Management Information Syst Computer Science. 2023;7:33–40.
Sabry Abdel-Messih M, Kamel Boulos MN. ChatGPT in clinical toxicology. JMIR Med Educ. 2023;9:e46876.
Morreel S, Mathysen D, Verhoeven V. Aye, AI! ChatGPT passes multiple-choice family medicine exam. Med Teach. 2023;45:665–6.
Rohaid A, Oliver YT, Ian DC, Patricia LZS, John HS, Jared SF, et al. Performance of ChatGPT and GPT-4 on neurosurgery written board examinations. medRxiv. 2023. Doi:10.1097/CORR.0000000000002704.
Huh S. Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination? A descriptive study. J Educ Eval Health Prof. 2023;20:1.
Antaki F, Touma S, Milad D, El-Khoury J, Duval R. Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings. Ophthalmol Sci. 2023;3:100324.
Fijačko N, Gosak L, Štiglic G, Picard CT, John Douma M. Can ChatGPT pass the life support exams without entering the American Heart Association course? Resuscitation. 2023;185:109732.
Jan CF, Hwang SJ, Chang CJ, Huang CK, Yang HY, Chiu TY. Family physician system in Taiwan. J Chin Med Assoc. 2020;83:117–24.
Seghier ML. ChatGPT: not all languages are equal. Nature. 2023;615:216.
m ZC. Can artificial intelligence pass the American Board of Orthopaedic Surgery Examination? Orthopaedic residents versus ChatGPT. Clin Orthop Relat Res. 2023. Doi:10.1097/CORR.0000000000002704.
Fang C, Ling J, Zhou J, Wang Y, Liu X, Jiang Y, et al. How does ChatGPT4 preform on non-English National Medical Licensing Examination? An evaluation in Chinese language. medRxiv. 2023. Doi:10.1101/2023.05.03.23289443.
Yeo YH, Samaan JS, Ng WH, Ma X, Ting PS, Kwak MS, et al. GPT-4 outperforms ChatGPT in answering non-English questions related to cirrhosis. medRxiv. 2023. Doi:10.1101/2023.05.04.23289482.
Chiavaroli N. Negatively-worded multiple choice questions: an avoidable threat to validity. Practical Assessment, Research, and Evaluation. 2017;22:3.
Truong HT, Otmakhova Y, Baldwin T, Cohn T, Lau JH, Verspoor K. Not another negation benchmark: the NaN-NLI test suite for sub-clausal negation. arXiv. 2022. Doi:10.48550/arXiv.2210.03256.
doi: 10.48550/arXiv.2210.03256
Miguel D. Probing negation in ChatGPT. Available at https://betterprogramming.pub/probing-negation-in-chatgpt-eb8e99cf0a9f . Accessed May 15, 2023.
Chen TJ. ChatGPT and other artificial intelligence applications speed up scientific writing. J Chin Med Assoc. 2023;86:351–3.