ChatGPT failed Taiwan's Family Medicine Board Exam.

Artificial Intelligence Databases, Factual Family Practice Taiwan Academic Performance

Journal

Journal of the Chinese Medical Association : JCMA

ISSN: 1728-7731

Titre abrégé: J Chin Med Assoc

Pays: Netherlands

ID NLM: 101174817

Informations de publication

Date de publication:
01 08 2023

Historique:

medline: 11 8 2023

pubmed: 9 6 2023

entrez: 9 6 2023

Statut: ppublish

Résumé

Chat Generative Pre-trained Transformer (ChatGPT), OpenAI Limited Partnership, San Francisco, CA, USA is an artificial intelligence language model gaining popularity because of its large database and ability to interpret and respond to various queries. Although it has been tested by researchers in different fields, its performance varies depending on the domain. We aimed to further test its ability in the medical field. We used questions from Taiwan's 2022 Family Medicine Board Exam, which combined both Chinese and English and covered various question types, including reverse questions and multiple-choice questions, and mainly focused on general medical knowledge. We pasted each question into ChatGPT and recorded its response, comparing it to the correct answer provided by the exam board. We used SAS 9.4 (Cary, North Carolina, USA) and Excel to calculate the accuracy rates for each question type. ChatGPT answered 52 questions out of 125 correctly, with an accuracy rate of 41.6%. The questions' length did not affect the accuracy rates. These were 45.5%, 33.3%, 58.3%, 50.0%, and 43.5% for negative-phrase questions, multiple-choice questions, mutually exclusive options, case scenario questions, and Taiwan's local policy-related questions, with no statistical difference observed. ChatGPT's accuracy rate was not good enough for Taiwan's Family Medicine Board Exam. Possible reasons include the difficulty level of the specialist exam and the relatively weak database of traditional Chinese language resources. However, ChatGPT performed acceptably in negative-phrase questions, mutually exclusive questions, and case scenario questions, and it can be a helpful tool for learning and exam preparation. Future research can explore ways to improve ChatGPT's accuracy rate for specialized exams and other domains.

Sections du résumé

BACKGROUND

METHODS

We used questions from Taiwan's 2022 Family Medicine Board Exam, which combined both Chinese and English and covered various question types, including reverse questions and multiple-choice questions, and mainly focused on general medical knowledge. We pasted each question into ChatGPT and recorded its response, comparing it to the correct answer provided by the exam board. We used SAS 9.4 (Cary, North Carolina, USA) and Excel to calculate the accuracy rates for each question type.

RESULTS

ChatGPT answered 52 questions out of 125 correctly, with an accuracy rate of 41.6%. The questions' length did not affect the accuracy rates. These were 45.5%, 33.3%, 58.3%, 50.0%, and 43.5% for negative-phrase questions, multiple-choice questions, mutually exclusive options, case scenario questions, and Taiwan's local policy-related questions, with no statistical difference observed.

CONCLUSION

ChatGPT's accuracy rate was not good enough for Taiwan's Family Medicine Board Exam. Possible reasons include the difficulty level of the specialist exam and the relatively weak database of traditional Chinese language resources. However, ChatGPT performed acceptably in negative-phrase questions, mutually exclusive questions, and case scenario questions, and it can be a helpful tool for learning and exam preparation. Future research can explore ways to improve ChatGPT's accuracy rate for specialized exams and other domains.

Identifiants

DOI: 10.1097/JCMA.0000000000000946 PMID: 37294147

pubmed: 37294147

doi: 10.1097/JCMA.0000000000000946

pii: 02118582-990000000-00224

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

762-766

Commentaires et corrections

Type : CommentIn

Informations de copyright

Déclaration de conflit d'intérêts

Conflicts of interest: Dr. Tzeng-Ji Chen and Dr. Shinn-Jang Hwang, editorial board members at the Journal of the Chinese Medical Association , have no roles in the peer review process or decision to publish this article. The other authors declare that they have no conflicts of interest related to the subject matter or materials discussed in this article.

Références

Iskender A. Holy or unholy? Interview with Open AI’s ChatGPT. European J Tourism Research. 2023;34:3414.

Haleem A, Javaid M, Singh RP. An era of ChatGPT as a significant futuristic support tool: a study on features, abilities, and challenges. BenchCouncil Transact Benchmarks, Standards Evaluations. 2022;2:100089.

Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepano C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2:e0000198.

Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States Medical Licensing Examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312.

Talan T, Kalinkara Y. The role of artificial intelligence in higher education: ChatGPT assessment for anatomy course. Int J Management Information Syst Computer Science. 2023;7:33–40.

Sabry Abdel-Messih M, Kamel Boulos MN. ChatGPT in clinical toxicology. JMIR Med Educ. 2023;9:e46876.

Morreel S, Mathysen D, Verhoeven V. Aye, AI! ChatGPT passes multiple-choice family medicine exam. Med Teach. 2023;45:665–6.

Rohaid A, Oliver YT, Ian DC, Patricia LZS, John HS, Jared SF, et al. Performance of ChatGPT and GPT-4 on neurosurgery written board examinations. medRxiv. 2023. Doi:10.1097/CORR.0000000000002704.

Huh S. Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination? A descriptive study. J Educ Eval Health Prof. 2023;20:1.

Antaki F, Touma S, Milad D, El-Khoury J, Duval R. Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings. Ophthalmol Sci. 2023;3:100324.

Fijačko N, Gosak L, Štiglic G, Picard CT, John Douma M. Can ChatGPT pass the life support exams without entering the American Heart Association course? Resuscitation. 2023;185:109732.

Jan CF, Hwang SJ, Chang CJ, Huang CK, Yang HY, Chiu TY. Family physician system in Taiwan. J Chin Med Assoc. 2020;83:117–24.

Seghier ML. ChatGPT: not all languages are equal. Nature. 2023;615:216.

m ZC. Can artificial intelligence pass the American Board of Orthopaedic Surgery Examination? Orthopaedic residents versus ChatGPT. Clin Orthop Relat Res. 2023. Doi:10.1097/CORR.0000000000002704.

Fang C, Ling J, Zhou J, Wang Y, Liu X, Jiang Y, et al. How does ChatGPT4 preform on non-English National Medical Licensing Examination? An evaluation in Chinese language. medRxiv. 2023. Doi:10.1101/2023.05.03.23289443.

Yeo YH, Samaan JS, Ng WH, Ma X, Ting PS, Kwak MS, et al. GPT-4 outperforms ChatGPT in answering non-English questions related to cirrhosis. medRxiv. 2023. Doi:10.1101/2023.05.04.23289482.

Chiavaroli N. Negatively-worded multiple choice questions: an avoidable threat to validity. Practical Assessment, Research, and Evaluation. 2017;22:3.

Truong HT, Otmakhova Y, Baldwin T, Cohn T, Lau JH, Verspoor K. Not another negation benchmark: the NaN-NLI test suite for sub-clausal negation. arXiv. 2022. Doi:10.48550/arXiv.2210.03256.

doi: 10.48550/arXiv.2210.03256

Miguel D. Probing negation in ChatGPT. Available at https://betterprogramming.pub/probing-negation-in-chatgpt-eb8e99cf0a9f . Accessed May 15, 2023.

Chen TJ. ChatGPT and other artificial intelligence applications speed up scientific writing. J Chin Med Assoc. 2023;86:351–3.

ChatGPT failed Taiwan's Family Medicine Board Exam.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Commentaires et corrections

Informations de copyright

Déclaration de conflit d'intérêts

Références

Auteurs

Tzu-Ling Weng (TL)

Ying-Mei Wang (YM)

Samuel Chang (S)

Tzeng-Ji Chen (TJ)

Shinn-Jang Hwang (SJ)

Articles similaires

AI-powered mechanisms as judges: Breaking ties in chess.

How Do Personal Attributes Shape AI Dependency in Chinese Higher Education Context? Insights from Needs Frustration Perspective.

An arithmetic operation P system based on symmetric ternary system.

Efficacy of a Wearable Activity Tracker With Step-by-Step Goal-Setting on Older Adults' Physical Activity and Sarcopenia Indicators: Clustered Trial.

Classifications MeSH