Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.
AI
Chat Generative Pre-trained Transformer
ChatGPT
GPT-4
Generative Pre-trained Transformer 4
Japanese Medical Licensing Examination
artificial intelligence
clinical support
learning model
medical education
medical licensing
Journal
JMIR medical education
ISSN: 2369-3762
Titre abrégé: JMIR Med Educ
Pays: Canada
ID NLM: 101684518
Informations de publication
Date de publication:
29 Jun 2023
29 Jun 2023
Historique:
received:
07
04
2023
accepted:
14
06
2023
revised:
11
05
2023
medline:
29
6
2023
pubmed:
29
6
2023
entrez:
29
6
2023
Statut:
epublish
Résumé
The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied. This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages. This study used the default mode of ChatGPT, which is based on GPT-3.5; the GPT-4 model of ChatGPT Plus; and the 117th JMLE in 2023. A total of 254 questions were included in the final analysis, which were categorized into 3 types, namely general, clinical, and clinical sentence questions. The results indicated that GPT-4 outperformed GPT-3.5 in terms of accuracy, particularly for general, clinical, and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for the JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages. GPT-4 could become a valuable tool for medical education and clinical support in non-English-speaking regions, such as Japan.
Sections du résumé
BACKGROUND
BACKGROUND
The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied.
OBJECTIVE
OBJECTIVE
This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages.
METHODS
METHODS
This study used the default mode of ChatGPT, which is based on GPT-3.5; the GPT-4 model of ChatGPT Plus; and the 117th JMLE in 2023. A total of 254 questions were included in the final analysis, which were categorized into 3 types, namely general, clinical, and clinical sentence questions.
RESULTS
RESULTS
The results indicated that GPT-4 outperformed GPT-3.5 in terms of accuracy, particularly for general, clinical, and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for the JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages.
CONCLUSIONS
CONCLUSIONS
GPT-4 could become a valuable tool for medical education and clinical support in non-English-speaking regions, such as Japan.
Identifiants
pubmed: 37384388
pii: v9i1e48002
doi: 10.2196/48002
pmc: PMC10365615
doi:
Types de publication
Journal Article
Langues
eng
Pagination
e48002Informations de copyright
©Soshi Takagi, Takashi Watari, Ayano Erabi, Kota Sakaguchi. Originally published in JMIR Medical Education (https://mededu.jmir.org), 29.06.2023.
Références
N Engl J Med. 2023 Mar 30;388(13):1233-1239
pubmed: 36988602
Healthcare (Basel). 2023 Mar 19;11(6):
pubmed: 36981544
Int J Environ Res Public Health. 2023 Feb 15;20(4):
pubmed: 36834073
Commun Med (Lond). 2022 Jun 3;2:63
pubmed: 35668847
Ann Biomed Eng. 2023 May;51(5):868-869
pubmed: 36920578
J Educ Eval Health Prof. 2023;20:1
pubmed: 36627845
JMIR Med Educ. 2023 Feb 8;9:e45312
pubmed: 36753318
PLOS Digit Health. 2023 Feb 9;2(2):e0000198
pubmed: 36812645
Nature. 2023 Mar;615(7951):216
pubmed: 36882613
Epilepsia. 2023 May;64(5):1195-1199
pubmed: 36869421
JMIR Med Educ. 2023 Apr 21;9:e46599
pubmed: 37083633