Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.

AI Chat Generative Pre-trained Transformer ChatGPT GPT-4 Generative Pre-trained Transformer 4 Japanese Medical Licensing Examination artificial intelligence clinical support learning model medical education medical licensing

Journal

JMIR medical education

ISSN: 2369-3762

Titre abrégé: JMIR Med Educ

Pays: Canada

ID NLM: 101684518

Informations de publication

Date de publication:
29 Jun 2023

Historique:

received: 07 04 2023

accepted: 14 06 2023

revised: 11 05 2023

medline: 29 6 2023

pubmed: 29 6 2023

entrez: 29 6 2023

Statut: epublish

Résumé

The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied. This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages. This study used the default mode of ChatGPT, which is based on GPT-3.5; the GPT-4 model of ChatGPT Plus; and the 117th JMLE in 2023. A total of 254 questions were included in the final analysis, which were categorized into 3 types, namely general, clinical, and clinical sentence questions. The results indicated that GPT-4 outperformed GPT-3.5 in terms of accuracy, particularly for general, clinical, and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for the JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages. GPT-4 could become a valuable tool for medical education and clinical support in non-English-speaking regions, such as Japan.

Sections du résumé

BACKGROUND BACKGROUND

The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied.

OBJECTIVE OBJECTIVE

This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages.

METHODS METHODS

This study used the default mode of ChatGPT, which is based on GPT-3.5; the GPT-4 model of ChatGPT Plus; and the 117th JMLE in 2023. A total of 254 questions were included in the final analysis, which were categorized into 3 types, namely general, clinical, and clinical sentence questions.

RESULTS RESULTS

The results indicated that GPT-4 outperformed GPT-3.5 in terms of accuracy, particularly for general, clinical, and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for the JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages.

CONCLUSIONS CONCLUSIONS

GPT-4 could become a valuable tool for medical education and clinical support in non-English-speaking regions, such as Japan.

Identifiants

DOI: 10.2196/48002 PMID: 37384388 PMC: PMC10365615

pubmed: 37384388

pii: v9i1e48002

doi: 10.2196/48002

pmc: PMC10365615

doi:

Types de publication

Journal Article

Langues

eng

Pagination

e48002

Informations de copyright

©Soshi Takagi, Takashi Watari, Ayano Erabi, Kota Sakaguchi. Originally published in JMIR Medical Education (https://mededu.jmir.org), 29.06.2023.

Références

N Engl J Med. 2023 Mar 30;388(13):1233-1239

pubmed: 36988602

Healthcare (Basel). 2023 Mar 19;11(6):

pubmed: 36981544

Int J Environ Res Public Health. 2023 Feb 15;20(4):

pubmed: 36834073

Commun Med (Lond). 2022 Jun 3;2:63

pubmed: 35668847

Ann Biomed Eng. 2023 May;51(5):868-869

pubmed: 36920578

J Educ Eval Health Prof. 2023;20:1

pubmed: 36627845

JMIR Med Educ. 2023 Feb 8;9:e45312

pubmed: 36753318

PLOS Digit Health. 2023 Feb 9;2(2):e0000198

pubmed: 36812645

Nature. 2023 Mar;615(7951):216

pubmed: 36882613

Epilepsia. 2023 May;64(5):1195-1199

pubmed: 36869421

JMIR Med Educ. 2023 Apr 21;9:e46599

pubmed: 37083633

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Pagination

Informations de copyright

Références

Auteurs

Soshi Takagi (S)

Takashi Watari (T)

Ayano Erabi (A)

Kota Sakaguchi (K)

Classifications MeSH