Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
22 Nov 2023
Historique:
received: 14 09 2023
accepted: 07 11 2023
medline: 24 11 2023
pubmed: 23 11 2023
entrez: 22 11 2023
Statut: epublish

Résumé

The study aimed to evaluate the performance of two Large Language Models (LLMs): ChatGPT (based on GPT-3.5) and GPT-4 with two temperature parameter values, on the Polish Medical Final Examination (MFE). The models were tested on three editions of the MFE from: Spring 2022, Autumn 2022, and Spring 2023 in two language versions-English and Polish. The accuracies of both models were compared and the relationships between the correctness of answers with the answer's metrics were investigated. The study demonstrated that GPT-4 outperformed GPT-3.5 in all three examinations regardless of the language used. GPT-4 achieved mean accuracies of 79.7% for both Polish and English versions, passing all MFE versions. GPT-3.5 had mean accuracies of 54.8% for Polish and 60.3% for English, passing none and 2 of 3 Polish versions for temperature parameter equal to 0 and 1 respectively while passing all English versions regardless of the temperature parameter value. GPT-4 score was mostly lower than the average score of a medical student. There was a statistically significant correlation between the correctness of the answers and the index of difficulty for both models. The overall accuracy of both models was still suboptimal and worse than the average for medical students. This emphasizes the need for further improvements in LLMs before they can be reliably deployed in medical settings. These findings suggest an increasing potential for the usage of LLMs in terms of medical education.

Identifiants

pubmed: 37993519
doi: 10.1038/s41598-023-46995-z
pii: 10.1038/s41598-023-46995-z
pmc: PMC10665355
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

20512

Informations de copyright

© 2023. The Author(s).

Références

Am J Med Sci. 2023 Oct;366(4):291-295
pubmed: 37549788
N Engl J Med. 2023 Mar 30;388(13):1233-1239
pubmed: 36988602
Vasa. 2017 May;46(3):169-176
pubmed: 28240041
JAMA Netw Open. 2023 Jun 1;6(6):e2317517
pubmed: 37285160
JAMA Intern Med. 2023 Jun 1;183(6):589-596
pubmed: 37115527
Anat Sci Educ. 2023 Mar 14;:
pubmed: 36916887
Radiology. 2023 Apr;307(2):e223312
pubmed: 36728748
JAMA. 2023 Sep 5;330(9):792-794
pubmed: 37548971
Korean J Med Educ. 2023 Mar;35(1):103-107
pubmed: 36858381
Cureus. 2023 Feb 19;15(2):e35179
pubmed: 36811129
JMIR Med Educ. 2023 Mar 6;9:e46885
pubmed: 36863937
Artif Intell Med. 2023 May;139:102535
pubmed: 37100505
JMIR Med Educ. 2023 Feb 8;9:e45312
pubmed: 36753318
JAMA Ophthalmol. 2023 Jun 1;141(6):589-597
pubmed: 37103928
JAMA Pediatr. 2023 Sep 1;177(9):977-979
pubmed: 37459084
Healthcare (Basel). 2023 Mar 19;11(6):
pubmed: 36981544
Radiology. 2023 Jun;307(5):e230582
pubmed: 37191485
Ann Biomed Eng. 2023 Aug;51(8):1658-1662
pubmed: 37097528
J Med Syst. 2023 Mar 04;47(1):33
pubmed: 36869927
JAMA Intern Med. 2023 Jun 1;183(6):596-597
pubmed: 37115531
J Am Coll Radiol. 2023 Sep 1;:
pubmed: 37659452
Artif Intell Med. 2021 Jul;117:102083
pubmed: 34127232
J Med Syst. 2023 Aug 15;47(1):86
pubmed: 37581690
Artif Intell Med. 2021 Aug;118:102086
pubmed: 34412834
Nature. 2023 Aug;620(7972):172-180
pubmed: 37438534
JAMA Intern Med. 2023 Sep 1;183(9):1028-1030
pubmed: 37459090
J Med Internet Res. 2023 Aug 22;25:e48659
pubmed: 37606976
Sci Rep. 2023 Oct 1;13(1):16492
pubmed: 37779171
Ger Med Sci. 2017 Sep 25;15:Doc15
pubmed: 29051721
PLOS Digit Health. 2023 Feb 9;2(2):e0000198
pubmed: 36812645
Med Teach. 2003 Jul;25(4):422-7
pubmed: 12893555

Auteurs

Maciej Rosoł (M)

Faculty of Mechatronics, Institute of Metrology and Biomedical Engineering, Warsaw University of Technology, Boboli 8 Street, 02-525, Warsaw, Poland. maciej.rosol.dokt@pw.edu.pl.

Jakub S Gąsior (JS)

Department of Pediatric Cardiology and General Pediatrics, Medical University of Warsaw, Warsaw, Poland.

Jonasz Łaba (J)

Faculty of Mechatronics, Institute of Metrology and Biomedical Engineering, Warsaw University of Technology, Boboli 8 Street, 02-525, Warsaw, Poland.

Kacper Korzeniewski (K)

Faculty of Mechatronics, Institute of Metrology and Biomedical Engineering, Warsaw University of Technology, Boboli 8 Street, 02-525, Warsaw, Poland.

Marcel Młyńczak (M)

Faculty of Mechatronics, Institute of Metrology and Biomedical Engineering, Warsaw University of Technology, Boboli 8 Street, 02-525, Warsaw, Poland.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH