Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
29 10 2023
29 10 2023
Historique:
received:
26
07
2023
accepted:
24
10
2023
medline:
31
10
2023
pubmed:
30
10
2023
entrez:
30
10
2023
Statut:
epublish
Résumé
To compare the performance of humans, GPT-4.0 and GPT-3.5 in answering multiple-choice questions from the American Academy of Ophthalmology (AAO) Basic and Clinical Science Course (BCSC) self-assessment program, available at https://www.aao.org/education/self-assessments . In June 2023, text-based multiple-choice questions were submitted to GPT-4.0 and GPT-3.5. The AAO provides the percentage of humans who selected the correct answer, which was analyzed for comparison. All questions were classified by 10 subspecialties and 3 practice areas (diagnostics/clinics, medical treatment, surgery). Out of 1023 questions, GPT-4.0 achieved the best score (82.4%), followed by humans (75.7%) and GPT-3.5 (65.9%), with significant difference in accuracy rates (always P < 0.0001). Both GPT-4.0 and GPT-3.5 showed the worst results in surgery-related questions (74.6% and 57.0% respectively). For difficult questions (answered incorrectly by > 50% of humans), both GPT models favorably compared to humans, without reaching significancy. The word count for answers provided by GPT-4.0 was significantly lower than those produced by GPT-3.5 (160 ± 56 and 206 ± 77 respectively, P < 0.0001); however, incorrect responses were longer (P < 0.02). GPT-4.0 represented a substantial improvement over GPT-3.5, achieving better performance than humans in an AAO BCSC self-assessment test. However, ChatGPT is still limited by inconsistency across different practice areas, especially when it comes to surgery.
Identifiants
pubmed: 37899405
doi: 10.1038/s41598-023-45837-2
pii: 10.1038/s41598-023-45837-2
pmc: PMC10613606
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
18562Informations de copyright
© 2023. The Author(s).
Références
Eye (Lond). 2023 Aug 2;:
pubmed: 37532832
N Engl J Med. 2023 Mar 30;388(13):1233-1239
pubmed: 36988602
Healthcare (Basel). 2023 Mar 19;11(6):
pubmed: 36981544
Eye (Lond). 2023 May 8;:
pubmed: 37156862
Eye (Lond). 2023 May 25;:
pubmed: 37231187
Graefes Arch Clin Exp Ophthalmol. 2023 Nov;261(11):3205-3206
pubmed: 37227477
JMIR Med Educ. 2023 Jun 29;9:e48002
pubmed: 37384388
Graefes Arch Clin Exp Ophthalmol. 2023 Oct;261(10):3041-3043
pubmed: 37129631
JMIR Med Educ. 2023 Feb 8;9:e45312
pubmed: 36753318
Cureus. 2023 Jun 22;15(6):e40822
pubmed: 37485215
NPJ Digit Med. 2023 Apr 26;6(1):75
pubmed: 37100871
Nature. 2023 Jan;613(7944):423
pubmed: 36635510
Semin Ophthalmol. 2023 Jul;38(5):503-507
pubmed: 37133418
Eye (Lond). 2023 May 9;:
pubmed: 37161074
Cureus. 2023 Feb 19;15(2):e35179
pubmed: 36811129
Front Artif Intell. 2023 May 04;6:1169595
pubmed: 37215063
Acta Ophthalmol. 2023 Nov;101(7):829-831
pubmed: 36912780
Ophthalmol Sci. 2023 May 05;3(4):100324
pubmed: 37334036
JAMA Ophthalmol. 2023 Jun 1;141(6):589-597
pubmed: 37103928