Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology.

Humans Ophthalmology Self-Assessment Academies and Institutes

Journal

Scientific reports

ISSN: 2045-2322

Titre abrégé: Sci Rep

Pays: England

ID NLM: 101563288

Informations de publication

Date de publication:
29 10 2023

Historique:

received: 26 07 2023

accepted: 24 10 2023

medline: 31 10 2023

pubmed: 30 10 2023

entrez: 30 10 2023

Statut: epublish

Résumé

To compare the performance of humans, GPT-4.0 and GPT-3.5 in answering multiple-choice questions from the American Academy of Ophthalmology (AAO) Basic and Clinical Science Course (BCSC) self-assessment program, available at https://www.aao.org/education/self-assessments . In June 2023, text-based multiple-choice questions were submitted to GPT-4.0 and GPT-3.5. The AAO provides the percentage of humans who selected the correct answer, which was analyzed for comparison. All questions were classified by 10 subspecialties and 3 practice areas (diagnostics/clinics, medical treatment, surgery). Out of 1023 questions, GPT-4.0 achieved the best score (82.4%), followed by humans (75.7%) and GPT-3.5 (65.9%), with significant difference in accuracy rates (always P < 0.0001). Both GPT-4.0 and GPT-3.5 showed the worst results in surgery-related questions (74.6% and 57.0% respectively). For difficult questions (answered incorrectly by > 50% of humans), both GPT models favorably compared to humans, without reaching significancy. The word count for answers provided by GPT-4.0 was significantly lower than those produced by GPT-3.5 (160 ± 56 and 206 ± 77 respectively, P < 0.0001); however, incorrect responses were longer (P < 0.02). GPT-4.0 represented a substantial improvement over GPT-3.5, achieving better performance than humans in an AAO BCSC self-assessment test. However, ChatGPT is still limited by inconsistency across different practice areas, especially when it comes to surgery.

Identifiants

DOI: 10.1038/s41598-023-45837-2 PMID: 37899405 PMC: PMC10613606

pubmed: 37899405

doi: 10.1038/s41598-023-45837-2

pii: 10.1038/s41598-023-45837-2

pmc: PMC10613606

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

18562

Informations de copyright

Références

Eye (Lond). 2023 Aug 2;:

pubmed: 37532832

N Engl J Med. 2023 Mar 30;388(13):1233-1239

pubmed: 36988602

Healthcare (Basel). 2023 Mar 19;11(6):

pubmed: 36981544

Eye (Lond). 2023 May 8;:

pubmed: 37156862

Eye (Lond). 2023 May 25;:

pubmed: 37231187

Graefes Arch Clin Exp Ophthalmol. 2023 Nov;261(11):3205-3206

pubmed: 37227477

JMIR Med Educ. 2023 Jun 29;9:e48002

pubmed: 37384388

Graefes Arch Clin Exp Ophthalmol. 2023 Oct;261(10):3041-3043

pubmed: 37129631

JMIR Med Educ. 2023 Feb 8;9:e45312

pubmed: 36753318

Cureus. 2023 Jun 22;15(6):e40822

pubmed: 37485215

NPJ Digit Med. 2023 Apr 26;6(1):75

pubmed: 37100871

Nature. 2023 Jan;613(7944):423

pubmed: 36635510

Semin Ophthalmol. 2023 Jul;38(5):503-507

pubmed: 37133418

Eye (Lond). 2023 May 9;:

pubmed: 37161074

Cureus. 2023 Feb 19;15(2):e35179

pubmed: 36811129

Front Artif Intell. 2023 May 04;6:1169595

pubmed: 37215063

Acta Ophthalmol. 2023 Nov;101(7):829-831

pubmed: 36912780

Ophthalmol Sci. 2023 May 05;3(4):100324

pubmed: 37334036

JAMA Ophthalmol. 2023 Jun 1;141(6):589-597

pubmed: 37103928

Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Andrea Taloni (A)

Massimiliano Borselli (M)

Valentina Scarsi (V)

Costanza Rossi (C)

Giulia Coco (G)

Vincenzo Scorcia (V)

Giuseppe Giannaccare (G)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH