Can AI pass the written European Board Examination in Neurological Surgery? - Ethical and practical issues.

Artificial intelligence Bard Bing Board-certification Chat gpt EANS Neurosurgery board examination

Journal

Brain & spine
ISSN: 2772-5294
Titre abrégé: Brain Spine
Pays: Netherlands
ID NLM: 9918470888906676

Informations de publication

Date de publication:
2024
Historique:
received: 27 11 2023
revised: 28 01 2024
accepted: 12 02 2024
medline: 21 3 2024
pubmed: 21 3 2024
entrez: 21 3 2024
Statut: epublish

Résumé

Artificial intelligence (AI) based large language models (LLM) contain enormous potential in education and training. Recent publications demonstrated that they are able to outperform participants in written medical exams. We aimed to explore the accuracy of AI in the written part of the EANS board exam. Eighty-six representative single best answer (SBA) questions, included at least ten times in prior EANS board exams, were selected by the current EANS board exam committee. The questions' content was classified as 75 text-based (TB) and 11 image-based (IB) and their structure as 50 interpretation-weighted, 30 theory-based and 6 true-or-false. Questions were tested with Chat GPT 3.5, Bing and Bard. The AI and participant results were statistically analyzed through ANOVA tests with Stata SE 15 (StataCorp, College Station, TX). P-values of <0.05 were considered as statistically significant. The Bard LLM achieved the highest accuracy with 62% correct questions overall and 69% excluding IB, outperforming human exam participants 59% (p = 0.67) and 59% (p = 0.42), respectively. All LLMs scored highest in theory-based questions, excluding IB questions (Chat-GPT: 79%; Bing: 83%; Bard: 86%) and significantly better than the human exam participants (60%; p = 0.03). AI could not answer any IB question correctly. AI passed the written EANS board exam based on representative SBA questions and achieved results close to or even better than the human exam participants. Our results raise several ethical and practical implications, which may impact the current concept for the written EANS board exam.

Identifiants

pubmed: 38510593
doi: 10.1016/j.bas.2024.102765
pii: S2772-5294(24)00021-3
pmc: PMC10951784
doi:

Types de publication

Journal Article

Langues

eng

Pagination

102765

Informations de copyright

© 2024 The Authors.

Déclaration de conflit d'intérêts

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Felix C Stengel (FC)

Department of Neurosurgery & Spine Center of Eastern Switzerland, Kantonsspital St. Gallen & Medical School of St.Gallen, St. Gallen, Switzerland.

Martin N Stienen (MN)

Department of Neurosurgery & Spine Center of Eastern Switzerland, Kantonsspital St. Gallen & Medical School of St.Gallen, St. Gallen, Switzerland.

Marcel Ivanov (M)

Royal Hallamshire Hospital, Sheffield, United Kingdom.

María L Gandía-González (ML)

Hospital Universitario La Paz, Madrid, Spain.

Giovanni Raffa (G)

Division of Neurosurgery, BIOMORF Department, University of Messina, Messina, Italy.

Mario Ganau (M)

Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom.

Peter Whitfield (P)

South West Neurosurgery Centre, Plymouth, United Kingdom.

Stefan Motov (S)

Department of Neurosurgery & Spine Center of Eastern Switzerland, Kantonsspital St. Gallen & Medical School of St.Gallen, St. Gallen, Switzerland.

Classifications MeSH