Experimental assessment of the performance of artificial intelligence in solving multiple-choice board exams in cardiology.


Journal

Swiss medical weekly
ISSN: 1424-3997
Titre abrégé: Swiss Med Wkly
Pays: Switzerland
ID NLM: 100970884

Informations de publication

Date de publication:
02 Oct 2024
Historique:
medline: 28 10 2024
pubmed: 28 10 2024
entrez: 28 10 2024
Statut: epublish

Résumé

The aim of the present study was to evaluate the performance of various artificial intelligence (AI)-powered chatbots (commercially available in Switzerland up to June 2023) in solving a theoretical cardiology board exam and to compare their accuracy with that of human cardiology fellows. For the study, a set of 88 multiple-choice cardiology exam questions was used. The participating cardiology fellows and selected chatbots were presented with these questions. The evaluation metrics included Top-1 and Top-2 accuracy, assessing the ability of chatbots and fellows to select the correct answer. Among the cardiology fellows, all 36 participants successfully passed the exam with a median accuracy of 98% (IQR 91-99%, range from 78% to 100%). However, the performance of the chatbots varied. Only one chatbot, Jasper quality, achieved the minimum pass rate of 73% correct answers. Most chatbots demonstrated a median Top-1 accuracy of 47% (IQR 44-53%, range from 42% to 73%), while Top-2 accuracy provided a modest improvement, resulting in a median accuracy of 67% (IQR 65-72%, range from 61% to 82%). Even with this advantage, only two chatbots, Jasper quality and ChatGPT plus 4.0, would have passed the exam. Similar results were observed when picture-based questions were excluded from the dataset. Overall, the study suggests that most current language-based chatbots have limitations in accurately solving theoretical medical board exams. In general, currently widely available chatbots fell short of achieving a passing score in a theoretical cardiology board exam. Nevertheless, a few showed promising results. Further improvements in artificial intelligence language models may lead to better performance in medical knowledge applications in the future.

Identifiants

pubmed: 39465318
pii: 3547
doi: 10.57187/s.3547
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

3547

Auteurs

Jessica Huwiler (J)

Heart Clinic Zurich, Zurich, Switzerland.
University of Zurich, Zurich, Switzerland.

Luca Oechslin (L)

Heart Clinic Zurich, Zurich, Switzerland.

Patric Biaggi (P)

Heart Clinic Zurich, Zurich, Switzerland.
University of Zurich, Zurich, Switzerland.

Felix C Tanner (FC)

University of Zurich, Zurich, Switzerland.
Swiss Society of Cardiology, Berne, Switzerland.

Christophe Alain Wyss (CA)

Heart Clinic Zurich, Zurich, Switzerland.
University of Zurich, Zurich, Switzerland.
Swiss Society of Cardiology, Berne, Switzerland.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH