Experimental assessment of the performance of artificial intelligence in solving multiple-choice board exams in cardiology.

Humans Cardiology / education Artificial Intelligence Educational Measurement / methods Switzerland Clinical Competence

Journal

Swiss medical weekly

ISSN: 1424-3997

Titre abrégé: Swiss Med Wkly

Pays: Switzerland

ID NLM: 100970884

Informations de publication

Date de publication:
02 Oct 2024

Historique:

medline: 28 10 2024

pubmed: 28 10 2024

entrez: 28 10 2024

Statut: epublish

Résumé

The aim of the present study was to evaluate the performance of various artificial intelligence (AI)-powered chatbots (commercially available in Switzerland up to June 2023) in solving a theoretical cardiology board exam and to compare their accuracy with that of human cardiology fellows. For the study, a set of 88 multiple-choice cardiology exam questions was used. The participating cardiology fellows and selected chatbots were presented with these questions. The evaluation metrics included Top-1 and Top-2 accuracy, assessing the ability of chatbots and fellows to select the correct answer. Among the cardiology fellows, all 36 participants successfully passed the exam with a median accuracy of 98% (IQR 91-99%, range from 78% to 100%). However, the performance of the chatbots varied. Only one chatbot, Jasper quality, achieved the minimum pass rate of 73% correct answers. Most chatbots demonstrated a median Top-1 accuracy of 47% (IQR 44-53%, range from 42% to 73%), while Top-2 accuracy provided a modest improvement, resulting in a median accuracy of 67% (IQR 65-72%, range from 61% to 82%). Even with this advantage, only two chatbots, Jasper quality and ChatGPT plus 4.0, would have passed the exam. Similar results were observed when picture-based questions were excluded from the dataset. Overall, the study suggests that most current language-based chatbots have limitations in accurately solving theoretical medical board exams. In general, currently widely available chatbots fell short of achieving a passing score in a theoretical cardiology board exam. Nevertheless, a few showed promising results. Further improvements in artificial intelligence language models may lead to better performance in medical knowledge applications in the future.

Identifiants

DOI: 10.57187/s.3547 PMID: 39465318

pubmed: 39465318

pii: 3547

doi: 10.57187/s.3547

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

3547

Experimental assessment of the performance of artificial intelligence in solving multiple-choice board exams in cardiology.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Auteurs

Jessica Huwiler (J)

Luca Oechslin (L)

Patric Biaggi (P)

Felix C Tanner (FC)

Christophe Alain Wyss (CA)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH