Advantages and pitfalls in utilizing artificial intelligence for crafting medical examinations: a medical education pilot study with GPT-4.

Humans Educational Measurement / methods Pilot Projects Artificial Intelligence Education, Medical Writing

Artificial intelligence Chat GPT Medical examinations Multiple choice questions

Journal

BMC medical education

ISSN: 1472-6920

Titre abrégé: BMC Med Educ

Pays: England

ID NLM: 101088679

Informations de publication

Date de publication:
17 Oct 2023

Historique:

received: 06 07 2023

accepted: 07 10 2023

medline: 23 10 2023

pubmed: 18 10 2023

entrez: 17 10 2023

Statut: epublish

Résumé

The task of writing multiple choice question examinations for medical students is complex, timely and requires significant efforts from clinical staff and faculty. Applying artificial intelligence algorithms in this field of medical education may be advisable. During March to April 2023, we utilized GPT-4, an OpenAI application, to write a 210 multi choice questions-MCQs examination based on an existing exam template and thoroughly investigated the output by specialist physicians who were blinded to the source of the questions. Algorithm mistakes and inaccuracies, as identified by specialists were classified as stemming from age, gender or geographical insensitivities. After inputting a detailed prompt, GPT-4 produced the test rapidly and effectively. Only 1 question (0.5%) was defined as false; 15% of questions necessitated revisions. Errors in the AI-generated questions included: the use of outdated or inaccurate terminology, age-sensitive inaccuracies, gender-sensitive inaccuracies, and geographically sensitive inaccuracies. Questions that were disqualified due to flawed methodology basis included elimination-based questions and questions that did not include elements of integrating knowledge with clinical reasoning. GPT-4 can be used as an adjunctive tool in creating multi-choice question medical examinations yet rigorous inspection by specialist physicians remains pivotal.

Sections du résumé

BACKGROUND BACKGROUND

METHODS METHODS

During March to April 2023, we utilized GPT-4, an OpenAI application, to write a 210 multi choice questions-MCQs examination based on an existing exam template and thoroughly investigated the output by specialist physicians who were blinded to the source of the questions. Algorithm mistakes and inaccuracies, as identified by specialists were classified as stemming from age, gender or geographical insensitivities.

RESULTS RESULTS

After inputting a detailed prompt, GPT-4 produced the test rapidly and effectively. Only 1 question (0.5%) was defined as false; 15% of questions necessitated revisions. Errors in the AI-generated questions included: the use of outdated or inaccurate terminology, age-sensitive inaccuracies, gender-sensitive inaccuracies, and geographically sensitive inaccuracies. Questions that were disqualified due to flawed methodology basis included elimination-based questions and questions that did not include elements of integrating knowledge with clinical reasoning.

CONCLUSION CONCLUSIONS

GPT-4 can be used as an adjunctive tool in creating multi-choice question medical examinations yet rigorous inspection by specialist physicians remains pivotal.

Identifiants

DOI: 10.1186/s12909-023-04752-w PMID: 37848913 PMC: PMC10580534

pubmed: 37848913

doi: 10.1186/s12909-023-04752-w

pii: 10.1186/s12909-023-04752-w

pmc: PMC10580534

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

772

Informations de copyright

Références

Nature. 2023 Jan;613(7944):423

pubmed: 36635510

Science. 2023 Jan 27;379(6630):313

pubmed: 36701446

Acad Med. 2022 Mar 1;97(3S):S90-S97

pubmed: 34817404

JMIR Med Educ. 2023 Feb 8;9:e45312

pubmed: 36753318

CMAJ. 2010 Mar 23;182(5):524

pubmed: 20231338

Pak J Med Sci. 2023 Mar-Apr;39(2):605-607

pubmed: 36950398

PLOS Digit Health. 2023 Feb 9;2(2):e0000198

pubmed: 36812645

Advantages and pitfalls in utilizing artificial intelligence for crafting medical examinations: a medical education pilot study with GPT-4.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Klang E (K)

Portugez S (P)

Gross R (G)

Kassif Lerner R (KL)

Brenner A (B)

Gilboa M (G)

Ortal T (O)

Ron S (R)

Robinzon V (R)

Meiri H (M)

Segal G (S)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH