The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses.


Journal

BMC research notes
ISSN: 1756-0500
Titre abrégé: BMC Res Notes
Pays: England
ID NLM: 101462768

Informations de publication

Date de publication:
03 Sep 2024
Historique:
received: 05 04 2024
accepted: 28 08 2024
medline: 4 9 2024
pubmed: 4 9 2024
entrez: 3 9 2024
Statut: epublish

Résumé

The integration of artificial intelligence (AI) in healthcare education is inevitable. Understanding the proficiency of generative AI in different languages to answer complex questions is crucial for educational purposes. The study objective was to compare the performance ChatGPT-4 and Gemini in answering Virology multiple-choice questions (MCQs) in English and Arabic, while assessing the quality of the generated content. Both AI models' responses to 40 Virology MCQs were assessed for correctness and quality based on the CLEAR tool designed for evaluation of AI-generated content. The MCQs were classified into lower and higher cognitive categories based on the revised Bloom's taxonomy. The study design considered the METRICS checklist for the design and reporting of generative AI-based studies in healthcare. ChatGPT-4 and Gemini performed better in English compared to Arabic, with ChatGPT-4 consistently surpassing Gemini in correctness and CLEAR scores. ChatGPT-4 led Gemini with 80% vs. 62.5% correctness in English compared to 65% vs. 55% in Arabic. For both AI models, superior performance in lower cognitive domains was reported. Both ChatGPT-4 and Gemini exhibited potential in educational applications; nevertheless, their performance varied across languages highlighting the importance of continued development to ensure the effective AI integration in healthcare education globally.

Identifiants

pubmed: 39228001
doi: 10.1186/s13104-024-06920-7
pii: 10.1186/s13104-024-06920-7
doi:

Types de publication

Journal Article Comparative Study

Langues

eng

Sous-ensembles de citation

IM

Pagination

247

Informations de copyright

© 2024. The Author(s).

Références

UNESCO. World Arabic Language Day. 7. March 2024, 2024. Updated 18 December 2023. Accessed 7 March 2024, 2024. https://www.unesco.org/en/world-arabic-language-day
Alhamami M, Almelhi A. English or Arabic in Healthcare Education: perspectives of Healthcare alumni, students, and instructors. J Multidiscip Healthc. 2021;14:2537–47. https://doi.org/10.2147/jmdh.S330579 .
doi: 10.2147/jmdh.S330579 pubmed: 34552332 pmcid: 8450159
Kaliyadan F, Thalamkandathil N, Parupalli SR, Amin TT, Balaha MH, Al Bu Ali WH. English language proficiency and academic performance: a study of a medical preparatory year program in Saudi Arabia. Avicenna J Med Oct-Dec. 2015;5(4):140–4. https://doi.org/10.4103/2231-0770.165126 .
doi: 10.4103/2231-0770.165126
Alshareef M, Mobaireek O, Mohamud M, Alrajhi Z, Alhamdan A, Hamad B. Decision Makers’ Perspectives on the Language of Instruction in Medicine in Saudi Arabia: A Qualitative Study. Health Professions Education. 2018/12/01/ 2018;4(4):308–316. https://doi.org/10.1016/j.hpe.2018.03.006
Sabbour SM, Dewedar SA, Kandil SK. Language barriers in medical education and attitudes towards arabization of medicine: student and staff perspectives. East Mediterr Health J Dec. 2012;4(12):1263–71. https://doi.org/10.26719/2010.16.12.1263 .
doi: 10.26719/2010.16.12.1263
Tayem Y, AlShammari A, Albalawi N, Shareef M. Language barriers to studying medicine in English: perceptions of final-year medical students at the Arabian Gulf University. East Mediterr Health J Feb. 2020;24(2):233–8. https://doi.org/10.26719/2020.26.2.233 .
doi: 10.26719/2020.26.2.233
Sallam M. ChatGPT Utility in Healthcare Education, Research, and practice: systematic review on the promising perspectives and valid concerns. Healthc (Basel) Mar. 2023;19(6):887. https://doi.org/10.3390/healthcare11060887 .
doi: 10.3390/healthcare11060887
Hwang SI, Lim JS, Lee RW, et al. Is ChatGPT a fire of Prometheus for non-native English-speaking researchers in Academic writing? Korean J Radiol Oct. 2023;24(10):952–9. https://doi.org/10.3348/kjr.2023.0773 .
doi: 10.3348/kjr.2023.0773
Teixeira da Silva JA. Can ChatGPT rescue or assist with language barriers in healthcare communication? Patient Education and Counseling. 2023/10/01/ 2023;115:107940. doi:10.1016/j.pec.2023.107940.
Seetharaman R. Revolutionizing Medical Education: can ChatGPT boost subjective learning and expression? J Med Syst May. 2023;9(1):61. https://doi.org/10.1007/s10916-023-01957-w .
doi: 10.1007/s10916-023-01957-w
Nicholas G, Bhatia A. Lost in translation: large Language models in Non-english Content Analysis. arXiv Preprint. 2023. https://doi.org/10.48550/arXiv.2306.07377 .
doi: 10.48550/arXiv.2306.07377
Lai VD, Ngo NT, Veyseh APB, et al. Chatgpt beyond English: towards a comprehensive evaluation of large language models in multilingual learning. arXiv Preprint. 2023. https://doi.org/10.48550/arXiv.2304.05613 .
doi: 10.48550/arXiv.2304.05613
Gurevich E, El Hassan B, El Morr C. Equity within AI systems: what can health leaders expect? Healthc Manage Forum Mar. 2023;36(2):119–24. https://doi.org/10.1177/08404704221125368 .
doi: 10.1177/08404704221125368
Holstein K, Doroudi S. Equity and Artificial Intelligence in Education: will AIEd amplify or alleviate inequities in education? arXiv Preprint. 2021. https://doi.org/10.48550/arXiv.2104.12920 .
doi: 10.48550/arXiv.2104.12920
Mijwil M, Abotaleb M, Guma ALI, Dhoska K. Assigning Medical professionals: ChatGPT’s contributions to Medical Education and Health Prediction. Mesopotamian J Artif Intell Healthc. 2024;07/20:2024:76–83. https://doi.org/10.58496/MJAIH/2024/011 .
doi: 10.58496/MJAIH/2024/011
Patterns (N Y). Jan 13 2023;4(1):100676. doi:10.1016/j.patter.2022.100676.
Kocoń J, Cichecki I, Kaszyca O et al. ChatGPT: Jack of all trades, master of none. Information Fusion. 2023/11/01/ 2023;99:101861. doi:10.1016/j.inffus.2023.101861.
Sallam M. Bibliometric top ten healthcare-related ChatGPT publications in the first ChatGPT anniversary. Narra J. 2024;4(2):e917. https://doi.org/10.52225/narra.v4i2.917 .
doi: 10.52225/narra.v4i2.917
Alowais SA, Alghamdi SS, Alsuhebany N et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Medical Education. 2023/09/22 2023;23(1):689. doi:10.1186/s12909-023-04698-z.
Sallam M, Salim NA, Barakat M, Al-Tammemi AB. ChatGPT applications in medical, dental, pharmacy, and public health education: a descriptive study highlighting the advantages and limitations. Narra J Apr. 2023;3(1):e103. https://doi.org/10.52225/narra.v3i1.103 .
doi: 10.52225/narra.v3i1.103
Yilmaz Muluk S, Olcucu N. The role of Artificial Intelligence in the primary Prevention of Common Musculoskeletal diseases. Cureus. 2024/7/25 2024;16(7):e65372. https://doi.org/10.7759/cureus.65372
Oniani D, Hilsman J, Peng Y et al. Adopting and expanding ethical principles for generative artificial intelligence from military to healthcare. npj Digital Medicine. 2023/12/02 2023;6(1):225. https://doi.org/10.1038/s41746-023-00965-x
Cappellani F, Card KR, Shields CL, Pulido JS, Haller JA. Reliability and accuracy of artificial intelligence ChatGPT in providing information on ophthalmic diseases and management to patients. Eye. 2024. https://doi.org/10.1038/s41433-023-02906-0 . /01/20 2024;.
doi: 10.1038/s41433-023-02906-0 pubmed: 38245622 pmcid: 11076805
Emsley R. ChatGPT: these are not hallucinations – they’re fabrications and falsifications. Schizophrenia. 2023;9(1):52. https://doi.org/10.1038/s41537-023-00379-4 . /08/19 2023.
doi: 10.1038/s41537-023-00379-4 pubmed: 37598184 pmcid: 10439949
Kwon HJ, Chae SJ, Park JH. Educational implications of assessing learning outcomes with multiple choice questions and short essay questions. Korean J Med Educ Sep. 2023;35(3):285–90. https://doi.org/10.3946/kjme.2023.266 .
doi: 10.3946/kjme.2023.266
Singh T. Principles of assessment in medical education. Jaypee Brothers Medical; 2021.
Stringer JK, Santen SA, Lee E et al. Examining Bloom’s taxonomy in multiple choice questions: students’ Approach to questions. Med Sci Educ. 2021/08/01 2021;31(4):1311–7. https://doi.org/10.1007/s40670-021-01305-y
Bloom BS, Krathwohl DR. Taxonomy of Educational objectives: the classification of Educational Goals. Green: Longmans; 1956. p. 403.
Seaman M, BLOOM’S TAXONOMY. Its Evolution, Revision, and Use in the Field of Education. Curriculum and Teaching Dialogue. 2011 2011;13(1/2):29-131A.
Reddy S. Generative AI in healthcare: an implementation science informed translational path on application, integration and governance. Implement Sci Mar. 2024;15(1):27. https://doi.org/10.1186/s13012-024-01357-9 .
doi: 10.1186/s13012-024-01357-9
Bharatha A, Ojeh N, Rabbi A, et al. Comparing the performance of ChatGPT-4 and medical students on MCQs at varied levels of Bloom’s taxonomy. Adv Med Educ Pract. 2024;05/09:15:393–400. https://doi.org/10.2147/AMEP.S457408 .
doi: 10.2147/AMEP.S457408
Google G. 5 March 2024, 2024. Accessed 5 March 2024, 2024. https://gemini.google.com/app
OpenAI. GPT-4. 5 March 2024, 2024. Accessed 5 March 2024, 2024. https://openai.com/
Rane N, Choudhary S, Rane J. Gemini versus ChatGPT: applications, performance, architecture, capabilities, and implementation. J Appl Artif Intell. 2024;03/20(1):69–93. https://doi.org/10.48185/jaai.v5i1.1052 .
doi: 10.48185/jaai.v5i1.1052
Podder I, Pipil N, Dhabal A, Mondal S, Pienyii V, Mondal H. Evaluation of Artificial Intelligence-based chatbot responses to common dermatological queries. Jordan Med J. 2024;07/20:58:271–7. https://doi.org/10.35516/jmj.v58i2.2960 .
doi: 10.35516/jmj.v58i2.2960
Newton P, Xiromeriti M. ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review. Assessment & Evaluation in Higher Education.1–18. https://doi.org/10.1080/02602938.2023.2299059
Sallam M, Barakat M, Sallam M. A preliminary Checklist (METRICS) to standardize the design and reporting of studies on generative Artificial Intelligence-based models in Health Care Education and Practice: Development Study Involving a Literature Review. Interact J Med Res Feb. 2024;15:13:e54704. https://doi.org/10.2196/54704 .
doi: 10.2196/54704
Yilmaz Muluk S, Olcucu N. Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-3.5 and GoogleBard in identifying red flags of low back Pain. Cureus. 2024/7/01 2024;16(7):e63580. https://doi.org/10.7759/cureus.63580
Bandi A, Adapa PV, Kuchi YE. The Power of Generative AI: a review of requirements, models, input–output formats, evaluation Metrics, and challenges. Future Internet. 2023;15(8):260. https://doi.org/10.3390/fi15080260 .
doi: 10.3390/fi15080260
Sallam M, Al-Farajat A, Egger J. Envisioning the future of ChatGPT in Healthcare: insights and recommendations from a systematic identification of Influential Research and a call for Papers. Jordan Med J. 2024;02/19(1). https://doi.org/10.35516/jmj.v58i1.2285 .
Sallam M, Mousa D. Evaluating ChatGPT performance in arabic dialects: a comparative study showing defects in responding to Jordanian and Tunisian general health prompts. Mesopotamian J Artif Intell Healthc. 2024;01/10:2024:1–7. https://doi.org/10.58496/MJAIH/2024/001 .
doi: 10.58496/MJAIH/2024/001
Samaan JS, Yeo YH, Ng WH et al. ChatGPT’s ability to comprehend and answer cirrhosis related questions in Arabic. Arab Journal of Gastroenterology. 2023/08/01/ 2023;24(3):145–148. doi:10.1016/j.ajg.2023.08.001.
Sallam M, Barakat M, Sallam M. Pilot testing of a Tool to standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-based models. Cureus Nov. 2023;15(11):e49373. https://doi.org/10.7759/cureus.49373 .
doi: 10.7759/cureus.49373
Banimelhem O, Amayreh W. Is ChatGPT a Good English to Arabic Machine Translation Tool? 2023:1–6.
Sallam M, Al-Mahzoum K, Alshuaib O, et al. Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic. BMC Infect Dis. 2024;2024/08/08(1):799. https://doi.org/10.1186/s12879-024-09725-y .
doi: 10.1186/s12879-024-09725-y
Liu X, Wu J, Shao A, et al. Uncovering Language disparity of ChatGPT on Retinal Vascular Disease Classification: cross-sectional study. J Med Internet Res Jan. 2024;22:26:e51926. https://doi.org/10.2196/51926 .
doi: 10.2196/51926
Rosoł M, Gąsior JS, Łaba J, Korzeniewski K, Młyńczak M. Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination. Scientific Reports. 2023/11/22 2023;13(1):20512. https://doi.org/10.1038/s41598-023-46995-z
Siebielec J, Ordak M, Oskroba A, Dworakowska A, Bujalska-Zadrozny M. Assessment Study of ChatGPT-3.5’s performance on the final Polish Medical examination: Accuracy in answering 980 questions. Healthcare. 2024;12(16):1637. https://doi.org/10.3390/healthcare12161637 .
doi: 10.3390/healthcare12161637 pubmed: 39201195 pmcid: 11353589
Guillen-Grima F, Guillen-Aguinaga S, Guillen-Aguinaga L, et al. Evaluating the efficacy of ChatGPT in navigating the Spanish Medical Residency Entrance Examination (MIR): Promising Horizons for AI in Clinical Medicine. Clin Pract Nov. 2023;20(6):1460–87. https://doi.org/10.3390/clinpract13060130 .
doi: 10.3390/clinpract13060130
Jonathan MS, Andrew DO, Kamal RM, et al. Critical thinking in healthcare and education. BMJ. 2017;357:j2234. https://doi.org/10.1136/bmj.j2234 .
doi: 10.1136/bmj.j2234
Michel-Villarreal R, Vilalta-Perdomo E, Salinas-Navarro DE, Thierry-Aguilera R, Gerardou FS. Challenges and opportunities of Generative AI for higher education as explained by ChatGPT. Educ Sci. 2023;13(9):856. https://doi.org/10.3390/educsci13090856 .
doi: 10.3390/educsci13090856
Sallam M, Al-Salahat K. Below average ChatGPT performance in medical microbiology exam compared to university students. Front Educ. 2023;8:1333415. https://doi.org/10.3389/feduc.2023.1333415 .
doi: 10.3389/feduc.2023.1333415
Egger J, Sallam M, Luijten G et al. Medical ChatGPT – a systematic Meta-review. medRxiv. 2024:2024.04.02.24304716. https://doi.org/10.1101/2024.04.02.24304716

Auteurs

Malik Sallam (M)

Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, 11942, Jordan. malik.sallam@ju.edu.jo.
Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Queen Rania Al-Abdullah Street-Aljubeiha, P.O. Box: 13046, Amman, 11942, Jordan. malik.sallam@ju.edu.jo.

Kholoud Al-Mahzoum (K)

School of Medicine, The University of Jordan, Amman, 11942, Jordan.

Rawan Ahmad Almutawaa (RA)

School of Medicine, The University of Jordan, Amman, 11942, Jordan.

Jasmen Ahmad Alhashash (JA)

School of Medicine, The University of Jordan, Amman, 11942, Jordan.

Retaj Abdullah Dashti (RA)

School of Medicine, The University of Jordan, Amman, 11942, Jordan.

Danah Raed AlSafy (DR)

School of Medicine, The University of Jordan, Amman, 11942, Jordan.

Reem Abdullah Almutairi (RA)

School of Medicine, The University of Jordan, Amman, 11942, Jordan.

Muna Barakat (M)

Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, 11931, Jordan.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH