Investigating the impact of innovative AI chatbot on post-pandemic medical education and clinical assistance: a comprehensive analysis.
ChatGPT
artificial intelligence
junior doctor
large language model
surgical education
Journal
ANZ journal of surgery
ISSN: 1445-2197
Titre abrégé: ANZ J Surg
Pays: Australia
ID NLM: 101086634
Informations de publication
Date de publication:
21 Aug 2023
21 Aug 2023
Historique:
revised:
04
08
2023
received:
24
03
2023
accepted:
07
08
2023
medline:
21
8
2023
pubmed:
21
8
2023
entrez:
21
8
2023
Statut:
aheadofprint
Résumé
The COVID-19 pandemic has significantly disrupted clinical experience and exposure of medical students and junior doctors. Artificial Intelligence (AI) integration in medical education has the potential to enhance learning and improve patient care. This study aimed to evaluate the effectiveness of three popular large language models (LLMs) in serving as clinical decision-making support tools for junior doctors. A series of increasingly complex clinical scenarios were presented to ChatGPT, Google's Bard and Bing's AI. Their responses were evaluated against standard guidelines, and for reliability by the Flesch Reading Ease Score, Flesch-Kincaid Grade Level, the Coleman-Liau Index, and the modified DISCERN score for assessing suitability. Lastly, the LLMs outputs were assessed by using the Likert scale for accuracy, informativeness, and accessibility by three experienced specialists. In terms of readability and reliability, ChatGPT stood out among the three LLMs, recording the highest scores in Flesch Reading Ease (31.2 ± 3.5), Flesch-Kincaid Grade Level (13.5 ± 0.7), Coleman-Lau Index (13) and DISCERN (62 ± 4.4). These results suggest statistically significant superior comprehensibility and alignment with clinical guidelines in the medical advice given by ChatGPT. Bard followed closely behind, with BingAI trailing in all categories. The only non-significant statistical differences (P > 0.05) were found between ChatGPT and Bard's readability indices, and between the Flesch Reading Ease scores of ChatGPT/Bard and BingAI. This study demonstrates the potential utility of LLMs in fostering self-directed and personalized learning, as well as bolstering clinical decision-making support for junior doctors. However further development is needed for its integration into education.
Sections du résumé
BACKGROUND
BACKGROUND
The COVID-19 pandemic has significantly disrupted clinical experience and exposure of medical students and junior doctors. Artificial Intelligence (AI) integration in medical education has the potential to enhance learning and improve patient care. This study aimed to evaluate the effectiveness of three popular large language models (LLMs) in serving as clinical decision-making support tools for junior doctors.
METHODS
METHODS
A series of increasingly complex clinical scenarios were presented to ChatGPT, Google's Bard and Bing's AI. Their responses were evaluated against standard guidelines, and for reliability by the Flesch Reading Ease Score, Flesch-Kincaid Grade Level, the Coleman-Liau Index, and the modified DISCERN score for assessing suitability. Lastly, the LLMs outputs were assessed by using the Likert scale for accuracy, informativeness, and accessibility by three experienced specialists.
RESULTS
RESULTS
In terms of readability and reliability, ChatGPT stood out among the three LLMs, recording the highest scores in Flesch Reading Ease (31.2 ± 3.5), Flesch-Kincaid Grade Level (13.5 ± 0.7), Coleman-Lau Index (13) and DISCERN (62 ± 4.4). These results suggest statistically significant superior comprehensibility and alignment with clinical guidelines in the medical advice given by ChatGPT. Bard followed closely behind, with BingAI trailing in all categories. The only non-significant statistical differences (P > 0.05) were found between ChatGPT and Bard's readability indices, and between the Flesch Reading Ease scores of ChatGPT/Bard and BingAI.
CONCLUSION
CONCLUSIONS
This study demonstrates the potential utility of LLMs in fostering self-directed and personalized learning, as well as bolstering clinical decision-making support for junior doctors. However further development is needed for its integration into education.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© 2023 The Authors. ANZ Journal of Surgery published by John Wiley & Sons Australia, Ltd on behalf of Royal Australasian College of Surgeons.
Références
Lund BD, Wang T. Chatting about ChatGPT: how may AI and GPT impact academia and libraries? Library Hi Tech News 2023; 40: 26-29.
Seth I, Kenney PS, Bulloch G, Hunter-Smith DJ, Thomsen JB, Rozen WM. Artificial or augmented authorship? A conversation with a chatbot on base of thumb arthritis. Plast. Reconstr. Surg. Glob. Open 2023; 11: e4999.
Gilson A, Safranek CW, Huang T et al. How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med. Educ. 2023; 9: e45312.
Seth I, Bulloch G, Rozen WM. Applications of artificial intelligence and large language models to plastic surgery research. Aesthet. Surg. J. 2023: sjad210.
Kung TH, Cheatham M, Medenilla A et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digital Health 2023; 2: e0000198.
Susnjak T. ChatGPT: The end of online exam integrity? arXiv preprint arXiv:221209292; 2022.
TMS Collaborative. The perceived impact of the COVID-19 pandemic on medical student education and training-an international survey. BMC Med. Educ. 2021; 21: 1-8.
Seifman MA, Fuzzard SK, To H, Nestel D. COVID-19 impact on junior doctor education and training: a scoping review. Postgrad. Med. J. 2022; 98: 466-476.
Dimitriu MC, Pantea-Stoian A, Smaranda AC et al. Burnout syndrome in Romanian medical residents in time of the COVID-19 pandemic. Med. Hypotheses 2020; 144: 109972.
Kannampallil TG, Goss CW, Evanoff BA, Strickland JR, McAlister RP, Duncan J. Exposure to COVID-19 patients increases physician trainee stress and burnout. PloS One 2020; 15: e0237301.
Xie Y, Seth I, Hunter-Smith DJ, Rozen WM, Ross R, Lee M. Aesthetic surgery advice and counseling from artificial intelligence: a rhinoplasty consultation with ChatGPT. Aesthetic Plast. Surg. 2023; 1-9.
Seth I, Cox A, Xie Y et al. Evaluating chatbot efficacy for answering frequently asked questions in plastic surgery: a ChatGPT case study focused on breast augmentation. Aesthet. Surg. J. 2023: sjad140.
Seth I, Lim B, Xie Y, Hunter-Smith DJ, Rozen WM. Exploring the role of artificial intelligence chatbot on the management of scaphoid fractures. J. Hand Surg. (Eur. Vol.) 2023; 48: 814-818.
Xie Y, Seth I, Rozen WM, Hunter-Smith DJ. Evaluation of the artificial intelligence chatbot on breast reconstruction and its efficacy in surgical research: a case study. Aesthetic Plast. Surg. 2023: 1-10.
Simon R, Snow R, Wakeman S. Understanding why patients with substance use disorders leave the hospital against medical advice: a qualitative study. Subst. Abus. 2020; 41: 519-525.
Baidoo-Anu D, Owusu Ansah L. Education in the era of generative artificial intelligence (AI): understanding the potential benefits of ChatGPT in promoting teaching and learning. Available at SSRN 4337484. 2023.
Haleem A, Javaid M, Singh RP. An era of ChatGPT as a significant futuristic support tool: a study on features, abilities, and challenges. BenchCouncil Transactions on Benchmarks, Standards and Evaluations; 2023, 100089.
Zhou N, Zhang CT, Lv HY et al. Concordance study between IBM Watson for oncology and clinical practice for patients with cancer in China. Oncologist 2019; 24: 812-819.
Yeo M, Kok HK, Kutaiba N et al. Artificial intelligence in clinical decision support and outcome prediction: applications in stroke. J. Med. Imaging Radiat. Oncol. 2021; 65: 518-528.
Titano JJ, Badgeley M, Schefflein J et al. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat. Med. 2018; 24: 1337-1341.