Leveraging Large Language Models in the delivery of post-operative dental care: a comparison between an embedded GPT model and ChatGPT.


Journal

BDJ open
ISSN: 2056-807X
Titre abrégé: BDJ Open
Pays: England
ID NLM: 101709456

Informations de publication

Date de publication:
12 Jun 2024
Historique:
received: 25 03 2024
accepted: 07 05 2024
revised: 01 05 2024
medline: 13 6 2024
pubmed: 13 6 2024
entrez: 12 6 2024
Statut: epublish

Résumé

This study underscores the transformative role of Artificial Intelligence (AI) in healthcare, particularly the promising applications of Large Language Models (LLMs) in the delivery of post-operative dental care. The aim is to evaluate the performance of an embedded GPT model and its comparison with ChatGPT-3.5 turbo. The assessment focuses on aspects like response accuracy, clarity, relevance, and up-to-date knowledge in addressing patient concerns and facilitating informed decision-making. An embedded GPT model, employing GPT-3.5-16k, was crafted via GPT-trainer to answer postoperative questions in four dental specialties including Operative Dentistry & Endodontics, Periodontics, Oral & Maxillofacial Surgery, and Prosthodontics. The generated responses were validated by thirty-six dental experts, nine from each specialty, employing a Likert scale, providing comprehensive insights into the embedded GPT model's performance and its comparison with GPT3.5 turbo. For content validation, a quantitative Content Validity Index (CVI) was used. The CVI was calculated both at the item level (I-CVI) and scale level (S-CVI/Ave). To adjust I-CVI for chance agreement, a modified kappa statistic (K*) was computed. The overall content validity of responses generated via embedded GPT model and ChatGPT was 65.62% and 61.87% respectively. Moreover, the embedded GPT model revealed a superior performance surpassing ChatGPT with an accuracy of 62.5% and clarity of 72.5%. In contrast, the responses generated via ChatGPT achieved slightly lower scores, with an accuracy of 52.5% and clarity of 67.5%. However, both models performed equally well in terms of relevance and up-to-date knowledge. In conclusion, embedded GPT model showed better results as compared to ChatGPT in providing post-operative dental care emphasizing the benefits of embedding and prompt engineering, paving the way for future advancements in healthcare applications.

Identifiants

pubmed: 38866751
doi: 10.1038/s41405-024-00226-3
pii: 10.1038/s41405-024-00226-3
doi:

Types de publication

Journal Article

Langues

eng

Pagination

48

Informations de copyright

© 2024. The Author(s).

Références

Sarkar D, Bali R, Sharma T, Sarkar D, Bali R, Sharma T. Machine learning basics. In: Practical Machine Learning with Python: A Problem-Solver’s Guide to Building Real-World Intelligent Systems. 2018. pp. 3–65. https://doi.org/10.1007/978-1-4842-3207-1 .
Panesar A. Machine learning and AI for healthcare. Springer; 2019. https://doi.org/10.1007/978-1-4842-6537-6 .
Shan T, Tay F, Gu L. Application of artificial intelligence in dentistry. J Dent Res. 2021;100:232–44. https://doi.org/10.1177/0022034520969115 .
doi: 10.1177/0022034520969115 pubmed: 33118431
Bohr A, Memarzadeh K. The rise of artificial intelligence in healthcare applications. In: Artificial Intelligence in healthcare. Elsevier; 2020. pp. 25–60. https://doi.org/10.1016/B978-0-12-818438-7.00002-2 .
Hadi MU, Al Tashi Q, Qureshi R, Shah A, Muneer A, Irfan M, et al. A Survey on Large Language Models: Applications, Challenges, Limitations, and Practical Usage. TechRxiv. 2023. https://doi.org/10.36227/techrxiv.23589741.v1 .
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29:1930–40. https://doi.org/10.1038/s41591-023-02448-8 .
doi: 10.1038/s41591-023-02448-8 pubmed: 37460753
Lahat A, Shachar E, Avidan B, Glicksberg B, Klang E. Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet? Diagnostics. 2023;13:1950 https://doi.org/10.3390/diagnostics13111950 .
doi: 10.3390/diagnostics13111950 pubmed: 37296802 pmcid: 10252924
Seth I, Cox A, Xie Y, Bulloch G, Hunter-Smith DJ, Rozen WM, et al. Evaluating Chatbot Efficacy for Answering Frequently Asked Questions in Plastic Surgery: A ChatGPT Case Study Focused on Breast Augmentation. Aesthet Surg J. 2023;43:1126–35. https://doi.org/10.1093/asj/sjad140 .
Lim ZW, Pushpanathan K, Yew SME, Lai Y, Sun C-H, Lam JSH, et al. Benchmarking large language models’ performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine. 2023;95:104770 https://doi.org/10.1016/j.ebiom.2023.104770 .
doi: 10.1016/j.ebiom.2023.104770 pubmed: 37625267 pmcid: 10470220
Dwyer T, Hoit G, Burns D, Higgins J, Chang J, Whelan D, et al. Use of an Artificial Intelligence Conversational Agent (Chatbot) for Hip Arthroscopy Patients Following Surgery. ASMAR. 2023;5:495–505. https://doi.org/10.1016/j.asmr.2023.01.020 .
doi: 10.1016/j.asmr.2023.01.020
Alsahafi YA, Alolayan AB, Alraddadi W, Alamri A, Aljadani M, Alenazi M, et al. The impact of the method of presenting instructions of postoperative care on the quality of life after simple tooth extraction. Saudi J Oral Sci 2021;8:143–9.
doi: 10.4103/sjoralsci.sjoralsci_14_21
LLM Embeddings — Explained Simply. 2024. https://pub.aimind.so/llm-embeddings-explained-simply . Accessed 8 January 2024.
Lynn MR. Determination and Quantification Of Content Validity. Nurs Res. 1986;35:382–6.
doi: 10.1097/00006199-198611000-00017 pubmed: 3640358
Drossman DA, Ruddy J. Improving patient-provider relationships to improve health care. CGH. 2020;18:1417–26. https://doi.org/10.1016/j.cgh.2019.12.007 .
doi: 10.1016/j.cgh.2019.12.007
Yang R, Tan TF, Lu W, Thirunavukarasu AJ, Ting DSW, Liu N. Large language models in health care: Development, applications, and challenges. Health Sci J. 2023;2:255–63. https://doi.org/10.1002/hcs2.61 .
doi: 10.1002/hcs2.61
Huang H, Zheng O, Wang D, Yin J, Wang Z, Ding S, et al. ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model. Int J Oral Sci. 2023;15:29 https://doi.org/10.1038/s41368-023-00239-y .
doi: 10.1038/s41368-023-00239-y pubmed: 37507396 pmcid: 10382494
Mohammad-Rahimi H, Ourang SA, Pourhoseingholi MA, Dianat O, Dummer PMH, Nosrat A. Validity and reliability of artificial intelligence chatbots as public sources of information on endodontics. Int Endod J. 2024;57:305–14. https://doi.org/10.1111/iej.14014 .
doi: 10.1111/iej.14014 pubmed: 38117284
Banerjee S, Dunn P, Conard S, Ng R. Large language modeling and classical AI methods for the future of healthcare. J Med Surg Public Health. 2023;1:100026 https://doi.org/10.1016/j.glmedi.2023.100026 .
doi: 10.1016/j.glmedi.2023.100026
Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ. 2023;9:e45312 https://doi.org/10.2196/45312 .
doi: 10.2196/45312 pubmed: 36753318 pmcid: 9947764
Umapathi LK, Pal A, Sankarasubbu M. Med-halt: Medical domain hallucination test for large language models. ArXiv. 2023. https://doi.org/10.48550/arXiv.2307.15343 .
Suárez A, Jiménez J, de Pedro ML, Andreu-Vázquez C, García VD, Sánchez MG, et al. Beyond the Scalpel: Assessing ChatGPT’s potential as an auxiliary intelligent virtual assistant in oral surgery. Computational Struct Biotechnol J. 2024;24(Dec):46–52.
doi: 10.1016/j.csbj.2023.11.058
Russe MF, Rau A, Ermer MA, Rothweiler R, Wenger S, Klöble K, et al. A content-aware chatbot based on GPT 4 provides trustworthy recommendations for Cone-Beam CT guidelines in dental imaging. Dentomaxillofacial Radiol. 2024;53(Feb):109–14.
doi: 10.1093/dmfr/twad015
Deiana G, Dettori M, Arghittu A, Azara A, Gabutti G, Castiglia P. Artificial intelligence and public health: evaluating ChatGPT responses to vaccination myths and misconceptions. Vaccines. 2023;11:1217 https://doi.org/10.3390/vaccines11071217 .
doi: 10.3390/vaccines11071217 pubmed: 37515033 pmcid: 10386180
Abu Arqub S, Al-Moghrabi D, Allareddy V, Upadhyay M, Vaid N, Yadav S. Content analysis of AI-generated (ChatGPT) responses concerning orthodontic clear aligners. Angle Orthod. 2024;94:263–72.
doi: 10.2319/071123-484.1 pubmed: 38195060 pmcid: 11050467
Rodrigues IB, Adachi JD, Beattie KA, MacDermid JC. Development and validation of a new tool to measure the facilitators, barriers and preferences to exercise in people with osteoporosis. BMC Musculoskelet Disord. 2017;18:540 https://doi.org/10.1186/s12891-017-1914-5 .
doi: 10.1186/s12891-017-1914-5 pubmed: 29258503 pmcid: 5738121
Wang J, Shi E, Yu S, Wu Z, Ma C, Dai H, et al., Prompt engineering for healthcare: Methodologies and applications. ArXiv. 2023. https://doi.org/10.48550/arXiv.2304.14670 .
Lu Q, Qiu B, Ding L, Xie L, Tao D. Error analysis prompting enables human-like translation evaluation in large language models: A case study on chatgpt. ArXiv. 2023. https://doi.org/10.48550/arXiv.2303.13809 .
Babayiğit O, Eroglu ZT, Sen DO, Yarkac FU. Potential Use of ChatGPT for Patient Information in Periodontology: A Descriptive Pilot Study. Cureus. 2023;15:e48518.
pubmed: 38073946 pmcid: 10708896
Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv Neural Inf Process. 2020;33:9459–74.
Dehghani M. Dental Severity Assessment through Few-shot Learning and SBERT Fine-tuning. ArXiv. 2024. https://arxiv.org/abs/2402.15755 .

Auteurs

Itrat Batool (I)

Section of Dentistry, Department of Surgery, Aga Khan University Hospital, Karachi, Pakistan.

Nighat Naved (N)

Section of Dentistry, Department of Surgery, Aga Khan University Hospital, Karachi, Pakistan.

Syed Murtaza Raza Kazmi (SMR)

Section of Dentistry, Department of Surgery, Aga Khan University Hospital, Karachi, Pakistan.

Fahad Umer (F)

Section of Dentistry, Department of Surgery, Aga Khan University Hospital, Karachi, Pakistan. fahad.umer@aku.edu.

Classifications MeSH