[ChatGPT and the German board examination for ophthalmology: an evaluation].
ChatGPT und die deutsche Facharztprüfung für Augenheilkunde: eine Evaluierung.
Artificial intelligence
Large Language Model
Medicine
Open questions
Subspeciality
Journal
Die Ophthalmologie
ISSN: 2731-7218
Titre abrégé: Ophthalmologie
Pays: Germany
ID NLM: 9918402288106676
Informations de publication
Date de publication:
27 May 2024
27 May 2024
Historique:
received:
23
12
2023
accepted:
18
04
2024
revised:
18
04
2024
medline:
27
5
2024
pubmed:
27
5
2024
entrez:
27
5
2024
Statut:
aheadofprint
Résumé
In recent years artificial intelligence (AI), as a new segment of computer science, has also become increasingly more important in medicine. The aim of this project was to investigate whether the current version of ChatGPT (ChatGPT 4.0) is able to answer open questions that could be asked in the context of a German board examination in ophthalmology. After excluding image-based questions, 10 questions from 15 different chapters/topics were selected from the textbook 1000 questions in ophthalmology (1000 Fragen Augenheilkunde 2nd edition, 2014). ChatGPT was instructed by means of a so-called prompt to assume the role of a board certified ophthalmologist and to concentrate on the essentials when answering. A human expert with considerable expertise in the respective topic, evaluated the answers regarding their correctness, relevance and internal coherence. Additionally, the overall performance was rated by school grades and assessed whether the answers would have been sufficient to pass the ophthalmology board examination. The ChatGPT would have passed the board examination in 12 out of 15 topics. The overall performance, however, was limited with only 53.3% completely correct answers. While the correctness of the results in the different topics was highly variable (uveitis and lens/cataract 100%; optics and refraction 20%), the answers always had a high thematic fit (70%) and internal coherence (71%). The fact that ChatGPT 4.0 would have passed the specialist examination in 12 out of 15 topics is remarkable considering the fact that this AI was not specifically trained for medical questions; however, there is a considerable performance variability between the topics, with some serious shortcomings that currently rule out its safe use in clinical practice. FRAGESTELLUNG: In den letzten Jahren nimmt die künstliche Intelligenz (KI) als neues Segment der Informatik auch in der Medizin eine immer größere Bedeutung ein. Ziel dieses Projekts war es zu untersuchen, ob die aktuelle Version von ChatGPT (ChatGPT 4.0) in der Lage ist, offene Fragen zu beantworten, die im Rahmen einer deutschen Facharztprüfung in der Augenheilkunde gestellt werden könnten. Aus dem Lehrbuch „1000 Fragen Augenheilkunde“ (2. Auflage, 2014) wurden nach Ausschluss bildbasierter Fragen jeweils 10 Fragen aus 15 verschiedenen Kapiteln/Themenschwerpunkten ausgewählt. ChatGPT wurde mittels eines sog. Prompt instruiert, die Rolle eines Facharztes für Augenheilkunde einzunehmen und sich im Umfang der Antworten auf das Wesentliche zu konzentrieren. Die Bewertung eines Themengebietes erfolgte durch einen in der Subspezialität langjährig erfahrenen Ophthalmologen, welcher die Antworten hinsichtlich Richtigkeit, Themenrelevanz und innerer Kohärenz beurteilte und die Gesamtleistung mit einer Schulnote bewertete. ChatGPT hätte die Facharztprüfung in 12 von 15 Themengebieten bestanden. Allerdings war die Gesamtleistung auf nur 53,3 % vollständig korrekte Antworten beschränkt. Während die Korrektheit der Ergebnisse in den unterschiedlichen Themengebieten sehr variabel war („Uveitis“ und „Linse/Katarakt“ 100 %; „Optik und Refraktion“ 20 %), hatten die Antworten stets eine hohe thematische Passgenauigkeit (70 %) und innere Kohärenz (71 %). Die Tatsache, dass ChatGPT 4.0 in 12 von 15 Themengebieten die Facharztprüfung bestanden hätte, ist vor dem Hintergrund bemerkenswert, dass diese KI nicht spezifisch für medizinische Fragestellungen trainiert wurde. Allerdings offenbart sich eine erhebliche Leistungsvarianz zwischen den Themengebieten mit zum Teil gravierenden Mängeln, die einen sicheren Einsatz in der klinischen Praxis derzeit ausschließt.
Autres résumés
Type: Publisher
(ger)
FRAGESTELLUNG: In den letzten Jahren nimmt die künstliche Intelligenz (KI) als neues Segment der Informatik auch in der Medizin eine immer größere Bedeutung ein. Ziel dieses Projekts war es zu untersuchen, ob die aktuelle Version von ChatGPT (ChatGPT 4.0) in der Lage ist, offene Fragen zu beantworten, die im Rahmen einer deutschen Facharztprüfung in der Augenheilkunde gestellt werden könnten.
Identifiants
pubmed: 38801461
doi: 10.1007/s00347-024-02046-0
pii: 10.1007/s00347-024-02046-0
doi:
Types de publication
English Abstract
Journal Article
Langues
ger
Sous-ensembles de citation
IM
Informations de copyright
© 2024. The Author(s), under exclusive licence to Springer Medizin Verlag GmbH, ein Teil von Springer Nature.
Références
Briganti G, Le Moine O (2020) Artificial intelligence in medicine: today and tomorrow. Front Med 7:27
doi: 10.3389/fmed.2020.00027
Bini SA (2018) Artificial intelligence, machine learning, deep learning, and cognitive computing: what do these terms mean and how will they impact health care? J Arthroplasty 33(8):2358–2361
doi: 10.1016/j.arth.2018.02.067
pubmed: 29656964
Van Dis EA, Bollen J, Zuidema W, van Rooij R, Bockting CL (2023) ChatGPT: five priorities for research. Nature 614(7947):224–226
doi: 10.1038/d41586-023-00288-7
pubmed: 36737653
Tan TF, Thirunavukarasu AJ, Campbell JP, Keane PA, Pasquale LR, Abramoff MD, u. a. Generative Artificial Intelligence through ChatGPT and Other Large Language Models in Ophthalmology: Clinical Applications and Challenges. Ophthalmol Sci. 2023;3(4):100394.
Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digit Health. 2023;5(3):e107–8.
Ali MJ, Singh S (2023) ChatGPT and scientific abstract writing: pitfalls and caution. Graefes Arch Clin Exp Ophthalmol: 1–2
Singh S, Djalilian A, Ali MJ. ChatGPT and Ophthalmology: Exploring Its Potential with Discharge Summaries and Operative Notes. Semin Ophthalmol. 4. Juli 2023;38(5):503–7.
Potapenko I, Boberg-Ans LC, Stormly Hansen M, Klefter ON, van Dijk EHC, Subhi Y (2023) Artificial intelligence-based chatbot patient information on common retinal diseases using ChatGPT. Acta Ophthalmol (Copenh). 1. November 101(7):829–831
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C et al (2023) Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. Plos Digit Heal 2(2):e198
doi: 10.1371/journal.pdig.0000198
Antaki F, Touma S, Milad D, El-Khoury J, Duval R (2023) Evaluating the performance of chatgpt in ophthalmology: An analysis of its successes and shortcomings. Ophthalmol. Sci 100324:
Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA et al (2023) How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. Jmir Med Educ 9(1):e45312
doi: 10.2196/45312
pubmed: 36753318
pmcid: 9947764
Jung LB, Gudera JA, Wiegand TL, Allmendinger S, Dimitriadis K, Koerte IK (2023) ChatGPT passes German state examination in medicine with picture questions omitted. Dtsch Ärztebl Int 120(373):21–22
Takagi S, Watari T, Erabi A, Sakaguchi K. Performance of GPT‑3.5 and GPT‑4 on the Japanese Medical Licensing Examination: comparison study. JMIR Med Educ. 2023;9(1):e48002.
Mihalache A, Popovic MM, Muni RH (2023) Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol
Mihalache A, Huang RS, Popovic MM, Muni RH (2023) Performance of an upgraded artificial intelligence chatbot for ophthalmic knowledge assessment. JAMA Ophthalmol
Panthier C, Gatinel D (2023) Success of ChatGPT, an AI language model, in taking the French language version of the European Board of Ophthalmology examination: A novel approach to medical knowledge assessment. J Fr Ophtalmol 46(7):706–711
doi: 10.1016/j.jfo.2023.05.006
pubmed: 37537126
Lin JC, Younessi DN, Kurapati SS, Tang OY, Scott IU. Comparison of GPT‑3.5, GPT‑4, and human user performance on a practice ophthalmology written examination. Eye [Internet]. 8. Mai 2023; Verfügbar unter: https://doi.org/10.1038/s41433-023-02564-2
Raimondi R, Tzoumas N, Salisbury T, Di Simplicio S, Romano MR (2023) Comparative analysis of large language models in the Royal College of Ophthalmologists fellowship exams. Eye: 1–4
Kampik A, Grehn F, Facharztprüfung Augenheilkunde ME (2014) 1000 kommentierte Prüfungsfragen. Thieme
Open AI (2024) Prompt engineering (guides) [Internet]. [cité 11. avr (Disponible sur: https://platform.openai.com/docs/guides/prompt-engineering )
Harris PA, Taylor R, Minor BL, Elliott V, Fernandez M, O’Neal L et al (2019) The REDCap consortium: building an international community of software platform partners. J Biomed Inform 95:103208
doi: 10.1016/j.jbi.2019.103208
pubmed: 31078660
pmcid: 7254481
Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG (2009) Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform 42(2):377–381
doi: 10.1016/j.jbi.2008.08.010
pubmed: 18929686
Dossantos J, An J, Javan R (2023) Eyes on AI: ChatGPT’s Transformative Potential Impact on Ophthalmology. Cureus 15(6)
Lai VD, Ngo NT, Veyseh APB, Man H, Dernoncourt F, Bui T et al (2023) Chatgpt beyond english: Towards a comprehensive evaluation of large language models in multilingual learning. ArXiv Prepr. ArXiv, Bd. 230405613
ChatGPT Is Cutting Non-English Languages Out of the AI Revolution. [zitiert 18. November 2023]; Verfügbar unter: https://www.wired.com/story/chatgpt-non-english-languages-ai-revolution/ ,
Bang Y, Cahyawijaya S, Lee N, Dai W, Su D, Wilie B et al (2023) A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. ArXiv Prepr. ArXiv, Bd. 230204023
Martinho A, Kroesen M, Chorus C (2021) A healthy debate: Exploring the views of medical doctors on the ethics of artificial intelligence. Artif Intell Med 121:102190
doi: 10.1016/j.artmed.2021.102190
pubmed: 34763805
Beutel G, Geerits E, Kielstein JT (2023) Artificial hallucination: GPT on LSD? Crit Care 27(1):148
doi: 10.1186/s13054-023-04425-6
pubmed: 37072798
pmcid: 10114308
Neues zur Geschichte des Begriffes Pannus. In: Archiv für Geschichte der Medizin [Internet]. Franz Steiner Verlag; 1927. S. 240–52. Verfügbar unter: https://www.jstor.org/stable/20773407
Schmidt-Rimpler H Augenheilkunde und. Opthalmoskopie (In: Werdens Sammlung kurzer medizinischer Lehrbücher. 2. Braunschweig: von Friedrich Werden)
Hirschberg J (1871) Professor A. von Graefe’s klinische Vorträge über Augenheilkunde. In, 1. Aufl. August Hirschwald, Berlin:
Stages of Trachoma. In: Trachoma Manual and Atlas [Internet]. Public Health Service Publication No.541; 1960. Verfügbar unter: https://books.google.de/books?id=KhKedH_sC2UC&pg=PA3&lpg=PA3&dq=%22MacCallan%27s+classification+of+trachoma+is+in+general+use+all+over+the+world%22&source=bl&ots=MjVgZHx7rn&sig=ACfU3U2vL3egFX-Q9Y_Q5kBtkG5xtxjl4A&hl=de&sa=X&ved=2ahUKEwjl5J3Ets6CAxVVg_0HHR
C. Stades, Milton Wyman, Michael H. Boeve, Willy Neumann, Bernhard Spiess. 10 Cornea and Sclera. In: Ophthalmology for the Veterinary Practitioner. 2. Schlütersche; 2007. S. 272.
Nash Squared Digital Leadership Report 2023; Website: https://www.nashsquared.com/2023-digital-leadership-report .
Srivastava R (2023) Applications of Artificial Intelligence in Medicine. Explor Res Hypothesis Med 000:0–0
doi: 10.14218/ERHM.2023.00048
Li J, Dada A, Puladi B, Kleesiek J, Egger J (2024) ChatGPT in healthcare: a taxonomy and systematic review. Comput Methods Programs Biomed 108013:
Finger RP (2020) Künstliche Intelligenz in der Augenheilkunde. Ophthalmol 117(10):963–964
Hswen Y, Voelker R (2023) New AI Tools Must Have Health Equity in Their DNA. JAMA
Voelker R (2023) The Promise and Pitfalls of AI in the Complex World of Diagnosis, Treatment, and Disease Management. JAMA
Tan TF, Thirunavukarasu AJ, Jin L, Lim J, Poh S, Teo ZL et al (2023) Artificial intelligence and digital health in global eye health: opportunities and challenges. Lancet Glob Health 11(9):e1432–43
doi: 10.1016/S2214-109X(23)00323-6
pubmed: 37591589
Alexandrou M (2024) Interventional Cardiologists’ Perspectives and Knowledge Towards Artificial Intelligence. In SCAI
van der Zander QE, van der Ende-van Loon MC, Janssen JM, Winkens B, van der Sommen F, Masclee AA et al (2022) Artificial intelligence in (gastrointestinal) healthcare: patients’ and physicians’ perspectives. Sci Rep 12(1):16779
doi: 10.1038/s41598-022-20958-2
pubmed: 36202957
pmcid: 9537305
Holzner D, Apfelbacher T, Rödle W, Schüttler C, Prokosch HU, Mikolajczyk RT et al (2022) Attitudes and Acceptance Towards Artificial Intelligence in. Medical, Care. In, S 68–72
Pedro AR, Dias MB, Laranjo L, Cunha AS, Cordeiro JV (2023) Artificial intelligence in medicine: A comprehensive survey of medical doctor’s perspectives in Portugal. PLoS ONE 18(9):e290613
doi: 10.1371/journal.pone.0290613
pubmed: 37676884
pmcid: 10484446
Chen M, Zhang B, Cai Z, Seery S, Gonzalez MJ, Ali NM et al (2022) Acceptance of clinical artificial intelligence among physicians and medical students: a systematic review with cross-sectional survey. Front Med 9:990604
doi: 10.3389/fmed.2022.990604