How do large language models answer breast cancer quiz questions? A comparative study of GPT-3.5, GPT-4 and Google Gemini.

Breast cancer ChatGPT Google Gemini Large language models

Journal

La Radiologia medica
ISSN: 1826-6983
Titre abrégé: Radiol Med
Pays: Italy
ID NLM: 0177625

Informations de publication

Date de publication:
13 Aug 2024
Historique:
received: 16 04 2024
accepted: 01 08 2024
medline: 14 8 2024
pubmed: 14 8 2024
entrez: 14 8 2024
Statut: aheadofprint

Résumé

Applications of large language models (LLMs) in the healthcare field have shown promising results in processing and summarizing multidisciplinary information. This study evaluated the ability of three publicly available LLMs (GPT-3.5, GPT-4, and Google Gemini-then called Bard) to answer 60 multiple-choice questions (29 sourced from public databases, 31 newly formulated by experienced breast radiologists) about different aspects of breast cancer care: treatment and prognosis, diagnostic and interventional techniques, imaging interpretation, and pathology. Overall, the rate of correct answers significantly differed among LLMs (p = 0.010): the best performance was achieved by GPT-4 (95%, 57/60) followed by GPT-3.5 (90%, 54/60) and Google Gemini (80%, 48/60). Across all LLMs, no significant differences were observed in the rates of correct replies to questions sourced from public databases and newly formulated ones (p ≥ 0.593). These results highlight the potential benefits of LLMs in breast cancer care, which will need to be further refined through in-context training.

Identifiants

pubmed: 39138732
doi: 10.1007/s11547-024-01872-1
pii: 10.1007/s11547-024-01872-1
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© 2024. Italian Society of Medical Radiology.

Références

Singhal K, Azizi S, Tu T et al (2023) Large language models encode clinical knowledge. Nature 620:172–180. https://doi.org/10.1038/s41586-023-06291-2
doi: 10.1038/s41586-023-06291-2 pubmed: 37438534 pmcid: 10396962
Moor M, Banerjee O, Abad ZSH et al (2023) Foundation models for generalist medical artificial intelligence. Nature 616:259–265. https://doi.org/10.1038/s41586-023-05881-4
doi: 10.1038/s41586-023-05881-4 pubmed: 37045921
Nerella S, Bandyopadhyay S, Zhang J et al (2024) Transformers and large language models in healthcare: a review. Artif Intell Med 154:102900. https://doi.org/10.1016/j.artmed.2024.102900
doi: 10.1016/j.artmed.2024.102900 pubmed: 38878555
Clusmann J, Kolbinger FR, Muti HS et al (2023) The future landscape of large language models in medicine. Commun Med 3:141. https://doi.org/10.1038/s43856-023-00370-1
doi: 10.1038/s43856-023-00370-1 pubmed: 37816837 pmcid: 10564921
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29:1930–1940. https://doi.org/10.1038/s41591-023-02448-8
doi: 10.1038/s41591-023-02448-8 pubmed: 37460753
Sorin V, Glicksberg BS, Artsi Y et al (2024) Utilizing large language models in breast cancer management: systematic review. J Cancer Res Clin Oncol 150:140. https://doi.org/10.1007/s00432-024-05678-6
doi: 10.1007/s00432-024-05678-6 pubmed: 38504034 pmcid: 10950983
Rahsepar AA, Tavakoli N, Kim GHJ, Hassani C, Abtin F, Bedayat A (2023) How AI responds to common lung cancer questions: ChatGPT versus Google Bard. Radiology 307:e230922. https://doi.org/10.1148/radiol.230922
doi: 10.1148/radiol.230922 pubmed: 37310252
Kuşcu O, Pamuk AE, SütaySüslü N, Hosal S (2023) Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer? Front Oncol 13:1256459. https://doi.org/10.3389/fonc.2023.1256459
doi: 10.3389/fonc.2023.1256459 pubmed: 38107064 pmcid: 10722294
Shao J, Rodrigues M, Corter AL, Baxter NN (2019) Multidisciplinary care of breast cancer patients: a scoping review of multidisciplinary styles, processes, and outcomes. Curr Oncol 26:385–397. https://doi.org/10.3747/co.26.4713
doi: 10.3747/co.26.4713
Omiye JA, Gui H, Rezaei SJ, Zou J, Daneshjou R (2024) Large Language models in medicine: the potentials and pitfalls. Ann Intern Med 177:210–220. https://doi.org/10.7326/M23-2772
doi: 10.7326/M23-2772 pubmed: 38285984
Brin D, Sorin V, Vaid A et al (2023) Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments. Sci Rep 13:16492. https://doi.org/10.1038/s41598-023-43436-9
doi: 10.1038/s41598-023-43436-9 pubmed: 37779171 pmcid: 10543445
Holmes J, Liu Z, Zhang L et al (2023) Evaluating large language models on a highly-specialized topic, radiation oncology physics. Front Oncol 13:1219326. https://doi.org/10.3389/fonc.2023.1219326
doi: 10.3389/fonc.2023.1219326 pubmed: 37529688 pmcid: 10388568
Griewing S, Knitza J, Boekhoff J et al (2024) Evolution of publicly available large language models for complex decision-making in breast cancer care. Arch Gynecol Obstet 310:537–550. https://doi.org/10.1007/s00404-024-07565-4
doi: 10.1007/s00404-024-07565-4 pubmed: 38806945 pmcid: 11169005
Cozzi A, Pinker K, Hidber A et al (2024) BI-RADS category assignments by GPT-3.5, GPT-4, and Google Bard: a multilanguage study. Radiology 311:e232133. https://doi.org/10.1148/radiol.232133
doi: 10.1148/radiol.232133 pubmed: 38687216
Wu Q, Wu Q, Li H et al (2024) Evaluating large language models for automated reporting and data systems categorization: cross-sectional study. JMIR Med Informatics 12:e55799. https://doi.org/10.2196/55799
doi: 10.2196/55799

Auteurs

Giovanni Irmici (G)

Breast Radiology Department, Fondazione IRCCS Istituto Nazionale dei Tumori, Via Giacomo Venezian 1, 20133, Milano, Italy. irmici.giovanni25@gmail.com.

Andrea Cozzi (A)

Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale (EOC), Lugano, Switzerland.

Gianmarco Della Pepa (G)

Breast Radiology Department, Fondazione IRCCS Istituto Nazionale dei Tumori, Via Giacomo Venezian 1, 20133, Milano, Italy.

Claudia De Berardinis (C)

Breast Radiology Department, Fondazione IRCCS Istituto Nazionale dei Tumori, Via Giacomo Venezian 1, 20133, Milano, Italy.

Elisa D'Ascoli (E)

Breast Radiology Department, Fondazione IRCCS Istituto Nazionale dei Tumori, Via Giacomo Venezian 1, 20133, Milano, Italy.

Michaela Cellina (M)

Radiology Department, ASST Fatebenefratelli Sacco, Milano, Italy.

Maurizio Cè (M)

Postgraduation School in Radiodiagnostics, Università degli Studi di Milano, Milano, Italy.

Catherine Depretto (C)

Breast Radiology Department, Fondazione IRCCS Istituto Nazionale dei Tumori, Via Giacomo Venezian 1, 20133, Milano, Italy.

Gianfranco Scaperrotta (G)

Breast Radiology Department, Fondazione IRCCS Istituto Nazionale dei Tumori, Via Giacomo Venezian 1, 20133, Milano, Italy.

Classifications MeSH