Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study.

ChatGPT Google Bard artificial intelligence immuno-oncology large language models

Journal

The oncologist
ISSN: 1549-490X
Titre abrégé: Oncologist
Pays: England
ID NLM: 9607837

Informations de publication

Date de publication:
03 Feb 2024
Historique:
received: 10 11 2023
accepted: 11 01 2024
medline: 4 2 2024
pubmed: 4 2 2024
entrez: 3 2 2024
Statut: aheadofprint

Résumé

The capability of large language models (LLMs) to understand and generate human-readable text has prompted the investigation of their potential as educational and management tools for patients with cancer and healthcare providers. We conducted a cross-sectional study aimed at evaluating the ability of ChatGPT-4, ChatGPT-3.5, and Google Bard to answer questions related to 4 domains of immuno-oncology (Mechanisms, Indications, Toxicities, and Prognosis). We generated 60 open-ended questions (15 for each section). Questions were manually submitted to LLMs, and responses were collected on June 30, 2023. Two reviewers evaluated the answers independently. ChatGPT-4 and ChatGPT-3.5 answered all questions, whereas Google Bard answered only 53.3% (P < .0001). The number of questions with reproducible answers was higher for ChatGPT-4 (95%) and ChatGPT3.5 (88.3%) than for Google Bard (50%) (P < .0001). In terms of accuracy, the number of answers deemed fully correct were 75.4%, 58.5%, and 43.8% for ChatGPT-4, ChatGPT-3.5, and Google Bard, respectively (P = .03). Furthermore, the number of responses deemed highly relevant was 71.9%, 77.4%, and 43.8% for ChatGPT-4, ChatGPT-3.5, and Google Bard, respectively (P = .04). Regarding readability, the number of highly readable was higher for ChatGPT-4 and ChatGPT-3.5 (98.1%) and (100%) compared to Google Bard (87.5%) (P = .02). ChatGPT-4 and ChatGPT-3.5 are potentially powerful tools in immuno-oncology, whereas Google Bard demonstrated relatively poorer performance. However, the risk of inaccuracy or incompleteness in the responses was evident in all 3 LLMs, highlighting the importance of expert-driven verification of the outputs returned by these technologies.

Sections du résumé

BACKGROUND BACKGROUND
The capability of large language models (LLMs) to understand and generate human-readable text has prompted the investigation of their potential as educational and management tools for patients with cancer and healthcare providers.
MATERIALS AND METHODS METHODS
We conducted a cross-sectional study aimed at evaluating the ability of ChatGPT-4, ChatGPT-3.5, and Google Bard to answer questions related to 4 domains of immuno-oncology (Mechanisms, Indications, Toxicities, and Prognosis). We generated 60 open-ended questions (15 for each section). Questions were manually submitted to LLMs, and responses were collected on June 30, 2023. Two reviewers evaluated the answers independently.
RESULTS RESULTS
ChatGPT-4 and ChatGPT-3.5 answered all questions, whereas Google Bard answered only 53.3% (P < .0001). The number of questions with reproducible answers was higher for ChatGPT-4 (95%) and ChatGPT3.5 (88.3%) than for Google Bard (50%) (P < .0001). In terms of accuracy, the number of answers deemed fully correct were 75.4%, 58.5%, and 43.8% for ChatGPT-4, ChatGPT-3.5, and Google Bard, respectively (P = .03). Furthermore, the number of responses deemed highly relevant was 71.9%, 77.4%, and 43.8% for ChatGPT-4, ChatGPT-3.5, and Google Bard, respectively (P = .04). Regarding readability, the number of highly readable was higher for ChatGPT-4 and ChatGPT-3.5 (98.1%) and (100%) compared to Google Bard (87.5%) (P = .02).
CONCLUSION CONCLUSIONS
ChatGPT-4 and ChatGPT-3.5 are potentially powerful tools in immuno-oncology, whereas Google Bard demonstrated relatively poorer performance. However, the risk of inaccuracy or incompleteness in the responses was evident in all 3 LLMs, highlighting the importance of expert-driven verification of the outputs returned by these technologies.

Identifiants

pubmed: 38309720
pii: 7600405
doi: 10.1093/oncolo/oyae009
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : NIH HHS
Pays : United States
Organisme : NCI NIH HHS
Pays : United States

Informations de copyright

Published by Oxford University Press 2024. This work is written by (a) US Government employee(s) and is in the public domain in the US.

Auteurs

Giovanni Maria Iannantuono (GM)

Genitourinary Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.

Dara Bracken-Clarke (D)

Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.

Fatima Karzai (F)

Genitourinary Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.

Hyoyoung Choo-Wosoba (H)

Biostatistics and Data Management Section, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.

James L Gulley (JL)

Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.

Charalampos S Floudas (CS)

Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.

Classifications MeSH