Performance of Large Language Models on Medical Oncology Examination Questions.


Journal

JAMA network open
ISSN: 2574-3805
Titre abrégé: JAMA Netw Open
Pays: United States
ID NLM: 101729235

Informations de publication

Date de publication:
03 Jun 2024
Historique:
medline: 18 6 2024
pubmed: 18 6 2024
entrez: 18 6 2024
Statut: epublish

Résumé

Large language models (LLMs) recently developed an unprecedented ability to answer questions. Studies of LLMs from other fields may not generalize to medical oncology, a high-stakes clinical setting requiring rapid integration of new information. To evaluate the accuracy and safety of LLM answers on medical oncology examination questions. This cross-sectional study was conducted between May 28 and October 11, 2023. The American Society of Clinical Oncology (ASCO) Oncology Self-Assessment Series on ASCO Connection, the European Society of Medical Oncology (ESMO) Examination Trial questions, and an original set of board-style medical oncology multiple-choice questions were presented to 8 LLMs. The primary outcome was the percentage of correct answers. Medical oncologists evaluated the explanations provided by the best LLM for accuracy, classified the types of errors, and estimated the likelihood and extent of potential clinical harm. Proprietary LLM 2 correctly answered 125 of 147 questions (85.0%; 95% CI, 78.2%-90.4%; P < .001 vs random answering). Proprietary LLM 2 outperformed an earlier version, proprietary LLM 1, which correctly answered 89 of 147 questions (60.5%; 95% CI, 52.2%-68.5%; P < .001), and the best open-source LLM, Mixtral-8x7B-v0.1, which correctly answered 87 of 147 questions (59.2%; 95% CI, 50.0%-66.4%; P < .001). The explanations provided by proprietary LLM 2 contained no or minor errors for 138 of 147 questions (93.9%; 95% CI, 88.7%-97.2%). Incorrect responses were most commonly associated with errors in information retrieval, particularly with recent publications, followed by erroneous reasoning and reading comprehension. If acted upon in clinical practice, 18 of 22 incorrect answers (81.8%; 95% CI, 59.7%-94.8%) would have a medium or high likelihood of moderate to severe harm. In this cross-sectional study of the performance of LLMs on medical oncology examination questions, the best LLM answered questions with remarkable performance, although errors raised safety concerns. These results demonstrated an opportunity to develop and evaluate LLMs to improve health care clinician experiences and patient care, considering the potential impact on capabilities and safety.

Identifiants

pubmed: 38888919
pii: 2820094
doi: 10.1001/jamanetworkopen.2024.17641
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

e2417641

Auteurs

Jack B Longwell (JB)

Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.

Ian Hirsch (I)

Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
Department of Medicine, University of Toronto, Toronto, Ontario, Canada.

Fernando Binder (F)

Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
Department of Medicine, University of Toronto, Toronto, Ontario, Canada.

Galileo Arturo Gonzalez Conchas (GA)

Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.

Daniel Mau (D)

Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada.

Raymond Jang (R)

Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
Department of Medicine, University of Toronto, Toronto, Ontario, Canada.

Rahul G Krishnan (RG)

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada.
Vector Institute, Toronto, Ontario, Canada.

Robert C Grant (RC)

Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
Department of Medicine, University of Toronto, Toronto, Ontario, Canada.
Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada.
ICES, Toronto, Ontario, Canada.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH