Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial.


Journal

JAMA network open
ISSN: 2574-3805
Titre abrégé: JAMA Netw Open
Pays: United States
ID NLM: 101729235

Informations de publication

Date de publication:
01 Oct 2024
Historique:
medline: 28 10 2024
pubmed: 28 10 2024
entrez: 28 10 2024
Statut: epublish

Résumé

Large language models (LLMs) have shown promise in their performance on both multiple-choice and open-ended medical reasoning examinations, but it remains unknown whether the use of such tools improves physician diagnostic reasoning. To assess the effect of an LLM on physicians' diagnostic reasoning compared with conventional resources. A single-blind randomized clinical trial was conducted from November 29 to December 29, 2023. Using remote video conferencing and in-person participation across multiple academic medical institutions, physicians with training in family medicine, internal medicine, or emergency medicine were recruited. Participants were randomized to either access the LLM in addition to conventional diagnostic resources or conventional resources only, stratified by career stage. Participants were allocated 60 minutes to review up to 6 clinical vignettes. The primary outcome was performance on a standardized rubric of diagnostic performance based on differential diagnosis accuracy, appropriateness of supporting and opposing factors, and next diagnostic evaluation steps, validated and graded via blinded expert consensus. Secondary outcomes included time spent per case (in seconds) and final diagnosis accuracy. All analyses followed the intention-to-treat principle. A secondary exploratory analysis evaluated the standalone performance of the LLM by comparing the primary outcomes between the LLM alone group and the conventional resource group. Fifty physicians (26 attendings, 24 residents; median years in practice, 3 [IQR, 2-8]) participated virtually as well as at 1 in-person site. The median diagnostic reasoning score per case was 76% (IQR, 66%-87%) for the LLM group and 74% (IQR, 63%-84%) for the conventional resources-only group, with an adjusted difference of 2 percentage points (95% CI, -4 to 8 percentage points; P = .60). The median time spent per case for the LLM group was 519 (IQR, 371-668) seconds, compared with 565 (IQR, 456-788) seconds for the conventional resources group, with a time difference of -82 (95% CI, -195 to 31; P = .20) seconds. The LLM alone scored 16 percentage points (95% CI, 2-30 percentage points; P = .03) higher than the conventional resources group. In this trial, the availability of an LLM to physicians as a diagnostic aid did not significantly improve clinical reasoning compared with conventional resources. The LLM alone demonstrated higher performance than both physician groups, indicating the need for technology and workforce development to realize the potential of physician-artificial intelligence collaboration in clinical practice. ClinicalTrials.gov Identifier: NCT06157944.

Identifiants

pubmed: 39466245
pii: 2825395
doi: 10.1001/jamanetworkopen.2024.40969
doi:

Banques de données

ClinicalTrials.gov
['NCT06157944']

Types de publication

Journal Article Randomized Controlled Trial

Langues

eng

Sous-ensembles de citation

IM

Pagination

e2440969

Auteurs

Ethan Goh (E)

Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California.
Stanford Clinical Excellence Research Center, Stanford University, Stanford, California.

Robert Gallo (R)

Center for Innovation to Implementation, VA Palo Alto Health Care System, Palo Alto, California.

Jason Hom (J)

Department of Hospital Medicine, Stanford University School of Medicine, Stanford, California.

Eric Strong (E)

Department of Hospital Medicine, Stanford University School of Medicine, Stanford, California.

Yingjie Weng (Y)

Quantitative Sciences Unit, Stanford University School of Medicine, Stanford, California.

Hannah Kerman (H)

Department of Hospital Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts.
Department of Hospital Medicine, Harvard Medical School, Boston, Massachusetts.

Joséphine A Cool (JA)

Department of Hospital Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts.
Department of Hospital Medicine, Harvard Medical School, Boston, Massachusetts.

Zahir Kanjee (Z)

Department of Hospital Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts.
Department of Hospital Medicine, Harvard Medical School, Boston, Massachusetts.

Andrew S Parsons (AS)

Department of Hospital Medicine, School of Medicine, University of Virginia, Charlottesville.

Neera Ahuja (N)

Department of Hospital Medicine, Stanford University School of Medicine, Stanford, California.

Eric Horvitz (E)

Microsoft Corp, Redmond, Washington.
Stanford Institute for Human-Centered Artificial Intelligence, Stanford, California.

Daniel Yang (D)

Department of Hospital Medicine, Kaiser Permanente, Oakland, California.

Arnold Milstein (A)

Stanford Clinical Excellence Research Center, Stanford University, Stanford, California.

Andrew P J Olson (APJ)

Department of Hospital Medicine, University of Minnesota Medical School, Minneapolis.

Adam Rodman (A)

Department of Hospital Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts.
Department of Hospital Medicine, Harvard Medical School, Boston, Massachusetts.

Jonathan H Chen (JH)

Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California.
Stanford Clinical Excellence Research Center, Stanford University, Stanford, California.
Division of Hospital Medicine, Stanford University, Stanford, California.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH