Assessing the Role of the Generative Pretrained Transformer (GPT) in Alzheimer's Disease Management: Comparative Study of Neurologist- and Artificial Intelligence-Generated Responses.

AI Alzheimer's disease ChatGPT GPT Generative Pretrained Transformer LLM artificial intelligence large language model patient information

Journal

Journal of medical Internet research
ISSN: 1438-8871
Titre abrégé: J Med Internet Res
Pays: Canada
ID NLM: 100959882

Informations de publication

Date de publication:
31 Oct 2024
Historique:
received: 23 07 2023
accepted: 25 09 2024
revised: 06 10 2023
medline: 1 11 2024
pubmed: 1 11 2024
entrez: 31 10 2024
Statut: epublish

Résumé

Alzheimer's disease (AD) is a progressive neurodegenerative disorder posing challenges to patients, caregivers, and society. Accessible and accurate information is crucial for effective AD management. This study aimed to evaluate the accuracy, comprehensibility, clarity, and usefulness of the Generative Pretrained Transformer's (GPT) answers concerning the management and caregiving of patients with AD. In total, 14 questions related to the prevention, treatment, and care of AD were identified and posed to GPT-3.5 and GPT-4 in Chinese and English, respectively, and 4 respondent neurologists were asked to answer them. We generated 8 sets of responses (total 112) and randomly coded them in answer sheets. Next, 5 evaluator neurologists and 5 family members of patients were asked to rate the 112 responses using separate 5-point Likert scales. We evaluated the quality of the responses using a set of 8 questions rated on a 5-point Likert scale. To gauge comprehensibility and participant satisfaction, we included 3 questions dedicated to each aspect within the same set of 8 questions. As of April 10, 2023, the 5 evaluator neurologists and 5 family members of patients with AD rated the 112 responses: GPT-3.5: n=28, 25%, responses; GPT-4: n=28, 25%, responses; respondent neurologists: 56 (50%) responses. The top 5 (4.5%) responses rated by evaluator neurologists had 4 (80%) GPT (GPT-3.5+GPT-4) responses and 1 (20%) respondent neurologist's response. For the top 5 (4.5%) responses rated by patients' family members, all but the third response were GPT responses. Based on the evaluation by neurologists, the neurologist-generated responses achieved a mean score of 3.9 (SD 0.7), while the GPT-generated responses scored significantly higher (mean 4.4, SD 0.6; P<.001). Language and model analyses revealed no significant differences in response quality between the GPT-3.5 and GPT-4 models (GPT-3.5: mean 4.3, SD 0.7; GPT-4: mean 4.4, SD 0.5; P=.51). However, English responses outperformed Chinese responses in terms of comprehensibility (Chinese responses: mean 4.1, SD 0.7; English responses: mean 4.6, SD 0.5; P=.005) and participant satisfaction (Chinese responses: mean 4.2, SD 0.8; English responses: mean 4.5, SD 0.5; P=.04). According to the evaluator neurologists' review, Chinese responses had a mean score of 4.4 (SD 0.6), whereas English responses had a mean score of 4.5 (SD 0.5; P=.002). As for the family members of patients with AD, no significant differences were observed between GPT and neurologists, GPT-3.5 and GPT-4, or Chinese and English responses. GPT can provide patient education materials on AD for patients, their families and caregivers, nurses, and neurologists. This capability can contribute to the effective health care management of patients with AD, leading to enhanced patient outcomes.

Sections du résumé

BACKGROUND BACKGROUND
Alzheimer's disease (AD) is a progressive neurodegenerative disorder posing challenges to patients, caregivers, and society. Accessible and accurate information is crucial for effective AD management.
OBJECTIVE OBJECTIVE
This study aimed to evaluate the accuracy, comprehensibility, clarity, and usefulness of the Generative Pretrained Transformer's (GPT) answers concerning the management and caregiving of patients with AD.
METHODS METHODS
In total, 14 questions related to the prevention, treatment, and care of AD were identified and posed to GPT-3.5 and GPT-4 in Chinese and English, respectively, and 4 respondent neurologists were asked to answer them. We generated 8 sets of responses (total 112) and randomly coded them in answer sheets. Next, 5 evaluator neurologists and 5 family members of patients were asked to rate the 112 responses using separate 5-point Likert scales. We evaluated the quality of the responses using a set of 8 questions rated on a 5-point Likert scale. To gauge comprehensibility and participant satisfaction, we included 3 questions dedicated to each aspect within the same set of 8 questions.
RESULTS RESULTS
As of April 10, 2023, the 5 evaluator neurologists and 5 family members of patients with AD rated the 112 responses: GPT-3.5: n=28, 25%, responses; GPT-4: n=28, 25%, responses; respondent neurologists: 56 (50%) responses. The top 5 (4.5%) responses rated by evaluator neurologists had 4 (80%) GPT (GPT-3.5+GPT-4) responses and 1 (20%) respondent neurologist's response. For the top 5 (4.5%) responses rated by patients' family members, all but the third response were GPT responses. Based on the evaluation by neurologists, the neurologist-generated responses achieved a mean score of 3.9 (SD 0.7), while the GPT-generated responses scored significantly higher (mean 4.4, SD 0.6; P<.001). Language and model analyses revealed no significant differences in response quality between the GPT-3.5 and GPT-4 models (GPT-3.5: mean 4.3, SD 0.7; GPT-4: mean 4.4, SD 0.5; P=.51). However, English responses outperformed Chinese responses in terms of comprehensibility (Chinese responses: mean 4.1, SD 0.7; English responses: mean 4.6, SD 0.5; P=.005) and participant satisfaction (Chinese responses: mean 4.2, SD 0.8; English responses: mean 4.5, SD 0.5; P=.04). According to the evaluator neurologists' review, Chinese responses had a mean score of 4.4 (SD 0.6), whereas English responses had a mean score of 4.5 (SD 0.5; P=.002). As for the family members of patients with AD, no significant differences were observed between GPT and neurologists, GPT-3.5 and GPT-4, or Chinese and English responses.
CONCLUSIONS CONCLUSIONS
GPT can provide patient education materials on AD for patients, their families and caregivers, nurses, and neurologists. This capability can contribute to the effective health care management of patients with AD, leading to enhanced patient outcomes.

Identifiants

pubmed: 39481104
pii: v26i1e51095
doi: 10.2196/51095
doi:

Types de publication

Journal Article Comparative Study

Langues

eng

Sous-ensembles de citation

IM

Pagination

e51095

Informations de copyright

©Jiaqi Zeng, Xiaoyi Zou, Shirong Li, Yao Tang, Sisi Teng, Huanhuan Li, Changyu Wang, Yuxuan Wu, Luyao Zhang, Yunheng Zhong, Jialin Liu, Siru Liu. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 31.10.2024.

Auteurs

Jiaqi Zeng (J)

West China Medical School, Sichuan University, Chengdu, China.

Xiaoyi Zou (X)

Department of Neurology, West China Hospital, Sichuan University, Chengdu, China.
Department of Neurology, Chengdu Shangjin Nanfu Hospital, Chengdu, China.

Shirong Li (S)

Department of Neurology, Guizhou Provincial People's Hospital, Guiyang, China.

Yao Tang (Y)

Department of Neurology, Chengdu Shangjin Nanfu Hospital, Chengdu, China.

Sisi Teng (S)

Department of Neurology, Chengdu Shangjin Nanfu Hospital, Chengdu, China.

Huanhuan Li (H)

Mental Health Center, West China Hospital, Sichuan University, Chengdu, China.

Changyu Wang (C)

West China College of Stomatology, Sichuan University, Chengdu, China.

Yuxuan Wu (Y)

Department of Medical Informatics, West China Medical School, Chengdu, China.

Luyao Zhang (L)

West China School of Nursing, Sichuan University, Chengdu, China.

Yunheng Zhong (Y)

West China School of Nursing, Sichuan University, Chengdu, China.

Jialin Liu (J)

Department of Medical Informatics, West China Medical School, Chengdu, China.
Department of Otolaryngology-Head and Neck Surgery, West China Hospital, Sichuan University, Chengdu, China.

Siru Liu (S)

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH