Evaluating ChatGPT responses on thyroid nodules for patient education.


Journal

Thyroid : official journal of the American Thyroid Association
ISSN: 1557-9077
Titre abrégé: Thyroid
Pays: United States
ID NLM: 9104317

Informations de publication

Date de publication:
27 Nov 2023
Historique:
medline: 27 11 2023
pubmed: 27 11 2023
entrez: 27 11 2023
Statut: aheadofprint

Résumé

ChatGPT, an artificial intelligence (AI) chatbot, is the fastest growing consumer application in history. Given recent trends identifying increasing patient use of Internet sources for self-education, we seek to evaluate the quality of ChatGPT-generated responses for patient education on thyroid nodules. ChatGPT was queried 4 times with 30 identical questions. Queries differed by initial chatbot prompting: no prompting, patient-friendly prompting, 8th-grade level prompting, and prompting for references. Answers were scored on a hierarchical score: incorrect, partially correct, correct, or correct with references. Proportions of responses at incremental score thresholds were compared by prompt type using chi-squared analysis. Flesch-Kincaid grade level was calculated for each answer. The relationship between prompt type and grade level was assessed using analysis of variance. References provided within ChatGPT answers were totaled and analyzed for veracity. Across all prompts (n=120 questions), 83 answers (69.2%) were at least correct. Proportions of responses that were at least partially correct (p=0.795) and correct (p=0.402) did not differ by prompt; responses that were correct with references did (p<0.0001). Responses from 8th-grade level prompting were the lowest mean grade level (13.43 ± 2.86) and were significantly lower than no prompting (14.97 ± 2.01, p=0.01) and prompting for references (16.43 ± 2.05, p<0.0001). Prompting for references generated 80/80 (100%) of referenced publications within answers. Seventy references (87.5%) were legitimate citations, and 58/80 (72.5%) provided accurately reported information from the referenced publications. ChatGPT overall provides appropriate answers to most questions on thyroid nodules regardless of prompting. Despite targeted prompting strategies, ChatGPT reliably generates responses corresponding to grade levels well-above accepted recommendations for presenting medical information to patients. Significant rates of AI hallucination may preclude clinicians from recommending the current version of ChatGPT as an educational tool for patients at this time.

Sections du résumé

BACKGROUND BACKGROUND
ChatGPT, an artificial intelligence (AI) chatbot, is the fastest growing consumer application in history. Given recent trends identifying increasing patient use of Internet sources for self-education, we seek to evaluate the quality of ChatGPT-generated responses for patient education on thyroid nodules.
METHODS METHODS
ChatGPT was queried 4 times with 30 identical questions. Queries differed by initial chatbot prompting: no prompting, patient-friendly prompting, 8th-grade level prompting, and prompting for references. Answers were scored on a hierarchical score: incorrect, partially correct, correct, or correct with references. Proportions of responses at incremental score thresholds were compared by prompt type using chi-squared analysis. Flesch-Kincaid grade level was calculated for each answer. The relationship between prompt type and grade level was assessed using analysis of variance. References provided within ChatGPT answers were totaled and analyzed for veracity.
RESULTS RESULTS
Across all prompts (n=120 questions), 83 answers (69.2%) were at least correct. Proportions of responses that were at least partially correct (p=0.795) and correct (p=0.402) did not differ by prompt; responses that were correct with references did (p<0.0001). Responses from 8th-grade level prompting were the lowest mean grade level (13.43 ± 2.86) and were significantly lower than no prompting (14.97 ± 2.01, p=0.01) and prompting for references (16.43 ± 2.05, p<0.0001). Prompting for references generated 80/80 (100%) of referenced publications within answers. Seventy references (87.5%) were legitimate citations, and 58/80 (72.5%) provided accurately reported information from the referenced publications.
CONCLUSION CONCLUSIONS
ChatGPT overall provides appropriate answers to most questions on thyroid nodules regardless of prompting. Despite targeted prompting strategies, ChatGPT reliably generates responses corresponding to grade levels well-above accepted recommendations for presenting medical information to patients. Significant rates of AI hallucination may preclude clinicians from recommending the current version of ChatGPT as an educational tool for patients at this time.

Identifiants

pubmed: 38010917
doi: 10.1089/thy.2023.0491
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Auteurs

Daniel J Campbell (DJ)

Thomas Jefferson University - Center City Campus, 6559, Otolaryngology - Head & Neck Surgery, 925 Chestnut St, Floor 6, Philadelphia, Pennsylvania, United States, 19107-5084; djc024@jefferson.edu.

Leonard E Estephan (LE)

Thomas Jefferson University - Center City Campus, 6559, Otolaryngology - Head & Neck Surgery , Philadelphia, Pennsylvania, United States; Leonard.estephan@jefferson.edu.

Elliott Sina (E)

Thomas Jefferson University - Center City Campus, 6559, Otolaryngology - Head & Neck Surgery , Philadelphia, Pennsylvania, United States; Elliott.sina@students.jefferson.edu.

Eric V Mastrolonardo (EV)

Thomas Jefferson University - Center City Campus, 6559, Otolaryngology - Head & Neck Surgery, Philadelphia, Pennsylvania, United States; Eric.mastrolonardo@jefferson.edu.

Rahul Alapati (R)

Thomas Jefferson University - Center City Campus, 6559, Philadelphia, Pennsylvania, United States; ralapati98@gmail.com.

Dev R Amin (DR)

Thomas Jefferson University - Center City Campus, 6559, Otolaryngology - Head & Neck Surgery, Philadelphia, Pennsylvania, United States; dev.amin@jefferson.edu.

Elizabeth Cottrill (E)

Thomas Jefferson University, Otolaryngology - Head and Neck Surgery, Philadelphia, Pennsylvania, United States; Elizabeth.Cottrill@jefferson.edu.

Classifications MeSH