Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal Health-Related Questions: Are We There Yet?

OpenAI’s ChatGPT chatbot gastroenterology medical information natural language processing (NLP) patients’ questions

Journal

Diagnostics (Basel, Switzerland)

ISSN: 2075-4418

Titre abrégé: Diagnostics (Basel)

Pays: Switzerland

ID NLM: 101658402

Informations de publication

Date de publication:
02 Jun 2023

Historique:

received: 27 03 2023

revised: 28 05 2023

accepted: 01 06 2023

medline: 10 6 2023

pubmed: 10 6 2023

entrez: 10 6 2023

Statut: epublish

Résumé

Patients frequently have concerns about their disease and find it challenging to obtain accurate Information. OpenAI's ChatGPT chatbot (ChatGPT) is a new large language model developed to provide answers to a wide range of questions in various fields. Our aim is to evaluate the performance of ChatGPT in answering patients' questions regarding gastrointestinal health. To evaluate the performance of ChatGPT in answering patients' questions, we used a representative sample of 110 real-life questions. The answers provided by ChatGPT were rated in consensus by three experienced gastroenterologists. The accuracy, clarity, and efficacy of the answers provided by ChatGPT were assessed. ChatGPT was able to provide accurate and clear answers to patients' questions in some cases, but not in others. For questions about treatments, the average accuracy, clarity, and efficacy scores (1 to 5) were 3.9 ± 0.8, 3.9 ± 0.9, and 3.3 ± 0.9, respectively. For symptoms questions, the average accuracy, clarity, and efficacy scores were 3.4 ± 0.8, 3.7 ± 0.7, and 3.2 ± 0.7, respectively. For diagnostic test questions, the average accuracy, clarity, and efficacy scores were 3.7 ± 1.7, 3.7 ± 1.8, and 3.5 ± 1.7, respectively. While ChatGPT has potential as a source of information, further development is needed. The quality of information is contingent upon the quality of the online information provided. These findings may be useful for healthcare providers and patients alike in understanding the capabilities and limitations of ChatGPT.

Sections du résumé

BACKGROUND AND AIMS OBJECTIVE

METHODS METHODS

To evaluate the performance of ChatGPT in answering patients' questions, we used a representative sample of 110 real-life questions. The answers provided by ChatGPT were rated in consensus by three experienced gastroenterologists. The accuracy, clarity, and efficacy of the answers provided by ChatGPT were assessed.

RESULTS RESULTS

ChatGPT was able to provide accurate and clear answers to patients' questions in some cases, but not in others. For questions about treatments, the average accuracy, clarity, and efficacy scores (1 to 5) were 3.9 ± 0.8, 3.9 ± 0.9, and 3.3 ± 0.9, respectively. For symptoms questions, the average accuracy, clarity, and efficacy scores were 3.4 ± 0.8, 3.7 ± 0.7, and 3.2 ± 0.7, respectively. For diagnostic test questions, the average accuracy, clarity, and efficacy scores were 3.7 ± 1.7, 3.7 ± 1.8, and 3.5 ± 1.7, respectively.

CONCLUSIONS CONCLUSIONS

While ChatGPT has potential as a source of information, further development is needed. The quality of information is contingent upon the quality of the online information provided. These findings may be useful for healthcare providers and patients alike in understanding the capabilities and limitations of ChatGPT.

Identifiants

DOI: 10.3390/diagnostics13111950 PMID: 37296802 PMC: PMC10252924

pubmed: 37296802

pii: diagnostics13111950

doi: 10.3390/diagnostics13111950

pmc: PMC10252924

pii:

doi:

Types de publication

Journal Article

Langues

eng

Références

Clin Mol Hepatol. 2023 Mar 22;:

pubmed: 36946005

Sci Rep. 2023 Mar 13;13(1):4164

pubmed: 36914821

J Med Internet Res. 2020 Oct 22;22(10):e20346

pubmed: 33090118

Hepatol Commun. 2023 Mar 24;7(4):

pubmed: 36972383

Int J Environ Res Public Health. 2023 Feb 15;20(4):

pubmed: 36834073

J Med Internet Res. 2021 May 6;23(5):e27460

pubmed: 33882012

Graefes Arch Clin Exp Ophthalmol. 2023 May 2;:

pubmed: 37129631

JMIR Med Educ. 2023 Mar 6;9:e46885

pubmed: 36863937

JNCI Cancer Spectr. 2023 Mar 1;7(2):

pubmed: 36929393

PLoS Med. 2018 Nov 6;15(11):e1002689

pubmed: 30399149

Aesthetic Plast Surg. 2023 Apr 24;:

pubmed: 37095384

Obes Surg. 2023 Jun;33(6):1790-1796

pubmed: 37106269

J Telemed Telecare. 2023 Feb 9;:1357633X231155520

pubmed: 36760131

Dig Liver Dis. 2008 Aug;40(8):659-66

pubmed: 18406672

J Med Internet Res. 2019 Apr 05;21(4):e12887

pubmed: 30950796

Heliyon. 2017 Jun 22;3(6):e00328

pubmed: 28707001

J Med Internet Res. 2019 Oct 28;21(10):e16222

pubmed: 31661083

Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal Health-Related Questions: Are We There Yet?

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Références

Auteurs

Adi Lahat (A)

Eyal Shachar (E)

Benjamin Avidan (B)

Benjamin Glicksberg (B)

Eyal Klang (E)

Classifications MeSH