Accuracy of natural language processors for patients seeking inguinal hernia information.

Artificial intelligence Inguinal hernia NLP Natural language processors Patient education

Journal

Surgical endoscopy
ISSN: 1432-2218
Titre abrégé: Surg Endosc
Pays: Germany
ID NLM: 8806653

Informations de publication

Date de publication:
23 Oct 2024
Historique:
received: 21 04 2024
accepted: 19 08 2024
medline: 24 10 2024
pubmed: 24 10 2024
entrez: 23 10 2024
Statut: aheadofprint

Résumé

NLPs such as ChatGPT are novel sources of online healthcare information that are readily accessible and integrated into internet search tools. The accuracy of NLP-generated responses to health information questions is unknown. We queried four NLPs (ChatGPT 3.5 and 4, Bard, and Claude 2.0) for responses to simulated patient questions about inguinal hernias and their management. Responses were graded on a Likert scale (1 poor to 5 excellent) for relevance, completeness, and accuracy. Responses were compiled and scored collectively for readability using the Flesch-Kincaid score and for educational quality using the DISCERN instrument, a validated tool for evaluating patient information materials. Responses were also compared to two gold-standard educational materials provided by SAGES and the ACS. Evaluations were performed by six hernia surgeons. The average NLP response scores for relevance, completeness, and accuracy were 4.76 (95% CI 4.70-4.80), 4.11 (95% CI 4.02-4.20), and 4.14 (95% CI 4.03-4.24), respectively. ChatGPT4 received higher accuracy scores (mean 4.43 [95% CI 4.37-4.50]) than Bard (mean 4.06 [95% CI 3.88-4.26]) and Claude 2.0 (mean 3.85 [95% CI 3.63-4.08]). The ACS document received the best scores for reading ease (55.2) and grade level (9.2); however, none of the documents achieved the readibility thresholds recommended by the American Medical Association. The ACS document also received the highest DISCERN score of 63.5 (57.0-70.1), and this was significantly higher compared to ChatGPT 4 (50.8 [95% CI 46.2-55.4]) and Claude 2.0 (48 [95% CI 41.6-54.4]). The evaluated NLPs provided relevant responses of reasonable accuracy to questions about inguinal hernia. Compiled NLP responses received relatively low readability and DISCERN scores, although results may improve as NLPs evolve or with adjustments in question wording. As surgical patients expand their use of NLPs for healthcare information, surgeons should be aware of the benefits and limitations of NLPs as patient education tools.

Sections du résumé

BACKGROUND BACKGROUND
NLPs such as ChatGPT are novel sources of online healthcare information that are readily accessible and integrated into internet search tools. The accuracy of NLP-generated responses to health information questions is unknown.
METHODS METHODS
We queried four NLPs (ChatGPT 3.5 and 4, Bard, and Claude 2.0) for responses to simulated patient questions about inguinal hernias and their management. Responses were graded on a Likert scale (1 poor to 5 excellent) for relevance, completeness, and accuracy. Responses were compiled and scored collectively for readability using the Flesch-Kincaid score and for educational quality using the DISCERN instrument, a validated tool for evaluating patient information materials. Responses were also compared to two gold-standard educational materials provided by SAGES and the ACS. Evaluations were performed by six hernia surgeons.
RESULTS RESULTS
The average NLP response scores for relevance, completeness, and accuracy were 4.76 (95% CI 4.70-4.80), 4.11 (95% CI 4.02-4.20), and 4.14 (95% CI 4.03-4.24), respectively. ChatGPT4 received higher accuracy scores (mean 4.43 [95% CI 4.37-4.50]) than Bard (mean 4.06 [95% CI 3.88-4.26]) and Claude 2.0 (mean 3.85 [95% CI 3.63-4.08]). The ACS document received the best scores for reading ease (55.2) and grade level (9.2); however, none of the documents achieved the readibility thresholds recommended by the American Medical Association. The ACS document also received the highest DISCERN score of 63.5 (57.0-70.1), and this was significantly higher compared to ChatGPT 4 (50.8 [95% CI 46.2-55.4]) and Claude 2.0 (48 [95% CI 41.6-54.4]).
CONCLUSIONS CONCLUSIONS
The evaluated NLPs provided relevant responses of reasonable accuracy to questions about inguinal hernia. Compiled NLP responses received relatively low readability and DISCERN scores, although results may improve as NLPs evolve or with adjustments in question wording. As surgical patients expand their use of NLPs for healthcare information, surgeons should be aware of the benefits and limitations of NLPs as patient education tools.

Identifiants

pubmed: 39443381
doi: 10.1007/s00464-024-11221-y
pii: 10.1007/s00464-024-11221-y
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© 2024. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

Références

Finney Rutten LJ, Blake KD, Greenberg-Worisek AJ, Allen SV, Moser RP, Hesse BW (2019) Online health information seeking among US adults: measuring progress toward a healthy people 2020 objective. Public Health Rep 134(6):617–625. https://doi.org/10.1177/0033354919874074
doi: 10.1177/0033354919874074 pubmed: 31513756 pmcid: 6832079
Amante DJ, Hogan TP, Pagoto SL, English TM, Lapane KL (2015) Access to care and use of the Internet to search for health information: results from the US National Health Interview Survey. J Med Internet Res 17(4):e106. https://doi.org/10.2196/jmir.4126
doi: 10.2196/jmir.4126 pubmed: 25925943 pmcid: 4430679
Fox S (2011) Health topics: 80% of internet users look for health information online. Washington, DC: Pew Internet & American Life Project. Feb 01, [2015–04–13]. http://www.pewinternet.org/files/old-media/Files/Reports/2011/PIP_Health_Topics.pdf . Accessed 1 April 2024
Drees J (2019) Google receives more than 1 billion health questions every day, 11 March 2019. https://www.beckershospitalreview.com/healthcare-information-technology/google-receives-more-than-1-billion-health-questions-every-day.html . Accessed 4 April 2024
Suarez-Lledo V, Alvarez-Galvez J (2021) Prevalence of health misinformation on social media: systematic review. J Med Internet Res 23(1):e17187. https://doi.org/10.2196/17187
doi: 10.2196/17187 pubmed: 33470931 pmcid: 7857950
do Nascimento IJB, Pizarro AB, Almeida JM, Azzopardi-Muscat N, Gonçalves MA, Björklund M, Novillo-Ortiz D (2022) Infodemics and health misinformation: a systematic review of reviews. Bull World Health Organ 100(9):544–561. https://doi.org/10.2471/BLT.21.287654
doi: 10.2471/BLT.21.287654
Khullar D (2022) Social media and medical misinformation: confronting new variants of an old problem. JAMA 328(14):1393–1394. https://doi.org/10.1001/jama.2022.17191
doi: 10.1001/jama.2022.17191 pubmed: 36149664
Walker HL, Ghani S, Kuemmerli C, Nebiker CA, Müller BP, Raptis DA, Staubli SM (2023) Reliability of medical information provided by ChatGPT: assessment against clinical guidelines and patient information quality instrument. J Med Internet Res 30(25):e47479. https://doi.org/10.2196/47479
doi: 10.2196/47479
Wu T, He S, Liu J, Sun S, Liu K, Han Q, Tang Y (2023) A brief overview of ChatGPT: the history, status quo and potential future development. IEEE/CAA J Autom Sinica 10:1122–1136. https://doi.org/10.1109/JAS.2023.123618
doi: 10.1109/JAS.2023.123618
Alkaissi H, McFarlane SI (2023) Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus 15(2):e35179. https://doi.org/10.7759/cureus.35179
doi: 10.7759/cureus.35179 pubmed: 36811129 pmcid: 9939079
Kurzer M, Kark A, Hussain T (2007) Inguinal hernia repair. J Perioper Pract. 17(7):318–321. https://doi.org/10.1177/175045890701700704
doi: 10.1177/175045890701700704 pubmed: 17702204
Pitt SC, Schwartz TA, Chu D (2021) AAPOR reporting guidelines for survey studies. JAMA Surg 156(8):785–786. https://doi.org/10.1001/jamasurg.2021.0543 . (PMID: 33825811)
doi: 10.1001/jamasurg.2021.0543 pubmed: 33825811
SAGES (2021) Inguinal hernia repair surgery patient information from Sage, 19 April. https://www.sages.org/publications/patient-information/inguinal-hernia-repair-surgery-patient-information-from-sages/ . Accessed 2 Dec 2023.
Feliciano D, Hawn M, Heneghan K, Strand N (2022) Inguinal and femoral groin hernia repair. https://www.facs.org/media/0aihsqg0/groin_hernia.pdf . Accessed 12 Dec 2023.
Weiss BD (2003) Health literacy. American Medical Association, p 253
Charnock D, Shepperd S (2004) Learning to DISCERN online: applying an appraisal tool to health websites in a workshop setting. Health Ed Res 19:440–446
doi: 10.1093/her/cyg046
Emile SH, Horesh N, Freund M, Pellino G, Oliveira L, Wignakumar A, Wexner SD (2023) How appropriate are answers of online chat-based artificial intelligence (ChatGPT) to common questions on colon cancer? Surg 174(5):1273–1275. https://doi.org/10.1016/j.surg.2023.06.005
doi: 10.1016/j.surg.2023.06.005
Mika AP, Martin JR, Engstrom SM, Polkowski GG, Wilson JM (2023) Assessing ChatGPT responses to common patient questions regarding total hip arthroplasty. J Bone Joint Surg Am 105(19):1519–1526. https://doi.org/10.2106/JBJS.23.00209
doi: 10.2106/JBJS.23.00209 pubmed: 37459402
Johns WL, Kellish A, Farronato D, Ciccotti MG, Hammoud S (2024) ChatGPT can offer satisfactory responses to common patient questions regarding elbow ulnar collateral ligament reconstruction. Arthrosc Sports Med Rehabil 6(2):100893. https://doi.org/10.1016/j.asmr.2024.100893
doi: 10.1016/j.asmr.2024.100893 pubmed: 38375341 pmcid: 10875189
Vargas CR, Chuang DJ, Lee BT (2014) Online patient resources for hernia repair: analysis of readability. J Surg Res 190(1):144–150. https://doi.org/10.1016/j.jss.2014.03.045
doi: 10.1016/j.jss.2014.03.045 pubmed: 24746256
Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, Faix DJ, Goodman AM, Longhurst CA, Hogarth M, Smith DM (2023) Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 183(6):589–596. https://doi.org/10.1001/jamainternmed.2023.1838
doi: 10.1001/jamainternmed.2023.1838 pubmed: 37115527 pmcid: 10148230
Chen S, Kann BH, Foote MB et al (2023) Use of artificial intelligence chatbots for cancer treatment information. JAMA Oncol 9(10):1459–1462. https://doi.org/10.1001/jamaoncol.2023.2954
doi: 10.1001/jamaoncol.2023.2954 pubmed: 37615976 pmcid: 10450584
Khullar D, Casalino LP, Qian Y, Lu Y, Krumholz HM, Aneja S (2022) Perspectives of patients about artificial intelligence in health care. JAMA Open 5(5):e2210309. https://doi.org/10.1001/jamanetworkopen.2022.10309
doi: 10.1001/jamanetworkopen.2022.10309
NeSahni NR, Carrus B (2023) Artificial intelligence in U.S. health care delivery. N Engl J Med. 389(4):348–358. https://doi.org/10.1056/NEJMra2204673
doi: 10.1056/NEJMra2204673
Elias J (2023) Google launches its largest and ‘most capable’ AI model, Gemini, 6 Dec. https://www.cnbc.com/2023/12/06/google-launches-its-largest-and-most-capable-ai-model gemini.html#:~:text=Google%20launches%20its%20largest%20and%20%E2%80%98most%20capable%E2%80%99%20AI,them%20to%20use%20in%20their%20own%20applications.%20. Accessed 1 April 2024.
Anthropic Announcements (2024) Introducing the next generation of Claude, 3 Mar. https://www.anthropic.com/news/claude-3-family . Accessed 1 April 2024.

Auteurs

Alex Lois (A)

Department of Surgery, University of Chicago, 5841 S. Maryland, MC 5095, Chicago, IL, 60637, USA. Alex.Lois@bsd.uchicago.edu.

Robert Yates (R)

Department of Surgery, University of Washington Medical Center, University of Washington, 1959 NE Pacific St, Box 356410, Seattle, WA, 98195, USA.

Megan Ivy (M)

Department of Surgery, University of Washington Medical Center, University of Washington, 1959 NE Pacific St, Box 356410, Seattle, WA, 98195, USA.

Colette Inaba (C)

Department of Surgery, University of Washington Medical Center, University of Washington, 1959 NE Pacific St, Box 356410, Seattle, WA, 98195, USA.

Roger Tatum (R)

Department of Surgery, University of Washington Medical Center, University of Washington, 1959 NE Pacific St, Box 356410, Seattle, WA, 98195, USA.

Lawrence Cetrulo (L)

Department of Surgery, University of Washington Medical Center, University of Washington, 1959 NE Pacific St, Box 356410, Seattle, WA, 98195, USA.

Zoe Parr (Z)

Department of Surgery, University of Washington Medical Center, University of Washington, 1959 NE Pacific St, Box 356410, Seattle, WA, 98195, USA.

Judy Chen (J)

Department of Surgery, University of Washington Medical Center, University of Washington, 1959 NE Pacific St, Box 356410, Seattle, WA, 98195, USA.

Saurabh Khandelwal (S)

Department of Surgery, University of Washington Medical Center, University of Washington, 1959 NE Pacific St, Box 356410, Seattle, WA, 98195, USA.

Andrew Wright (A)

Department of Surgery, University of Washington Medical Center, University of Washington, 1959 NE Pacific St, Box 356410, Seattle, WA, 98195, USA.

Classifications MeSH