Integrating human expertise & automated methods for a dynamic and multi-parametric evaluation of large language models' feasibility in clinical decision-making.

AI routine integration Clinical decision making Healthcare innovation LLM’s feasibility Methodology Multi-parametric analysis Multidisciplinary approach Nursing informatics Safety

Journal

International journal of medical informatics
ISSN: 1872-8243
Titre abrégé: Int J Med Inform
Pays: Ireland
ID NLM: 9711057

Informations de publication

Date de publication:
26 May 2024
Historique:
received: 08 04 2024
revised: 09 05 2024
accepted: 22 05 2024
medline: 30 5 2024
pubmed: 30 5 2024
entrez: 29 5 2024
Statut: aheadofprint

Résumé

Recent enhancements in Large Language Models (LLMs) such as ChatGPT have exponentially increased user adoption. These models are accessible on mobile devices and support multimodal interactions, including conversations, code generation, and patient image uploads, broadening their utility in providing healthcare professionals with real-time support for clinical decision-making. Nevertheless, many authors have highlighted serious risks that may arise from the adoption of LLMs, principally related to safety and alignment with ethical guidelines. To address these challenges, we introduce a novel methodological approach designed to assess the specific feasibility of adopting LLMs within a healthcare area, with a focus on clinical nursing, evaluating their performance and thereby directing their choice. Emphasizing LLMs' adherence to scientific advancements, this approach prioritizes safety and care personalization, according to the "Organization for Economic Co-operation and Development" frameworks for responsible AI. Moreover, its dynamic nature is designed to adapt to future evolutions of LLMs. Through integrating advanced multidisciplinary knowledge, including Nursing Informatics, and aided by a prospective literature review, seven key domains and specific evaluation items were identified as follows:A Peer Review by experts in Nursing and AI was performed, ensuring scientific rigor and breadth of insights for an essential, reproducible, and coherent methodological approach. By means of a 7-point Likert scale, thresholds are defined in order to classify LLMs as "unusable", "usable with high caution", and "recommended" categories. Nine state of the art LLMs were evaluated using this methodology in clinical oncology nursing decision-making, producing preliminary results. Gemini Advanced, Anthropic Claude 3 and ChatGPT 4 achieved the minimum score of the State of the Art Alignment & Safety domain for classification as "recommended", being also endorsed across all domains. LLAMA 3 70B and ChatGPT 3.5 were classified as "usable with high caution." Others were classified as unusable in this domain. The identification of a recommended LLM for a specific healthcare area, combined with its critical, prudent, and integrative use, can support healthcare professionals in decision-making processes.

Sections du résumé

BACKGROUND BACKGROUND
Recent enhancements in Large Language Models (LLMs) such as ChatGPT have exponentially increased user adoption. These models are accessible on mobile devices and support multimodal interactions, including conversations, code generation, and patient image uploads, broadening their utility in providing healthcare professionals with real-time support for clinical decision-making. Nevertheless, many authors have highlighted serious risks that may arise from the adoption of LLMs, principally related to safety and alignment with ethical guidelines.
OBJECTIVE OBJECTIVE
To address these challenges, we introduce a novel methodological approach designed to assess the specific feasibility of adopting LLMs within a healthcare area, with a focus on clinical nursing, evaluating their performance and thereby directing their choice. Emphasizing LLMs' adherence to scientific advancements, this approach prioritizes safety and care personalization, according to the "Organization for Economic Co-operation and Development" frameworks for responsible AI. Moreover, its dynamic nature is designed to adapt to future evolutions of LLMs.
METHOD METHODS
Through integrating advanced multidisciplinary knowledge, including Nursing Informatics, and aided by a prospective literature review, seven key domains and specific evaluation items were identified as follows:A Peer Review by experts in Nursing and AI was performed, ensuring scientific rigor and breadth of insights for an essential, reproducible, and coherent methodological approach. By means of a 7-point Likert scale, thresholds are defined in order to classify LLMs as "unusable", "usable with high caution", and "recommended" categories. Nine state of the art LLMs were evaluated using this methodology in clinical oncology nursing decision-making, producing preliminary results. Gemini Advanced, Anthropic Claude 3 and ChatGPT 4 achieved the minimum score of the State of the Art Alignment & Safety domain for classification as "recommended", being also endorsed across all domains. LLAMA 3 70B and ChatGPT 3.5 were classified as "usable with high caution." Others were classified as unusable in this domain.
CONCLUSION CONCLUSIONS
The identification of a recommended LLM for a specific healthcare area, combined with its critical, prudent, and integrative use, can support healthcare professionals in decision-making processes.

Identifiants

pubmed: 38810498
pii: S1386-5056(24)00164-3
doi: 10.1016/j.ijmedinf.2024.105501
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

105501

Informations de copyright

Copyright © 2024. Published by Elsevier B.V.

Déclaration de conflit d'intérêts

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Elena Sblendorio (E)

Azienda Ospedaliero-Universitaria Consorziale Policlinico di Bari, Piazza Giulio Cesare, 11, 70124 Bari, Italy; Department of Biomedicine and Prevention, University of Rome "Tor Vergata", Italy.

Vincenzo Dentamaro (V)

Department of Computer Science, University of Bari "Aldo Moro", Bari, Italy. Electronic address: https://twitter.com/vincenzoden.

Alessio Lo Cascio (A)

Department of Biomedicine and Prevention, University of Rome "Tor Vergata", Italy; La Maddalena Cancer Center, Via San Lorenzo 312, 90146 Palermo, Italy.

Francesco Germini (F)

Department of Biomedicine and Prevention, University of Rome "Tor Vergata", Italy; Direttore di Distretto Sociosanitario, ASL Bari, Bari, Italy.

Michela Piredda (M)

Department of Medicine and Surgery, Research Unit Nursing Science, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo, 21, 00128 Rome, Italy.

Giancarlo Cicolini (G)

Department of Innovative Technologies in Medicine & Dentistry, "G.d'Annunzio" University of Chieti - Pescara, Italy.

Classifications MeSH