A Novel Evaluation Framework for Medical LLMs: Combining Fuzzy Logic and MCDM for Medical Relation and Clinical Concept Extraction.

Fuzzy Logic Humans Artificial Intelligence Decision Making

p q-QROFS Clinical Concept Extraction FWZIC MAIRCA Medical Relation Extraction

Journal

Journal of medical systems

ISSN: 1573-689X

Titre abrégé: J Med Syst

Pays: United States

ID NLM: 7806056

Informations de publication

Date de publication:
31 Aug 2024

Historique:

received: 31 05 2024

accepted: 22 07 2024

medline: 31 8 2024

pubmed: 31 8 2024

entrez: 30 8 2024

Statut: epublish

Résumé

Artificial intelligence (AI) has become a crucial element of modern technology, especially in the healthcare sector, which is apparent given the continuous development of large language models (LLMs), which are utilized in various domains, including medical beings. However, when it comes to using these LLMs for the medical domain, there's a need for an evaluation platform to determine their suitability and drive future development efforts. Towards that end, this study aims to address this concern by developing a comprehensive Multi-Criteria Decision Making (MCDM) approach that is specifically designed to evaluate medical LLMs. The success of AI, particularly LLMs, in the healthcare domain, depends on their efficacy, safety, and ethical compliance. Therefore, it is essential to have a robust evaluation framework for their integration into medical contexts. This study proposes using the Fuzzy-Weighted Zero-InConsistency (FWZIC) method extended to p, q-quasirung orthopair fuzzy set (p, q-QROFS) for weighing evaluation criteria. This extension enables the handling of uncertainties inherent in medical decision-making processes. The approach accommodates the imprecise and multifaceted nature of real-world medical data and criteria by incorporating fuzzy logic principles. The MultiAtributive Ideal-Real Comparative Analysis (MAIRCA) method is employed for the assessment of medical LLMs utilized in the case study of this research. The results of this research revealed that "Medical Relation Extraction" criteria with its sub-levels had more importance with (0.504) than "Clinical Concept Extraction" with (0.495). For the LLMs evaluated, out of 6 alternatives, (

Identifiants

DOI: 10.1007/s10916-024-02090-y PMID: 39214943

pubmed: 39214943

doi: 10.1007/s10916-024-02090-y

pii: 10.1007/s10916-024-02090-y

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

Informations de copyright

Références

C. Xingxin, Z. Xin, and W. Gangming, "Research on online fault detection tool of substation equipment based on artificial intelligence," Journal of King Saud University-Science, vol. 34, no. 6, p. 102149, 2022.

doi: 10.1016/j.jksus.2022.102149

M. Pournader, H. Ghaderi, A. Hassanzadegan, and B. Fahimnia, "Artificial intelligence applications in supply chain management," International Journal of Production Economics, vol. 241, p. 108250, 2021.

doi: 10.1016/j.ijpe.2021.108250

A. Zirar, S. I. Ali, and N. Islam, "Worker and workplace Artificial Intelligence (AI) coexistence: Emerging themes and research agenda," Technovation, vol. 124, p. 102747, 2023.

doi: 10.1016/j.technovation.2023.102747

A. R. Malik, Y. Pratiwi, K. Andajani, I. W. Numertayasa, S. Suharti, and A. Darwis, "Exploring Artificial Intelligence in Academic Essay: Higher Education Student's Perspective," International Journal of Educational Research Open, vol. 5, p. 100296, 2023.

doi: 10.1016/j.ijedro.2023.100296

G. Kaur, P. Tomar, and M. Tanque, Artificial intelligence to solve pervasive internet of things issues. Academic Press, 2020.

S. Tuli et al., "AI augmented Edge and Fog computing: Trends and challenges," Journal of Network and Computer Applications, p. 103648, 2023.

K. Panesar and M. B. P. C. de Alba, "Natural language processing-driven framework for the early detection of language and cognitive decline," Language and Health, 2023.

O. Nov, N. Singh, and D. M. Mann, "Putting ChatGPT's medical advice to the (Turing) test," medRxiv, p. 2023.01. 23.23284735, 2023.

T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, "Large language models are zero-shot reasoners," Advances in neural information processing systems, vol. 35, pp. 22199-22213, 2022.

C. Zhang, J. Chen, J. Li, Y. Peng, and Z. Mao, "Large language models for human-robot interaction: A review," Biomimetic Intelligence and Robotics, p. 100131, 2023.

A. H. Huang, H. Wang, and Y. Yang, "FinBERT: A large language model for extracting information from financial text," Contemporary Accounting Research, vol. 40, no. 2, pp. 806-841, 2023.

doi: 10.1111/1911-3846.12832

R. Taylor et al., "Galactica: A large language model for science," arXiv preprint arXiv:2211.09085 , 2022.

X. Yang et al., "A large language model for electronic health records," NPJ Digital Medicine, vol. 5, no. 1, p. 194, 2022.

doi: 10.1038/s41746-022-00742-2 pubmed: 36572766 pmcid: 9792464

H. Jung et al., "Enhancing Clinical Efficiency through LLM: Discharge Note Generation for Cardiac Patients," arXiv preprint arXiv:2404.05144 , 2024.

J. Barile et al., "Diagnostic accuracy of a large language model in pediatric case studies," JAMA pediatrics, 2024.

B. Kasper and A. Brownfield, "Evaluation of a newly established layered learning model in an ambulatory care practice setting," Currents in Pharmacy Teaching and Learning, vol. 10, no. 7, pp. 925-932, 2018.

doi: 10.1016/j.cptl.2018.02.011 pubmed: 30236430

U. P. Liyanage and N. D. Ranaweera, "Ethical considerations and potential risks in the deployment of large Language Models in diverse societal contexts," Journal of Computational Social Dynamics, vol. 8, no. 11, pp. 15-25, 2023.

J. Yuan, R. Tang, X. Jiang, and X. Hu, "Llm for patient-trial matching: Privacy-aware data augmentation towards better performance and generalizability," in American Medical Informatics Association (AMIA) Annual Symposium, 2023.

A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F. Tan, and D. S. W. Ting, "Large language models in medicine," Nature medicine, vol. 29, no. 8, pp. 1930-1940, 2023.

doi: 10.1038/s41591-023-02448-8 pubmed: 37460753

C. Peng et al., "A Study of Generative Large Language Model for Medical Research and Healthcare," arXiv preprint arXiv:2305.13523 , 2023.

L. Gao et al., "The pile: An 800gb dataset of diverse text for language modeling," arXiv preprint arXiv:2101.00027 , 2020.

T. L. Saaty, "The analytic hierarchy process: planning, priority setting, resource allocation," ed: McGraw-Hill, New York London, 1980.

R. L. Keeney and H. Raiffa, Decisions with multiple objectives: preferences and value trade-offs. Cambridge university press, 1993.

V. Belton and T. Stewart, Multiple criteria decision analysis: an integrated approach. Springer Science & Business Media, 2002.

T. L. J. E. j. o. o. r. Saaty, "How to make a decision: the analytic hierarchy process," vol. 48, no. 1, pp. 9–26, 1990.

G.-H. Tzeng and J.-J. Huang, Multiple attribute decision making: methods and applications. CRC press, 2011.

E. Triantaphyllou and E. Triantaphyllou, Multi-criteria decision making methods. Springer, 2000.

B. Roy, Multicriteria methodology for decision aiding. Springer Science & Business Media, 2013.

K. T. Atanassov and S. Stoeva, "Intuitionistic fuzzy sets," Fuzzy sets and Systems, vol. 20, no. 1, pp. 87-96, 1986.

doi: 10.1016/S0165-0114(86)80034-3

M. R. Seikh and U. Mandal, "Multiple attribute group decision making based on quasirung orthopair fuzzy sets: Application to electric vehicle charging station site selection problem," Engineering Applications of Artificial Intelligence, vol. 115, p. 105299, 2022.

doi: 10.1016/j.engappai.2022.105299

R. Mohammed et al., "Determining importance of many-objective optimisation competitive algorithms evaluation criteria based on a novel fuzzy-weighted zero-inconsistency method," International Journal of Information Technology & Decision Making, vol. 21, no. 01, pp. 195-241, 2022.

doi: 10.1142/S0219622021500140

D. S. Pamucar, S. P. Tarle, and T. Parezanovic, "New hybrid multi-criteria decision-making DEMATEL-MAIRCA model: sustainable selection of a location for the development of multimodal logistics centre," Economic Research-Ekonomska Istraživanja, vol. 31, no. 1, pp. 1641–1665, 2018/01/01 2018, https://doi.org/10.1080/1331677X.2018.1506706 .

A. Alamoodi et al., "Based on neutrosophic fuzzy environment: a new development of FWZIC and FDOSM for benchmarking smart e-tourism applications," Complex & Intelligent Systems, vol. 8, no. 4, pp. 3479-3503, 2022.

doi: 10.1007/s40747-022-00689-7

A. Alamoodi et al., "New extension of fuzzy-weighted zero-inconsistency and fuzzy decision by opinion score method based on cubic pythagorean fuzzy environment: a benchmarking case study of sign language recognition systems," International Journal of Fuzzy Systems, vol. 24, no. 4, pp. 1909-1926, 2022.

doi: 10.1007/s40815-021-01246-z

E. Krishnan et al., "Interval type 2 trapezoidal‐fuzzy weighted with zero inconsistency combined with VIKOR for evaluating smart e‐tourism applications," International Journal of Intelligent Systems, vol. 36, no. 9, pp. 4723-4774, 2021.

doi: 10.1002/int.22489

K. Chatterjee, D. Pamucar, and E. K. Zavadskas, "Evaluating the performance of suppliers based on using the R'AMATEL-MAIRCA method for green supply chain implementation in electronics industry," Journal of cleaner production, vol. 184, pp. 101-129, 2018.

doi: 10.1016/j.jclepro.2018.02.186

K. Huang, J. Altosaar, and R. Ranganath, "Clinicalbert: Modeling clinical notes and predicting hospital readmission," arXiv preprint arXiv:1904.05342 , 2019.

L. Floridi and M. Chiriatti, "GPT-3: Its nature, scope, limits, and consequences," Minds and Machines, vol. 30, pp. 681-694, 2020.

doi: 10.1007/s11023-020-09548-1

J. Lee et al., "BioBERT: a pre-trained biomedical language representation model for biomedical text mining," Bioinformatics, vol. 36, no. 4, pp. 1234-1240, 2020.

doi: 10.1093/bioinformatics/btz682 pubmed: 31501885

X. Yang, J. Bian, R. Fang, R. I. Bjarnadottir, W. R. Hogan, and Y. Wu, "Identifying relations of medications with adverse drug events using recurrent convolutional neural networks and gradient boosting," Journal of the American Medical Informatics Association, vol. 27, no. 1, pp. 65-72, 2020.

doi: 10.1093/jamia/ocz144 pubmed: 31504605

A Novel Evaluation Framework for Medical LLMs: Combining Fuzzy Logic and MCDM for Medical Relation and Clinical Concept Extraction.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

A H Alamoodi (AH)

Omar Zughoul (O)

Dianese David (D)

Salem Garfan (S)

Dragan Pamucar (D)

O S Albahri (OS)

A S Albahri (AS)

Salman Yussof (S)

Iman Mohamad Sharaf (IM)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH