A Novel Evaluation Framework for Medical LLMs: Combining Fuzzy Logic and MCDM for Medical Relation and Clinical Concept Extraction.
p
q-QROFS
Clinical Concept Extraction
FWZIC
MAIRCA
Medical Relation Extraction
Journal
Journal of medical systems
ISSN: 1573-689X
Titre abrégé: J Med Syst
Pays: United States
ID NLM: 7806056
Informations de publication
Date de publication:
31 Aug 2024
31 Aug 2024
Historique:
received:
31
05
2024
accepted:
22
07
2024
medline:
31
8
2024
pubmed:
31
8
2024
entrez:
30
8
2024
Statut:
epublish
Résumé
Artificial intelligence (AI) has become a crucial element of modern technology, especially in the healthcare sector, which is apparent given the continuous development of large language models (LLMs), which are utilized in various domains, including medical beings. However, when it comes to using these LLMs for the medical domain, there's a need for an evaluation platform to determine their suitability and drive future development efforts. Towards that end, this study aims to address this concern by developing a comprehensive Multi-Criteria Decision Making (MCDM) approach that is specifically designed to evaluate medical LLMs. The success of AI, particularly LLMs, in the healthcare domain, depends on their efficacy, safety, and ethical compliance. Therefore, it is essential to have a robust evaluation framework for their integration into medical contexts. This study proposes using the Fuzzy-Weighted Zero-InConsistency (FWZIC) method extended to p, q-quasirung orthopair fuzzy set (p, q-QROFS) for weighing evaluation criteria. This extension enables the handling of uncertainties inherent in medical decision-making processes. The approach accommodates the imprecise and multifaceted nature of real-world medical data and criteria by incorporating fuzzy logic principles. The MultiAtributive Ideal-Real Comparative Analysis (MAIRCA) method is employed for the assessment of medical LLMs utilized in the case study of this research. The results of this research revealed that "Medical Relation Extraction" criteria with its sub-levels had more importance with (0.504) than "Clinical Concept Extraction" with (0.495). For the LLMs evaluated, out of 6 alternatives, (
Identifiants
pubmed: 39214943
doi: 10.1007/s10916-024-02090-y
pii: 10.1007/s10916-024-02090-y
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
81Informations de copyright
© 2024. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Références
C. Xingxin, Z. Xin, and W. Gangming, "Research on online fault detection tool of substation equipment based on artificial intelligence," Journal of King Saud University-Science, vol. 34, no. 6, p. 102149, 2022.
doi: 10.1016/j.jksus.2022.102149
M. Pournader, H. Ghaderi, A. Hassanzadegan, and B. Fahimnia, "Artificial intelligence applications in supply chain management," International Journal of Production Economics, vol. 241, p. 108250, 2021.
doi: 10.1016/j.ijpe.2021.108250
A. Zirar, S. I. Ali, and N. Islam, "Worker and workplace Artificial Intelligence (AI) coexistence: Emerging themes and research agenda," Technovation, vol. 124, p. 102747, 2023.
doi: 10.1016/j.technovation.2023.102747
A. R. Malik, Y. Pratiwi, K. Andajani, I. W. Numertayasa, S. Suharti, and A. Darwis, "Exploring Artificial Intelligence in Academic Essay: Higher Education Student's Perspective," International Journal of Educational Research Open, vol. 5, p. 100296, 2023.
doi: 10.1016/j.ijedro.2023.100296
G. Kaur, P. Tomar, and M. Tanque, Artificial intelligence to solve pervasive internet of things issues. Academic Press, 2020.
S. Tuli et al., "AI augmented Edge and Fog computing: Trends and challenges," Journal of Network and Computer Applications, p. 103648, 2023.
K. Panesar and M. B. P. C. de Alba, "Natural language processing-driven framework for the early detection of language and cognitive decline," Language and Health, 2023.
O. Nov, N. Singh, and D. M. Mann, "Putting ChatGPT's medical advice to the (Turing) test," medRxiv, p. 2023.01. 23.23284735, 2023.
T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, "Large language models are zero-shot reasoners," Advances in neural information processing systems, vol. 35, pp. 22199-22213, 2022.
C. Zhang, J. Chen, J. Li, Y. Peng, and Z. Mao, "Large language models for human-robot interaction: A review," Biomimetic Intelligence and Robotics, p. 100131, 2023.
A. H. Huang, H. Wang, and Y. Yang, "FinBERT: A large language model for extracting information from financial text," Contemporary Accounting Research, vol. 40, no. 2, pp. 806-841, 2023.
doi: 10.1111/1911-3846.12832
R. Taylor et al., "Galactica: A large language model for science," arXiv preprint arXiv:2211.09085 , 2022.
X. Yang et al., "A large language model for electronic health records," NPJ Digital Medicine, vol. 5, no. 1, p. 194, 2022.
doi: 10.1038/s41746-022-00742-2
pubmed: 36572766
pmcid: 9792464
H. Jung et al., "Enhancing Clinical Efficiency through LLM: Discharge Note Generation for Cardiac Patients," arXiv preprint arXiv:2404.05144 , 2024.
J. Barile et al., "Diagnostic accuracy of a large language model in pediatric case studies," JAMA pediatrics, 2024.
B. Kasper and A. Brownfield, "Evaluation of a newly established layered learning model in an ambulatory care practice setting," Currents in Pharmacy Teaching and Learning, vol. 10, no. 7, pp. 925-932, 2018.
doi: 10.1016/j.cptl.2018.02.011
pubmed: 30236430
U. P. Liyanage and N. D. Ranaweera, "Ethical considerations and potential risks in the deployment of large Language Models in diverse societal contexts," Journal of Computational Social Dynamics, vol. 8, no. 11, pp. 15-25, 2023.
J. Yuan, R. Tang, X. Jiang, and X. Hu, "Llm for patient-trial matching: Privacy-aware data augmentation towards better performance and generalizability," in American Medical Informatics Association (AMIA) Annual Symposium, 2023.
A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F. Tan, and D. S. W. Ting, "Large language models in medicine," Nature medicine, vol. 29, no. 8, pp. 1930-1940, 2023.
doi: 10.1038/s41591-023-02448-8
pubmed: 37460753
C. Peng et al., "A Study of Generative Large Language Model for Medical Research and Healthcare," arXiv preprint arXiv:2305.13523 , 2023.
L. Gao et al., "The pile: An 800gb dataset of diverse text for language modeling," arXiv preprint arXiv:2101.00027 , 2020.
T. L. Saaty, "The analytic hierarchy process: planning, priority setting, resource allocation," ed: McGraw-Hill, New York London, 1980.
R. L. Keeney and H. Raiffa, Decisions with multiple objectives: preferences and value trade-offs. Cambridge university press, 1993.
V. Belton and T. Stewart, Multiple criteria decision analysis: an integrated approach. Springer Science & Business Media, 2002.
T. L. J. E. j. o. o. r. Saaty, "How to make a decision: the analytic hierarchy process," vol. 48, no. 1, pp. 9–26, 1990.
G.-H. Tzeng and J.-J. Huang, Multiple attribute decision making: methods and applications. CRC press, 2011.
E. Triantaphyllou and E. Triantaphyllou, Multi-criteria decision making methods. Springer, 2000.
B. Roy, Multicriteria methodology for decision aiding. Springer Science & Business Media, 2013.
K. T. Atanassov and S. Stoeva, "Intuitionistic fuzzy sets," Fuzzy sets and Systems, vol. 20, no. 1, pp. 87-96, 1986.
doi: 10.1016/S0165-0114(86)80034-3
M. R. Seikh and U. Mandal, "Multiple attribute group decision making based on quasirung orthopair fuzzy sets: Application to electric vehicle charging station site selection problem," Engineering Applications of Artificial Intelligence, vol. 115, p. 105299, 2022.
doi: 10.1016/j.engappai.2022.105299
R. Mohammed et al., "Determining importance of many-objective optimisation competitive algorithms evaluation criteria based on a novel fuzzy-weighted zero-inconsistency method," International Journal of Information Technology & Decision Making, vol. 21, no. 01, pp. 195-241, 2022.
doi: 10.1142/S0219622021500140
D. S. Pamucar, S. P. Tarle, and T. Parezanovic, "New hybrid multi-criteria decision-making DEMATEL-MAIRCA model: sustainable selection of a location for the development of multimodal logistics centre," Economic Research-Ekonomska Istraživanja, vol. 31, no. 1, pp. 1641–1665, 2018/01/01 2018, https://doi.org/10.1080/1331677X.2018.1506706 .
A. Alamoodi et al., "Based on neutrosophic fuzzy environment: a new development of FWZIC and FDOSM for benchmarking smart e-tourism applications," Complex & Intelligent Systems, vol. 8, no. 4, pp. 3479-3503, 2022.
doi: 10.1007/s40747-022-00689-7
A. Alamoodi et al., "New extension of fuzzy-weighted zero-inconsistency and fuzzy decision by opinion score method based on cubic pythagorean fuzzy environment: a benchmarking case study of sign language recognition systems," International Journal of Fuzzy Systems, vol. 24, no. 4, pp. 1909-1926, 2022.
doi: 10.1007/s40815-021-01246-z
E. Krishnan et al., "Interval type 2 trapezoidal‐fuzzy weighted with zero inconsistency combined with VIKOR for evaluating smart e‐tourism applications," International Journal of Intelligent Systems, vol. 36, no. 9, pp. 4723-4774, 2021.
doi: 10.1002/int.22489
K. Chatterjee, D. Pamucar, and E. K. Zavadskas, "Evaluating the performance of suppliers based on using the R'AMATEL-MAIRCA method for green supply chain implementation in electronics industry," Journal of cleaner production, vol. 184, pp. 101-129, 2018.
doi: 10.1016/j.jclepro.2018.02.186
K. Huang, J. Altosaar, and R. Ranganath, "Clinicalbert: Modeling clinical notes and predicting hospital readmission," arXiv preprint arXiv:1904.05342 , 2019.
L. Floridi and M. Chiriatti, "GPT-3: Its nature, scope, limits, and consequences," Minds and Machines, vol. 30, pp. 681-694, 2020.
doi: 10.1007/s11023-020-09548-1
J. Lee et al., "BioBERT: a pre-trained biomedical language representation model for biomedical text mining," Bioinformatics, vol. 36, no. 4, pp. 1234-1240, 2020.
doi: 10.1093/bioinformatics/btz682
pubmed: 31501885
X. Yang, J. Bian, R. Fang, R. I. Bjarnadottir, W. R. Hogan, and Y. Wu, "Identifying relations of medications with adverse drug events using recurrent convolutional neural networks and gradient boosting," Journal of the American Medical Informatics Association, vol. 27, no. 1, pp. 65-72, 2020.
doi: 10.1093/jamia/ocz144
pubmed: 31504605