Evaluating pointwise reliability of machine learning prediction.


Journal

Journal of biomedical informatics
ISSN: 1532-0480
Titre abrégé: J Biomed Inform
Pays: United States
ID NLM: 100970413

Informations de publication

Date de publication:
03 2022
Historique:
received: 31 10 2021
revised: 07 01 2022
accepted: 11 01 2022
pubmed: 19 1 2022
medline: 17 3 2022
entrez: 18 1 2022
Statut: ppublish

Résumé

Interest in Machine Learning applications to tackle clinical and biological problems is increasing. This is driven by promising results reported in many research papers, the increasing number of AI-based software products, and by the general interest in Artificial Intelligence to solve complex problems. It is therefore of importance to improve the quality of machine learning output and add safeguards to support their adoption. In addition to regulatory and logistical strategies, a crucial aspect is to detect when a Machine Learning model is not able to generalize to new unseen instances, which may originate from a population distant to that of the training population or from an under-represented subpopulation. As a result, the prediction of the machine learning model for these instances may be often wrong, given that the model is applied outside its "reliable" space of work, leading to a decreasing trust of the final users, such as clinicians. For this reason, when a model is deployed in practice, it would be important to advise users when the model's predictions may be unreliable, especially in high-stakes applications, including those in healthcare. Yet, reliability assessment of each machine learning prediction is still poorly addressed. Here, we review approaches that can support the identification of unreliable predictions, we harmonize the notation and terminology of relevant concepts, and we highlight and extend possible interrelationships and overlap among concepts. We then demonstrate, on simulated and real data for ICU in-hospital death prediction, a possible integrative framework for the identification of reliable and unreliable predictions. To do so, our proposed approach implements two complementary principles, namely the density principle and the local fit principle. The density principle verifies that the instance we want to evaluate is similar to the training set. The local fit principle verifies that the trained model performs well on training subsets that are more similar to the instance under evaluation. Our work can contribute to consolidating work in machine learning especially in medicine.

Identifiants

pubmed: 35041981
pii: S1532-0464(22)00012-0
doi: 10.1016/j.jbi.2022.103996
pii:
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't Review

Langues

eng

Sous-ensembles de citation

IM

Pagination

103996

Informations de copyright

Copyright © 2022 The Authors. Published by Elsevier Inc. All rights reserved.

Auteurs

Giovanna Nicora (G)

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy. Electronic address: giovanna.nicora01@universitadipavia.it.

Miguel Rios (M)

Department of Medical Informatics, Amsterdam UMC, University of Amsterdam, the Netherlands.

Ameen Abu-Hanna (A)

Department of Medical Informatics, Amsterdam UMC, University of Amsterdam, the Netherlands.

Riccardo Bellazzi (R)

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Humans Middle Aged Female Male Surveys and Questionnaires

Classifications MeSH