The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study.
Clinical application
Deep learning
Domain shift
Neuroimaging
Journal
Medical image analysis
ISSN: 1361-8423
Titre abrégé: Med Image Anal
Pays: Netherlands
ID NLM: 9713490
Informations de publication
Date de publication:
12 2020
12 2020
Historique:
received:
07
11
2019
revised:
17
04
2020
accepted:
24
04
2020
pubmed:
3
10
2020
medline:
24
6
2021
entrez:
2
10
2020
Statut:
ppublish
Résumé
Deep learning (DL) methods have in recent years yielded impressive results in medical imaging, with the potential to function as clinical aid to radiologists. However, DL models in medical imaging are often trained on public research cohorts with images acquired with a single scanner or with strict protocol harmonization, which is not representative of a clinical setting. The aim of this study was to investigate how well a DL model performs in unseen clinical datasets-collected with different scanners, protocols and disease populations-and whether more heterogeneous training data improves generalization. In total, 3117 MRI scans of brains from multiple dementia research cohorts and memory clinics, that had been visually rated by a neuroradiologist according to Scheltens' scale of medial temporal atrophy (MTA), were included in this study. By training multiple versions of a convolutional neural network on different subsets of this data to predict MTA ratings, we assessed the impact of including images from a wider distribution during training had on performance in external memory clinic data. Our results showed that our model generalized well to datasets acquired with similar protocols as the training data, but substantially worse in clinical cohorts with visibly different tissue contrasts in the images. This implies that future DL studies investigating performance in out-of-distribution (OOD) MRI data need to assess multiple external cohorts for reliable results. Further, by including data from a wider range of scanners and protocols the performance improved in OOD data, which suggests that more heterogeneous training data makes the model generalize better. To conclude, this is the most comprehensive study to date investigating the domain shift in deep learning on MRI data, and we advocate rigorous evaluation of DL models on clinical data prior to being certified for deployment.
Identifiants
pubmed: 33007638
pii: S1361-8415(20)30078-5
doi: 10.1016/j.media.2020.101714
pii:
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
101714Subventions
Organisme : NIA NIH HHS
ID : U01 AG024904
Pays : United States
Organisme : Department of Defense
ID : W81XWH-12-2-0012
Pays : International
Organisme : NIA NIH HHS
ID : U01 AG024904
Pays : United States
Informations de copyright
Copyright © 2020 The Author(s). Published by Elsevier B.V. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.