Deep Ensembles Are Robust to Occasional Catastrophic Failures of Individual DNNs for Organs Segmentations in CT Images.
Automated organ segmentation
Computed tomography
Deep ensembles
Deep neural networks
Journal
Journal of digital imaging
ISSN: 1618-727X
Titre abrégé: J Digit Imaging
Pays: United States
ID NLM: 9100529
Informations de publication
Date de publication:
10 2023
10 2023
Historique:
received:
25
01
2023
accepted:
18
05
2023
revised:
15
05
2023
medline:
18
9
2023
pubmed:
9
6
2023
entrez:
8
6
2023
Statut:
ppublish
Résumé
Deep neural networks (DNNs) have recently showed remarkable performance in various computer vision tasks, including classification and segmentation of medical images. Deep ensembles (an aggregated prediction of multiple DNNs) were shown to improve a DNN's performance in various classification tasks. Here we explore how deep ensembles perform in the image segmentation task, in particular, organ segmentations in CT (Computed Tomography) images. Ensembles of V-Nets were trained to segment multiple organs using several in-house and publicly available clinical studies. The ensembles segmentations were tested on images from a different set of studies, and the effects of ensemble size as well as other ensemble parameters were explored for various organs. Compared to single models, Deep Ensembles significantly improved the average segmentation accuracy, especially for those organs where the accuracy was lower. More importantly, Deep Ensembles strongly reduced occasional "catastrophic" segmentation failures characteristic of single models and variability of the segmentation accuracy from image to image. To quantify this we defined the "high risk images": images for which at least one model produced an outlier metric (performed in the lower 5% percentile). These images comprised about 12% of the test images across all organs. Ensembles performed without outliers for 68%-100% of the "high risk images" depending on the performance metric used.
Identifiants
pubmed: 37291384
doi: 10.1007/s10278-023-00857-2
pii: 10.1007/s10278-023-00857-2
pmc: PMC10502003
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
2060-2074Informations de copyright
© 2023. The Author(s).
Références
B. Lakshminarayanan, A. Pritzel, C. Blundell, Simple and scalable predictive uncertainty estimation using deep ensembles, Advances in neural information processing systems 30 (2017).
L. Breiman, Bagging predictors, Machine learning 24 (2) (1996) 123–140.
doi: 10.1007/BF00058655
R. E. Schapire, The strength of weak learnability, Machine learning 5 (2) (1990) 197–227.
doi: 10.1007/BF00116037
L. Breiman, Random forests, Machine learning 45 (1) (2001) 5–32.
doi: 10.1023/A:1010933404324
X. Li, B. Aldridge, J. Rees, R. Fisher, Estimating the ground truth from multiple individual segmentations with application to skin lesion segmentation, in: Proc. Medical Image Understanding and Analysis Conference, UK, Vol. 1, 2010, pp. 101–106.
E. Hann, I. A. Popescu, Q. Zhang, R. A. Gonzales, A. Barutçu, S. Neubauer, V. M. Ferreira, S. K. Piechnik, Deep neural network ensemble for on-the-fly quality control-driven segmentation of cardiac mri t1 mapping, Medical image analysis 71 (2021) 102029.
doi: 10.1016/j.media.2021.102029
pubmed: 33831594
pmcid: 8204226
S. K. Warfield, K. H. Zou, W. M. Wells, Simultaneous truth and performance level estimation (staple): an algorithm for the validation of image segmentation, IEEE transactions on medical imaging 23 (7) (2004) 903–921.
doi: 10.1109/TMI.2004.828354
pubmed: 15250643
pmcid: 1283110
J. Zilly, J. M. Buhmann, D. Mahapatra, Glaucoma detection using entropy sampling and ensemble learning for automatic optic cup and disc segmentation, Computerized Medical Imaging and Graphics 55 (2017) 28–41.
doi: 10.1016/j.compmedimag.2016.07.012
pubmed: 27590198
J. V. Manjón, P. Coupé, P. Raniga, Y. Xia, P. Desmond, J. Fripp, O. Salvado, Mri white matter lesion segmentation using an ensemble of neural networks and overcomplete patch-based voting, Computerized Medical Imaging and Graphics 69 (2018) 43–51.
doi: 10.1016/j.compmedimag.2018.05.001
pubmed: 30172092
N. Bnouni, I. Rekik, M. S. Rhim, N. E. B. Amara, Dynamic multi-scale cnn forest learning for automatic cervical cancer segmentation, in: International Workshop on Machine Learning in Medical Imaging, Springer, 2018, pp. 19–27.
K. Kamnitsas, W. Bai, E. Ferrante, S. McDonagh, M. Sinclair, N. Pawlowski, M. Rajchl, M. Lee, B. Kainz, D. Rueckert, et al., Ensembles of multiple models and architectures for robust brain tumour segmentation, in: International MICCAI brainlesion workshop, Springer, 2017, pp. 450–462.
J. Dolz, C. Desrosiers, L. Wang, J. Yuan, D. Shen, I. B. Ayed, Deep cnn ensembles and suggestive annotations for infant brain mri segmentation, Computerized Medical Imaging and Graphics 79 (2020) 101660.
doi: 10.1016/j.compmedimag.2019.101660
pubmed: 31785402
A. E. Kavur, L. I. Kuncheva, M. A. Selver, Basic ensembles of vanilla-style deep learning models improve liver segmentation from ct images, in: Convolutional Neural Networks for Medical Image Processing Applications, CRC Press, 2020, pp. 52–74.
S. Reza, J. A. Butman, D. M. Park, D. L. Pham, S. Roy, Adaboosted deep ensembles: Getting maximum performance out of small training datasets, in: International Workshop on Machine Learning in Medical Imaging, Springer, 2020, pp. 572–582.
F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, K. H. Maier-Hein, nnu-net: a self-configuring method for deep learning-based biomedical image segmentation, Nature methods 18 (2) (2021) 203–211.
doi: 10.1038/s41592-020-01008-z
pubmed: 33288961
B. Ghoshal, A. Tucker, B. Sanghera, W. Lup Wong, Estimating uncertainty in deep learning for reporting confidence to clinicians in medical image segmentation and diseases detection, Computational Intelligence 37 (2) (2021) 701–734. https://onlinelibrary.wiley.com/doi/pdf/10.1111/coin.12411 https://doi.org/10.1111/coin.12411
A. Jungo, M. Reyes, Assessing reliability and challenges of uncertainty estimations for medical image segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2019, pp. 48–56.
Z. Mirikharaji, K. Abhishek, S. Izadi, G. Hamarneh, D-lema: Deep learning ensembles from multiple annotations-application to skin lesion segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1837–1846.
A. J. Sharkey, N. E. Sharkey, Combining diverse neural nets, The Knowledge Engineering Review 12 (3) (1997) 231–247.
doi: 10.1017/S0269888997003123
E. A. Eisenhauer, P. Therasse, J. Bogaerts, L. H. Schwartz, D. Sargent, R. Ford, J. Dancey, S. Arbuck, S. Gwyther, M. Mooney, et al., New response evaluation criteria in solid tumours: revised recist guideline (version 1.1), European Journal of Cancer 45 (2) (2009) 228–247.
F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 fourth international conference on 3D vision (3DV), IEEE, 2016, pp. 565–571.
K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore, S. Phillips, D. Maffitt, M. Pringle, et al., The cancer imaging archive (tcia): maintaining and operating a public information repository, Journal of digital imaging 26 (6) (2013) 1045–1057. https://doi.org/10.1007/s10278-013-9622-7
doi: 10.1007/s10278-013-9622-7
pubmed: 23884657
pmcid: 3824915
E. Gibson, F. Giganti, Y. Hu, E. Bonmati, S. Bandula, K. Gurusamy, B. Davidson, S. P. Pereira, M. J. Clarkson, D. C. Barratt, Multi-organ Abdominal CT Reference Standard Segmentations, This data set was developed as part of independent research supported by Cancer Research UK (Multidisciplinary C28070/A19985) and the National Institute for Health Research UCL/UCL Hospitals Biomedical Research Centre. (Feb. 2018). https://doi.org/10.5281/zenodo.1169361
B. Rister, K. Shivakumar, T. Nobashi, D. Rubin, Ct-org: Ct volumes with multiple organ segmentations, The Cancer Imaging Archive (2019).
A. L. Simpson, M. Antonelli, S. Bakas, M. Bilello, K. Farahani, B. van Ginneken, A. Kopp-Schneider, B. A. Landman, G. Litjens, B. Menze, O. Ronneberger, R. M. Summers, P. Bilic, P. F. Christ, R. K. G. Do, M. Gollub, J. Golia-Pernicka, S. H. Heckers, W. R. Jarnagin, M. K. McHugo, S. Napel, E. Vorontsov, L. Maier-Hein, M. J. Cardoso, A large annotated medical image dataset for the development and evaluation of segmentation algorithms (2019). http://arxiv.org/abs/1902.09063 arXiv:1902.09063 .
M. A. Socinski, R. M. Jotte, F. Cappuzzo, F. Orlandi, D. Stroyakovskiy, N. Nogami, D. Rodríguez-Abreu, D. Moro-Sibilot, C. A. Thomas, F. Barlesi, et al., Atezolizumab for first-line treatment of metastatic nonsquamous nsclc, New England Journal of Medicine 378 (24) (2018) 2288–2301.
doi: 10.1056/NEJMoa1716948
pubmed: 29863955
U. Vitolo, M. Trněnỳ, D. Belada, J. M. Burke, A. M. Carella, N. Chua, P. Abrisqueta, J. Demeter, I. Flinn, X. Hong, et al., Obinutuzumab or rituximab plus cyclophosphamide, doxorubicin, vincristine, and prednisone in previously untreated diffuse large b-cell lymphoma, J Clin Oncol 35 (31) (2017) 3529–3537.
doi: 10.1200/JCO.2017.73.3402
pubmed: 28796588
E. A. Perez, C. Barrios, W. Eiermann, M. Toi, Y.-H. Im, P. Conte, M. Martin, T. Pienkowski, X. Pivot, H. A. Burris, et al., Trastuzumab emtansine with or without pertuzumab versus trastuzumab plus taxane for human epidermal growth factor receptor 2–positive, advanced breast cancer: primary results from the phase iii marianne study, Journal of Clinical Oncology 35 (2) (2017) 141.
R. Jotte, F. Cappuzzo, I. Vynnychenko, D. Stroyakovskiy, D. Rodríguez-Abreu, M. Hussein, R. Soo, H. J. Conter, T. Kozuki, K.-C. Huang, et al., Atezolizumab in combination with carboplatin and nab-paclitaxel in advanced squamous nsclc (impower131): results from a randomized phase iii trial, Journal of Thoracic Oncology 15 (8) (2020) 1351–1360.
doi: 10.1016/j.jtho.2020.03.028
pubmed: 32302702
L. I. Kuncheva, Combining pattern classifiers: methods and algorithms, John Wiley & Sons, 2014.
I. Pitas, A. Venetsanopoulos, Nonlinear mean filters in image processing, IEEE transactions on acoustics, speech, and signal processing 34 (3) (1986) 573–584.
doi: 10.1109/TASSP.1986.1164857
O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241.
E. Gibson, F. Giganti, Y. Hu, E. Bonmati, S. Bandula, K. Gurusamy, B. Davidson, S. P. Pereira, M. J. Clarkson, D. C. Barratt, Automatic multi-organ segmentation on abdominal ct with dense v-networks, IEEE transactions on medical imaging 37 (8) (2018) 1822–1834.
doi: 10.1109/TMI.2018.2806309
pubmed: 29994628
pmcid: 6076994
K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (2015). http://arxiv.org/abs/1502.01852 http://arxiv.org/abs/1502.01852 arXiv:1502.01852.
P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, A. G. Wilson, Averaging weights leads to wider optima and better generalization, arXiv preprint http://arxiv.org/abs/1803.05407 arXiv:1803.05407 (2018).
G. Hinton, O. Vinyals, J. Dean, et al., Distilling the knowledge in a neural network, arXiv preprint http://arxiv.org/abs/1503.02531 arXiv:1503.02531 2 (7) (2015).
Y.-H. Nai, B. W. Teo, N. L. Tan, S. O’Doherty, M. C. Stephenson, Y. L. Thian, E. Chiong, A. Reilhac, Comparison of metrics for the evaluation of medical segmentations using prostate mri dataset, Computers in Biology and Medicine 134 (2021) 104497.
doi: 10.1016/j.compbiomed.2021.104497
pubmed: 34022486
A. E. Kavur, N. S. Gezer, M. Bariş, S. Aslan, P.-H. Conze, V. Groza, D. D. Pham, S. Chatterjee, P. Ernst, S. Özkan, et al., Chaos challenge-combined (ct-mr) healthy abdominal organ segmentation, Medical Image Analysis 69 (2021) 101950.
doi: 10.1016/j.media.2020.101950
pubmed: 33421920
D. York, N. M. Evensen, M. L. Martinez, J. De Basabe Delgado, Unified equations for the slope, intercept, and standard errors of the best straight line, American journal of physics 72 (3) (2004) 367–375.
S. Fort, H. Hu, B. Lakshminarayanan, Deep ensembles: A loss landscape perspective, arXiv preprint http://arxiv.org/abs/1912.02757 arXiv:1912.02757 (2019).
Z. Allen-Zhu, Y. Li, Towards understanding ensemble, knowledge distillation and self-distillation in deep learning, arXiv preprint http://arxiv.org/abs/2012.09816 arXiv:2012.09816 (2020).
T. Garipov, P. Izmailov, D. Podoprikhin, D. P. Vetrov, A. G. Wilson, Loss surfaces, mode connectivity, and fast ensembling of dnns, Advances in neural information processing systems 31 (2018).