Deep Ensembles Are Robust to Occasional Catastrophic Failures of Individual DNNs for Organs Segmentations in CT Images.

Humans Image Processing, Computer-Assisted / methods Neural Networks, Computer Tomography, X-Ray Computed / methods

Automated organ segmentation Computed tomography Deep ensembles Deep neural networks

Journal

Journal of digital imaging

ISSN: 1618-727X

Titre abrégé: J Digit Imaging

Pays: United States

ID NLM: 9100529

Informations de publication

Date de publication:
10 2023

Historique:

received: 25 01 2023

accepted: 18 05 2023

revised: 15 05 2023

medline: 18 9 2023

pubmed: 9 6 2023

entrez: 8 6 2023

Statut: ppublish

Résumé

Deep neural networks (DNNs) have recently showed remarkable performance in various computer vision tasks, including classification and segmentation of medical images. Deep ensembles (an aggregated prediction of multiple DNNs) were shown to improve a DNN's performance in various classification tasks. Here we explore how deep ensembles perform in the image segmentation task, in particular, organ segmentations in CT (Computed Tomography) images. Ensembles of V-Nets were trained to segment multiple organs using several in-house and publicly available clinical studies. The ensembles segmentations were tested on images from a different set of studies, and the effects of ensemble size as well as other ensemble parameters were explored for various organs. Compared to single models, Deep Ensembles significantly improved the average segmentation accuracy, especially for those organs where the accuracy was lower. More importantly, Deep Ensembles strongly reduced occasional "catastrophic" segmentation failures characteristic of single models and variability of the segmentation accuracy from image to image. To quantify this we defined the "high risk images": images for which at least one model produced an outlier metric (performed in the lower 5% percentile). These images comprised about 12% of the test images across all organs. Ensembles performed without outliers for 68%-100% of the "high risk images" depending on the performance metric used.

Identifiants

DOI: 10.1007/s10278-023-00857-2 PMID: 37291384 PMC: PMC10502003

pubmed: 37291384

doi: 10.1007/s10278-023-00857-2

pii: 10.1007/s10278-023-00857-2

pmc: PMC10502003

doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

2060-2074

Informations de copyright

Références

B. Lakshminarayanan, A. Pritzel, C. Blundell, Simple and scalable predictive uncertainty estimation using deep ensembles, Advances in neural information processing systems 30 (2017).

L. Breiman, Bagging predictors, Machine learning 24 (2) (1996) 123–140.

doi: 10.1007/BF00058655

R. E. Schapire, The strength of weak learnability, Machine learning 5 (2) (1990) 197–227.

doi: 10.1007/BF00116037

L. Breiman, Random forests, Machine learning 45 (1) (2001) 5–32.

doi: 10.1023/A:1010933404324

X. Li, B. Aldridge, J. Rees, R. Fisher, Estimating the ground truth from multiple individual segmentations with application to skin lesion segmentation, in: Proc. Medical Image Understanding and Analysis Conference, UK, Vol. 1, 2010, pp. 101–106.

E. Hann, I. A. Popescu, Q. Zhang, R. A. Gonzales, A. Barutçu, S. Neubauer, V. M. Ferreira, S. K. Piechnik, Deep neural network ensemble for on-the-fly quality control-driven segmentation of cardiac mri t1 mapping, Medical image analysis 71 (2021) 102029.

doi: 10.1016/j.media.2021.102029 pubmed: 33831594 pmcid: 8204226

S. K. Warfield, K. H. Zou, W. M. Wells, Simultaneous truth and performance level estimation (staple): an algorithm for the validation of image segmentation, IEEE transactions on medical imaging 23 (7) (2004) 903–921.

doi: 10.1109/TMI.2004.828354 pubmed: 15250643 pmcid: 1283110

J. Zilly, J. M. Buhmann, D. Mahapatra, Glaucoma detection using entropy sampling and ensemble learning for automatic optic cup and disc segmentation, Computerized Medical Imaging and Graphics 55 (2017) 28–41.

doi: 10.1016/j.compmedimag.2016.07.012 pubmed: 27590198

J. V. Manjón, P. Coupé, P. Raniga, Y. Xia, P. Desmond, J. Fripp, O. Salvado, Mri white matter lesion segmentation using an ensemble of neural networks and overcomplete patch-based voting, Computerized Medical Imaging and Graphics 69 (2018) 43–51.

doi: 10.1016/j.compmedimag.2018.05.001 pubmed: 30172092

N. Bnouni, I. Rekik, M. S. Rhim, N. E. B. Amara, Dynamic multi-scale cnn forest learning for automatic cervical cancer segmentation, in: International Workshop on Machine Learning in Medical Imaging, Springer, 2018, pp. 19–27.

K. Kamnitsas, W. Bai, E. Ferrante, S. McDonagh, M. Sinclair, N. Pawlowski, M. Rajchl, M. Lee, B. Kainz, D. Rueckert, et al., Ensembles of multiple models and architectures for robust brain tumour segmentation, in: International MICCAI brainlesion workshop, Springer, 2017, pp. 450–462.

J. Dolz, C. Desrosiers, L. Wang, J. Yuan, D. Shen, I. B. Ayed, Deep cnn ensembles and suggestive annotations for infant brain mri segmentation, Computerized Medical Imaging and Graphics 79 (2020) 101660.

doi: 10.1016/j.compmedimag.2019.101660 pubmed: 31785402

A. E. Kavur, L. I. Kuncheva, M. A. Selver, Basic ensembles of vanilla-style deep learning models improve liver segmentation from ct images, in: Convolutional Neural Networks for Medical Image Processing Applications, CRC Press, 2020, pp. 52–74.

S. Reza, J. A. Butman, D. M. Park, D. L. Pham, S. Roy, Adaboosted deep ensembles: Getting maximum performance out of small training datasets, in: International Workshop on Machine Learning in Medical Imaging, Springer, 2020, pp. 572–582.

F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, K. H. Maier-Hein, nnu-net: a self-configuring method for deep learning-based biomedical image segmentation, Nature methods 18 (2) (2021) 203–211.

doi: 10.1038/s41592-020-01008-z pubmed: 33288961

B. Ghoshal, A. Tucker, B. Sanghera, W. Lup Wong, Estimating uncertainty in deep learning for reporting confidence to clinicians in medical image segmentation and diseases detection, Computational Intelligence 37 (2) (2021) 701–734. https://onlinelibrary.wiley.com/doi/pdf/10.1111/coin.12411 https://doi.org/10.1111/coin.12411

A. Jungo, M. Reyes, Assessing reliability and challenges of uncertainty estimations for medical image segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2019, pp. 48–56.

Z. Mirikharaji, K. Abhishek, S. Izadi, G. Hamarneh, D-lema: Deep learning ensembles from multiple annotations-application to skin lesion segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1837–1846.

A. J. Sharkey, N. E. Sharkey, Combining diverse neural nets, The Knowledge Engineering Review 12 (3) (1997) 231–247.

doi: 10.1017/S0269888997003123

E. A. Eisenhauer, P. Therasse, J. Bogaerts, L. H. Schwartz, D. Sargent, R. Ford, J. Dancey, S. Arbuck, S. Gwyther, M. Mooney, et al., New response evaluation criteria in solid tumours: revised recist guideline (version 1.1), European Journal of Cancer 45 (2) (2009) 228–247.

F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 fourth international conference on 3D vision (3DV), IEEE, 2016, pp. 565–571.

K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore, S. Phillips, D. Maffitt, M. Pringle, et al., The cancer imaging archive (tcia): maintaining and operating a public information repository, Journal of digital imaging 26 (6) (2013) 1045–1057. https://doi.org/10.1007/s10278-013-9622-7

doi: 10.1007/s10278-013-9622-7 pubmed: 23884657 pmcid: 3824915

E. Gibson, F. Giganti, Y. Hu, E. Bonmati, S. Bandula, K. Gurusamy, B. Davidson, S. P. Pereira, M. J. Clarkson, D. C. Barratt, Multi-organ Abdominal CT Reference Standard Segmentations, This data set was developed as part of independent research supported by Cancer Research UK (Multidisciplinary C28070/A19985) and the National Institute for Health Research UCL/UCL Hospitals Biomedical Research Centre. (Feb. 2018). https://doi.org/10.5281/zenodo.1169361

B. Rister, K. Shivakumar, T. Nobashi, D. Rubin, Ct-org: Ct volumes with multiple organ segmentations, The Cancer Imaging Archive (2019).

A. L. Simpson, M. Antonelli, S. Bakas, M. Bilello, K. Farahani, B. van Ginneken, A. Kopp-Schneider, B. A. Landman, G. Litjens, B. Menze, O. Ronneberger, R. M. Summers, P. Bilic, P. F. Christ, R. K. G. Do, M. Gollub, J. Golia-Pernicka, S. H. Heckers, W. R. Jarnagin, M. K. McHugo, S. Napel, E. Vorontsov, L. Maier-Hein, M. J. Cardoso, A large annotated medical image dataset for the development and evaluation of segmentation algorithms (2019). http://arxiv.org/abs/1902.09063 arXiv:1902.09063 .

M. A. Socinski, R. M. Jotte, F. Cappuzzo, F. Orlandi, D. Stroyakovskiy, N. Nogami, D. Rodríguez-Abreu, D. Moro-Sibilot, C. A. Thomas, F. Barlesi, et al., Atezolizumab for first-line treatment of metastatic nonsquamous nsclc, New England Journal of Medicine 378 (24) (2018) 2288–2301.

doi: 10.1056/NEJMoa1716948 pubmed: 29863955

U. Vitolo, M. Trněnỳ, D. Belada, J. M. Burke, A. M. Carella, N. Chua, P. Abrisqueta, J. Demeter, I. Flinn, X. Hong, et al., Obinutuzumab or rituximab plus cyclophosphamide, doxorubicin, vincristine, and prednisone in previously untreated diffuse large b-cell lymphoma, J Clin Oncol 35 (31) (2017) 3529–3537.

doi: 10.1200/JCO.2017.73.3402 pubmed: 28796588

E. A. Perez, C. Barrios, W. Eiermann, M. Toi, Y.-H. Im, P. Conte, M. Martin, T. Pienkowski, X. Pivot, H. A. Burris, et al., Trastuzumab emtansine with or without pertuzumab versus trastuzumab plus taxane for human epidermal growth factor receptor 2–positive, advanced breast cancer: primary results from the phase iii marianne study, Journal of Clinical Oncology 35 (2) (2017) 141.

R. Jotte, F. Cappuzzo, I. Vynnychenko, D. Stroyakovskiy, D. Rodríguez-Abreu, M. Hussein, R. Soo, H. J. Conter, T. Kozuki, K.-C. Huang, et al., Atezolizumab in combination with carboplatin and nab-paclitaxel in advanced squamous nsclc (impower131): results from a randomized phase iii trial, Journal of Thoracic Oncology 15 (8) (2020) 1351–1360.

doi: 10.1016/j.jtho.2020.03.028 pubmed: 32302702

L. I. Kuncheva, Combining pattern classifiers: methods and algorithms, John Wiley & Sons, 2014.

I. Pitas, A. Venetsanopoulos, Nonlinear mean filters in image processing, IEEE transactions on acoustics, speech, and signal processing 34 (3) (1986) 573–584.

doi: 10.1109/TASSP.1986.1164857

O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241.

E. Gibson, F. Giganti, Y. Hu, E. Bonmati, S. Bandula, K. Gurusamy, B. Davidson, S. P. Pereira, M. J. Clarkson, D. C. Barratt, Automatic multi-organ segmentation on abdominal ct with dense v-networks, IEEE transactions on medical imaging 37 (8) (2018) 1822–1834.

doi: 10.1109/TMI.2018.2806309 pubmed: 29994628 pmcid: 6076994

K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (2015). http://arxiv.org/abs/1502.01852 http://arxiv.org/abs/1502.01852 arXiv:1502.01852.

P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, A. G. Wilson, Averaging weights leads to wider optima and better generalization, arXiv preprint http://arxiv.org/abs/1803.05407 arXiv:1803.05407 (2018).

G. Hinton, O. Vinyals, J. Dean, et al., Distilling the knowledge in a neural network, arXiv preprint http://arxiv.org/abs/1503.02531 arXiv:1503.02531 2 (7) (2015).

Y.-H. Nai, B. W. Teo, N. L. Tan, S. O’Doherty, M. C. Stephenson, Y. L. Thian, E. Chiong, A. Reilhac, Comparison of metrics for the evaluation of medical segmentations using prostate mri dataset, Computers in Biology and Medicine 134 (2021) 104497.

doi: 10.1016/j.compbiomed.2021.104497 pubmed: 34022486

A. E. Kavur, N. S. Gezer, M. Bariş, S. Aslan, P.-H. Conze, V. Groza, D. D. Pham, S. Chatterjee, P. Ernst, S. Özkan, et al., Chaos challenge-combined (ct-mr) healthy abdominal organ segmentation, Medical Image Analysis 69 (2021) 101950.

doi: 10.1016/j.media.2020.101950 pubmed: 33421920

D. York, N. M. Evensen, M. L. Martinez, J. De Basabe Delgado, Unified equations for the slope, intercept, and standard errors of the best straight line, American journal of physics 72 (3) (2004) 367–375.

S. Fort, H. Hu, B. Lakshminarayanan, Deep ensembles: A loss landscape perspective, arXiv preprint http://arxiv.org/abs/1912.02757 arXiv:1912.02757 (2019).

Z. Allen-Zhu, Y. Li, Towards understanding ensemble, knowledge distillation and self-distillation in deep learning, arXiv preprint http://arxiv.org/abs/2012.09816 arXiv:2012.09816 (2020).

T. Garipov, P. Izmailov, D. Podoprikhin, D. P. Vetrov, A. G. Wilson, Loss surfaces, mode connectivity, and fast ensembling of dnns, Advances in neural information processing systems 31 (2018).

Deep Ensembles Are Robust to Occasional Catastrophic Failures of Individual DNNs for Organs Segmentations in CT Images.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Yury Petrov (Y)

Bilal Malik (B)

Jill Fredrickson (J)

Skander Jemaa (S)

Richard A D Carano (RAD)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH