Dermatologist versus artificial intelligence confidence in dermoscopy diagnosis: Complementary information that may affect decision-making.
computer vision
deep learning
neural networks
skin lesion classification
uncertainty
Journal
Experimental dermatology
ISSN: 1600-0625
Titre abrégé: Exp Dermatol
Pays: Denmark
ID NLM: 9301549
Informations de publication
Date de publication:
10 2023
10 2023
Historique:
revised:
04
07
2023
received:
21
10
2022
accepted:
13
07
2023
medline:
12
10
2023
pubmed:
3
8
2023
entrez:
3
8
2023
Statut:
ppublish
Résumé
In dermatology, deep learning may be applied for skin lesion classification. However, for a given input image, a neural network only outputs a label, obtained using the class probabilities, which do not model uncertainty. Our group developed a novel method to quantify uncertainty in stochastic neural networks. In this study, we aimed to train such network for skin lesion classification and evaluate its diagnostic performance and uncertainty, and compare the results to the assessments by a group of dermatologists. By passing duplicates of an image through such a stochastic neural network, we obtained distributions per class, rather than a single probability value. We interpreted the overlap between these distributions as the output uncertainty, where a high overlap indicated a high uncertainty, and vice versa. We had 29 dermatologists diagnose a series of skin lesions and rate their confidence. We compared these results to those of the network. The network achieved a sensitivity and specificity of 50% and 88%, comparable to the average dermatologist (respectively 68% and 73%). Higher confidence/less uncertainty was associated with better diagnostic performance both in the neural network and in dermatologists. We found no correlation between the uncertainty of the neural network and the confidence of dermatologists (R = -0.06, p = 0.77). Dermatologists should not blindly trust the output of a neural network, especially when its uncertainty is high. The addition of an uncertainty score may stimulate the human-computer interaction.
Types de publication
Comparative Study
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1744-1751Informations de copyright
© 2023 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Références
European Cancer Information System. Skin melanoma burden in EU-27. 2021 1-2.
Forsea A-M. Melanoma epidemiology and early detection in Europe: diversity and disparities. Dermatol Pract Concept. 2020;10(3):e2020033. doi:10.5826/DPC.1003A33
Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition; 2015:1-9. doi:10.1109/CVPR.2015.7298594
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016:770-778. doi:10.1109/CVPR.2016.90
Tan M, Le QV. EfficientNet: rethinking model scaling for convolutional neural networks. International Conference on Machine learning; 2019:6105-6114.
Liu X, Faes L, Kale AU, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Heal. 2019;1(6):e271-e297. doi:10.1016/S2589-7500(19)30123-2
Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115-118. doi:10.1038/nature21056
Haenssle HA, Fink C, Schneiderbauer R, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol. 2018;29(8):1836-1842. doi:10.1093/annonc/mdy166
Haenssle HA, Fink C, Toberer F, et al. Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions. Ann Oncol. 2020;31(1):137-143. doi:10.1016/j.annonc.2019.10.013
Marchetti MA, Codella NCF, Dusza SW, et al. Results of the 2016 international skin imaging collaboration international symposium on biomedical imaging challenge: comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images. J Am Acad Dermatol. 2018;78(2):270-277. doi:10.1016/j.jaad.2017.08.016
Gal Y, Ghahramani Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning. International Conference on Machine Learning; 2016:1050-1059.
Tschandl P, Rinner C, Apalla Z, et al. Human-computer collaboration for skin cancer recognition. Nat Med. 2020;26(8):1229-1234. doi:10.1038/s41591-020-0942-0
Van Molle P, Verbelen T, Vankeirsbilck B, et al. Leveraging the Bhattacharyya coefficient for uncertainty quantification in deep neural networks. Neural Comput Appl. 2021;33(16):10259-10275. doi:10.1007/s00521-021-05789-y
Tschandl P, Rosendahl C, Kittler H. Data descriptor: the HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data. 2018;5(1):1-9. doi:10.1038/sdata.2018.161
Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge. Int J Comput Vis. 2015;115(3):211-252. doi:10.1007/s11263-015-0816-y
Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C. A survey on deep transfer learning. Lect Notes Comput Sci. 2018;11141:270-279. doi:10.1007/978-3-030-01424-7_27
Agarap AF. Deep learning using rectified linear units (ReLU). arXiv. 2018:1803.08375.
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929-1958.
Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. J Mach Learn Res. 2010;9:249-256.
Kingma DP, Ba JL. Adam: a method for stochastic optimization. arXiv. 2015:1412.6980.
Van Molle P, De Strooper M, Verbelen T, Vankeirsbilck B, Simoens P, Dhoedt B. Visualizing convolutional neural networks to improve decision support for skin lesion classification. Lect Notes Comput Sci. 2018;11038:115-123. doi:10.1007/978-3-030-02628-8_13
Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv. 2013:1312.6034.
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016:2921-2929. doi:10.1109/CVPR.2016.319
Jahanifar M, Zamani Tajeddin N, Mohammadzadeh Asl B, Gooya A. Supervised saliency map driven segmentation of lesions in dermoscopic images. IEEE J Biomed Health Inform. 2019;23(2):509-518. doi:10.1109/JBHI.2018.2839647
Jia X, Shen L. Skin lesion classification using class activation map. arXiv. 2017:1703.01053.
Kim B, Wattenberg M, Gilmer J, et al. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). 35th Int Conf Mach Learn ICML. 2018;6:4186-4195. doi:10.48550/arxiv.1711.11279
Lucieri A, Bajwa MN, Alexander Braun S, Malik MI, Dengel A, Ahmed S. On interpretability of deep learning based skin lesion classifiers using concept activation vectors. Proc Int Jt Conf Neural Networks. 2020:1-10. doi:10.1109/IJCNN48605.2020.9206946
Hurwitz RM, Buckel LJ. Signature nevi: individuals with multiple melanocytic nevi commonly have similar clinical and histologic patterns. Dermatol Pract Concept. 2001;1(1):4. doi:10.5826/dpc.0101a04
Tschandl P, Codella N, Akay BN, et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. Lancet Oncol. 2019;20(7):938-947. doi:10.1016/S1470-2045(19)30333-X