Vocal cord leukoplakia classification using deep learning models in white light and narrow band imaging endoscopy images.
NBI images
classification
deep learning
vocal cord leukoplakia
white light images
Journal
Head & neck
ISSN: 1097-0347
Titre abrégé: Head Neck
Pays: United States
ID NLM: 8902541
Informations de publication
Date de publication:
12 2023
12 2023
Historique:
revised:
15
09
2023
received:
05
05
2023
accepted:
29
09
2023
medline:
13
11
2023
pubmed:
14
10
2023
entrez:
14
10
2023
Statut:
ppublish
Résumé
Accurate vocal cord leukoplakia classification is critical for the individualized treatment and early detection of laryngeal cancer. Numerous deep learning techniques have been proposed, but it is unclear how to select one to apply in the laryngeal tasks. This article introduces and reliably evaluates existing deep learning models for vocal cord leukoplakia classification. We created white light and narrow band imaging (NBI) image datasets of vocal cord leukoplakia which were classified into six classes: normal tissues (NT), inflammatory keratosis (IK), mild dysplasia (MiD), moderate dysplasia (MoD), severe dysplasia (SD), and squamous cell carcinoma (SCC). Vocal cord leukoplakia classification was performed using six classical deep learning models, AlexNet, VGG, Google Inception, ResNet, DenseNet, and Vision Transformer. GoogLeNet (i.e., Google Inception V1), DenseNet-121, and ResNet-152 perform excellent classification. The highest overall accuracy of white light image classification is 0.9583, while the highest overall accuracy of NBI image classification is 0.9478. These three neural networks all provide very high sensitivity, specificity, and precision values. GoogLeNet, ResNet, and DenseNet can provide accurate pathological classification of vocal cord leukoplakia. It facilitates early diagnosis, providing judgment on conservative treatment or surgical treatment of different degrees, and reducing the burden on endoscopists.
Sections du résumé
BACKGROUND
Accurate vocal cord leukoplakia classification is critical for the individualized treatment and early detection of laryngeal cancer. Numerous deep learning techniques have been proposed, but it is unclear how to select one to apply in the laryngeal tasks. This article introduces and reliably evaluates existing deep learning models for vocal cord leukoplakia classification.
METHODS
We created white light and narrow band imaging (NBI) image datasets of vocal cord leukoplakia which were classified into six classes: normal tissues (NT), inflammatory keratosis (IK), mild dysplasia (MiD), moderate dysplasia (MoD), severe dysplasia (SD), and squamous cell carcinoma (SCC). Vocal cord leukoplakia classification was performed using six classical deep learning models, AlexNet, VGG, Google Inception, ResNet, DenseNet, and Vision Transformer.
RESULTS
GoogLeNet (i.e., Google Inception V1), DenseNet-121, and ResNet-152 perform excellent classification. The highest overall accuracy of white light image classification is 0.9583, while the highest overall accuracy of NBI image classification is 0.9478. These three neural networks all provide very high sensitivity, specificity, and precision values.
CONCLUSION
GoogLeNet, ResNet, and DenseNet can provide accurate pathological classification of vocal cord leukoplakia. It facilitates early diagnosis, providing judgment on conservative treatment or surgical treatment of different degrees, and reducing the burden on endoscopists.
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
3129-3145Informations de copyright
© 2023 Wiley Periodicals LLC.
Références
Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge. Int J Comput Vis. 2015;115(3):211-252. doi:10.1007/s11263-015-0816-y
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014 International Conference on Learning Representations (ICLR). International Conference on Learning Representations, ICLR; 2014.
Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems. Curran Associates, Inc.; 2012.
Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society; 2015:1-9. doi:10.1109/CVPR.2015.7298594
Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning. PMLR; 2015:448-456.
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society; 2016:2818-2826. doi:10.1109/CVPR.2016.308
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA. Inception-v4, inception-ResNet and the impact of residual connections on learning. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, in AAAI'17. San Francisco, California, USA. AAAI Press; 2017:4278-4284.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society; 2016:770-778. doi:10.1109/CVPR.2016.90
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society; 2017:2261-2269. doi:10.1109/CVPR.2017.243
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society; 2018:7132-7141. doi:10.1109/CVPR.2018.00745
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in Neural Information Processing Systems. Curran Associates, Inc.; 2017.
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale. International Conference on Learning Representations. International Conference on Learning Representations, ICLR; 2022.
Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Technical Report; 2009.
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A. Reading digits in natural images with unsupervised feature learning. NIPS Workshop on Deep Learning and Unsupervised Feature Learning. Neural Information Processing Systems; 2011.
Lin T-Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T, eds. Computer Vision-ECCV 2014. Lecture Notes in Computer Science. Springer; 2014:740-755. doi:10.1007/978-3-319-10602-1_48
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A. The pascal visual object classes (VOC) challenge. Int J Comput Vis. 2010;88(2):303-338. doi:10.1007/s11263-009-0275-4
Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A. The pascal visual object classes challenge: a retrospective. Int J Comput Vis. 2015;111(1):98-136. doi:10.1007/s11263-014-0733-5
Yao P, Usman M, Chen YH, et al. Applications of artificial intelligence to office laryngoscopy: a scoping review. Laryngoscope. 2022;132(10):1993-2016. doi:10.1002/lary.29886
Moccia S, Vanone GO, Momi ED, et al. Learning-based classification of informative laryngoscopic frames. Comput Methods Programs Biomed. 2018;158:21-30. doi:10.1016/j.cmpb.2018.01.030
Galdran A, Costa P, Campilho A. Real-time informative laryngoscopic frame classification with pre-trained convolutional neural networks. 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). IEEE Computer Society; 2019:87-90. doi:10.1109/ISBI.2019.8759511
Xiong H, Lin P, Yu JG, et al. Computer-aided diagnosis of laryngeal cancer via deep learning based on laryngoscopic images. EBioMedicine. 2019;48:92-99. doi:10.1016/j.ebiom.2019.08.075
Luan B, Sun Y, Tong C, Liu Y, Liu H. R-FCN based laryngeal lesion detection. 2019 12th International Symposium on Computational Intelligence and Design (ISCID). Institute of Electrical and Electronics Engineers Inc.; 2019:128-131. doi:10.1109/ISCID.2019.10112
Ji B, Ren J, Zheng X, et al. A multi-scale recurrent fully convolution neural network for laryngeal leukoplakia segmentation. Biomed Signal Process Control. 2020;59:101913. doi:10.1016/j.bspc.2020.101913
Hamad A, Haney M, Lever TE, Bunyak F. Automated segmentation of the vocal folds in laryngeal endoscopy videos using deep convolutional regression networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2019;140-148. doi:10.1109/CVPRW.2019.00023
Cho WK, Lee YJ, Joo HA, et al. Diagnostic accuracies of laryngeal diseases using a convolutional neural network-based image classification system. Laryngoscope. 2021;131(11):2558-2566. doi:10.1002/lary.29595
Azam MA, Sampieri C, Ioppi A, et al. Deep learning applied to white light and narrow band imaging videolaryngoscopy: toward real-time laryngeal cancer detection. Laryngoscope. 2022;132(9):1798-1806. doi:10.1002/lary.29960
Zhao Q, He Y, Wu Y, et al. Vocal cord lesions classification based on deep convolutional neural network and transfer learning. Med Phys. 2022;49(1):432-442. doi:10.1002/mp.15371