Classification of imbalanced oral cancer image data from high-risk population.
deep learning
ensemble learning
imbalanced multi-class datasets
mobile screening device
oral cancer
Journal
Journal of biomedical optics
ISSN: 1560-2281
Titre abrégé: J Biomed Opt
Pays: United States
ID NLM: 9605853
Informations de publication
Date de publication:
10 2021
10 2021
Historique:
received:
03
08
2021
accepted:
28
09
2021
entrez:
24
10
2021
pubmed:
25
10
2021
medline:
17
11
2021
Statut:
ppublish
Résumé
Early detection of oral cancer is vital for high-risk patients, and machine learning-based automatic classification is ideal for disease screening. However, current datasets collected from high-risk populations are unbalanced and often have detrimental effects on the performance of classification. To reduce the class bias caused by data imbalance. We collected 3851 polarized white light cheek mucosa images using our customized oral cancer screening device. We use weight balancing, data augmentation, undersampling, focal loss, and ensemble methods to improve the neural network performance of oral cancer image classification with the imbalanced multi-class datasets captured from high-risk populations during oral cancer screening in low-resource settings. By applying both data-level and algorithm-level approaches to the deep learning training process, the performance of the minority classes, which were difficult to distinguish at the beginning, has been improved. The accuracy of "premalignancy" class is also increased, which is ideal for screening applications. Experimental results show that the class bias induced by imbalanced oral cancer image datasets could be reduced using both data- and algorithm-level methods. Our study may provide an important basis for helping understand the influence of unbalanced datasets on oral cancer deep learning classifiers and how to mitigate.
Identifiants
pubmed: 34689442
pii: JBO-210246R
doi: 10.1117/1.JBO.26.10.105001
pmc: PMC8536945
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : NIDCR NIH HHS
ID : R01 DE030682
Pays : United States
Organisme : NIBIB NIH HHS
ID : UH2 EB022623
Pays : United States
Organisme : NCI NIH HHS
ID : UH3 CA239682
Pays : United States
Références
Artif Intell Med. 2020 Jan;102:101779
pubmed: 31980109
Comput Math Methods Med. 2015;2015:368674
pubmed: 25977704
IEEE Trans Neural Netw. 1993;4(6):962-9
pubmed: 18276526
Comput Methods Programs Biomed. 2018 Jan;153:1-9
pubmed: 29157442
Biomed Opt Express. 2018 Oct 10;9(11):5318-5329
pubmed: 30460130
Biomed Eng Online. 2017 Nov 21;16(1):132
pubmed: 29157240
J Healthc Eng. 2019 Oct 16;2019:7294582
pubmed: 31737241
Nat Commun. 2020 Jun 11;11(1):2961
pubmed: 32528084
J Healthc Eng. 2019 Feb 4;2019:5156416
pubmed: 30863524
J Biomed Opt. 2019 Oct;24(10):1-8
pubmed: 31642247
J Biomed Opt. 2021 Aug;26(8):
pubmed: 34453419
Front Genet. 2019 Feb 19;10:80
pubmed: 30838023
Sci Rep. 2017 Jun 23;7(1):4172
pubmed: 28646155