ESFPNet: Efficient Stage-Wise Feature Pyramid on Mix Transformer for Deep Learning-Based Cancer Analysis in Endoscopic Video.
autofluorescence bronchoscopy
colonoscopy
colorectal cancer
deep learning
efficient stage-wise feature pyramid
endoscopic video analysis
lesion analysis
lung cancer
mix transformer
semantic image segmentation
Journal
Journal of imaging
ISSN: 2313-433X
Titre abrégé: J Imaging
Pays: Switzerland
ID NLM: 101698819
Informations de publication
Date de publication:
07 Aug 2024
07 Aug 2024
Historique:
received:
20
06
2024
revised:
19
07
2024
accepted:
01
08
2024
medline:
28
8
2024
pubmed:
28
8
2024
entrez:
28
8
2024
Statut:
epublish
Résumé
For patients at risk of developing either lung cancer or colorectal cancer, the identification of suspect lesions in endoscopic video is an important procedure. The physician performs an endoscopic exam by navigating an endoscope through the organ of interest, be it the lungs or intestinal tract, and performs a visual inspection of the endoscopic video stream to identify lesions. Unfortunately, this entails a tedious, error-prone search over a lengthy video sequence. We propose a deep learning architecture that enables the real-time detection and segmentation of lesion regions from endoscopic video, with our experiments focused on autofluorescence bronchoscopy (AFB) for the lungs and colonoscopy for the intestinal tract. Our architecture, dubbed ESFPNet, draws on a pretrained Mix Transformer (MiT) encoder and a decoder structure that incorporates a new Efficient Stage-Wise Feature Pyramid (ESFP) to promote accurate lesion segmentation. In comparison to existing deep learning models, the ESFPNet model gave superior lesion segmentation performance for an AFB dataset. It also produced superior segmentation results for three widely used public colonoscopy databases and nearly the best results for two other public colonoscopy databases. In addition, the lightweight ESFPNet architecture requires fewer model parameters and less computation than other competing models, enabling the real-time analysis of input video frames. Overall, these studies point to the combined superior analysis performance and architectural efficiency of the ESFPNet for endoscopic video analysis. Lastly, additional experiments with the public colonoscopy databases demonstrate the learning ability and generalizability of ESFPNet, implying that the model could be effective for region segmentation in other domains.
Identifiants
pubmed: 39194980
pii: jimaging10080191
doi: 10.3390/jimaging10080191
pii:
doi:
Types de publication
Journal Article
Langues
eng
Subventions
Organisme : National Institutes of Health - National Cancer Institute
ID : R01-CA151433