ESFPNet: Efficient Stage-Wise Feature Pyramid on Mix Transformer for Deep Learning-Based Cancer Analysis in Endoscopic Video.

autofluorescence bronchoscopy colonoscopy colorectal cancer deep learning efficient stage-wise feature pyramid endoscopic video analysis lesion analysis lung cancer mix transformer semantic image segmentation

Journal

Journal of imaging
ISSN: 2313-433X
Titre abrégé: J Imaging
Pays: Switzerland
ID NLM: 101698819

Informations de publication

Date de publication:
07 Aug 2024
Historique:
received: 20 06 2024
revised: 19 07 2024
accepted: 01 08 2024
medline: 28 8 2024
pubmed: 28 8 2024
entrez: 28 8 2024
Statut: epublish

Résumé

For patients at risk of developing either lung cancer or colorectal cancer, the identification of suspect lesions in endoscopic video is an important procedure. The physician performs an endoscopic exam by navigating an endoscope through the organ of interest, be it the lungs or intestinal tract, and performs a visual inspection of the endoscopic video stream to identify lesions. Unfortunately, this entails a tedious, error-prone search over a lengthy video sequence. We propose a deep learning architecture that enables the real-time detection and segmentation of lesion regions from endoscopic video, with our experiments focused on autofluorescence bronchoscopy (AFB) for the lungs and colonoscopy for the intestinal tract. Our architecture, dubbed ESFPNet, draws on a pretrained Mix Transformer (MiT) encoder and a decoder structure that incorporates a new Efficient Stage-Wise Feature Pyramid (ESFP) to promote accurate lesion segmentation. In comparison to existing deep learning models, the ESFPNet model gave superior lesion segmentation performance for an AFB dataset. It also produced superior segmentation results for three widely used public colonoscopy databases and nearly the best results for two other public colonoscopy databases. In addition, the lightweight ESFPNet architecture requires fewer model parameters and less computation than other competing models, enabling the real-time analysis of input video frames. Overall, these studies point to the combined superior analysis performance and architectural efficiency of the ESFPNet for endoscopic video analysis. Lastly, additional experiments with the public colonoscopy databases demonstrate the learning ability and generalizability of ESFPNet, implying that the model could be effective for region segmentation in other domains.

Identifiants

pubmed: 39194980
pii: jimaging10080191
doi: 10.3390/jimaging10080191
pii:
doi:

Types de publication

Journal Article

Langues

eng

Subventions

Organisme : National Institutes of Health - National Cancer Institute
ID : R01-CA151433

Auteurs

Qi Chang (Q)

School of Electrical Engineering and Computer Science, Penn State University, University Park, PA 16802, USA.

Danish Ahmad (D)

Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA.

Jennifer Toth (J)

Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA.

Rebecca Bascom (R)

Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA.

William E Higgins (WE)

School of Electrical Engineering and Computer Science, Penn State University, University Park, PA 16802, USA.

Classifications MeSH