Optimizing latent graph representations of surgical scenes for unseen domain generalization.

Domain adaptation Graph neural networks Object-centric learning Surgical video analysis

Journal

International journal of computer assisted radiology and surgery

ISSN: 1861-6429

Titre abrégé: Int J Comput Assist Radiol Surg

Pays: Germany

ID NLM: 101499225

Informations de publication

Date de publication:
28 Apr 2024

Historique:

received: 03 03 2024

accepted: 22 03 2024

medline: 28 4 2024

pubmed: 28 4 2024

entrez: 28 4 2024

Statut: aheadofprint

Résumé

Advances in deep learning have resulted in effective models for surgical video analysis; however, these models often fail to generalize across medical centers due to domain shift caused by variations in surgical workflow, camera setups, and patient demographics. Recently, object-centric learning has emerged as a promising approach for improved surgical scene understanding, capturing and disentangling visual and semantic properties of surgical tools and anatomy to improve downstream task performance. In this work, we conduct a multicentric performance benchmark of object-centric approaches, focusing on critical view of safety assessment in laparoscopic cholecystectomy, then propose an improved approach for unseen domain generalization. We evaluate four object-centric approaches for domain generalization, establishing baseline performance. Next, leveraging the disentangled nature of object-centric representations, we dissect one of these methods through a series of ablations (e.g., ignoring either visual or semantic features for downstream classification). Finally, based on the results of these ablations, we develop an optimized method specifically tailored for domain generalization, LG-DG, that includes a novel disentanglement loss function. Our optimized approach, LG-DG, achieves an improvement of 9.28% over the best baseline approach. More broadly, we show that object-centric approaches are highly effective for domain generalization thanks to their modular approach to representation learning. We investigate the use of object-centric methods for unseen domain generalization, identify method-agnostic factors critical for performance, and present an optimized approach that substantially outperforms existing methods.

Identifiants

DOI: 10.1007/s11548-024-03121-2 PMID: 38678488

pubmed: 38678488

doi: 10.1007/s11548-024-03121-2

pii: 10.1007/s11548-024-03121-2

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Subventions

Organisme : Agence Nationale de la Recherche

ID : ANR-20-CHIA-0029-01

Informations de copyright

Références

Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2016) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97

doi: 10.1109/TMI.2016.2593957 pubmed: 27455522

Grammatikopoulou M, Flouty E, Kadkhodamohammadi A, Quellec G, Chow A, Nehme J, Luengo I, Stoyanov D (2021) Cadis: Cataract dataset for surgical rgb-image segmentation. Med Image Anal 71:66

doi: 10.1016/j.media.2021.102053

Sestini L, Rosa B, De Momi E, Ferrigno G, Padoy N (2023) Fun-sis: a fully unsupervised approach for surgical instrument segmentation. Med Image Anal 85:102751

doi: 10.1016/j.media.2023.102751 pubmed: 36716700

Sharma S, Nwoye CI, Mutter D, Padoy N (2023) Surgical action triplet detection by mixed supervised learning of instrument-tissue interactions. In: MICCAI. Springer, Berlin, pp 505–514

Hao L, Hu Y, Lin W, Wang Q, Li H, Fu H, Duan J, Liu J (2023) Act-net: anchor-context action detection in surgery videos. In: MICCAI. Springer, Berlin, pp 196–206

Kassem H, Alapatt D, Mascagni P, AI4SafeChole C, Karargyris A, Padoy N. (2022) Federated cycling (fedcy): semi-supervised federated learning of surgical phases. IEEE Trans Med Imaging 6:66

Srivastav V, Gangi A, Padoy N (2022) Unsupervised domain adaptation for clinician pose estimation and instance segmentation in the operating room. In: Medical image analysis

Wang Q, Bu P, Breckon TP (2019) Unifying unsupervised domain adaptation and zero-shot visual recognition. In: 2019 International joint conference on neural networks (IJCNN). IEEE, pp 1–8

Mottaghi A, Sharghi A, Yeung S, Mohareri O (2022) Adaptation of surgical activity recognition models across operating rooms. In: MICCAI. Springer, pp 530–540

Xu J, Zhang Q, Yu Y, Zhao R, Bian X, Liu X, Wang J, Ge Z, Qian D (2022) Deep reconstruction-recoding network for unsupervised domain adaptation and multi-center generalization in colonoscopy polyp detection. Comput Methods Programs Biomed 214:106576

doi: 10.1016/j.cmpb.2021.106576 pubmed: 34915425

Mascagni P, Vardazaryan A, Alapatt D, Urade T, Emre T, Fiorillo C, Pessaux P, Mutter D, Marescaux J, Costamagna G et al (2021) Artificial intelligence for surgical safety: automatic assessment of the critical view of safety in laparoscopic cholecystectomy using deep learning. Ann Surg 6:66

Murali A, Alapatt D, Mascagni P, Vardazaryan A, Garcia A, Okamoto N, Mutter D, Padoy N (2023) Latent graph representations for critical view of safety assessment. IEEE Trans Med Imaging 66:1

Murali A, Alapatt D, Mascagni P, Vardazaryan A, Garcia A, Okamoto N, Mutter D, Padoy N (2023) Encoding surgical videos as latent spatiotemporal graphs for object and anatomy-driven reasoning. In: MICCAI. Springer, Berlin, pp 647–657

Murali A, Alapatt D, Mascagni P, Vardazaryan A, Garcia A, Okamoto N, Costamagna G, Mutter D, Marescaux J, Dallemagne B et al (2023) The endoscapes dataset for surgical scene segmentation, object detection, and critical view of safety assessment: official splits and benchmark. arXiv preprint arXiv:2312.12429

Basak H, Yin Z (2023) Semi-supervised domain adaptive medical image segmentation through consistency regularized disentangled contrastive learning. In: MICCAI. Springer, Berlin, pp 260–270

Sohan MF, Basalamah A (2023) A systematic review on federated learning in medical image analysis. IEEE Access 66:6

Choi S, Jung S, Yun H, Kim JT, Kim S, Choo J (2021) Robustnet: improving domain generalization in urban-scene segmentation via instance selective whitening. In: CVPR, pp 11580–11590

Chen Z, Pan Y, Ye Y, Cui H, Xia Y (2023) Treasure in distribution: a domain randomization based multi-source domain generalization for 2d medical image segmentation. In: MICCAI. Springer, Cham, pp 89–99

Hamoud I, Jamal MA, Srivastav V, Mutter D, Padoy N, Mohareri O (2023) St(or)[Formula: see text]: spatio-temporal object level reasoning for activity recognition in the operating room. In: Medical imaging with deep learning

Özsoy E, Czempiel T, Holm F, Pellegrini C, Navab N (2023) Labrad-or: lightweight memory scene graphs for accurate bimodal reasoning in dynamic operating rooms. arXiv preprint arXiv:2303.13293

Holm F, Ghazaei G, Czempiel T, Özsoy E, Saur S, Navab N (2023) Dynamic scene graph representation for surgical video. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 81–87

Pang W, Islam M, Mitheran S, Seenivasan L, Xu M, Ren H (2022) Rethinking feature extraction: gradient-based localized feature extraction for end-to-end surgical downstream tasks. IEEE Robot Autom Lett 7(4):12623–12630

doi: 10.1109/LRA.2022.3221310

Optimizing latent graph representations of surgical scenes for unseen domain generalization.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Subventions

Informations de copyright

Références

Auteurs

Siddhant Satyanaik (S)

Aditya Murali (A)

Deepak Alapatt (D)

Xin Wang (X)

Pietro Mascagni (P)

Nicolas Padoy (N)

Classifications MeSH