Optimizing latent graph representations of surgical scenes for unseen domain generalization.

Domain adaptation Graph neural networks Object-centric learning Surgical video analysis

Journal

International journal of computer assisted radiology and surgery
ISSN: 1861-6429
Titre abrégé: Int J Comput Assist Radiol Surg
Pays: Germany
ID NLM: 101499225

Informations de publication

Date de publication:
28 Apr 2024
Historique:
received: 03 03 2024
accepted: 22 03 2024
medline: 28 4 2024
pubmed: 28 4 2024
entrez: 28 4 2024
Statut: aheadofprint

Résumé

Advances in deep learning have resulted in effective models for surgical video analysis; however, these models often fail to generalize across medical centers due to domain shift caused by variations in surgical workflow, camera setups, and patient demographics. Recently, object-centric learning has emerged as a promising approach for improved surgical scene understanding, capturing and disentangling visual and semantic properties of surgical tools and anatomy to improve downstream task performance. In this work, we conduct a multicentric performance benchmark of object-centric approaches, focusing on critical view of safety assessment in laparoscopic cholecystectomy, then propose an improved approach for unseen domain generalization. We evaluate four object-centric approaches for domain generalization, establishing baseline performance. Next, leveraging the disentangled nature of object-centric representations, we dissect one of these methods through a series of ablations (e.g., ignoring either visual or semantic features for downstream classification). Finally, based on the results of these ablations, we develop an optimized method specifically tailored for domain generalization, LG-DG, that includes a novel disentanglement loss function. Our optimized approach, LG-DG, achieves an improvement of 9.28% over the best baseline approach. More broadly, we show that object-centric approaches are highly effective for domain generalization thanks to their modular approach to representation learning. We investigate the use of object-centric methods for unseen domain generalization, identify method-agnostic factors critical for performance, and present an optimized approach that substantially outperforms existing methods.

Identifiants

pubmed: 38678488
doi: 10.1007/s11548-024-03121-2
pii: 10.1007/s11548-024-03121-2
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Agence Nationale de la Recherche
ID : ANR-20-CHIA-0029-01

Informations de copyright

© 2024. CARS.

Références

Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2016) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97
doi: 10.1109/TMI.2016.2593957 pubmed: 27455522
Grammatikopoulou M, Flouty E, Kadkhodamohammadi A, Quellec G, Chow A, Nehme J, Luengo I, Stoyanov D (2021) Cadis: Cataract dataset for surgical rgb-image segmentation. Med Image Anal 71:66
doi: 10.1016/j.media.2021.102053
Sestini L, Rosa B, De Momi E, Ferrigno G, Padoy N (2023) Fun-sis: a fully unsupervised approach for surgical instrument segmentation. Med Image Anal 85:102751
doi: 10.1016/j.media.2023.102751 pubmed: 36716700
Sharma S, Nwoye CI, Mutter D, Padoy N (2023) Surgical action triplet detection by mixed supervised learning of instrument-tissue interactions. In: MICCAI. Springer, Berlin, pp 505–514
Hao L, Hu Y, Lin W, Wang Q, Li H, Fu H, Duan J, Liu J (2023) Act-net: anchor-context action detection in surgery videos. In: MICCAI. Springer, Berlin, pp 196–206
Kassem H, Alapatt D, Mascagni P, AI4SafeChole C, Karargyris A, Padoy N. (2022) Federated cycling (fedcy): semi-supervised federated learning of surgical phases. IEEE Trans Med Imaging 6:66
Srivastav V, Gangi A, Padoy N (2022) Unsupervised domain adaptation for clinician pose estimation and instance segmentation in the operating room. In: Medical image analysis
Wang Q, Bu P, Breckon TP (2019) Unifying unsupervised domain adaptation and zero-shot visual recognition. In: 2019 International joint conference on neural networks (IJCNN). IEEE, pp 1–8
Mottaghi A, Sharghi A, Yeung S, Mohareri O (2022) Adaptation of surgical activity recognition models across operating rooms. In: MICCAI. Springer, pp 530–540
Xu J, Zhang Q, Yu Y, Zhao R, Bian X, Liu X, Wang J, Ge Z, Qian D (2022) Deep reconstruction-recoding network for unsupervised domain adaptation and multi-center generalization in colonoscopy polyp detection. Comput Methods Programs Biomed 214:106576
doi: 10.1016/j.cmpb.2021.106576 pubmed: 34915425
Mascagni P, Vardazaryan A, Alapatt D, Urade T, Emre T, Fiorillo C, Pessaux P, Mutter D, Marescaux J, Costamagna G et al (2021) Artificial intelligence for surgical safety: automatic assessment of the critical view of safety in laparoscopic cholecystectomy using deep learning. Ann Surg 6:66
Murali A, Alapatt D, Mascagni P, Vardazaryan A, Garcia A, Okamoto N, Mutter D, Padoy N (2023) Latent graph representations for critical view of safety assessment. IEEE Trans Med Imaging 66:1
Murali A, Alapatt D, Mascagni P, Vardazaryan A, Garcia A, Okamoto N, Mutter D, Padoy N (2023) Encoding surgical videos as latent spatiotemporal graphs for object and anatomy-driven reasoning. In: MICCAI. Springer, Berlin, pp 647–657
Murali A, Alapatt D, Mascagni P, Vardazaryan A, Garcia A, Okamoto N, Costamagna G, Mutter D, Marescaux J, Dallemagne B et al (2023) The endoscapes dataset for surgical scene segmentation, object detection, and critical view of safety assessment: official splits and benchmark. arXiv preprint arXiv:2312.12429
Basak H, Yin Z (2023) Semi-supervised domain adaptive medical image segmentation through consistency regularized disentangled contrastive learning. In: MICCAI. Springer, Berlin, pp 260–270
Sohan MF, Basalamah A (2023) A systematic review on federated learning in medical image analysis. IEEE Access 66:6
Choi S, Jung S, Yun H, Kim JT, Kim S, Choo J (2021) Robustnet: improving domain generalization in urban-scene segmentation via instance selective whitening. In: CVPR, pp 11580–11590
Chen Z, Pan Y, Ye Y, Cui H, Xia Y (2023) Treasure in distribution: a domain randomization based multi-source domain generalization for 2d medical image segmentation. In: MICCAI. Springer, Cham, pp 89–99
Hamoud I, Jamal MA, Srivastav V, Mutter D, Padoy N, Mohareri O (2023) St(or)[Formula: see text]: spatio-temporal object level reasoning for activity recognition in the operating room. In: Medical imaging with deep learning
Özsoy E, Czempiel T, Holm F, Pellegrini C, Navab N (2023) Labrad-or: lightweight memory scene graphs for accurate bimodal reasoning in dynamic operating rooms. arXiv preprint arXiv:2303.13293
Holm F, Ghazaei G, Czempiel T, Özsoy E, Saur S, Navab N (2023) Dynamic scene graph representation for surgical video. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 81–87
Pang W, Islam M, Mitheran S, Seenivasan L, Xu M, Ren H (2022) Rethinking feature extraction: gradient-based localized feature extraction for end-to-end surgical downstream tasks. IEEE Robot Autom Lett 7(4):12623–12630
doi: 10.1109/LRA.2022.3221310

Auteurs

Siddhant Satyanaik (S)

ICube, University of Strasbourg, CNRS, Strasbourg, France.

Aditya Murali (A)

ICube, University of Strasbourg, CNRS, Strasbourg, France. murali@unistra.fr.

Deepak Alapatt (D)

ICube, University of Strasbourg, CNRS, Strasbourg, France.

Xin Wang (X)

West China Hospital of Sichuan University, Chengdu, China.

Pietro Mascagni (P)

IHU, Strasbourg, France.
Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy.

Nicolas Padoy (N)

ICube, University of Strasbourg, CNRS, Strasbourg, France.
IHU, Strasbourg, France.

Classifications MeSH