Learning how to robustly estimate camera pose in endoscopic videos.

Humans Imaging, Three-Dimensional / methods Algorithms Endoscopy / methods Minimally Invasive Surgical Procedures / methods Endoscopes

Camera pose estimation Deep declarative network Endoscopic surgery

Journal

International journal of computer assisted radiology and surgery

ISSN: 1861-6429

Titre abrégé: Int J Comput Assist Radiol Surg

Pays: Germany

ID NLM: 101499225

Informations de publication

Date de publication:
Jul 2023

Historique:

received: 06 03 2023

accepted: 13 04 2023

medline: 10 7 2023

pubmed: 15 5 2023

entrez: 15 5 2023

Statut: ppublish

Résumé

Surgical scene understanding plays a critical role in the technology stack of tomorrow's intervention-assisting systems in endoscopic surgeries. For this, tracking the endoscope pose is a key component, but remains challenging due to illumination conditions, deforming tissues and the breathing motion of organs. We propose a solution for stereo endoscopes that estimates depth and optical flow to minimize two geometric losses for camera pose estimation. Most importantly, we introduce two learned adaptive per-pixel weight mappings that balance contributions according to the input image content. To do so, we train a Deep Declarative Network to take advantage of the expressiveness of deep learning and the robustness of a novel geometric-based optimization approach. We validate our approach on the publicly available SCARED dataset and introduce a new in vivo dataset, StereoMIS, which includes a wider spectrum of typically observed surgical settings. Our method outperforms state-of-the-art methods on average and more importantly, in difficult scenarios where tissue deformations and breathing motion are visible. We observed that our proposed weight mappings attenuate the contribution of pixels on ambiguous regions of the images, such as deforming tissues. We demonstrate the effectiveness of our solution to robustly estimate the camera pose in challenging endoscopic surgical scenes. Our contributions can be used to improve related tasks like simultaneous localization and mapping (SLAM) or 3D reconstruction, therefore advancing surgical scene understanding in minimally invasive surgery.

Identifiants

DOI: 10.1007/s11548-023-02919-w PMID: 37184768 PMC: PMC10329609

pubmed: 37184768

doi: 10.1007/s11548-023-02919-w

pii: 10.1007/s11548-023-02919-w

pmc: PMC10329609

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

1185-1192

Subventions

Organisme : Innosuisse - Schweizerische Agentur für Innovationsförderung

ID : # 50204.1 IP-LS

Informations de copyright

Références

Mur-Artal R, Tardós JD (2017) Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans Rob 33(5):1255–1262. https://doi.org/10.1109/TRO.2017.2705103

doi: 10.1109/TRO.2017.2705103

Whelan T, Salas-Moreno RF, Glocker B, Davison AJ, Leutenegger S (2016) Elasticfusion: real-time dense slam and light source estimation. Int J Rob Res 35(14):1697–1716. https://doi.org/10.1177/0278364916669237

doi: 10.1177/0278364916669237

Lamarca J, Montiel JMM (2018) Camera tracking for slam in deformable maps. In: Computer vision-ECCV 2018 workshops, pp 730–737. https://doi.org/10.1007/978-3-030-11009-3_45

Gómez-Rodríguez JJ, Lamarca J, Morlana J, Tardós JD, Montiel JMM (2021) Sd-defslam: semi-direct monocular slam for deformable and intracorporeal scenes. In: 2021 IEEE international conference on robotics and automation (ICRA), pp 5170–5177. https://doi.org/10.1109/ICRA48506.2021.9561512

Liu X, Li Z, Ishii M, Hager GD, Taylor RH, Unberath M (2022) Sage: slam with appearance and geometry prior for endoscopy. In: 2022 International conference on robotics and automation (ICRA), pp 5587–5593. https://doi.org/10.1109/ICRA46639.2022.9812257

Song J, Wang J, Zhao L, Huang S, Dissanayake G (2018) Mis-slam: real-time large-scale dense deformable slam system in minimal invasive surgery based on heterogeneous computing. IEEE Robot Autom Lett 3(4):4068–4075. https://doi.org/10.1109/LRA.2018.2856519

doi: 10.1109/LRA.2018.2856519

Zhou H, Jayender J (2021) EMDQ-SLAM: real-time high-resolution reconstruction of soft tissue surface from stereo laparoscopy videos. In: Medical image computing and computer assisted intervention—MICCAI 2021, pp 331–340 . https://doi.org/10.1007/978-3-030-87202-1_32

Wei R, Li B, Mo H, Lu B, Long Y, Yang B, Dou Q, Liu Y, Sun D (2023) Stereo dense scene reconstruction and accurate localization for learning-based navigation of laparoscope in minimally invasive surgery. IEEE Trans Biomed Eng 70(2):488–500. https://doi.org/10.1109/TBME.2022.3195027

Gould S, Hartley R, Campbell D (2022) Deep declarative networks. IEEE Trans Pattern Anal Mach Intell 44(8):3988–4004. https://doi.org/10.1109/TPAMI.2021.3059462

Parameshwara CM, Hari G, Fermüller C, Sanket NJ, Aloimonos Y (2022) Diffposenet: direct differentiable camera pose estimation. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6835–6844. https://doi.org/10.1109/CVPR52688.2022.00672

Teed Z, Deng J (2020) RAFT: recurrent all-pairs field transforms for optical flow. In: Computer vision-ECCV 2020, pp 402–419. https://doi.org/10.1007/978-3-030-58536-5_24

Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. In: Mathematical programming, vol 45, pp 503–528. https://doi.org/10.1007/BF01589116

Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015, pp 234–241. Springer https://doi.org/10.1007/978-3-319-24574-4_28

Allan M, McLeod AJ, Wang CC, Rosenthal J, Hu Z, Gard N, Eisert P, Fu KX, Zeffiro T, Xia W, Zhu Z, Luo H, Jia F, Zhang X, Li X, Sharan L, Kurmann T, Schmid S, Sznitman R, Psychogyios D, Azizian M, Stoyanov D, Maier-Hein L, Speidel S (2021) Stereo correspondence and reconstruction of endoscopic data challenge. arXiv:2101.01133

Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Computer vision—ECCV 2018, pp 833–851. https://doi.org/10.1007/978-3-030-01234-2_49

Allan M, Kondo S, Bodenstedt S, Leger S, Kadkhodamohammadi R, Luengo I, Fuentes F, Flouty E, Mohammed A, Pedersen M, et al (2020) 2018 Robotic scene segmentation challenge. arXiv:2001.11190

Ozyoruk KB, Gokceler GI, Bobrow TL, Coskun G, Incetan K, Almalioglu Y, Mahmood F, Curto E, Perdigoto L, Oliveira M, Sahin H, Araujo H, Alexandrino H, Durr NJ, Gilbert HB, Turan M (2021) Endoslam dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Med Image Anal 71:102–112. https://doi.org/10.1016/j.media.2021.102058

doi: 10.1016/j.media.2021.102058

Learning how to robustly estimate camera pose in endoscopic videos.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Informations de copyright

Références

Auteurs

Michel Hayoz (M)

Christopher Hahne (C)

Mathias Gallardo (M)

Daniel Candinas (D)

Thomas Kurmann (T)

Maximilian Allan (M)

Raphael Sznitman (R)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH