Learning how to robustly estimate camera pose in endoscopic videos.


Journal

International journal of computer assisted radiology and surgery
ISSN: 1861-6429
Titre abrégé: Int J Comput Assist Radiol Surg
Pays: Germany
ID NLM: 101499225

Informations de publication

Date de publication:
Jul 2023
Historique:
received: 06 03 2023
accepted: 13 04 2023
medline: 10 7 2023
pubmed: 15 5 2023
entrez: 15 5 2023
Statut: ppublish

Résumé

Surgical scene understanding plays a critical role in the technology stack of tomorrow's intervention-assisting systems in endoscopic surgeries. For this, tracking the endoscope pose is a key component, but remains challenging due to illumination conditions, deforming tissues and the breathing motion of organs. We propose a solution for stereo endoscopes that estimates depth and optical flow to minimize two geometric losses for camera pose estimation. Most importantly, we introduce two learned adaptive per-pixel weight mappings that balance contributions according to the input image content. To do so, we train a Deep Declarative Network to take advantage of the expressiveness of deep learning and the robustness of a novel geometric-based optimization approach. We validate our approach on the publicly available SCARED dataset and introduce a new in vivo dataset, StereoMIS, which includes a wider spectrum of typically observed surgical settings. Our method outperforms state-of-the-art methods on average and more importantly, in difficult scenarios where tissue deformations and breathing motion are visible. We observed that our proposed weight mappings attenuate the contribution of pixels on ambiguous regions of the images, such as deforming tissues. We demonstrate the effectiveness of our solution to robustly estimate the camera pose in challenging endoscopic surgical scenes. Our contributions can be used to improve related tasks like simultaneous localization and mapping (SLAM) or 3D reconstruction, therefore advancing surgical scene understanding in minimally invasive surgery.

Identifiants

pubmed: 37184768
doi: 10.1007/s11548-023-02919-w
pii: 10.1007/s11548-023-02919-w
pmc: PMC10329609
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

1185-1192

Subventions

Organisme : Innosuisse - Schweizerische Agentur für Innovationsförderung
ID : # 50204.1 IP-LS

Informations de copyright

© 2023. The Author(s).

Références

Mur-Artal R, Tardós JD (2017) Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans Rob 33(5):1255–1262. https://doi.org/10.1109/TRO.2017.2705103
doi: 10.1109/TRO.2017.2705103
Whelan T, Salas-Moreno RF, Glocker B, Davison AJ, Leutenegger S (2016) Elasticfusion: real-time dense slam and light source estimation. Int J Rob Res 35(14):1697–1716. https://doi.org/10.1177/0278364916669237
doi: 10.1177/0278364916669237
Lamarca J, Montiel JMM (2018) Camera tracking for slam in deformable maps. In: Computer vision-ECCV 2018 workshops, pp 730–737. https://doi.org/10.1007/978-3-030-11009-3_45
Gómez-Rodríguez JJ, Lamarca J, Morlana J, Tardós JD, Montiel JMM (2021) Sd-defslam: semi-direct monocular slam for deformable and intracorporeal scenes. In: 2021 IEEE international conference on robotics and automation (ICRA), pp 5170–5177. https://doi.org/10.1109/ICRA48506.2021.9561512
Liu X, Li Z, Ishii M, Hager GD, Taylor RH, Unberath M (2022) Sage: slam with appearance and geometry prior for endoscopy. In: 2022 International conference on robotics and automation (ICRA), pp 5587–5593. https://doi.org/10.1109/ICRA46639.2022.9812257
Song J, Wang J, Zhao L, Huang S, Dissanayake G (2018) Mis-slam: real-time large-scale dense deformable slam system in minimal invasive surgery based on heterogeneous computing. IEEE Robot Autom Lett 3(4):4068–4075. https://doi.org/10.1109/LRA.2018.2856519
doi: 10.1109/LRA.2018.2856519
Zhou H, Jayender J (2021) EMDQ-SLAM: real-time high-resolution reconstruction of soft tissue surface from stereo laparoscopy videos. In: Medical image computing and computer assisted intervention—MICCAI 2021, pp 331–340 . https://doi.org/10.1007/978-3-030-87202-1_32
Wei R, Li B, Mo H, Lu B, Long Y, Yang B, Dou Q, Liu Y, Sun D (2023) Stereo dense scene reconstruction and accurate localization for learning-based navigation of laparoscope in minimally invasive surgery. IEEE Trans Biomed Eng 70(2):488–500. https://doi.org/10.1109/TBME.2022.3195027
Gould S, Hartley R, Campbell D (2022) Deep declarative networks. IEEE Trans Pattern Anal Mach Intell 44(8):3988–4004. https://doi.org/10.1109/TPAMI.2021.3059462
Parameshwara CM, Hari G, Fermüller C, Sanket NJ, Aloimonos Y (2022) Diffposenet: direct differentiable camera pose estimation. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6835–6844. https://doi.org/10.1109/CVPR52688.2022.00672
Teed Z, Deng J (2020) RAFT: recurrent all-pairs field transforms for optical flow. In: Computer vision-ECCV 2020, pp 402–419. https://doi.org/10.1007/978-3-030-58536-5_24
Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. In: Mathematical programming, vol 45, pp 503–528. https://doi.org/10.1007/BF01589116
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015, pp 234–241. Springer https://doi.org/10.1007/978-3-319-24574-4_28
Allan M, McLeod AJ, Wang CC, Rosenthal J, Hu Z, Gard N, Eisert P, Fu KX, Zeffiro T, Xia W, Zhu Z, Luo H, Jia F, Zhang X, Li X, Sharan L, Kurmann T, Schmid S, Sznitman R, Psychogyios D, Azizian M, Stoyanov D, Maier-Hein L, Speidel S (2021) Stereo correspondence and reconstruction of endoscopic data challenge. arXiv:2101.01133
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Computer vision—ECCV 2018, pp 833–851. https://doi.org/10.1007/978-3-030-01234-2_49
Allan M, Kondo S, Bodenstedt S, Leger S, Kadkhodamohammadi R, Luengo I, Fuentes F, Flouty E, Mohammed A, Pedersen M, et al (2020) 2018 Robotic scene segmentation challenge. arXiv:2001.11190
Ozyoruk KB, Gokceler GI, Bobrow TL, Coskun G, Incetan K, Almalioglu Y, Mahmood F, Curto E, Perdigoto L, Oliveira M, Sahin H, Araujo H, Alexandrino H, Durr NJ, Gilbert HB, Turan M (2021) Endoslam dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Med Image Anal 71:102–112. https://doi.org/10.1016/j.media.2021.102058
doi: 10.1016/j.media.2021.102058

Auteurs

Michel Hayoz (M)

ARTORG Center, University of Bern, Bern, Switzerland. michel.hayoz@unibe.ch.

Christopher Hahne (C)

ARTORG Center, University of Bern, Bern, Switzerland.

Mathias Gallardo (M)

ARTORG Center, University of Bern, Bern, Switzerland.

Daniel Candinas (D)

Department of Visceral Surgery and Medicine, Inselspital, Bern, Switzerland.

Thomas Kurmann (T)

Applied Research, Intuitive Surgical, Sunnyvale, USA.

Maximilian Allan (M)

Applied Research, Intuitive Surgical, Sunnyvale, USA.

Raphael Sznitman (R)

ARTORG Center, University of Bern, Bern, Switzerland.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH