Deep learning prediction of error and skill in robotic prostatectomy suturing.

Deep learning Errors Robotic Technical skill

Journal

Surgical endoscopy
ISSN: 1432-2218
Titre abrégé: Surg Endosc
Pays: Germany
ID NLM: 8806653

Informations de publication

Date de publication:
21 Oct 2024
Historique:
received: 27 06 2024
accepted: 02 10 2024
medline: 22 10 2024
pubmed: 22 10 2024
entrez: 21 10 2024
Statut: aheadofprint

Résumé

Manual objective assessment of skill and errors in minimally invasive surgery have been validated with correlation to surgical expertise and patient outcomes. However, assessment and error annotation can be subjective and are time-consuming processes, often precluding their use. Recent years have seen the development of artificial intelligence models to work towards automating the process to allow reduction of errors and truly objective assessment. This study aimed to validate surgical skill rating and error annotations in suturing gestures to inform the development and evaluation of AI models. SAR-RARP50 open data set was blindly, independently annotated at the gesture level in Robotic-Assisted Radical Prostatectomy (RARP) suturing. Manual objective assessment tools and error annotation methodology, Objective Clinical Human Reliability Analysis (OCHRA), were used as ground truth to train and test vision-based deep learning methods to estimate skill and errors. Analysis included descriptive statistics plus tool validity and reliability. Fifty-four RARP videos (266 min) were analysed. Strong/excellent inter-rater reliability (range r = 0.70-0.89, p < 0.001) and very strong correlation (r = 0.92, p < 0.001) between objective assessment tools was demonstrated. Skill estimation of OSATS and M-GEARS had a Spearman's Correlation Coefficient 0.37 and 0.36, respectively, with normalised mean absolute error representing a prediction error of 17.92% (inverted "accuracy" 82.08%) and 20.6% (inverted "accuracy" 79.4%) respectively. The best performing models in error prediction achieved mean absolute precision of 37.14%, area under the curve 65.10% and Macro-F1 58.97%. This is the first study to employ detailed error detection methodology and deep learning models within real robotic surgical video. This benchmark evaluation of AI models sets a foundation and promising approach for future advancements in automated technical skill assessment.

Sections du résumé

BACKGROUND BACKGROUND
Manual objective assessment of skill and errors in minimally invasive surgery have been validated with correlation to surgical expertise and patient outcomes. However, assessment and error annotation can be subjective and are time-consuming processes, often precluding their use. Recent years have seen the development of artificial intelligence models to work towards automating the process to allow reduction of errors and truly objective assessment. This study aimed to validate surgical skill rating and error annotations in suturing gestures to inform the development and evaluation of AI models.
METHODS METHODS
SAR-RARP50 open data set was blindly, independently annotated at the gesture level in Robotic-Assisted Radical Prostatectomy (RARP) suturing. Manual objective assessment tools and error annotation methodology, Objective Clinical Human Reliability Analysis (OCHRA), were used as ground truth to train and test vision-based deep learning methods to estimate skill and errors. Analysis included descriptive statistics plus tool validity and reliability.
RESULTS RESULTS
Fifty-four RARP videos (266 min) were analysed. Strong/excellent inter-rater reliability (range r = 0.70-0.89, p < 0.001) and very strong correlation (r = 0.92, p < 0.001) between objective assessment tools was demonstrated. Skill estimation of OSATS and M-GEARS had a Spearman's Correlation Coefficient 0.37 and 0.36, respectively, with normalised mean absolute error representing a prediction error of 17.92% (inverted "accuracy" 82.08%) and 20.6% (inverted "accuracy" 79.4%) respectively. The best performing models in error prediction achieved mean absolute precision of 37.14%, area under the curve 65.10% and Macro-F1 58.97%.
CONCLUSIONS CONCLUSIONS
This is the first study to employ detailed error detection methodology and deep learning models within real robotic surgical video. This benchmark evaluation of AI models sets a foundation and promising approach for future advancements in automated technical skill assessment.

Identifiants

pubmed: 39433583
doi: 10.1007/s00464-024-11341-5
pii: 10.1007/s00464-024-11341-5
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© 2024. Crown.

Références

Cancer Research UK (2015) Cancer Research UK. Prostate cancer statistics. https://www.cancerresearchuk.org/health-professional/cancer-statistics/statistics-by-cancer-type/prostate-cancer . Accessed 12 Jan 2024
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A et al (2021) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71(3):209–249
doi: 10.3322/caac.21660 pubmed: 33538338
Cao L, Yang Z, Qi L, Chen M (2019) Robot-assisted and laparoscopic vs open radical prostatectomy in clinically localized prostate cancer: perioperative, functional, and oncological outcomes: a systematic review and meta-analysis. Medicine 98(22):e15770
doi: 10.1097/MD.0000000000015770 pubmed: 31145297 pmcid: 6709105
Du Y, Long Q, Guan B, Mu L, Tian J, Jiang Y et al (2018) Robot-assisted radical prostatectomy is more beneficial for prostate cancer patients: a system review and meta-analysis. Med Sci Monit 14(24):272–287
doi: 10.12659/MSM.907092
Labban M, Dasgupta P, Song C, Becker R, Li Y, Kreaden US et al (2022) Cost-effectiveness of robotic-assisted radical prostatectomy for localized prostate cancer in the UK. JAMA Netw Open 5(4):e225740
doi: 10.1001/jamanetworkopen.2022.5740 pubmed: 35377424 pmcid: 8980901
Kutana S, Bitner DP, Addison P, Chung PJ, Talamini MA, Filicori F (2021) Objective assessment of robotic surgical skills: review of literature and future directions. Surg Endosc. https://doi.org/10.1007/s00464-022-09134-9
doi: 10.1007/s00464-022-09134-9
Mazzone E, Puliatti S, Amato M, Bunting B, Rocco B, Montorsi F et al (2021) A systematic review and meta-analysis on the impact of proficiency-based progression simulation training on performance outcomes. Ann Surg 274(2):281–289
doi: 10.1097/SLA.0000000000004650 pubmed: 33630473
Birkmeyer JD, Finks JF, O’Reilly A, Oerline M, Carlin AM, Nunn AR et al (2013) Surgical skill and complication rates after bariatric surgery. N Engl J Med 369(15):1434–1442
doi: 10.1056/NEJMsa1300625 pubmed: 24106936
Curtis NJ, Foster JD, Miskovic D, Brown CSB, Hewett PJ, Abbott S et al (2020) Association of surgical skill assessment with clinical outcomes in cancer surgery. JAMA Surg 155(7):590–598
doi: 10.1001/jamasurg.2020.1004 pubmed: 32374371
Kobayashi E, Nakatani E, Tanaka T, Yosuke K, Kanao H, Shiki Y et al (2022) Surgical skill and oncological outcome of laparoscopic radical hysterectomy: JGOG1081s-A1, an ancillary analysis of the Japanese Gynecologic Oncology Group Study JGOG1081. Gynecol Oncol 165(2):293–301
doi: 10.1016/j.ygyno.2022.02.005 pubmed: 35221133
Boal MWE, Anastasiou D, Tesfai F, Ghamrawi W, Mazomenos E, Curtis N et al (2023) Evaluation of objective tools and artificial intelligence in robotic surgery technical skills assessment: a systematic review. Br J Surg. https://doi.org/10.1093/bjs/znad331/7407357
doi: 10.1093/bjs/znad331/7407357 pmcid: 10771126
van Amsterdam B, Funke I, Edwards E, Speidel S, Collins J, Sridhar A et al (2022) Gesture recognition in robotic surgery with multimodal attention and with the centre for tactile internet with human-in-the-loop. IEEE Trans Med Imaging. https://www.ucl.ac.uk/interventional-surgical -
Hung AJ, Chen J, Gill IS (2018) Automated performance metrics and machine learning algorithms to measure surgeon performance and anticipate clinical outcomes in robotic surgery. JAMA Surg 153:770–771
doi: 10.1001/jamasurg.2018.1512 pubmed: 29926095 pmcid: 9084629
Hung AJ, Ma R, Cen S, Nguyen JH, Lei X, Wagner C (2021) Surgeon automated performance metrics as predictors of early urinary continence recovery after robotic radical prostatectomy—a prospective bi-institutional study. Eur Urol Open Sci 1(27):65–72
doi: 10.1016/j.euros.2021.03.005
Ghodoussipour S, Reddy SS, Ma R, Huang D, Nguyen J, Hung AJ (2021) An objective assessment of performance during robotic partial nephrectomy: validation and correlation of automated performance metrics with intraoperative outcomes. J Urol 205(5):1294–1302
doi: 10.1097/JU.0000000000001557 pubmed: 33356480
Zhang J, Nie Y, Lyu Y, Yang X, Chang J, Zhang JJ (2021) SD-Net: joint surgical gesture recognition and skill assessment. Int J Comput Assist Radiol Surg 16(10):1675–1682
doi: 10.1007/s11548-021-02495-x pubmed: 34655392 pmcid: 8580939
Ma R, Ramaswamy A, Xu J, Trinh L, Kiyasseh D, Chu TN et al (2022) Surgical gestures as a method to quantify surgical performance and predict patient outcomes. NPJ Digit Med 5(1):187
doi: 10.1038/s41746-022-00738-y pubmed: 36550203 pmcid: 9780308
Hutchinson K, Li Z, Cantrell LA, Schenkman NS, Alemzadeh H (2022) Analysis of executional and procedural errors in dry-lab robotic surgery experiments. Int J Med Robot Comput Assist Surg. https://doi.org/10.1002/rcs.2375
doi: 10.1002/rcs.2375
Psychogyios D, Colleoni E, Van Amsterdam B, Li CY, Huang SY, Li Y et al (2023) SAR-RARP50: segmentation of surgical instrumentation and action recognition on robot-assisted radical prostatectomy challenge. http://arxiv.org/abs/2401.00496
Gao Y, Swaroop Vedula S, Reiley CE, Ahmidi N, Varadarajan B, Lin HC et al (2014) JHU-ISI gesture and skill assessment working set (JIGSAWS): a surgical activity dataset for human motion modeling. In: MICCAI workshop: M2cai, vol 3, p 3
Guni A, Raison N, Challacombe B, Khan S, Dasgupta P, Ahmed K (2018) Development of a technical checklist for the assessment of suturing in robotic surgery. Surg Endosc 32:4402–4407. https://doi.org/10.1007/s00464-018-6407-6
doi: 10.1007/s00464-018-6407-6 pubmed: 30194643
Tang B, Cuschieri A (2020) Objective assessment of surgical operative performance by observational clinical human reliability analysis (OCHRA): a systematic review. Surg Endosc 34:1492–1508
doi: 10.1007/s00464-019-07365-x pubmed: 31953728 pmcid: 7093355
Gorard J, Boal M, Swamynathan V, Ghamrawi W, Francis N (2023) The application of objective clinical human reliability analysis (OCHRA) in the assessment of basic robotic surgical skills. Surg Endosc 38(1):116–128
doi: 10.1007/s00464-023-10510-2 pubmed: 37932602 pmcid: 10776495
Foster JD, Miskovic D, Allison AS, Conti JA, Ockrim J, Cooper EJ et al (2016) Application of objective clinical human reliability analysis (OCHRA) in assessment of technical performance in laparoscopic rectal cancer surgery. Tech Coloproctol 20(6):361–367
doi: 10.1007/s10151-016-1444-4 pubmed: 27154295
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. In: Conference paper ICLR. http://arxiv.org/abs/2010.11929
Oquab M, Darcet T, Moutakanni T, Vo H, Szafraniec M, Khalidov V et al (2023) DINOv2: learning robust visual features without supervision. Trans Mach Learn Res http://arxiv.org/abs/2304.07193
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society. pp 770–778
Czempiel T, Paschali M, Keicher M, Simson W, Feussner H, Kim ST et al (2020) TeCNO: surgical phase recognition with multi-stage temporal convolutional networks- lecture notes in computer science, vol 12263. http://arxiv.org/abs/2003.10751
Farha YA, Gall J (2019) MS-TCN: multi-stage temporal convolutional network for action segmentation. In: IEEE/CVF conference on computer vision and pattern recognition. http://arxiv.org/abs/1903.01945
Li S, Farha YA, Liu Y, Cheng MM, Gall J (2023) MS-TCN++: multi-stage temporal convolutional network for action segmentation. IEEE Trans Pattern Anal Mach Intell 45(6):6647–6658
doi: 10.1109/TPAMI.2020.3021756 pubmed: 32886607
Yi F, Wen H, Jiang T (2021) ASFormer: transformer for action segmentation. http://arxiv.org/abs/2110.08568 . Accessed 16 Oct 2021
Neumuth D, Loebe F, Herre H, Neumuth T (2011) Modeling surgical processes: a four-level translational approach. Artif Intell Med 51(3):147–161
doi: 10.1016/j.artmed.2010.12.003 pubmed: 21227665
Ding X, Xu X, Li X (2023) SEDSkill: surgical events driven method for skill assessment from thoracoscopic surgical videos. In: International conference on medical image computing and computer-assisted intervention, pp 35–45. Springer, Cham
Wagner M, Müller-Stich BP, Kisilenko A, Tran D, Heger P, Mündermann L et al (2023) Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the HeiChole benchmark. Med Image Anal 86:102770
doi: 10.1016/j.media.2023.102770 pubmed: 36889206
Liu D, Li Q, Jiang T, Wang Y, Miao R, Shan F et al (2021) Towards unified surgical skill assessment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Haque TF, Hui A, You J, Ma R, Nguyen JH, Lei X et al (2022) An assessment tool to provide targeted feedback to robotic surgical trainees: development and validation of the end-to-end assessment of suturing expertise (EASE). Urol Pract 9(6):532–539
doi: 10.1097/UPJ.0000000000000344 pubmed: 36844996 pmcid: 9948038
Francis NK, Curtis NJ, Conti JA, Foster JD, Bonjer HJ, Hanna GB et al (2018) EAES classification of intraoperative adverse events in laparoscopic surgery. Surg Endosc 32(9):3822–3829
doi: 10.1007/s00464-018-6108-1 pubmed: 29435754
Curtis NJ, Conti JA, Dalton R, Rockall TA, Allison AS, Ockrim JB et al (2019) 2D versus 3D laparoscopic total mesorectal excision: a developmental multicentre randomised controlled trial. Surg Endosc 33:3370–3383. https://doi.org/10.1007/s00464-018-06630-9
doi: 10.1007/s00464-018-06630-9 pubmed: 30656453 pmcid: 6722156
Curtis NJ, Dennison G, Brown CSB, Hewett PJ, Hanna GB, Stevenson ARL et al (2021) Clinical evaluation of intraoperative near misses in laparoscopic rectal cancer surgery. Ann Surg 273(4):778–784
doi: 10.1097/SLA.0000000000003452 pubmed: 31274657
Wang Z, Fey AM (2018) Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. Int J Comput Assist Radiol Surg 13(12):1959–1970
doi: 10.1007/s11548-018-1860-1 pubmed: 30255463
Benmansour M, Malti A, Jannin P (2023) Deep neural network architecture for automated soft surgical skills evaluation using objective structured assessment of technical skills criteria. Int J Comput Assist Radiol Surg 18:929–937
doi: 10.1007/s11548-022-02827-5 pubmed: 36694051
Yasar MS, Alemzadeh H (2020) Real-time context-aware detection of unsafe events in robot-assisted surgery. In: 2020 50th annual IEEE/IFIP international conference on dependable systems and networks (DSN). IEEE. pp 385–397
Vaidya A, Aydin A, Ridgley J, Raison N, Dasgupta P, Ahmed K (2020) Current status of technical skills assessment tools in surgery: a systematic review. J Surg Res 246:342–378
doi: 10.1016/j.jss.2019.09.006 pubmed: 31690531
Vanstrum EB, Ma R, Maya-Silva J, Sanford D, Nguyen JH, Lei X et al (2021) Development and validation of an objective scoring tool to evaluate surgical dissection: dissection assessment for robotic technique (DART). Urol Pract 8(5):596–604
doi: 10.1097/UPJ.0000000000000246 pubmed: 37131998 pmcid: 10150863
Intuitive Surgical (2024) Da Vinci 5. https://www.intuitive.com/en-us/products-and-services/da-vinci/5

Auteurs

N Sirajudeen (N)

Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK.

M Boal (M)

Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK.
The Griffin Institute, Northwick Park and St Marks Hospital, London, UK.
Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK.

D Anastasiou (D)

Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK.
Medical Physics and Biomedical Engineering, UCL, London, UK.

J Xu (J)

Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK.
Medical Physics and Biomedical Engineering, UCL, London, UK.

D Stoyanov (D)

Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK.
Computer Vision, UCL, London, UK.

J Kelly (J)

Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK.
Computer Vision, UCL, London, UK.

J W Collins (JW)

Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK.
University College London Hospitals NHS Foundation Trust, London, UK.

A Sridhar (A)

Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK.
University College London Hospitals NHS Foundation Trust, London, UK.

E Mazomenos (E)

Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK.
Medical Physics and Biomedical Engineering, UCL, London, UK.

N K Francis (NK)

The Griffin Institute, Northwick Park and St Marks Hospital, London, UK. n.francis@griffininstitute.org.uk.
Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK. n.francis@griffininstitute.org.uk.
University College London Hospitals NHS Foundation Trust, London, UK. n.francis@griffininstitute.org.uk.
Yeovil District Hospital, Somerset Foundation NHS Trust, Yeovil, UK. n.francis@griffininstitute.org.uk.

Classifications MeSH