Gross failure rates and failure modes for a commercial AI-based auto-segmentation algorithm in head and neck cancer patients.

auto-segmentation deep learning failure modes

Journal

Journal of applied clinical medical physics

ISSN: 1526-9914

Titre abrégé: J Appl Clin Med Phys

Pays: United States

ID NLM: 101089176

Informations de publication

Date de publication:
23 Jan 2024

Historique:

revised: 15 12 2023

received: 04 10 2023

accepted: 20 12 2023

medline: 24 1 2024

pubmed: 24 1 2024

entrez: 24 1 2024

Statut: aheadofprint

Résumé

Artificial intelligence (AI) based commercial software can be used to automatically delineate organs at risk (OAR), with potential for efficiency savings in the radiotherapy treatment planning pathway, and reduction of inter- and intra-observer variability. There has been little research investigating gross failure rates and failure modes of such systems. 50 head and neck (H&N) patient data sets with "gold standard" contours were compared to AI-generated contours to produce expected mean and standard deviation values for the Dice Similarity Coefficient (DSC), for four common H&N OARs (brainstem, mandible, left and right parotid). An AI-based commercial system was applied to 500 H&N patients. AI-generated contours were compared to manual contours, outlined by an expert human, and a gross failure was set at three standard deviations below the expected mean DSC. Failures were inspected to assess reason for failure of the AI-based system with failures relating to suboptimal manual contouring censored. True failures were classified into 4 sub-types (setup position, anatomy, image artefacts and unknown). There were 24 true failures of the AI-based commercial software, a gross failure rate of 1.2%. Fifteen failures were due to patient anatomy, four were due to dental image artefacts, three were due to patient position and two were unknown. True failure rates by OAR were 0.4% (brainstem), 2.2% (mandible), 1.4% (left parotid) and 0.8% (right parotid). True failures of the AI-based system were predominantly associated with a non-standard element within the CT scan. It is likely that these non-standard elements were the reason for the gross failure, and suggests that patient datasets used to train the AI model did not contain sufficient heterogeneity of data. Regardless of the reasons for failure, the true failure rate for the AI-based system in the H&N region for the OARs investigated was low (∼1%).

Identifiants

DOI: 10.1002/acm2.14273 PMID: 38263866

pubmed: 38263866

doi: 10.1002/acm2.14273

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

e14273

Informations de copyright

Références

Tong N, Gou S, Yang S, Ruan D, Sheng K. Fully automatic multi-organ segmentation for head and neck cancer radiotherapy using shape representation model constrained fully convolutional neural networks. Med Phys. 2018;45(10):4558-4567. doi:10.1002/mp.13147

Cardenas CE, Yang J, Anderson BM, Court LE, Brock KB. Advances in auto-segmentation. Semin Radiat Oncol. 2019;29(3):185-197. doi:10.1016/J.SEMRADONC.2019.02.001

Nelms BE, Robinson G, Markham J, et al. Variation in external beam treatment plan quality: an inter-institutional study of planners and planning systems. Pract Radiat Oncol. 2012;2(4):296-305. doi:10.1016/j.prro.2011.11.012

Stelmes JJ, Vu E, Grégoire V, et al. Quality assurance of radiotherapy in the ongoing EORTC 1420 “Best of” trial for early stage oropharyngeal, supraglottic and hypopharyngeal carcinoma: results of the benchmark case procedure. Radiat Oncol. 2021;16(1):1-10. doi:10.1186/s13014-021-01809-2

van der Veen J, Willems S, Deschuymer S, et al. Benefits of deep learning for delineation of organs at risk in head and neck cancer. Radiother Oncol. 2019;138:68-74. doi:10.1016/J.RADONC.2019.05.010

Urago Y, Okamoto H, Kaneda T, et al. Evaluation of auto-segmentation accuracy of cloud-based artificial intelligence and atlas-based models. Radiat Oncol. 2021;16(1):175. doi:10.1186/s13014-021-01896-1

Walker Z, Bartley G, Hague C, et al. Evaluating the effectiveness of deep learning contouring across multiple radiotherapy centres. Phys imaging Radiat Oncol. 2022;24:121-128. doi:10.1016/j.phro.2022.11.003

Hu Y, Nguyen H, Smith C, et al. Clinical assessment of a novel machine-learning automated contouring tool for radiotherapy planning. J Appl Clin Med Phys. 2023;24(7):e13949. doi:10.1002/acm2.13949

Lucido JJ, DeWees TA, Leavitt TR, et al. Validation of clinical acceptability of deep-learning-based automated segmentation of organs-at-risk for head-and-neck radiotherapy treatment planning. Front Oncol. 2023;13:1137803. doi:10.3389/fonc.2023.1137803

Daisne J-F, Blumhofer A. Atlas-based automatic segmentation of head and neck organs at risk and nodal target volumes: a clinical validation. Radiat Oncol. 2013;8:154. doi:10.1186/1748-717X-8-154

Vrtovec T, Močnik D, Strojan P, Pernuš F, Ibragimov B. Auto-segmentation of organs at risk for head and neck radiotherapy planning: from atlas-based to deep learning methods. Med Phys. 2020;47(9):e929-e950. doi:10.1002/mp.14320

Samarasinghe G, Jameson M, Vinod S, et al. Deep learning for segmentation in radiation therapy planning: a review. J Med Imaging Radiat Oncol. 2021;65(5):578-595. doi:10.1111/1754-9485.13286

Harrison K, Pullen H, Welsh C, Oktay O, Alvarez-Valle J, Jena R. Machine learning for auto-segmentation in radiotherapy planning. Clin Oncol (R Coll Radiol). 2022;34(2):74-88. doi:10.1016/j.clon.2021.12.003

Wong J, Fong A, McVicar N, et al. Comparing deep learning-based auto-segmentation of organs at risk and clinical target volumes to expert inter-observer variability in radiotherapy planning. Radiother Oncol. 2020;144:152-158. doi:10.1016/J.RADONC.2019.10.019

Rudin C, Radin J. Why are we using black box models in ai when we don't need to? A lesson from an explainable AI competition. Harvard Data Sci Rev. 2019;1(2):1-10. doi:10.1162/99608f92.5a8a3a3d

Poon AIF, Sung JJY. Opening the black box of AI-Medicine. J Gastroenterol Hepatol. 2021;36(3):581-584. doi:10.1111/jgh.15384

Vandewinckele L, Claessens M, Dinkla A, et al. Overview of artificial intelligence-based applications in radiotherapy: recommendations for implementation and quality assurance. Radiother Oncol. 2020;153:55-66. doi:10.1016/j.radonc.2020.09.008

Brouwer CL, Dinkla AM, Vandewinckele L, et al. Machine learning applications in radiation oncology: current use and needs to support clinical implementation. Phys Imaging Radiat Oncol. 2020;16(November):144-148. doi:10.1016/j.phro.2020.11.002

Dice LR. Measures of the amount of ecologic association between species. Ecology. 1945;26(3):297-302. doi:10.2307/1932409

Yang J, Veeraraghavan H, Armato SG, et al. Autosegmentation for thoracic radiation treatment planning: a grand challenge at AAPM 2017. Medical Physics. 2018;45(10):4568-4581. doi:10.1002/mp.13141

van Dijk LV, Van den Bosch L, Aljabar P, et al. Improving automatic delineation for head and neck organs at risk by deep learning contouring. Radiother Oncol . 2020;142:115-123. doi:10.1016/j.radonc.2019.09.022

Brouwer CL, Steenbakkers RJHM, Bourhis J, et al. CT-based delineation of organs at risk in the head and neck region: DAHANCA, EORTC, GORTEC, HKNPCSG, NCIC CTG, NCRI, NRG Oncology and TROG consensus guidelines. Radiother Oncol . 2015;117(1):83-90. doi:10.1016/j.radonc.2015.07.041

Owadally W, Hurt C, Timmins H, et al. PATHOS: a phase II/III trial of risk-stratified, reduced intensity adjuvant treatment in patients undergoing transoral surgery for Human papillomavirus (HPV) positive oropharyngeal cancer. BMC Cancer. 2015;15(1):1-10. doi:10.1186/s12885-015-1598-x

Vinod SK, Min M, Jameson MG, Holloway LC. A review of interventions to reduce inter-observer variability in volume delineation in radiation oncology. J Med Imaging Radiat Oncol. 2016;60(3):393-406. doi:10.1111/1754-9485.12462

Pukelsheim F. The three sigma rule. Am Stat . 1994;48(2):88-91. doi:10.1080/00031305.1994.10476030

Sherer MV, Lin D, Elguindi S, et al. Metrics to evaluate the performance of auto-segmentation for radiation treatment planning: a critical review. Radiother Oncol . 2021;160:185-191. doi:10.1016/j.radonc.2021.05.003

Huger S, Graff P, Harter V, Marchesi V, et al. Evaluation of the Block Matching deformable registration algorithm in the field of head-and-neck adaptive radiotherapy. Physica Medica. 2014;30(3):301-308. doi:10.1016/j.ejmp.2013.09.001

Kovacs DG, Rechner LA, Appelt AL, et al. Metal artefact reduction for accurate tumour delineation in radiotherapy. Radiother Oncol. 2018;126(3):479-486. doi:10.1016/j.radonc.2017.09.029

Kusters R, Misevic D, Berry H, et al. Interdisciplinary research in artificial intelligence: challenges and opportunities. Front big data. 2020;3:577974. doi:10.3389/fdata.2020.577974

Alowais SA, Alghamdi SS, Alsuhebany N, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023;23(1):1-15. doi:10.1186/s12909-023-04698-z

El Naqa I, Karolak A, Luo Y, et al. Translation of AI into oncology clinical practice. Oncogene. 2023;42:3089-3097. doi:10.1038/s41388-023-02826-z

Filippi CG, Stein JM, Wang Z, et al. Ethical considerations and fairness in the use of artificial intelligence for neuroradiology. Am J Neuroradiol. 2023;44:1242-1248. doi:10.3174/ajnr.a7963. Published online.

Jeyaraman M, Balaji S, Jeyaraman N, Yadav S. Unraveling the ethical enigma: artificial intelligence in healthcare. Cureus. 2023;15(8):8-13. doi:10.7759/cureus.43262

Polevikov S. Advancing AI in healthcare: a comprehensive review of best practices. Clin Chim Acta. 2023;548(July):117519. doi:10.1016/j.cca.2023.117519

Fan J, Wang J, Chen Z, Hu C, Zhang Z, Hu W. Automatic treatment planning based on three-dimensional dose distribution predicted from deep learning technique. Med Phys. 2019;46(1):370-381. doi:10.1002/mp.13271

Gebru T, Morgenstern J, Vecchione B, et al. Datasheets for datasets. Commun ACM. 2021;64(12):86-92. doi:10.1145/3458723

Burke R. Inspection planning for mission-critical quality. IEEE Int Eng Manag Conf. 2001:329-334. doi:10.1109/iemc.2001.960553. Published online.

Papadakis EP, Stephan CH, McGinty MT, Wall WB. Inspection decision theory: deming inspection criterion and time-adjusted rate-of-return compared. Eng Costs Prod Econ. 1988;13(2):111-124. doi:10.1016/0167-188X(88)90025-0

Rhee DJ, Cardenas CE, Elhalawani H, et al. Automatic detection of contouring errors using convolutional neural networks. Med Phys. 2019;46(11):5086-5097. doi:10.1002/MP.13814

Robert C, Munoz A, Moreau D, et al. Clinical implementation of deep-learning based auto-contouring tools-Experience of three French radiotherapy centers. Cancer/Radiotherapie. 2021;25(6-7):607-616. doi:10.1016/J.CANRAD.2021.06.023

Brouwer CL, Boukerroui D, Oliveira J, et al. Assessment of manual adjustment performed in clinical practice following deep learning contouring for head and neck organs at risk in radiotherapy. Phys imaging Radiat Oncol. 2020;16:54-60. doi:10.1016/j.phro.2020.10.001

Peng Y-L, Chen L, Shen G-Z, et al. Interobserver variations in the delineation of target volumes and organs at risk and their impact on dose distribution in intensity-modulated radiation therapy for nasopharyngeal carcinoma. Oral Oncol. 2018;82:1-7. doi:10.1016/j.oraloncology.2018.04.025

van der Veen J, Gulyban A, Willems S, Maes F, Nuyts S. Interobserver variability in organ at risk delineation in head and neck cancer. Radiat Oncol. 2021;16(1):1-12. doi:10.1186/s13014-020-01677-2