Gross failure rates and failure modes for a commercial AI-based auto-segmentation algorithm in head and neck cancer patients.
auto-segmentation
deep learning
failure modes
Journal
Journal of applied clinical medical physics
ISSN: 1526-9914
Titre abrégé: J Appl Clin Med Phys
Pays: United States
ID NLM: 101089176
Informations de publication
Date de publication:
23 Jan 2024
23 Jan 2024
Historique:
revised:
15
12
2023
received:
04
10
2023
accepted:
20
12
2023
medline:
24
1
2024
pubmed:
24
1
2024
entrez:
24
1
2024
Statut:
aheadofprint
Résumé
Artificial intelligence (AI) based commercial software can be used to automatically delineate organs at risk (OAR), with potential for efficiency savings in the radiotherapy treatment planning pathway, and reduction of inter- and intra-observer variability. There has been little research investigating gross failure rates and failure modes of such systems. 50 head and neck (H&N) patient data sets with "gold standard" contours were compared to AI-generated contours to produce expected mean and standard deviation values for the Dice Similarity Coefficient (DSC), for four common H&N OARs (brainstem, mandible, left and right parotid). An AI-based commercial system was applied to 500 H&N patients. AI-generated contours were compared to manual contours, outlined by an expert human, and a gross failure was set at three standard deviations below the expected mean DSC. Failures were inspected to assess reason for failure of the AI-based system with failures relating to suboptimal manual contouring censored. True failures were classified into 4 sub-types (setup position, anatomy, image artefacts and unknown). There were 24 true failures of the AI-based commercial software, a gross failure rate of 1.2%. Fifteen failures were due to patient anatomy, four were due to dental image artefacts, three were due to patient position and two were unknown. True failure rates by OAR were 0.4% (brainstem), 2.2% (mandible), 1.4% (left parotid) and 0.8% (right parotid). True failures of the AI-based system were predominantly associated with a non-standard element within the CT scan. It is likely that these non-standard elements were the reason for the gross failure, and suggests that patient datasets used to train the AI model did not contain sufficient heterogeneity of data. Regardless of the reasons for failure, the true failure rate for the AI-based system in the H&N region for the OARs investigated was low (∼1%).
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
e14273Informations de copyright
© 2024 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, LLC on behalf of The American Association of Physicists in Medicine.
Références
Tong N, Gou S, Yang S, Ruan D, Sheng K. Fully automatic multi-organ segmentation for head and neck cancer radiotherapy using shape representation model constrained fully convolutional neural networks. Med Phys. 2018;45(10):4558-4567. doi:10.1002/mp.13147
Cardenas CE, Yang J, Anderson BM, Court LE, Brock KB. Advances in auto-segmentation. Semin Radiat Oncol. 2019;29(3):185-197. doi:10.1016/J.SEMRADONC.2019.02.001
Nelms BE, Robinson G, Markham J, et al. Variation in external beam treatment plan quality: an inter-institutional study of planners and planning systems. Pract Radiat Oncol. 2012;2(4):296-305. doi:10.1016/j.prro.2011.11.012
Stelmes JJ, Vu E, Grégoire V, et al. Quality assurance of radiotherapy in the ongoing EORTC 1420 “Best of” trial for early stage oropharyngeal, supraglottic and hypopharyngeal carcinoma: results of the benchmark case procedure. Radiat Oncol. 2021;16(1):1-10. doi:10.1186/s13014-021-01809-2
van der Veen J, Willems S, Deschuymer S, et al. Benefits of deep learning for delineation of organs at risk in head and neck cancer. Radiother Oncol. 2019;138:68-74. doi:10.1016/J.RADONC.2019.05.010
Urago Y, Okamoto H, Kaneda T, et al. Evaluation of auto-segmentation accuracy of cloud-based artificial intelligence and atlas-based models. Radiat Oncol. 2021;16(1):175. doi:10.1186/s13014-021-01896-1
Walker Z, Bartley G, Hague C, et al. Evaluating the effectiveness of deep learning contouring across multiple radiotherapy centres. Phys imaging Radiat Oncol. 2022;24:121-128. doi:10.1016/j.phro.2022.11.003
Hu Y, Nguyen H, Smith C, et al. Clinical assessment of a novel machine-learning automated contouring tool for radiotherapy planning. J Appl Clin Med Phys. 2023;24(7):e13949. doi:10.1002/acm2.13949
Lucido JJ, DeWees TA, Leavitt TR, et al. Validation of clinical acceptability of deep-learning-based automated segmentation of organs-at-risk for head-and-neck radiotherapy treatment planning. Front Oncol. 2023;13:1137803. doi:10.3389/fonc.2023.1137803
Daisne J-F, Blumhofer A. Atlas-based automatic segmentation of head and neck organs at risk and nodal target volumes: a clinical validation. Radiat Oncol. 2013;8:154. doi:10.1186/1748-717X-8-154
Vrtovec T, Močnik D, Strojan P, Pernuš F, Ibragimov B. Auto-segmentation of organs at risk for head and neck radiotherapy planning: from atlas-based to deep learning methods. Med Phys. 2020;47(9):e929-e950. doi:10.1002/mp.14320
Samarasinghe G, Jameson M, Vinod S, et al. Deep learning for segmentation in radiation therapy planning: a review. J Med Imaging Radiat Oncol. 2021;65(5):578-595. doi:10.1111/1754-9485.13286
Harrison K, Pullen H, Welsh C, Oktay O, Alvarez-Valle J, Jena R. Machine learning for auto-segmentation in radiotherapy planning. Clin Oncol (R Coll Radiol). 2022;34(2):74-88. doi:10.1016/j.clon.2021.12.003
Wong J, Fong A, McVicar N, et al. Comparing deep learning-based auto-segmentation of organs at risk and clinical target volumes to expert inter-observer variability in radiotherapy planning. Radiother Oncol. 2020;144:152-158. doi:10.1016/J.RADONC.2019.10.019
Rudin C, Radin J. Why are we using black box models in ai when we don't need to? A lesson from an explainable AI competition. Harvard Data Sci Rev. 2019;1(2):1-10. doi:10.1162/99608f92.5a8a3a3d
Poon AIF, Sung JJY. Opening the black box of AI-Medicine. J Gastroenterol Hepatol. 2021;36(3):581-584. doi:10.1111/jgh.15384
Vandewinckele L, Claessens M, Dinkla A, et al. Overview of artificial intelligence-based applications in radiotherapy: recommendations for implementation and quality assurance. Radiother Oncol. 2020;153:55-66. doi:10.1016/j.radonc.2020.09.008
Brouwer CL, Dinkla AM, Vandewinckele L, et al. Machine learning applications in radiation oncology: current use and needs to support clinical implementation. Phys Imaging Radiat Oncol. 2020;16(November):144-148. doi:10.1016/j.phro.2020.11.002
Dice LR. Measures of the amount of ecologic association between species. Ecology. 1945;26(3):297-302. doi:10.2307/1932409
Yang J, Veeraraghavan H, Armato SG, et al. Autosegmentation for thoracic radiation treatment planning: a grand challenge at AAPM 2017. Medical Physics. 2018;45(10):4568-4581. doi:10.1002/mp.13141
van Dijk LV, Van den Bosch L, Aljabar P, et al. Improving automatic delineation for head and neck organs at risk by deep learning contouring. Radiother Oncol . 2020;142:115-123. doi:10.1016/j.radonc.2019.09.022
Brouwer CL, Steenbakkers RJHM, Bourhis J, et al. CT-based delineation of organs at risk in the head and neck region: DAHANCA, EORTC, GORTEC, HKNPCSG, NCIC CTG, NCRI, NRG Oncology and TROG consensus guidelines. Radiother Oncol . 2015;117(1):83-90. doi:10.1016/j.radonc.2015.07.041
Owadally W, Hurt C, Timmins H, et al. PATHOS: a phase II/III trial of risk-stratified, reduced intensity adjuvant treatment in patients undergoing transoral surgery for Human papillomavirus (HPV) positive oropharyngeal cancer. BMC Cancer. 2015;15(1):1-10. doi:10.1186/s12885-015-1598-x
Vinod SK, Min M, Jameson MG, Holloway LC. A review of interventions to reduce inter-observer variability in volume delineation in radiation oncology. J Med Imaging Radiat Oncol. 2016;60(3):393-406. doi:10.1111/1754-9485.12462
Pukelsheim F. The three sigma rule. Am Stat . 1994;48(2):88-91. doi:10.1080/00031305.1994.10476030
Sherer MV, Lin D, Elguindi S, et al. Metrics to evaluate the performance of auto-segmentation for radiation treatment planning: a critical review. Radiother Oncol . 2021;160:185-191. doi:10.1016/j.radonc.2021.05.003
Huger S, Graff P, Harter V, Marchesi V, et al. Evaluation of the Block Matching deformable registration algorithm in the field of head-and-neck adaptive radiotherapy. Physica Medica. 2014;30(3):301-308. doi:10.1016/j.ejmp.2013.09.001
Kovacs DG, Rechner LA, Appelt AL, et al. Metal artefact reduction for accurate tumour delineation in radiotherapy. Radiother Oncol. 2018;126(3):479-486. doi:10.1016/j.radonc.2017.09.029
Kusters R, Misevic D, Berry H, et al. Interdisciplinary research in artificial intelligence: challenges and opportunities. Front big data. 2020;3:577974. doi:10.3389/fdata.2020.577974
Alowais SA, Alghamdi SS, Alsuhebany N, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023;23(1):1-15. doi:10.1186/s12909-023-04698-z
El Naqa I, Karolak A, Luo Y, et al. Translation of AI into oncology clinical practice. Oncogene. 2023;42:3089-3097. doi:10.1038/s41388-023-02826-z
Filippi CG, Stein JM, Wang Z, et al. Ethical considerations and fairness in the use of artificial intelligence for neuroradiology. Am J Neuroradiol. 2023;44:1242-1248. doi:10.3174/ajnr.a7963. Published online.
Jeyaraman M, Balaji S, Jeyaraman N, Yadav S. Unraveling the ethical enigma: artificial intelligence in healthcare. Cureus. 2023;15(8):8-13. doi:10.7759/cureus.43262
Polevikov S. Advancing AI in healthcare: a comprehensive review of best practices. Clin Chim Acta. 2023;548(July):117519. doi:10.1016/j.cca.2023.117519
Fan J, Wang J, Chen Z, Hu C, Zhang Z, Hu W. Automatic treatment planning based on three-dimensional dose distribution predicted from deep learning technique. Med Phys. 2019;46(1):370-381. doi:10.1002/mp.13271
Gebru T, Morgenstern J, Vecchione B, et al. Datasheets for datasets. Commun ACM. 2021;64(12):86-92. doi:10.1145/3458723
Burke R. Inspection planning for mission-critical quality. IEEE Int Eng Manag Conf. 2001:329-334. doi:10.1109/iemc.2001.960553. Published online.
Papadakis EP, Stephan CH, McGinty MT, Wall WB. Inspection decision theory: deming inspection criterion and time-adjusted rate-of-return compared. Eng Costs Prod Econ. 1988;13(2):111-124. doi:10.1016/0167-188X(88)90025-0
Rhee DJ, Cardenas CE, Elhalawani H, et al. Automatic detection of contouring errors using convolutional neural networks. Med Phys. 2019;46(11):5086-5097. doi:10.1002/MP.13814
Robert C, Munoz A, Moreau D, et al. Clinical implementation of deep-learning based auto-contouring tools-Experience of three French radiotherapy centers. Cancer/Radiotherapie. 2021;25(6-7):607-616. doi:10.1016/J.CANRAD.2021.06.023
Brouwer CL, Boukerroui D, Oliveira J, et al. Assessment of manual adjustment performed in clinical practice following deep learning contouring for head and neck organs at risk in radiotherapy. Phys imaging Radiat Oncol. 2020;16:54-60. doi:10.1016/j.phro.2020.10.001
Peng Y-L, Chen L, Shen G-Z, et al. Interobserver variations in the delineation of target volumes and organs at risk and their impact on dose distribution in intensity-modulated radiation therapy for nasopharyngeal carcinoma. Oral Oncol. 2018;82:1-7. doi:10.1016/j.oraloncology.2018.04.025
van der Veen J, Gulyban A, Willems S, Maes F, Nuyts S. Interobserver variability in organ at risk delineation in head and neck cancer. Radiat Oncol. 2021;16(1):1-12. doi:10.1186/s13014-020-01677-2