Localization-adjusted diagnostic performance and assistance effect of a computer-aided detection system for pneumothorax and consolidation.


Journal

NPJ digital medicine
ISSN: 2398-6352
Titre abrégé: NPJ Digit Med
Pays: England
ID NLM: 101731738

Informations de publication

Date de publication:
30 Jul 2022
Historique:
received: 24 01 2022
accepted: 11 07 2022
entrez: 30 7 2022
pubmed: 31 7 2022
medline: 31 7 2022
Statut: epublish

Résumé

While many deep-learning-based computer-aided detection systems (CAD) have been developed and commercialized for abnormality detection in chest radiographs (CXR), their ability to localize a target abnormality is rarely reported. Localization accuracy is important in terms of model interpretability, which is crucial in clinical settings. Moreover, diagnostic performances are likely to vary depending on thresholds which define an accurate localization. In a multi-center, stand-alone clinical trial using temporal and external validation datasets of 1,050 CXRs, we evaluated localization accuracy, localization-adjusted discrimination, and calibration of a commercially available deep-learning-based CAD for detecting consolidation and pneumothorax. The CAD achieved image-level AUROC (95% CI) of 0.960 (0.945, 0.975), sensitivity of 0.933 (0.899, 0.959), specificity of 0.948 (0.930, 0.963), dice of 0.691 (0.664, 0.718), moderate calibration for consolidation, and image-level AUROC of 0.978 (0.965, 0.991), sensitivity of 0.956 (0.923, 0.978), specificity of 0.996 (0.989, 0.999), dice of 0.798 (0.770, 0.826), moderate calibration for pneumothorax. Diagnostic performances varied substantially when localization accuracy was accounted for but remained high at the minimum threshold of clinical relevance. In a separate trial for diagnostic impact using 461 CXRs, the causal effect of the CAD assistance on clinicians' diagnostic performances was estimated. After adjusting for age, sex, dataset, and abnormality type, the CAD improved clinicians' diagnostic performances on average (OR [95% CI] = 1.73 [1.30, 2.32]; p < 0.001), although the effects varied substantially by clinical backgrounds. The CAD was found to have high stand-alone diagnostic performances and may beneficially impact clinicians' diagnostic performances when used in clinical settings.

Identifiants

pubmed: 35908091
doi: 10.1038/s41746-022-00658-x
pii: 10.1038/s41746-022-00658-x
pmc: PMC9339006
doi:

Types de publication

Journal Article

Langues

eng

Pagination

107

Informations de copyright

© 2022. The Author(s).

Références

Tudor, G. R., Finlay, D. & Taub, N. An assessment of inter-observer agreement and accuracy when reporting plain radiographs. Clin. Radiol. 52, 235–238 (1997).
pubmed: 9091261 doi: 10.1016/S0009-9260(97)80280-2
Albaum, M. N. et al. Interobserver reliability of the chest radiograph in community-acquired pneumonia. Chest 110, 343–350 (1996).
pubmed: 8697831 doi: 10.1378/chest.110.2.343
World Health Organization. Chest Radiography in Tuberculosis Detection - Summary of current WHO recommendations and guidance on programmatic approaches. (2016).
Potchen, E. J. et al. Measuring performance in chest radiography. Radiology 217, 456–459 (2000).
pubmed: 11058645 doi: 10.1148/radiology.217.2.r00nv14456
Ding, W., Shen, Y., Yang, J., He, X. & Zhang, M. Diagnosis of pneumothorax by radiography and ultrasonography: A meta-analysis. Chest 140, 859–866 (2011).
pubmed: 21546439 doi: 10.1378/chest.10-2946
Hew, M., Corcoran, J. P., Harriss, E. K., Rahman, N. M. & Mallett, S. The diagnostic accuracy of chest ultrasound for CT-detected radiographic consolidation in hospitalised adults with acute respiratory failure: A systematic review. BMJ Open 5, e007838 (2015).
pubmed: 25991460 pmcid: 4442194 doi: 10.1136/bmjopen-2015-007838
Alrajab, S., Youssef, A. M., Akkus, N. I. & Caldito, G. Pleural ultrasonography versus chest radiography for the diagnosis of pneumothorax: Review of the literature and meta-analysis. Crit. Care 17, R208 (2013).
pubmed: 24060427 pmcid: 4057340 doi: 10.1186/cc13016
Hansell, L., Milross, M., Delaney, A., Tian, D. H. & Ntoumenopoulos, G. Lung ultrasound has greater accuracy than conventional respiratory assessment tools for the diagnosis of pleural effusion, lung consolidation and collapse: A systematic review. J. Physiother. 67, 41–48 (2021).
pubmed: 33353830 doi: 10.1016/j.jphys.2020.12.002
Ebrahimi, A. et al. Diagnostic accuracy of chest ultrasonography versus chest radiography for identification of pneumothorax: A systematic review and meta-analysis. Tanaffos 13, 29–40 (2014).
pubmed: 25852759 pmcid: 4386013
Winkler, M. H., Touw, H. R., van de Ven, P. M., Twisk, J. & Tuinman, P. R. Diagnostic accuracy of chest radiograph, and when concomitantly studied lung ultrasound, in critically Ill patients with respiratory symptoms: A systematic review and meta-analysis. Crit. Care Med. 46, e707–e714 (2018).
pubmed: 29601314 doi: 10.1097/CCM.0000000000003129
Frija, G. et al. How to improve access to medical imaging in low- and middle-income countries? EClinical Med. 38, 101034 (2021).
doi: 10.1016/j.eclinm.2021.101034
Mollura, D. J. et al. Artificial intelligence in low- and middle-income countries: Innovating global health radiology. Radiology 297, 513–520 (2020).
pubmed: 33021895 doi: 10.1148/radiol.2020201434
World Health Organization. WHO consolidated guidelines on tuberculosis. Module 2: screening - systematic screening for tuberculosis disease. (2021).
Çallı, E., Sogancioglu, E., van Ginneken, B., van Leeuwen, K. G. & Murphy, K. Deep learning for chest X-ray analysis: A survey. Med. Image Anal. 72, 102125 (2021).
pubmed: 34171622 doi: 10.1016/j.media.2021.102125
Hwang, E. J. & Park, C. M. Clinical implementation of deep learning in thoracic radiology: Potential applications and challenges. Korean J. Radiol. 21, 511–525 (2020).
pubmed: 32323497 pmcid: 7183830 doi: 10.3348/kjr.2019.0821
Rajpurkar, P. et al. CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv preprint arXiv:1711.05225 (2017).
Thian, Y. L. et al. Deep learning systems for pneumothorax detection on chest radiographs: A multicenter external validation study. Radiol. Artif. Intell. 3 (2021).
Rajpurkar, P. et al. CheXaid: deep learning assistance for physician diagnosis of tuberculosis using chest x-rays in patients with HIV. npj Digit. Med. 3, 1–8 (2020).
doi: 10.1038/s41746-020-00322-2
Yoo, H., Kim, K. H., Singh, R., Digumarthy, S. R. & Kalra, M. K. Validation of a deep learning algorithm for the detection of malignant pulmonary nodules in chest radiographs. JAMA Netw. Open 3, e2017135 (2020).
pubmed: 32970157 pmcid: 7516603 doi: 10.1001/jamanetworkopen.2020.17135
Seah, J. C. Y. et al. Effect of a comprehensive deep-learning model on the accuracy of chest x-ray interpretation by radiologists: A retrospective, multireader multicase study. Lancet Digit. Heal. 3, e496–e506 (2021).
doi: 10.1016/S2589-7500(21)00106-0
Rajpurkar, P. et al. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 15, e1002686 (2018).
pubmed: 30457988 pmcid: 6245676 doi: 10.1371/journal.pmed.1002686
Nam, J. G. et al. Development and validation of a deep learning algorithm detecting 10 common abnormalities on chest radiographs. Eur. Respir. J. 57, (2021).
Qin, Z. Z. et al. Tuberculosis detection from chest x-rays for triaging in a high tuberculosis-burden setting: An evaluation of five artificial intelligence algorithms. Lancet Digit. Heal. 3, e543–e554 (2021).
doi: 10.1016/S2589-7500(21)00116-3
Yoo, H. et al. AI-based improvement in lung cancer detection on chest radiographs: results of a multi-reader study in NLST dataset. Eur. Radiol. 31, 1–11 (2021).
doi: 10.1007/s00330-021-08074-7
Ueda, D. et al. Artificial intelligence-supported lung cancer detection by multi-institutional readers with multi-vendor chest radiographs: a retrospective clinical validation study. BMC Cancer 21, 1–8 (2021).
doi: 10.1186/s12885-021-08847-9
Tavaziva, G. et al. Chest X-ray analysis with deep learning-based software as a triage test for pulmonary tuberculosis: An individual patient data meta-analysis of diagnostic accuracy. Clin. Infect. Dis. 74, 1390–1400 (2021).
pmcid: 9049274 doi: 10.1093/cid/ciab639
Khan, F. A. et al. Chest x-ray analysis with deep learning-based software as a triage test for pulmonary tuberculosis: A prospective study of diagnostic accuracy for culture-confirmed disease. Lancet Digit. Heal. 2, e573–e581 (2020).
doi: 10.1016/S2589-7500(20)30221-1
Homayounieh, F. et al. An artificial intelligence–based chest X-ray model on human nodule detection accuracy from a multicenter study. JAMA Netw. Open 4, e2141096 (2021).
pubmed: 34964851 pmcid: 8717119 doi: 10.1001/jamanetworkopen.2021.41096
Roberts, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Intell. 3, 199–217 (2021).
doi: 10.1038/s42256-021-00307-0
Aggarwal, R. et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. npj Digital Med. 4, 1–23 (2021).
doi: 10.1038/s41746-021-00438-z
Lu, J. H. et al. Low adherence to existing model reporting guidelines by commonly used clinical prediction models. medRxiv https://doi.org/10.1101/2021.07.21.21260282 (2021).
Goodman, B. & Flaxman, S. European union regulations on algorithmic decision making and a ‘right to explanation’. AI Mag. 38, 50–57 (2017).
FDA. Clinical Performance Assessment: Detection Devices Applied to Radiology Images and Radiology Device Data in - Premarket Notification (510(k)) Submissions Guidance for Industry and FDA Staff. (2020).
Baselli, G., Codari, M. & Sardanelli, F. Opening the black box of machine learning in radiology: can the proximity of annotated cases be a way? Eur. Radiol. Exp. 4, 1–7 (2020).
doi: 10.1186/s41747-020-00159-0
Chen, R. J. et al. Algorithm Fairness in AI for Medicine and Healthcare. arXiv preprint arXiv:2110.00603 (2021).
Skitka, L. J., Mosier, K. L. & Burdick, M. Does automation bias decision-making? Int. J. Hum. Comput. Stud. 51, 991–1006 (1999).
doi: 10.1006/ijhc.1999.0252
Sung, J. et al. Added value of deep learning-based detection system for multiple major findings on chest radiographs: A randomized crossover study. Radiology 299, 450–459 (2021).
pubmed: 33754828 doi: 10.1148/radiol.2021202818
Park, S. et al. Deep learning-based detection system for multiclass lesions on chest radiographs: comparison with observer readings. Eur. Radiol. 30, 1359–1368 (2020).
pubmed: 31748854 doi: 10.1007/s00330-019-06532-x
Hong, W. et al. Deep Learning for Detecting Pneumothorax on Chest Radiographs after Needle Biopsy: Clinical Implementation. Radiology https://doi.org/10.1148/radiol.211706 (2022).
Koo, Y. H. et al. Extravalidation and reproducibility results of a commercial deep learning-based automatic detection algorithm for pulmonary nodules on chest radiographs at tertiary hospital. J. Med. Imaging Radiat. Oncol. 65, 15–22 (2021).
pubmed: 33090731 doi: 10.1111/1754-9485.13105
Lee, J. H. et al. Deep learning–based automated detection algorithm for active pulmonary tuberculosis on chest radiographs: Diagnostic performance in systematic screening of asymptomatic individuals. Eur. Radiol. 31, 1069–1080 (2021).
pubmed: 32857202 doi: 10.1007/s00330-020-07219-4
Hwang, E. J. et al. Deep learning for chest radiograph diagnosis in the emergency department. Radiology 293, 573–580 (2019).
pubmed: 31638490 doi: 10.1148/radiol.2019191225
Choi, S. Y. et al. Evaluation of a deep learning-based computer-aided detection algorithm on chest radiographs: Case-control study. Med. (Baltim.) 100, e25663 (2021).
doi: 10.1097/MD.0000000000025663
Nabulsi, Z. et al. Deep learning for distinguishing normal versus abnormal chest radiographs and generalization to two unseen diseases tuberculosis and COVID-19. Sci. Rep. 11, 1–15 (2021).
doi: 10.1038/s41598-021-93967-2
Kim, E. Y. et al. Performance of a deep-learning algorithm for referable thoracic abnormalities on chest radiographs: A multicenter study of a health screening cohort. PLoS One 16, e0246472 (2021).
pubmed: 33606779 pmcid: 7894861 doi: 10.1371/journal.pone.0246472
Collins, G. S. et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med. Res. Methodol. 14, 40 (2014).
pubmed: 24645774 pmcid: 3999945 doi: 10.1186/1471-2288-14-40
Mongan, J., Moy, L. & Kahn, C. E. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A guide for authors and reviewers. Radiol. Artif. Intell. 2, e200029 (2020).
pubmed: 33937821 pmcid: 8017414 doi: 10.1148/ryai.2020200029
Moons, K. G. M. et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): Explanation and elaboration. Ann. Intern. Med. 162, W1–W73 (2015).
pubmed: 25560730 doi: 10.7326/M14-0698
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) the TRIPOD statement. Circulation 131, 211–219 (2015).
pubmed: 25561516 pmcid: 4297220 doi: 10.1161/CIRCULATIONAHA.114.014508
Van Calster, B. & Vickers, A. J. Calibration of risk prediction models: Impact on decision-analytic performance. Med. Decis. Mak. 35, 162–169 (2015).
doi: 10.1177/0272989X14547233
Hwang, E. J. et al. Deep learning algorithm for surveillance of pneumothorax after lung biopsy: a multicenter diagnostic cohort study. Eur. Radiol. 30, 3660–3671 (2020).
pubmed: 32162001 doi: 10.1007/s00330-020-06771-3
Saporta, A. Benchmarking saliency methods for chest X-ray interpretation. medRxiv 2021.02.28.21252634 (2021).
Seyyed-Kalantari, L., Liu, G., Mcdermott, M., Chen, I. Y. & Ghassemi, M. CheXclusion: Fairness gaps in deep chest X-ray classifiers. BIOCOMPUTING 2021: proceedings of the Pacific symposium, pp. 232–243 (2020).
Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLOS Med 15, e1002683 (2018).
pubmed: 30399157 pmcid: 6219764 doi: 10.1371/journal.pmed.1002683
Van Calster, B. et al. Calibration: The Achilles heel of predictive analytics. BMC Med. 17, 230 (2019).
pubmed: 31842878 pmcid: 6912996 doi: 10.1186/s12916-019-1466-7
Hwang, E. J., Kim, H., Lee, J. H., Goo, J. M. & Park, C. M. Automated identification of chest radiographs with referable abnormality with deep learning: need for recalibration. Eur. Radiol. 30, 6902–6912 (2020).
pubmed: 32661584 doi: 10.1007/s00330-020-07062-7
Kim, J. H. et al. Clinical Validation of a Deep Learning Algorithm for Detection of Pneumonia on Chest Radiographs in Emergency Department Patients with Acute Febrile Respiratory Illness. J. Clin. Med. 9, 1981 (2020).
pmcid: 7356293 doi: 10.3390/jcm9061981
Vafaei, A., Hatamabadi, H. R., Heidary, K., Alimohammadi, H. & Tarbiyat, M. Diagnostic accuracy of ultrasonography and radiography in initial evaluation of chest trauma patients. Emergency vol. 4 www.jemerg.com (2016).
Kiani, A. et al. Impact of a deep learning assistant on the histopathologic classification of liver cancer. npj Digital Med. 3, 1–8 (2020).
doi: 10.1038/s41746-020-0232-8
Prakash, A. K. et al. To evaluate the inter and intraobserver agreement in the initial diagnosis by digital chest radiograph sent via whatsapp messenger. Eur. Respir. J. 54, PA4820 (2019).
Pantanowitz, L. et al. Validating whole slide imaging for diagnostic purposes in pathology: Guideline from the college of american pathologists pathology and laboratory quality center. Arch. Pathol. Lab. Med. 137, 1710 (2013).
pubmed: 23634907 pmcid: 7240346 doi: 10.5858/arpa.2013-0093-CP
Liu, X., Rivera, S. C., Moher, D., Calvert, M. J. & Denniston, A. K. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: The CONSORT-AI Extension. BMJ 370, 1364–1374 (2020).
Moher, D. et al. CONSORT 2010 Explanation and Elaboration: Updated guidelines for reporting parallel group randomised trials. J. Clin. Epidemiol. 63, e1–e37 (2010).
pubmed: 20346624 doi: 10.1016/j.jclinepi.2010.03.004
Schulz, K. F., Altman, D. G. & Moher, D. CONSORT 2010 Statement: Updated guidelines for reporting parallel group randomised trials. BMJ 340, 698–702 (2010).
doi: 10.1136/bmj.c332
Bossuyt, P. M. et al. STARD 2015: An updated list of essential items for reporting diagnostic accuracy studies. BMJ 351, h5527 (2015).
pubmed: 26511519 pmcid: 4623764 doi: 10.1136/bmj.h5527
Cohen, J. F. et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: Explanation and elaboration. BMJ Open 6, 1–17 (2016).
doi: 10.1136/bmjopen-2016-012799
R Core Team. R: A language and environment for statistical computing. (2020).
Clopper, C. J. & Pearson, E. S. The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial. Biometrika 26, 404–413 (1934).
doi: 10.1093/biomet/26.4.404
Flahault, A., Cadilhac, M. & Thomas, G. Sample size calculation should be performed for design accuracy in diagnostic test studies. J. Clin. Epidemiol. 58, 859–862 (2005).
pubmed: 16018921 doi: 10.1016/j.jclinepi.2004.12.009
Deepnoid. DEEP:LABEL. (2020).
Dice, L. R. Measures of the Amount of Ecologic Association Between Species. Ecology 26, 297–302 (1945).
doi: 10.2307/1932409
Zou, K. H. et al. Statistical Validation of Image Segmentation Quality Based on a Spatial Overlap Index. Acad. Radiol. 11, 178–189 (2004).
pubmed: 14974593 pmcid: 1415224 doi: 10.1016/S1076-6332(03)00671-8
Cheng, P. M. et al. Deep learning: An update for radiologists. Radiographics 41, 1427–1445 (2021).
pubmed: 34469211 doi: 10.1148/rg.2021200210
Vergouwe, Y. et al. A closed testing procedure to select an appropriate method for updating prediction models. Stat. Med. 36, 4529–4539 (2017).
pubmed: 27891652 doi: 10.1002/sim.7179
MacDuff, A., Arnold, A. & Harvey, J. Management of spontaneous pneumothorax: British Thoracic Society pleural disease guideline 2010. Thorax 65, ii18–ii31 (2010).
pubmed: 20696690 doi: 10.1136/thx.2010.136986
Royston, P., Altman, D. G. & Sauerbrei, W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat. Med. 25, 127–141 (2006).
pubmed: 16217841 doi: 10.1002/sim.2331
Bustos, A., Pertusa, A., Salinas, J. M. & de la Iglesia-Vayá, M. PadChest: A large chest x-ray image dataset with multi-label annotated reports. Med. Image Anal. 66, 101797 (2020).
pubmed: 32877839 doi: 10.1016/j.media.2020.101797
Irvin, J. et al. CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In 33rd AAAI Conference on Artificial Intelligence 590–597 (AAAI Press, 2019).
Asan Image Metrics & Medicallogic. AiCRO System. (2017).

Auteurs

Sun Yeop Lee (SY)

Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea.

Sangwoo Ha (S)

Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea.

Min Gyeong Jeon (MG)

Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea.

Hao Li (H)

Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea.

Hyunju Choi (H)

Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea.

Hwa Pyung Kim (HP)

Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea.

Ye Ra Choi (YR)

Department of Radiology, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, Seoul, Republic of Korea.
Department of Radiology, Seoul National University College of Medicine, Seoul, Republic of Korea.

Hoseok I (H)

Department of Thoracic and Cardiovascular Surgery, Pusan National University School of Medicine, Busan, Republic of Korea.
Convergence Medical Institute of Technology, Biomedical Research Institute, Pusan National University Hospital, Busan, Republic of Korea.

Yeon Joo Jeong (YJ)

Department of Radiology and Biomedical Research Institute, Pusan National University Hospital, Busan, Republic of Korea.

Yoon Ha Park (YH)

Department of Internal Medicine, Jawol Health Center, Incheon, Republic of Korea.

Hyemin Ahn (H)

Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.

Sang Hyup Hong (SH)

Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.

Hyun Jung Koo (HJ)

Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.

Choong Wook Lee (CW)

Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.

Min Jae Kim (MJ)

Department of Infectious Disease, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.

Yeon Joo Kim (YJ)

Department of Respiratory Allergy Medicine, Nowon Eulji Medical Center, Seoul, Republic of Korea.

Kyung Won Kim (KW)

Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.

Jong Mun Choi (JM)

Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea. finaldoor2@gmail.com.

Classifications MeSH