Deep Learning Pitfall: Impact of Novel Ultrasound Equipment Introduction on Algorithm Performance and the Realities of Domain Adaptation.
artificial intelligence
deep learning
domain shift
inferior vena cava
pediatrics
point of care ultrasound
Journal
Journal of ultrasound in medicine : official journal of the American Institute of Ultrasound in Medicine
ISSN: 1550-9613
Titre abrégé: J Ultrasound Med
Pays: England
ID NLM: 8211547
Informations de publication
Date de publication:
Apr 2022
Apr 2022
Historique:
revised:
03
05
2021
received:
11
04
2021
accepted:
17
05
2021
pubmed:
17
6
2021
medline:
16
3
2022
entrez:
16
6
2021
Statut:
ppublish
Résumé
To test deep learning (DL) algorithm performance repercussions by introducing novel ultrasound equipment into a clinical setting. Researchers introduced prospectively obtained inferior vena cava (IVC) videos from a similar patient population using novel ultrasound equipment to challenge a previously validated DL algorithm (trained on a common point of care ultrasound [POCUS] machine) to assess IVC collapse. Twenty-one new videos were obtained for each novel ultrasound machine. The videos were analyzed for complete collapse by the algorithm and by 2 blinded POCUS experts. Cohen's kappa was calculated for agreement between the 2 POCUS experts and DL algorithm. Previous testing showed substantial agreement between algorithm and experts with Cohen's kappa of 0.78 (95% CI 0.49-1.0) and 0.66 (95% CI 0.31-1.0) on new patient data using, the same ultrasound equipment. Challenged with higher image quality (IQ) POCUS cart ultrasound videos, algorithm performance declined with kappa values of 0.31 (95% CI 0.19-0.81) and 0.39 (95% CI 0.11-0.89), showing fair agreement. Algorithm performance plummeted on a lower IQ, smartphone device with a kappa value of -0.09 (95% CI -0.95 to 0.76) and 0.09 (95% CI -0.65 to 0.82), respectively, showing less agreement than would be expected by chance. Two POCUS experts had near perfect agreement with a kappa value of 0.88 (95% CI 0.64-1.0) regarding IVC collapse. Performance of this previously validated DL algorithm worsened when faced with ultrasound studies from 2 novel ultrasound machines. Performance was much worse on images from a lower IQ hand-held device than from a superior cart-based device.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
855-863Informations de copyright
© 2021 American Institute of Ultrasound in Medicine.
Références
Safina A, Lau L, Brennan P, et al. Precision imaging-its impact on image quality and diagnostic confidence in breast ultrasound examinations. Br J Radiol 2015; 88:20140340.
Birnholz J. Practice of ultrasound: part 9-image quality. 2013. www.auntminnie.com/. Accessed January 3, 2014.
Lévêque L, Zhang W, Parker P, Liu H. The impact of specialty settings on the perceived quality of medical ultrasound video. IEEE Access. 2017; 5:16998-17005.
Han X, Jovicich J, Salat D, et al. Reliability of mri-derived measurements of human cerebral cortical thickness: the effects of field strength, scanner upgrade and manufacturer. NeuroImage 2006; 32:180-194.
Panayides AS, Amini A, Filipovic ND, et al. AI in medical imaging informatics: current challenges and future directions. IEEE J Biomed Health Inform 2020; 247:1837-1857.
Zhou AZ, Green RS, Haines EJ, Vazquez MN, Tay ET, Tsung JW. Interobserver agreement of inferior vena cava ultrasound collapse duration and correlated outcomes in children with dehydration. Pediatr Emerg Care 2020; 11.
Blaivas M, Blaivas L, Tsung J. Deep learning algorithm performance compared to experts in visual evaluation of inferior vena cava ultrasound and dehydration. AIUM 2021 Annual Meeting Abstract Presentation. 2021
Baochen S, Feng J, Saenko K. Return of frustratingly easy domain adaptation. Thirtieth AAAI Conference on Artificial Intelligence; 2016.
Gbenga E, Joseph D, Bassia S, et al. Machine learning for email spam filtering: review, approaches and open research problems. Heliyon 2019; 5:e01802.
Ho D, Black E, Agrawalav M, Li F. Domain Shift and Emerging Questions in Facial Recognition Technology. Stanford, CA: Stanford University Human-Centered Artificial Intelligence; 2020.
Guo J, Zhu X, Zhao C, Cao D, Lei Z, Li S. Learning meta face recognition in unseen domains. CVPR 2020; 6163-6172.
Takao H, Hayashi N, Ohtomo K. Effects of study design in multi-scanner voxel-based morphometry studies. J NeuroImage 2013; 84. https://doi.org/10.1016/j.neuroimage 08:046.
Dinsdale N, Jenkinson M, Namburete A. Deep learning-based unlearning of dataset bias for MRI harmonisation and confound removal. Neuroimage 2021; 228:117689.
Meineri M, Arellano R, Bryson G, et al. Canadian recommendations for training and performance in basic perioperative point-of-care ultrasound: recommendations from a consensus of Canadian anesthesiology academic centres. Can J Anaesth 2021; 68:376-386.
Zaharchuk G. Next generation research applications for hybrid PET/MR and PET/CT imaging using deep learning. Eur J Nucl Med Mol Imaging 2019; 46:2700-2707.
Norman B, Pedoia V, Majumdar S. Use of 2D U-net convolutional neural networks for automated cartilage and meniscus segmentation of knee MR imaging data to determine relaxometry and morphometry. Radiology 2018; 288:177-185.
Madani A, Arnaout R, Mofrad M, Arnaout R. Fast and accurate view classification of echocardiograms using deep learning. NPJ Digit Med 2018; 1:6.
Tian SF, Liu AL, Liu JH, Liu YJ, Pan JD. Potential value of the PixelShine deep learning algorithm for increasing quality of 70 kVp+ASiR-V reconstruction pelvic arterial phase CT images. Jpn J Radiol 2019; 37:186-190.
Cheema BS, Walter J, Narang A, Thomas JD. Artificial intelligence-enabled POCUS in the COVID-19 ICU: a new spin on cardiac ultrasound. JACC Case Rep 2021; 3:258-263.
Blaivas M, Blaivas L, Philips G, et al. Development of a deep learning network to classify inferior vena cava collapse to predict fluid responsiveness. J Ultrasound Med 2020; 10.
https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfRes/res.cfm?ID=173162. 2019.
Kasprzak J, Wejner-Mik P, Szymczyk E, Wdowiak-Okrojek K, Lipiec P. Artificial intelligence-powered measurement of left ventricular ejection fraction using a handheld ultrasound device. Ultrasound Med Biol 2021; 47:1120-1125.
Shokoohi H, Goldsmith A, Negishi K, et al. A novel measure for characterizing ultrasound device use and wear. J Am Coll Emerg Physicians Open 2020; 1:865-870.
Narang A, Bae R, Hong H, et al. Utility of a deep-learning algorithm to guide novices to acquire echocardiograms for limited diagnostic use. JAMA Cardiol 2021.