Human-computer collaboration for skin cancer recognition.

Artificial Intelligence Clinical Decision-Making Humans Neural Networks, Computer Physicians Skin Neoplasms / diagnostic imaging Telemedicine User-Computer Interface

Journal

Nature medicine

ISSN: 1546-170X

Titre abrégé: Nat Med

Pays: United States

ID NLM: 9502015

Informations de publication

Date de publication:
08 2020

Historique:

received: 26 09 2019

accepted: 15 05 2020

pubmed: 24 6 2020

medline: 29 10 2020

entrez: 24 6 2020

Statut: ppublish

Résumé

The rapid increase in telemedicine coupled with recent advances in diagnostic artificial intelligence (AI) create the imperative to consider the opportunities and risks of inserting AI-based support into new paradigms of care. Here we build on recent achievements in the accuracy of image-based AI for skin cancer diagnosis to address the effects of varied representations of AI-based support across different levels of clinical expertise and multiple clinical workflows. We find that good quality AI-based support of clinical decision-making improves diagnostic accuracy over that of either AI or physicians alone, and that the least experienced clinicians gain the most from AI-based support. We further find that AI-based multiclass probabilities outperformed content-based image retrieval (CBIR) representations of AI in the mobile technology environment, and AI-based support had utility in simulations of second opinions and of telemedicine triage. In addition to demonstrating the potential benefits associated with good quality AI in the hands of non-expert clinicians, we find that faulty AI can mislead the entire spectrum of clinicians, including experts. Lastly, we show that insights derived from AI class-activation maps can inform improvements in human diagnosis. Together, our approach and findings offer a framework for future studies across the spectrum of image-based diagnostics to improve human-computer collaboration in clinical practice.

Identifiants

DOI: 10.1038/s41591-020-0942-0 PMID: 32572267

pubmed: 32572267

doi: 10.1038/s41591-020-0942-0

pii: 10.1038/s41591-020-0942-0

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

1229-1234

Références

Webster, P. Virtual health care in the era of COVID-19. Lancet 395, 1180–1181 (2020).

doi: 10.1016/S0140-6736(20)30818-7

He, J. et al. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 25, 30–36 (2019).

doi: 10.1038/s41591-018-0307-0

McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).

doi: 10.1038/s41586-019-1799-6

Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410 (2016).

doi: 10.1001/jama.2016.17216

Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

doi: 10.1038/nature21056

Haenssle, H. A. et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann. Oncol. 29, 1836–1842 (2018).

doi: 10.1093/annonc/mdy166

Han, S. S. et al. Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. J. Invest. Dermatol. 138, 1529–1538 (2018).

doi: 10.1016/j.jid.2018.01.028

Marchetti, M. A. et al. Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images. J. Am. Acad. Dermatol. 78, 270–277 (2018).

doi: 10.1016/j.jaad.2017.08.016

Tschandl, P. et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. Lancet Oncol. 20, 938–947 (2019).

doi: 10.1016/S1470-2045(19)30333-X

Garg, A. X. et al. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. JAMA 293, 1223–1238 (2005).

doi: 10.1001/jama.293.10.1223

Codella, N. C. F. et al. Collaborative human–AI (CHAI): evidence-based interpretable melanoma classification in dermoscopic images. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications (eds., Kenji Suzuki, Mauricio Reyes, Tanveer Syeda-Mahmood, ETH Zurich, Ben Glocker, Roland Wiest, Yaniv Gur, Hayit Greenspan, Anant Madabhushi) 97–105 (Springer International Publishing, 2018).

Bien, N. et al. Deep-learning-assisted diagnosis for knee magnetic resonance imaging: development and retrospective validation of MRNet. PLoS Med. 15, e1002699 (2018).

doi: 10.1371/journal.pmed.1002699

Mobiny, A., Singh, A. & Van Nguyen, H. Risk-aware machine learning classifier for skin lesion diagnosis. J. Clin. Med. 8, 1241 (2019).

Han, S. S. et al. Augment intelligence dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders. J. Invest. Dermatol. https://doi.org/10.1016/j.jid.2020.01.019 (2020).

Hekler, A. et al. Superior skin cancer classification by the combination of human and artificial intelligence. Eur. J. Cancer 120, 114–121 (2019).

doi: 10.1016/j.ejca.2019.07.019

Lakhani, P. & Sundaram, B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 284, 574–582 (2017).

doi: 10.1148/radiol.2017162326

Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 (2018).

doi: 10.1038/sdata.2018.161

Codella, N. et al. Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the International Skin Imaging Collaboration (ISIC). Preprint at https://arxiv.org/abs/1902.03368 (2019).

Sadeghi, M., Chilana, P. K. & Atkins, M. S. How users perceive content-based image retrieval for identifying skin images. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications (eds., Kenji Suzuki, Mauricio Reyes, Tanveer Syeda-Mahmood, ETH Zurich, Ben Glocker, Roland Wiest, Yaniv Gur, Hayit Greenspan, Anant Madabhushi) 141–148 (Springer International Publishing, 2018).

Tschandl, P., Argenziano, G., Razmara, M. & Yap, J. Diagnostic accuracy of content-based dermatoscopic image retrieval with deep classification features. Br. J. Dermatol. 181, 155–165 (2019).

Cai, C. J. et al. Human-centered tools for coping with imperfect algorithms during medical decision-making. In Proc. 2019 CHI Conference on Human Factors in Computing Systems 1–14 (Association for Computing Machinery, 2019).

Wang, M. & Deng, W. Deep visual domain adaptation: a survey. Neurocomputing 312, 135–153 (2018).

doi: 10.1016/j.neucom.2018.05.083

Finlayson, S.G. et al. Adversarial attacks on medical machine learning. Science 363, 1287–1289 (2019).

doi: 10.1126/science.aaw4399

Navarrete-Dechent, C. et al. Automated dermatological diagnosis: hype or reality? J. Invest. Dermatol. 138, 2277–2279 (2018).

doi: 10.1016/j.jid.2018.04.040

Winkler, J. K. et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 155, 1135–1141 (2019).

doi: 10.1001/jamadermatol.2019.1735

Cai, C. J., Winter, S., Steiner, D., Wilcox, L. & Terry, M. ‘Hello AI’: uncovering the onboarding needs of medical practitioners for human–AI collaborative decision-making. In Proc. ACM on Human–Computer Interaction (Association for Computing Machinery, 2019).

Janda, M. et al. Accuracy of mobile digital teledermoscopy for skin self-examinations in adults at high risk of skin cancer: an open-label, randomised controlled trial. Lancet Digit. Health 2, e129–e137 (2020).

doi: 10.1016/S2589-7500(20)30001-7

Gessert, N., Nielsen, M., Shaikh, M., Werner, R. & Schlaefer, A. Skin lesion classification using ensembles of multi-resolution EfficientNets with meta data. MethodsX 7, 100864 (2020).

doi: 10.1016/j.mex.2020.100864

Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128, 336–359 (2020).

Li, X., Wu, J., Chen, E. Z. & Jiang, H. From deep learning towards finding skin lesion biomarkers. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2019, 2797–2800 (2019).

pubmed: 31946474

Bissoto, A., Fornaciali, M., Valle, E. & Avila, S. (De)constructing bias on skin lesion datasets. Preprint at https://arxiv.org/abs/1904.08818 (2019).

Lapuschkin, S. et al. Unmasking Clever Hans predictors and assessing what machines really learn. Nat. Commun. 10, 1096 (2019).

doi: 10.1038/s41467-019-08987-4

Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K. & Müller, K.-R. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (Springer Nature, 2019).

Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (eds. Wallach, H. et al.) 8026–8037 (Curran Associates, 2019).

He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (2016).

Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).

doi: 10.1007/s11263-015-0816-y

Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. 3rd International Conference for Learning Representations (eds., Bengio, Y., LeCun, Y.) (2015).

Barata, C., Celebi, M. E. & Marques, J. S. Improving dermoscopy image classification using color constancy. IEEE J. Biomed. Health Inform. 19, 1146–1152 (2015).

pubmed: 25073179

Rinner, C., Kittler, H., Rosendahl, C. & Tschandl, P. Analysis of collective human intelligence for diagnosis of pigmented skin lesions harnessed by gamification via a web-based training platform: simulation reader study. J. Med. Internet Res. 22, e15597 (2020).

doi: 10.2196/15597

Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6, 65–70 (1979).

R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2019).

Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag, 2016).

Human-computer collaboration for skin cancer recognition.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Références

Auteurs

Philipp Tschandl (P)

Christoph Rinner (C)

Zoe Apalla (Z)

Giuseppe Argenziano (G)

Noel Codella (N)

Allan Halpern (A)

Monika Janda (M)

Aimilios Lallas (A)

Caterina Longo (C)

Josep Malvehy (J)

John Paoli (J)

Susana Puig (S)

Cliff Rosendahl (C)

H Peter Soyer (HP)

Iris Zalaudek (I)

Harald Kittler (H)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH