Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study.


Journal

The Lancet. Oncology
ISSN: 1474-5488
Titre abrégé: Lancet Oncol
Pays: England
ID NLM: 100957246

Informations de publication

Date de publication:
07 2019
Historique:
received: 06 02 2019
revised: 17 04 2019
accepted: 17 04 2019
pubmed: 16 6 2019
medline: 23 6 2020
entrez: 16 6 2019
Statut: ppublish

Résumé

Whether machine-learning algorithms can diagnose all pigmented skin lesions as accurately as human experts is unclear. The aim of this study was to compare the diagnostic accuracy of state-of-the-art machine-learning algorithms with human readers for all clinically relevant types of benign and malignant pigmented skin lesions. For this open, web-based, international, diagnostic study, human readers were asked to diagnose dermatoscopic images selected randomly in 30-image batches from a test set of 1511 images. The diagnoses from human readers were compared with those of 139 algorithms created by 77 machine-learning labs, who participated in the International Skin Imaging Collaboration 2018 challenge and received a training set of 10 015 images in advance. The ground truth of each lesion fell into one of seven predefined disease categories: intraepithelial carcinoma including actinic keratoses and Bowen's disease; basal cell carcinoma; benign keratinocytic lesions including solar lentigo, seborrheic keratosis and lichen planus-like keratosis; dermatofibroma; melanoma; melanocytic nevus; and vascular lesions. The two main outcomes were the differences in the number of correct specific diagnoses per batch between all human readers and the top three algorithms, and between human experts and the top three algorithms. Between Aug 4, 2018, and Sept 30, 2018, 511 human readers from 63 countries had at least one attempt in the reader study. 283 (55·4%) of 511 human readers were board-certified dermatologists, 118 (23·1%) were dermatology residents, and 83 (16·2%) were general practitioners. When comparing all human readers with all machine-learning algorithms, the algorithms achieved a mean of 2·01 (95% CI 1·97 to 2·04; p<0·0001) more correct diagnoses (17·91 [SD 3·42] vs 19·92 [4·27]). 27 human experts with more than 10 years of experience achieved a mean of 18·78 (SD 3·15) correct answers, compared with 25·43 (1·95) correct answers for the top three machine algorithms (mean difference 6·65, 95% CI 6·06-7·25; p<0·0001). The difference between human experts and the top three algorithms was significantly lower for images in the test set that were collected from sources not included in the training set (human underperformance of 11·4%, 95% CI 9·9-12·9 vs 3·6%, 0·8-6·3; p<0·0001). State-of-the-art machine-learning classifiers outperformed human experts in the diagnosis of pigmented skin lesions and should have a more important role in clinical practice. However, a possible limitation of these algorithms is their decreased performance for out-of-distribution images, which should be addressed in future research. None.

Sections du résumé

BACKGROUND
Whether machine-learning algorithms can diagnose all pigmented skin lesions as accurately as human experts is unclear. The aim of this study was to compare the diagnostic accuracy of state-of-the-art machine-learning algorithms with human readers for all clinically relevant types of benign and malignant pigmented skin lesions.
METHODS
For this open, web-based, international, diagnostic study, human readers were asked to diagnose dermatoscopic images selected randomly in 30-image batches from a test set of 1511 images. The diagnoses from human readers were compared with those of 139 algorithms created by 77 machine-learning labs, who participated in the International Skin Imaging Collaboration 2018 challenge and received a training set of 10 015 images in advance. The ground truth of each lesion fell into one of seven predefined disease categories: intraepithelial carcinoma including actinic keratoses and Bowen's disease; basal cell carcinoma; benign keratinocytic lesions including solar lentigo, seborrheic keratosis and lichen planus-like keratosis; dermatofibroma; melanoma; melanocytic nevus; and vascular lesions. The two main outcomes were the differences in the number of correct specific diagnoses per batch between all human readers and the top three algorithms, and between human experts and the top three algorithms.
FINDINGS
Between Aug 4, 2018, and Sept 30, 2018, 511 human readers from 63 countries had at least one attempt in the reader study. 283 (55·4%) of 511 human readers were board-certified dermatologists, 118 (23·1%) were dermatology residents, and 83 (16·2%) were general practitioners. When comparing all human readers with all machine-learning algorithms, the algorithms achieved a mean of 2·01 (95% CI 1·97 to 2·04; p<0·0001) more correct diagnoses (17·91 [SD 3·42] vs 19·92 [4·27]). 27 human experts with more than 10 years of experience achieved a mean of 18·78 (SD 3·15) correct answers, compared with 25·43 (1·95) correct answers for the top three machine algorithms (mean difference 6·65, 95% CI 6·06-7·25; p<0·0001). The difference between human experts and the top three algorithms was significantly lower for images in the test set that were collected from sources not included in the training set (human underperformance of 11·4%, 95% CI 9·9-12·9 vs 3·6%, 0·8-6·3; p<0·0001).
INTERPRETATION
State-of-the-art machine-learning classifiers outperformed human experts in the diagnosis of pigmented skin lesions and should have a more important role in clinical practice. However, a possible limitation of these algorithms is their decreased performance for out-of-distribution images, which should be addressed in future research.
FUNDING
None.

Identifiants

pubmed: 31201137
pii: S1470-2045(19)30333-X
doi: 10.1016/S1470-2045(19)30333-X
pmc: PMC8237239
mid: NIHMS1716706
pii:
doi:

Types de publication

Comparative Study Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

938-947

Subventions

Organisme : NCI NIH HHS
ID : P30 CA008748
Pays : United States

Commentaires et corrections

Type : CommentIn

Informations de copyright

Copyright © 2019 Elsevier Ltd. All rights reserved.

Références

Arch Dermatol. 2005 Nov;141(11):1388-96
pubmed: 16301386
JAMA Dermatol. 2015 Oct;151(10):1081-6
pubmed: 25928283
Br J Dermatol. 2014 Nov;171(5):1099-107
pubmed: 24841846
J Am Acad Dermatol. 2018 Feb;78(2):270-277.e1
pubmed: 28969863
JAMA. 2018 Jun 12;319(22):2267-2268
pubmed: 29800012
Nature. 2017 Jun 28;546(7660):686
pubmed: 28658222
JAMA Dermatol. 2017 Apr 1;153(4):279-284
pubmed: 28196213
Lancet Oncol. 2002 Mar;3(3):159-65
pubmed: 11902502
Arch Dermatol. 2011 Feb;147(2):188-94
pubmed: 20956633
BMC Bioinformatics. 2011 Mar 17;12:77
pubmed: 21414208
Br J Dermatol. 1994 Apr;130(4):460-5
pubmed: 8186110
J Invest Dermatol. 2018 Jul;138(7):1529-1538
pubmed: 29428356
Melanoma Res. 2009 Jun;19(3):180-4
pubmed: 19369900
J Am Acad Dermatol. 2012 Nov;67(5):846-52
pubmed: 22325462
Ann Oncol. 2018 Aug 1;29(8):1836-1842
pubmed: 29846502
J Am Acad Dermatol. 2011 Jun;64(6):1068-73
pubmed: 21440329
Sci Data. 2018 Aug 14;5:180161
pubmed: 30106392
JAMA Dermatol. 2017 May 1;153(5):453-457
pubmed: 28241182
Biometrics. 1988 Sep;44(3):837-45
pubmed: 3203132
Br J Dermatol. 2016 Dec;175(6):1329-1337
pubmed: 27469990

Auteurs

Philipp Tschandl (P)

ViDIR Group, Department of Dermatology, Medical University of Vienna, Vienna, Austria.

Noel Codella (N)

IBM Research AI, T J Watson Research Center, Yorktown Heights, NY, USA.

Bengü Nisa Akay (BN)

Department of Dermatology, Medicine Faculty, Ankara University, Ankara, Turkey.

Giuseppe Argenziano (G)

Dermatology Unit, University of Campania, Naples, Italy.

Ralph P Braun (RP)

Skin Cancer Center, Department of Dermatology, University Hospital Zürich, Zürich, Switzerland.

Horacio Cabo (H)

Department of Dermatology, Instituto de Investigaciones Médicas, Buenos Aires, Argentina.

David Gutman (D)

Department of Neurology, Emory University School of Medicine, Atlanta, GA, USA.

Allan Halpern (A)

Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA.

Brian Helba (B)

Kitware, Clifton Park, NY, USA.

Rainer Hofmann-Wellenhof (R)

Department of Dermatology, Medical University Graz, Graz, Austria.

Aimilios Lallas (A)

First Department of Dermatology, Aristotle University, Thessaloniki, Greece.

Jan Lapins (J)

Department of Dermatology, Karolinska University Hospital and Karolinska Institutet, Stockholm, Sweden.

Caterina Longo (C)

Department of Dermatology, University of Modena and Reggio Emilia, Modena, Italy; Azienda Unità Sanitaria Locale-IRCCS di Reggio Emilia, Centro Oncologico ad Alta Tecnologia Diagnostica-Dermatologia, Reggio Emilia, Italy.

Josep Malvehy (J)

Melanoma Unit, Dermatology Department, Hospital Clínic Barcelona, Universitat de Barcelona, IDIBAPS, Barcelona, Spain; Centro de Investigación Biomédica en Red de Enfermedades Rarasd (CIBER ER), Instituto de Salud Carlos III, Barcelona, Spain.

Michael A Marchetti (MA)

Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA.

Ashfaq Marghoob (A)

Memorial Sloan Kettering Cancer Center, Hauppauge, NY, USA.

Scott Menzies (S)

Sydney Melanoma Diagnostic Centre & Sydney Medical School, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia.

Amanda Oakley (A)

Department of Dermatology, Waikato District Health Board and Waikato Clinical Campus, University of Auckland, Hamilton, New Zealand.

John Paoli (J)

Department of Dermatology and Venereology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden.

Susana Puig (S)

Melanoma Unit, Dermatology Department, Hospital Clínic Barcelona, Universitat de Barcelona, IDIBAPS, Barcelona, Spain; Centro de Investigación Biomédica en Red de Enfermedades Rarasd (CIBER ER), Instituto de Salud Carlos III, Barcelona, Spain.

Christoph Rinner (C)

Center for Medical Statistics, Informatics and Intelligent Systems (CeMSIIS), Medical University of Vienna, Vienna, Austria.

Cliff Rosendahl (C)

School of Clinical Medicine, University of Queensland, University of Queensland, Brisbane, QLD, Australia.

Alon Scope (A)

Medical Screening Institute, Sheba Medical Center and Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.

Christoph Sinz (C)

ViDIR Group, Department of Dermatology, Medical University of Vienna, Vienna, Austria.

H Peter Soyer (HP)

Dermatology Research Centre, The University of Queensland Diamantina Institute, University of Queensland, Brisbane, QLD, Australia.

Luc Thomas (L)

Department of Dermatology, Hospitalier Lyon Sud, Lyon, France; Lyon Cancer Research Center INSERM U1052-CNRS UMR5286, Lyon, France; Lyon 1 University, Lyon, France.

Iris Zalaudek (I)

Dermatology Clinic, Maggiore Hospital, University of Trieste, Trieste, Italy.

Harald Kittler (H)

ViDIR Group, Department of Dermatology, Medical University of Vienna, Vienna, Austria. Electronic address: harald.kittler@meduniwien.ac.at.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH