Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study.

Adult Algorithms Dermoscopy Female Humans Internet Machine Learning Male Pigmentation Disorders / pathology Reproducibility of Results Retrospective Studies Skin Neoplasms / pathology

Journal

The Lancet. Oncology

ISSN: 1474-5488

Titre abrégé: Lancet Oncol

Pays: England

ID NLM: 100957246

Informations de publication

Date de publication:
07 2019

Historique:

received: 06 02 2019

revised: 17 04 2019

accepted: 17 04 2019

pubmed: 16 6 2019

medline: 23 6 2020

entrez: 16 6 2019

Statut: ppublish

Résumé

Whether machine-learning algorithms can diagnose all pigmented skin lesions as accurately as human experts is unclear. The aim of this study was to compare the diagnostic accuracy of state-of-the-art machine-learning algorithms with human readers for all clinically relevant types of benign and malignant pigmented skin lesions. For this open, web-based, international, diagnostic study, human readers were asked to diagnose dermatoscopic images selected randomly in 30-image batches from a test set of 1511 images. The diagnoses from human readers were compared with those of 139 algorithms created by 77 machine-learning labs, who participated in the International Skin Imaging Collaboration 2018 challenge and received a training set of 10 015 images in advance. The ground truth of each lesion fell into one of seven predefined disease categories: intraepithelial carcinoma including actinic keratoses and Bowen's disease; basal cell carcinoma; benign keratinocytic lesions including solar lentigo, seborrheic keratosis and lichen planus-like keratosis; dermatofibroma; melanoma; melanocytic nevus; and vascular lesions. The two main outcomes were the differences in the number of correct specific diagnoses per batch between all human readers and the top three algorithms, and between human experts and the top three algorithms. Between Aug 4, 2018, and Sept 30, 2018, 511 human readers from 63 countries had at least one attempt in the reader study. 283 (55·4%) of 511 human readers were board-certified dermatologists, 118 (23·1%) were dermatology residents, and 83 (16·2%) were general practitioners. When comparing all human readers with all machine-learning algorithms, the algorithms achieved a mean of 2·01 (95% CI 1·97 to 2·04; p<0·0001) more correct diagnoses (17·91 [SD 3·42] vs 19·92 [4·27]). 27 human experts with more than 10 years of experience achieved a mean of 18·78 (SD 3·15) correct answers, compared with 25·43 (1·95) correct answers for the top three machine algorithms (mean difference 6·65, 95% CI 6·06-7·25; p<0·0001). The difference between human experts and the top three algorithms was significantly lower for images in the test set that were collected from sources not included in the training set (human underperformance of 11·4%, 95% CI 9·9-12·9 vs 3·6%, 0·8-6·3; p<0·0001). State-of-the-art machine-learning classifiers outperformed human experts in the diagnosis of pigmented skin lesions and should have a more important role in clinical practice. However, a possible limitation of these algorithms is their decreased performance for out-of-distribution images, which should be addressed in future research. None.

Sections du résumé

BACKGROUND

METHODS

For this open, web-based, international, diagnostic study, human readers were asked to diagnose dermatoscopic images selected randomly in 30-image batches from a test set of 1511 images. The diagnoses from human readers were compared with those of 139 algorithms created by 77 machine-learning labs, who participated in the International Skin Imaging Collaboration 2018 challenge and received a training set of 10 015 images in advance. The ground truth of each lesion fell into one of seven predefined disease categories: intraepithelial carcinoma including actinic keratoses and Bowen's disease; basal cell carcinoma; benign keratinocytic lesions including solar lentigo, seborrheic keratosis and lichen planus-like keratosis; dermatofibroma; melanoma; melanocytic nevus; and vascular lesions. The two main outcomes were the differences in the number of correct specific diagnoses per batch between all human readers and the top three algorithms, and between human experts and the top three algorithms.

FINDINGS

Between Aug 4, 2018, and Sept 30, 2018, 511 human readers from 63 countries had at least one attempt in the reader study. 283 (55·4%) of 511 human readers were board-certified dermatologists, 118 (23·1%) were dermatology residents, and 83 (16·2%) were general practitioners. When comparing all human readers with all machine-learning algorithms, the algorithms achieved a mean of 2·01 (95% CI 1·97 to 2·04; p<0·0001) more correct diagnoses (17·91 [SD 3·42] vs 19·92 [4·27]). 27 human experts with more than 10 years of experience achieved a mean of 18·78 (SD 3·15) correct answers, compared with 25·43 (1·95) correct answers for the top three machine algorithms (mean difference 6·65, 95% CI 6·06-7·25; p<0·0001). The difference between human experts and the top three algorithms was significantly lower for images in the test set that were collected from sources not included in the training set (human underperformance of 11·4%, 95% CI 9·9-12·9 vs 3·6%, 0·8-6·3; p<0·0001).

INTERPRETATION

State-of-the-art machine-learning classifiers outperformed human experts in the diagnosis of pigmented skin lesions and should have a more important role in clinical practice. However, a possible limitation of these algorithms is their decreased performance for out-of-distribution images, which should be addressed in future research.

FUNDING

None.

Identifiants

DOI: 10.1016/S1470-2045(19)30333-X PMID: 31201137 PMC: PMC8237239

pubmed: 31201137

pii: S1470-2045(19)30333-X

doi: 10.1016/S1470-2045(19)30333-X

pmc: PMC8237239

mid: NIHMS1716706

pii:

doi:

Types de publication

Comparative Study Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

938-947

Subventions

Organisme : NCI NIH HHS

ID : P30 CA008748

Pays : United States

Commentaires et corrections

Type : CommentIn

Informations de copyright

Références

Arch Dermatol. 2005 Nov;141(11):1388-96

pubmed: 16301386

JAMA Dermatol. 2015 Oct;151(10):1081-6

pubmed: 25928283

Br J Dermatol. 2014 Nov;171(5):1099-107

pubmed: 24841846

J Am Acad Dermatol. 2018 Feb;78(2):270-277.e1

pubmed: 28969863

JAMA. 2018 Jun 12;319(22):2267-2268

pubmed: 29800012

Nature. 2017 Jun 28;546(7660):686

pubmed: 28658222

JAMA Dermatol. 2017 Apr 1;153(4):279-284

pubmed: 28196213

Lancet Oncol. 2002 Mar;3(3):159-65

pubmed: 11902502

Arch Dermatol. 2011 Feb;147(2):188-94

pubmed: 20956633

BMC Bioinformatics. 2011 Mar 17;12:77

pubmed: 21414208

Br J Dermatol. 1994 Apr;130(4):460-5

pubmed: 8186110

J Invest Dermatol. 2018 Jul;138(7):1529-1538

pubmed: 29428356

Melanoma Res. 2009 Jun;19(3):180-4

pubmed: 19369900

J Am Acad Dermatol. 2012 Nov;67(5):846-52

pubmed: 22325462

Ann Oncol. 2018 Aug 1;29(8):1836-1842

pubmed: 29846502

J Am Acad Dermatol. 2011 Jun;64(6):1068-73

pubmed: 21440329

Sci Data. 2018 Aug 14;5:180161

pubmed: 30106392

JAMA Dermatol. 2017 May 1;153(5):453-457

pubmed: 28241182

Biometrics. 1988 Sep;44(3):837-45

pubmed: 3203132

Br J Dermatol. 2016 Dec;175(6):1329-1337

pubmed: 27469990