Expert-Level Diagnosis of Nonpigmented Skin Cancer by Combined Convolutional Neural Networks

Philipp Tschandl; Cliff Rosendahl; Bengu Nisa Akay; Giuseppe Argenziano; Andreas Blum; Ralph P. Braun; Horacio Cabo; Jean Yves Gourhant; Jürgen Kreusch; Aimilios Lallas; Jan Lapins; Ashfaq Marghoob; Scott Menzies; Nina Maria Neuber; John Paoli; Harold S. Rabinovitz; Christoph Rinner; Alon Scope; H. Peter Soyer; Christoph Sinz; Luc Thomas; Iris Zalaudek; Harald Kittler

doi:10.1001/jamadermatol.2018.4378

Expert-Level Diagnosis of Nonpigmented Skin Cancer by Combined Convolutional Neural Networks

Philipp Tschandl, Cliff Rosendahl, Bengu Nisa Akay, Giuseppe Argenziano, Andreas Blum, Ralph P. Braun, Horacio Cabo, Jean Yves Gourhant, Jürgen Kreusch, Aimilios Lallas, Jan Lapins, Ashfaq Marghoob, Scott Menzies, Nina Maria Neuber, John Paoli, Harold S. Rabinovitz, Christoph Rinner, Alon Scope, H. Peter Soyer, Christoph SinzLuc Thomas, Iris Zalaudek, Harald Kittler

Research output: Contribution to journal › Article › peer-review

188 Scopus citations

Abstract

Importance: Convolutional neural networks (CNNs) achieve expert-level accuracy in the diagnosis of pigmented melanocytic lesions. However, the most common types of skin cancer are nonpigmented and nonmelanocytic, and are more difficult to diagnose. Objective: To compare the accuracy of a CNN-based classifier with that of physicians with different levels of experience. Design, Setting, and Participants: A CNN-based classification model was trained on 7895 dermoscopic and 5829 close-up images of lesions excised at a primary skin cancer clinic between January 1, 2008, and July 13, 2017, for a combined evaluation of both imaging methods. The combined CNN (cCNN) was tested on a set of 2072 unknown cases and compared with results from 95 human raters who were medical personnel, including 62 board-certified dermatologists, with different experience in dermoscopy. Main Outcomes and Measures: The proportions of correct specific diagnoses and the accuracy to differentiate between benign and malignant lesions measured as an area under the receiver operating characteristic curve served as main outcome measures. Results: Among 95 human raters (51.6% female; mean age, 43.4 years; 95% CI, 41.0-45.7 years), the participants were divided into 3 groups (according to years of experience with dermoscopy): beginner raters (<3 years), intermediate raters (3-10 years), or expert raters (>10 years). The area under the receiver operating characteristic curve of the trained cCNN was higher than human ratings (0.742; 95% CI, 0.729-0.755 vs 0.695; 95% CI, 0.676-0.713; P <.001). The specificity was fixed at the mean level of human raters (51.3%), and therefore the sensitivity of the cCNN (80.5%; 95% CI, 79.0%-82.1%) was higher than that of human raters (77.6%; 95% CI, 74.7%-80.5%). The cCNN achieved a higher percentage of correct specific diagnoses compared with human raters (37.6%; 95% CI, 36.6%-38.4% vs 33.5%; 95% CI, 31.5%-35.6%; P =.001) but not compared with experts (37.3%; 95% CI, 35.7%-38.8% vs 40.0%; 95% CI, 37.0%-43.0%; P =.18). Conclusions and Relevance: Neural networks are able to classify dermoscopic and close-up images of nonpigmented lesions as accurately as human experts in an experimental setting.

Original language	English (US)
Pages (from-to)	58-65
Number of pages	8
Journal	JAMA Dermatology
Volume	155
Issue number	1
DOIs	https://doi.org/10.1001/jamadermatol.2018.4378
State	Published - Jan 2019
Externally published	Yes

ASJC Scopus subject areas

Dermatology

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1001/jamadermatol.2018.4378

Cite this

Tschandl, P., Rosendahl, C., Akay, B. N., Argenziano, G., Blum, A., Braun, R. P., Cabo, H., Gourhant, J. Y., Kreusch, J., Lallas, A., Lapins, J., Marghoob, A., Menzies, S., Neuber, N. M., Paoli, J., Rabinovitz, H. S., Rinner, C., Scope, A., Soyer, H. P., ... Kittler, H. (2019). Expert-Level Diagnosis of Nonpigmented Skin Cancer by Combined Convolutional Neural Networks. JAMA Dermatology, 155(1), 58-65. https://doi.org/10.1001/jamadermatol.2018.4378

Tschandl, P, Rosendahl, C, Akay, BN, Argenziano, G, Blum, A, Braun, RP, Cabo, H, Gourhant, JY, Kreusch, J, Lallas, A, Lapins, J, Marghoob, A, Menzies, S, Neuber, NM, Paoli, J, Rabinovitz, HS, Rinner, C, Scope, A, Soyer, HP, Sinz, C, Thomas, L, Zalaudek, I & Kittler, H 2019, 'Expert-Level Diagnosis of Nonpigmented Skin Cancer by Combined Convolutional Neural Networks', JAMA Dermatology, vol. 155, no. 1, pp. 58-65. https://doi.org/10.1001/jamadermatol.2018.4378

@article{19739da4256e4415964c3c42fca24a56,

title = "Expert-Level Diagnosis of Nonpigmented Skin Cancer by Combined Convolutional Neural Networks",

abstract = "Importance: Convolutional neural networks (CNNs) achieve expert-level accuracy in the diagnosis of pigmented melanocytic lesions. However, the most common types of skin cancer are nonpigmented and nonmelanocytic, and are more difficult to diagnose. Objective: To compare the accuracy of a CNN-based classifier with that of physicians with different levels of experience. Design, Setting, and Participants: A CNN-based classification model was trained on 7895 dermoscopic and 5829 close-up images of lesions excised at a primary skin cancer clinic between January 1, 2008, and July 13, 2017, for a combined evaluation of both imaging methods. The combined CNN (cCNN) was tested on a set of 2072 unknown cases and compared with results from 95 human raters who were medical personnel, including 62 board-certified dermatologists, with different experience in dermoscopy. Main Outcomes and Measures: The proportions of correct specific diagnoses and the accuracy to differentiate between benign and malignant lesions measured as an area under the receiver operating characteristic curve served as main outcome measures. Results: Among 95 human raters (51.6% female; mean age, 43.4 years; 95% CI, 41.0-45.7 years), the participants were divided into 3 groups (according to years of experience with dermoscopy): beginner raters (<3 years), intermediate raters (3-10 years), or expert raters (>10 years). The area under the receiver operating characteristic curve of the trained cCNN was higher than human ratings (0.742; 95% CI, 0.729-0.755 vs 0.695; 95% CI, 0.676-0.713; P <.001). The specificity was fixed at the mean level of human raters (51.3%), and therefore the sensitivity of the cCNN (80.5%; 95% CI, 79.0%-82.1%) was higher than that of human raters (77.6%; 95% CI, 74.7%-80.5%). The cCNN achieved a higher percentage of correct specific diagnoses compared with human raters (37.6%; 95% CI, 36.6%-38.4% vs 33.5%; 95% CI, 31.5%-35.6%; P =.001) but not compared with experts (37.3%; 95% CI, 35.7%-38.8% vs 40.0%; 95% CI, 37.0%-43.0%; P =.18). Conclusions and Relevance: Neural networks are able to classify dermoscopic and close-up images of nonpigmented lesions as accurately as human experts in an experimental setting.",

author = "Philipp Tschandl and Cliff Rosendahl and Akay, {Bengu Nisa} and Giuseppe Argenziano and Andreas Blum and Braun, {Ralph P.} and Horacio Cabo and Gourhant, {Jean Yves} and J{\"u}rgen Kreusch and Aimilios Lallas and Jan Lapins and Ashfaq Marghoob and Scott Menzies and Neuber, {Nina Maria} and John Paoli and Rabinovitz, {Harold S.} and Christoph Rinner and Alon Scope and Soyer, {H. Peter} and Christoph Sinz and Luc Thomas and Iris Zalaudek and Harald Kittler",

year = "2019",

month = jan,

doi = "10.1001/jamadermatol.2018.4378",

language = "English (US)",

volume = "155",

pages = "58--65",

journal = "JAMA Dermatology",

issn = "2168-6068",

publisher = "American Medical Association",

number = "1",

}

TY - JOUR

T1 - Expert-Level Diagnosis of Nonpigmented Skin Cancer by Combined Convolutional Neural Networks

AU - Tschandl, Philipp

AU - Rosendahl, Cliff

AU - Akay, Bengu Nisa

AU - Argenziano, Giuseppe

AU - Blum, Andreas

AU - Braun, Ralph P.

AU - Cabo, Horacio

AU - Gourhant, Jean Yves

AU - Kreusch, Jürgen

AU - Lallas, Aimilios

AU - Lapins, Jan

AU - Marghoob, Ashfaq

AU - Menzies, Scott

AU - Neuber, Nina Maria

AU - Paoli, John

AU - Rabinovitz, Harold S.

AU - Rinner, Christoph

AU - Scope, Alon

AU - Soyer, H. Peter

AU - Sinz, Christoph

AU - Thomas, Luc

AU - Zalaudek, Iris

AU - Kittler, Harald

PY - 2019/1

Y1 - 2019/1

N2 - Importance: Convolutional neural networks (CNNs) achieve expert-level accuracy in the diagnosis of pigmented melanocytic lesions. However, the most common types of skin cancer are nonpigmented and nonmelanocytic, and are more difficult to diagnose. Objective: To compare the accuracy of a CNN-based classifier with that of physicians with different levels of experience. Design, Setting, and Participants: A CNN-based classification model was trained on 7895 dermoscopic and 5829 close-up images of lesions excised at a primary skin cancer clinic between January 1, 2008, and July 13, 2017, for a combined evaluation of both imaging methods. The combined CNN (cCNN) was tested on a set of 2072 unknown cases and compared with results from 95 human raters who were medical personnel, including 62 board-certified dermatologists, with different experience in dermoscopy. Main Outcomes and Measures: The proportions of correct specific diagnoses and the accuracy to differentiate between benign and malignant lesions measured as an area under the receiver operating characteristic curve served as main outcome measures. Results: Among 95 human raters (51.6% female; mean age, 43.4 years; 95% CI, 41.0-45.7 years), the participants were divided into 3 groups (according to years of experience with dermoscopy): beginner raters (<3 years), intermediate raters (3-10 years), or expert raters (>10 years). The area under the receiver operating characteristic curve of the trained cCNN was higher than human ratings (0.742; 95% CI, 0.729-0.755 vs 0.695; 95% CI, 0.676-0.713; P <.001). The specificity was fixed at the mean level of human raters (51.3%), and therefore the sensitivity of the cCNN (80.5%; 95% CI, 79.0%-82.1%) was higher than that of human raters (77.6%; 95% CI, 74.7%-80.5%). The cCNN achieved a higher percentage of correct specific diagnoses compared with human raters (37.6%; 95% CI, 36.6%-38.4% vs 33.5%; 95% CI, 31.5%-35.6%; P =.001) but not compared with experts (37.3%; 95% CI, 35.7%-38.8% vs 40.0%; 95% CI, 37.0%-43.0%; P =.18). Conclusions and Relevance: Neural networks are able to classify dermoscopic and close-up images of nonpigmented lesions as accurately as human experts in an experimental setting.

AB - Importance: Convolutional neural networks (CNNs) achieve expert-level accuracy in the diagnosis of pigmented melanocytic lesions. However, the most common types of skin cancer are nonpigmented and nonmelanocytic, and are more difficult to diagnose. Objective: To compare the accuracy of a CNN-based classifier with that of physicians with different levels of experience. Design, Setting, and Participants: A CNN-based classification model was trained on 7895 dermoscopic and 5829 close-up images of lesions excised at a primary skin cancer clinic between January 1, 2008, and July 13, 2017, for a combined evaluation of both imaging methods. The combined CNN (cCNN) was tested on a set of 2072 unknown cases and compared with results from 95 human raters who were medical personnel, including 62 board-certified dermatologists, with different experience in dermoscopy. Main Outcomes and Measures: The proportions of correct specific diagnoses and the accuracy to differentiate between benign and malignant lesions measured as an area under the receiver operating characteristic curve served as main outcome measures. Results: Among 95 human raters (51.6% female; mean age, 43.4 years; 95% CI, 41.0-45.7 years), the participants were divided into 3 groups (according to years of experience with dermoscopy): beginner raters (<3 years), intermediate raters (3-10 years), or expert raters (>10 years). The area under the receiver operating characteristic curve of the trained cCNN was higher than human ratings (0.742; 95% CI, 0.729-0.755 vs 0.695; 95% CI, 0.676-0.713; P <.001). The specificity was fixed at the mean level of human raters (51.3%), and therefore the sensitivity of the cCNN (80.5%; 95% CI, 79.0%-82.1%) was higher than that of human raters (77.6%; 95% CI, 74.7%-80.5%). The cCNN achieved a higher percentage of correct specific diagnoses compared with human raters (37.6%; 95% CI, 36.6%-38.4% vs 33.5%; 95% CI, 31.5%-35.6%; P =.001) but not compared with experts (37.3%; 95% CI, 35.7%-38.8% vs 40.0%; 95% CI, 37.0%-43.0%; P =.18). Conclusions and Relevance: Neural networks are able to classify dermoscopic and close-up images of nonpigmented lesions as accurately as human experts in an experimental setting.

UR - http://www.scopus.com/inward/record.url?scp=85057833474&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057833474&partnerID=8YFLogxK

U2 - 10.1001/jamadermatol.2018.4378

DO - 10.1001/jamadermatol.2018.4378

M3 - Article

C2 - 30484822

AN - SCOPUS:85057833474

SN - 2168-6068

VL - 155

SP - 58

EP - 65

JO - JAMA Dermatology

JF - JAMA Dermatology

IS - 1

ER -

Expert-Level Diagnosis of Nonpigmented Skin Cancer by Combined Convolutional Neural Networks

Abstract

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this