Submitted:
05 September 2025
Posted:
08 September 2025
You are already at the latest version
Abstract
Keywords:
1. Summary
2. Data Description
2.1. Dataset Structure
2.1.1. Lesion Entity
2.1.2. Subject Entity
2.3. Image Entities
2.4. Diagnostic Entities: Dermatology, Histopathology, and Unified Diagnosis
- 1)
- Dermatology Diagnosis: A diagnosis provided by a panel of dermatologists, assigned to each lesion.
- 2)
- Histopathology Diagnosis: A diagnosis derived from histopathology reports, available for a subset of 29 excised lesions (out of 240). This report also contains tumor thickness information when applicable.
- 3)
- Unified Diagnosis: The definitive label for this dataset, derived by synthesizing the dermatology and histopathology diagnoses. The methodology for generating this label is detailed in the Methods section.
3. Methods
3.1. Ethics Declaration
3.2. Participants and Selection Criteria
3.3. Data Acquisition Workflow
- 1.
- Informed consent: When the subject arrives in the room, they are informed about the study. Then, the subject is given the informed consent form to read and sign if they are willing to participate in the study (Figure 3a). Estimated time: 5 min.
- 2.
- Clinical data collection: If the informed consent form is signed, the subject is asked to fill out a questionnaire in situ, so the data collector can clarify any questions the subject may have if needed (Figure 3b). Estimated time: 10 min.
- 3.
- Clinical and dermoscopic image acquisition: A smartphone-based digital camera is used by the data collector for capturing the images with and without the dermoscope attached to the device (Figure 3c). Estimated time: 30 s per lesion.
- 4.
- Diameter measurement of the skin lesion: The lesion is measured by the data collector with a caliper gauge (Figure 3d). Estimated time: 20 s per lesion.
- 5.
- Data Storage: All acquired data are verified and stored in a secure, encrypted storage system (Figure 3e). Estimated time: 5 min per lesion.
3.4. Diagnosis Consolidation and Ground Truth Determination
3.5. Data Curation and Validation
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| MCR-SL | Multimodal, Context-Rich Skin Lesion |
| UNN | University Hospital of North Norway |
| WARIFA | Watching the Risk Factors |
| NEV | Nevus |
| SK | Seborrheic Keratosis |
| BCC | Basal Cell Carcinoma |
| AK | Actinic Keratosis |
| ATY | Atypical nevus |
| MEL | Melanoma |
| SCC | Squamous Cell Carcinoma |
| ANG | Angioma |
| DF | Dermatofibroma |
| UNK | Unknown |
| NM | Non-malignant |
| M | Malignant |
References
- Wang, R.; Chen, Y.; Shao, X.; Chen, T.; Zhong, J.; Ou, Y.; Chen, J. Burden of Skin Cancer in Older Adults From 1990 to 2021 and Modelled Projection to 2050. JAMA Dermatol 2025, 161, 715. [Google Scholar] [CrossRef] [PubMed]
- Haenssle, H.A.; Fink, C.; Toberer, F.; Winkler, J.; Stolz, W.; Deinlein, T.; Hofmann-Wellenhof, R.; Lallas, A.; Emmert, S.; Buhl, T.; et al. Man against Machine Reloaded: Performance of a Market-Approved Convolutional Neural Network in Classifying a Broad Spectrum of Skin Lesions in Comparison with 96 Dermatologists Working under Less Artificial Conditions. Annals of Oncology 2020, 31, 137–143. [Google Scholar] [CrossRef] [PubMed]
- Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef] [PubMed]
- Tschandl, P.; Rosendahl, C.; Kittler, H. The HAM10000 Dataset, a Large Collection of Multi-Source Dermatoscopic Images of Common Pigmented Skin Lesions. Sci Data 2018, 5, 180161. [Google Scholar] [CrossRef] [PubMed]
- Combalia, M.; Codella, N.C.F.; Rotemberg, V.; Helba, B.; Vilaplana, V.; Reiter, O.; Carrera, C.; Barreiro, A.; Halpern, A.C.; Puig, S.; et al. BCN20000: Dermoscopic Lesions in the Wild. 2019. [CrossRef]
- Mendonca, T.; Ferreira, P.M.; Marques, J.S.; Marcal, A.R.S.; Rozeira, J. PH2 - A Dermoscopic Image Database for Research and Benchmarking. In Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); IEEE: Osaka (Japan), July 2013; pp. 5437–5440. [Google Scholar]
- Pacheco, A.G.C.; Lima, G.R.; Salomão, A.S.; Krohling, B.; Biral, I.P.; de Angelo, G.G.; Alves Jr, F.C.R.; Esgario, J.G.M.; Simora, A.C.; Castro, P.B.C.; et al. PAD-UFES-20: A Skin Lesion Dataset Composed of Patient Data and Clinical Images Collected from Smartphones. Data Brief 2020, 32, 106221. [Google Scholar] [CrossRef] [PubMed]
- Codella, N.; Rotemberg, V.; Tschandl, P.; Celebi, M.E.; Dusza, S.; Gutman, D.; Helba, B.; Kalloo, A.; Liopyris, K.; Marchetti, M.; et al. Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC). 2019.
- Watching the Risk Factors: Artificial Intelligence and the Prevention of Chronic Conditions | WARIFA Project | Fact Sheet | H2020 | CORDIS | European Commission. Available online: https://cordis.europa.eu/project/id/101017385/es (accessed on 27 October 2021).
- Daneshjou, R.; Barata, C.; Betz-Stablein, B.; Celebi, M.E.; Codella, N.; Combalia, M.; Guitera, P.; Gutman, D.; Halpern, A.; Helba, B.; et al. Checklist for Evaluation of Image-Based Artificial Intelligence Reports in Dermatology: CLEAR Derm Consensus Guidelines From the International Skin Imaging Collaboration Artificial Intelligence Working Group. JAMA Dermatol 2022, 158, 90–96. [Google Scholar] [CrossRef] [PubMed]
- Haenssle, H.A.; Fink, C.; Schneiderbauer, R.; Toberer, F.; Buhl, T.; Blum, A.; Kalloo, A.; Ben Hadj Hassen, A.; Thomas, L.; Enk, A.; et al. Man against Machine: Diagnostic Performance of a Deep Learning Convolutional Neural Network for Dermoscopic Melanoma Recognition in Comparison to 58 Dermatologists. Annals of Oncology 2018, 29, 1836–1842. [Google Scholar] [CrossRef] [PubMed]
- Bourkas, A.N.; Barone, N.; Bourkas, M.E.C.; Mannarino, M.; Fraser, R.D.J.; Lorincz, A.; Wang, S.C.; Ramirez-Garcialuna, J.L. Diagnostic Reliability in Teledermatology: A Systematic Review and a Meta-Analysis. BMJ Open 2023, 13, e068207. [Google Scholar] [CrossRef] [PubMed]
- Barata, C.; Celebi, M.E.; Marques, J.S. A Survey of Feature Extraction in Dermoscopy Image Analysis of Skin Cancer. IEEE J Biomed Health Inform 2019, 23, 1096–1109. [Google Scholar] [CrossRef] [PubMed]








| Lesion type | Malignancy | Diagnosed by histopathology | Diagnosed by dermatologists | ||
|---|---|---|---|---|---|
| Subjects | Lesions | Subjects | Lesions | ||
| BCC | Malignant | 18 (30.0 %) | 20 (8.3 %) | 18 (30.0 %) | 26 (10.8 %) |
| MEL | Malignant | 3 (5.0 %) | 3 (1.3 %) | 7 (11.7 %) | 8 (3.3 %) |
| SCC | Malignant | 0 (0.0 %) | 0 (0.0 %) | 5 (8.3 %) | 5 (2.1 %) |
| NEV | Non-Malignant | 3 (5.0 %) | 3 (1.3 %) | 37 (61.7 %) | 85 (35.4 %) |
| SK | Non-Malignant | 1 (1.7 %) | 1 (0.4 %) | 34 (56.6 %) | 84 (35.0 %) |
| AK | Non-Malignant | 0 (0.0 %) | 0 (0.0 %) | 10 (16.7 %) | 12 (5.0 %) |
| ATY | Non-Malignant | 2 (3.3 %) | 2 (0.8 %) | 6 (10.0 %) | 7 (2.9 %) |
| ANG | Non-Malignant | 0 (0.0 %) | 0 (0.0 %) | 2 (3.3 %) | 4 (1.7 %) |
| DF | Non-Malignant | 0 (0.0 %) | 0 (0.0 %) | 2 (3.3 %) | 2 (0.8 %) |
| UNK | - | 0 (0.0 %) | 0 (0.0 %) | 6 (10.0 %) | 7 (2.9 %) |
| Total | 27 (45.0 %) | 29 (12.1 %) | 60 (100.0 %) | 240 (100.0 %) | |
| Attribute | Values | # | % | # NM | % NM | # M | % M | p-value |
|---|---|---|---|---|---|---|---|---|
| Age | 14.9-40.7 | 8 | 13% | 0 | 0% | 8 | 100% | 0,582 |
| 40.7-66.3 | 23 | 38% | 13 | 57% | 10 | 43% | ||
| 66.3-92.0 | 29 | 48% | 19 | 66% | 10 | 34% | ||
| Sex | Female | 33 | 55% | 12 | 36% | 21 | 64% | 0,008 |
| Male | 27 | 45% | 20 | 74% | 7 | 26% | ||
| Height (cm) | 145.9-162.3 | 14 | 23% | 4 | 29% | 10 | 71% | 0,053 |
| 162.3-178.7 | 27 | 45% | 17 | 63% | 10 | 37% | ||
| 178.7-195.0 | 19 | 32% | 11 | 58% | 8 | 42% | ||
| Weight (kg) | 38.9-66.0 | 19 | 32% | 6 | 32% | 13 | 68% | 0,496 |
| 66.0-93.0 | 32 | 53% | 20 | 62% | 13 | 41% | ||
| 93.0-120.0 | 9 | 15% | 6 | 67% | 4 | 44% | ||
|
Natural hair color (≤ 18 years old) |
Brown | 25 | 42% | 12 | 48% | 13 | 52% | 0,382 |
| Fair blonde | 19 | 32% | 10 | 53% | 9 | 47% | ||
| Dark brown, black | 12 | 20% | 9 | 75% | 3 | 25% | ||
| Red or auburn | 3 | 5% | 1 | 33% | 2 | 67% | ||
| Blonde | 1 | 2% | 0 | 0% | 1 | 100% | ||
| Skin reaction to sun exposure | Red | 29 | 48% | 16 | 55% | 13 | 45% | 0,844 |
| Brown without 1st becoming red | 22 | 37% | 12 | 55% | 10 | 45% | ||
| Red with pain | 9 | 15% | 4 | 44% | 5 | 56% | ||
|
Number of moles (≤ 18 years old) |
Few | 21 | 35% | 14 | 67% | 7 | 33% | 0,065 |
| Some | 18 | 30% | 5 | 28% | 13 | 72% | ||
| Many | 14 | 23% | 8 | 57% | 6 | 43% | ||
| Unknown | 7 | 12% | 5 | 71% | 2 | 29% | ||
| Moles > 5 mm | Yes | 30 | 50% | 14 | 47% | 16 | 53% | 0,361 |
| No | 25 | 42% | 16 | 64% | 9 | 36% | ||
| Unknown | 5 | 8% | 2 | 40% | 3 | 60% | ||
| Moles > 20 cm | No | 60 | 100% | 32 | 53% | 28 | 47% | 1,000 |
|
Number of moles (now) |
Some | 24 | 40% | 9 | 38% | 15 | 62% | 0,133 |
| Few | 22 | 37% | 15 | 68% | 7 | 32% | ||
| Many | 7 | 12% | 3 | 43% | 4 | 57% | ||
| Unknown | 7 | 12% | 5 | 71% | 2 | 29% | ||
| Number of severe sunburns | 0 | 28 | 47% | 14 | 50% | 14 | 50% | 0,617 |
| 1-2 | 13 | 22% | 7 | 54% | 6 | 46% | ||
| 3-5 | 8 | 13% | 3 | 38% | 5 | 62% | ||
| >5 | 3 | 5% | 2 | 67% | 1 | 33% | ||
| Unknown | 8 | 13% | 6 | 75% | 2 | 25% | ||
| Sunbed use | No | 54 | 90% | 29 | 54% | 25 | 46% | 0,218 |
| Yes | 4 | 7% | 1 | 25% | 3 | 75% | ||
| Unknown | 2 | 3% | 2 | 100% | 0 | 0% | ||
| History of cancer | No | 39 | 65% | 17 | 44% | 22 | 56% | 0,073 |
| Yes | 21 | 35% | 15 | 71% | 6 | 29% | ||
| History of skin cancer | No | 41 | 68% | 19 | 46% | 22 | 54% | 0,102 |
| Yes | 15 | 25% | 9 | 60% | 6 | 40% | ||
| Unknown | 4 | 7% | 4 | 100% | 0 | 0% | ||
| History of skin cancer (close relatives) | No | 50 | 83% | 25 | 50% | 25 | 50% | 0,418 |
| Yes | 10 | 17% | 7 | 70% | 3 | 30% | ||
| Organ transplant | No | 57 | 95% | 30 | 53% | 27 | 47% | 0,234 |
| Yes | 2 | 3% | 2 | 100% | 0 | 0% | ||
| Unknown | 1 | 2% | 0 | 0% | 1 | 100% | ||
|
Immunosuppression |
No | 54 | 90% | 30 | 56% | 24 | 44% | 0,448 |
| Yes | 5 | 8% | 2 | 40% | 3 | 60% | ||
| Unknown | 1 | 2% | 0 | 0% | 1 | 100% | ||
| Patients derived from | Plastic surgery | 35 | 58% | 20 | 57% | 15 | 43% | 0,040 |
| Dermatology | 17 | 28% | 11 | 65% | 6 | 35% | ||
| Volunteer | 8 | 13% | 1 | 12% | 7 | 88% | ||
| Malignant lesions | yes | 32 | 53% | |||||
| no | 28 | 47% |
| Attribute | Data type | Description |
|---|---|---|
| lesion_id | string | A unique identifier for the lesion. |
| referral_diagnosis | text | The initial diagnosis provided during the subject's referral. |
| lesion_status_when_captured | categorical | The status of the lesion at the time of imaging. |
| location | categorical | The anatomical location of the lesion on the subject's body. |
| location_group | categorical | A broader classification of the lesion's location. |
| diameter | numerical | The measured diameter of the lesion in millimeters. |
| malignancy | categorical | The malignancy status of the lesion (i.e., malignant, non-malignant). |
| lesion_diagnosis | text | The unified diagnosis assigned to the lesion. |
| diagnosis_image_id | string | The unique identifier of the specific image used by the dermatologists to make their diagnoses. |
| Attribute | Data type | Description |
|---|---|---|
| subject_id | string | A unique identifier for the subject. |
| derived_from | categorical | The hospital's department that derived the subject. |
| year_of_birth | integer | The subject's year of birth. |
| age | integer | The subject's age. |
| sex | categorical | The subject's sex. |
| height | numerical | Subject height in centimeters. |
| weight | numerical | Subject weight in kilograms. |
| natural_hair_color | categorical | The subject's natural hair color at 18 years old. |
| skin_reaction_to_sun | categorical | How the subject's skin reacts to sun exposure without sun protection. |
| number_of_moles | integer | The total number of moles on the subject at 18 years old. |
| moles_bigger_5mm | integer | Current number of moles larger than 5mm. |
| moles_bigger_20cm | integer | Current number of moles larger than 20cm. |
| moles_body | integer | Current number of moles on the body. |
| sunburn_number | integer | The number of severe sunburns the subject has experienced. |
| sunburn_age | text | The age at which the subject experienced severe sunburns. |
| sunburn_number_group | categorical | A categorized group for the number of sunburns. |
| sunbed | boolean | Whether the subject has used a sunbed. |
| h_cancer | boolean | History of hereditary cancer. |
| h_skin_cancer | boolean | History of hereditary skin cancer. |
| h_skin_cancer_relatives | boolean | History of skin cancer in close relatives. |
| organ_transplant | boolean | Whether the subject has had an organ transplant. |
| immunosuppresion | boolean | Whether the subject is on immunosuppressive medication. |
| Attribute | Data type | Description |
|---|---|---|
| image_id | string | A unique identifier for each image. |
| lesion_id | string | A unique identifier for the lesion depicted in the image. |
| modality | categorical | The modality of the image (clinical or dermoscopic). |
| Attribute | Data type | Description |
|---|---|---|
| diagnosis_id | string | A unique identifier for each diagnosis. |
| lesion_id | string | The identifier of the lesion the diagnosis refers to. |
| image_id | string | The identifier of the image that was diagnosed. |
| expert_id | string | The identifier of the dermatologist who provided the diagnosis. |
| diagnosis | string | The primary diagnosis provided by the expert (e.g., NEV, MEL). |
| 2nd_option | string | An optional second choice or differential diagnosis. |
| certainty | categorical | A numerical rating of the expert's confidence in their diagnosis. Potential values are 0%, 25%, 50%, 75%, and 100%. |
| image_rating | integer | The expert's rating of the image quality, ranging from 1 to 10. |
| time | datetime | The time taken by the expert to provide the diagnosis. |
| Attribute | Data type | Description |
|---|---|---|
| diagnosis_id | string | A unique identifier for each histopathology diagnosis. |
| lesion_id | string | The identifier of the lesion the diagnosis refers to. |
| procedure | string | The type of procedure described in the report (e.g., biopsy, excision). |
| tumor_thickness | float | The Breslow thickness of the tumor, if applicable. |
| diagnosis | string | The final diagnosis from the histopathology report (e.g. NEV, MEL). |
| Attribute | Data type | Description |
|---|---|---|
| diagnosis_id | string | A unique identifier for the unified diagnosis. |
| lesion_id | string | The identifier of the lesion the diagnosis refers to. |
| dermatology_diagnosis | string | The final diagnosis selected by the dermatology experts. |
| histopathology_diagnosis | string | The diagnosis from the histopathology report, used as the ground truth when available. |
| diagnosis_id_histopath | string | The unique identifier of the histopathological diagnosis of the lesion. |
| unified_diagnosis | string | The final ground truth diagnosis for the lesion. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).