Card

About Dataset

This dataset contains dermatoscopic images of skin lesions organized into six classes:

Melanoma
Nevus (Mole)
Basal Cell Carcinoma
Actinic Keratosis
Benign Keratosis
Vascular Lesion
The dataset has been preprocessed to remove duplicate images and ensure consistency between the training and test sets. It is structured into train and test folders, with subfolders for each class. This makes it ready for use in machine learning and deep learning projects.

Key Features:

Total Images: 1888 (1820 train, 68 test)
Classes: 6
Image Size: Variable (can be resized during preprocessing)
Preprocessing: Duplicate images removed using perceptual hashing.

profile
Qwen2.5-Max
1:26 am
Here’s a polished and professional description you can use for your Kaggle dataset. It’s concise, informative, and includes all the necessary details about the dataset, its purpose, preprocessing steps, and usage guidelines.

Dataset Description

Title: Cleaned Skin Cancer Dataset (6 Classes)

Description:

This dataset contains dermatoscopic images of skin lesions organized into six classes:

Melanoma
Nevus (Mole)
Basal Cell Carcinoma
Actinic Keratosis
Benign Keratosis
Vascular Lesion
The dataset has been preprocessed to remove duplicate images and ensure consistency between the training and test sets. It is structured into train and test folders, with subfolders for each class. This makes it ready for use in machine learning and deep learning projects.

Key Features:

Total Images: 1888 (1820 train, 68 test)
Classes: 6
Image Size: Variable (can be resized during preprocessing)
Preprocessing: Duplicate images removed using perceptual hashing.
Use Case:
This dataset is ideal for training and evaluating models for skin cancer classification. It can be used with frameworks like TensorFlow, PyTorch, or scikit-learn. The cleaned structure ensures that the dataset is free from duplicates and ready for immediate use.

Acknowledgments:

The original dataset was sourced from the International Skin Imaging Collaboration (ISIC) . Cleaning and preprocessing were performed to remove duplicates and prepare the dataset for modeling. Please refer to the ISIC website for more information about the original dataset: ISIC Archive .

License:

This dataset is derived from the ISIC dataset and is made available under the CC BY-NC-SA license. Any use of this dataset must comply with the original licensing terms, including non-commercial use and attribution.