The DERMAHAIR dataset is a meticulously constructed benchmark for the task of Automatic Skin Hair Removal aimed at enhancing skin diagnosis and analysis. This dataset is partially derived from the HAM10000 dataset [1], which is a renowned collection of 10,000 dermatoscopic images covering a broad spectrum of skin lesions including melanomas and nevi.
For the creation of DERMAHAIR, we selected 100 images from each category present in HAM10000. To each selected dermatoscopic image, we added artificial hair patterns sourced from the Digital Hair Dataset[2]. This process involved using binary masks where white pixels indicate hair and black pixels represent the background. The addition process comprised normalizing these masks, merging them with the dermatoscopic images, and then conducting a post-processing routine to adjust the pixel values. This technique effectively superimposed a hair structure onto the skin lesion images, creating a version with hair artifacts. The generated pair is used to train Automatic Hair Removal models.
The resulting DERMAHAIR dataset is organized into two primary directories: 'Data_Skin_with_Hair' and 'Data_Skin_without_Hair'. Each directory contains seven subdirectories corresponding to the seven lesion types: Melanocytic nevi (common benign melanocyte tumors), Melanoma (the deadliest form of skin cancer), Benign keratosis (encompassing benign growths from keratinocytes), Basal cell carcinoma (a frequent but less aggressive skin cancer), Actinic keratoses (sun-induced precancerous lesions), Vascular lesions (such as benign blood vessel growths), and Dermatofibroma (benign fibrous skin lesions).
This dataset, alongside its detailed description and the artificial hair addition process, is open-sourced and accessible via Kaggle, providing a valuable resource for researchers in the field.
[1] Tschandl, P., Rosendahl, C., & Kittler, H. (2018). "The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions." Scientific Data, 5(1), 1-9.
[2] https://www.kaggle.com/datasets/weilizai/digital-hair-dataset