Card

About Dataset

*


📂 Dataset Description:

The Askin Disease Dataset is a synthetic dataset generated to support machine learning and data analysis tasks related to dermatological conditions. It contains 34,000 rows and 10 columns, covering various aspects of skin diseases, patient demographics, treatment history, and disease severity.

🌟 Why This Dataset?

Skin diseases are a prevalent health issue affecting millions of people globally. Accurate diagnosis and effective treatment planning are crucial for improving patient outcomes. This dataset provides a comprehensive representation of various skin disease conditions, making it ideal for:

  • Classification tasks: Predicting disease type or severity.
  • Predictive modeling: Estimating treatment effectiveness.
  • Data visualization: Analyzing demographic patterns.
  • Exploratory Data Analysis (EDA): Understanding distribution and correlations.
  • Healthcare analytics: Gaining insights into treatment efficacy and disease prevalence.

🗃️ Dataset Content:

The dataset contains the following 10 columns:

  1. Patient_ID: Unique identifier for each patient (e.g., P00001).
  2. Age: Age of the patient (range: 18 to 90).
  3. Gender: Gender of the patient (Male/Female).
  4. Skin_Color: The skin tone of the patient (Fair/Medium/Dark).
  5. Disease_Type: The diagnosed skin disease (Eczema, Psoriasis, Acne, Rosacea, Vitiligo, Melanoma).
  6. Severity: The severity level of the disease (Mild, Moderate, Severe).
  7. Duration: Duration of the disease in months (range: 1 to 120).
  8. Affected_Area: The body part affected by the disease (Face, Arms, Legs, Back, Chest, Scalp).
  9. Previous_Treatment: Indicates whether the patient has received prior treatment (Yes/No).
  10. Treatment_Effectiveness: The effectiveness of previous treatments (High, Moderate, Low).

🔥 Key Features:

  • Balanced Distribution: The dataset is synthetically generated to ensure a balanced distribution of disease types and severity levels.
  • Comprehensive Coverage: Multiple features capture patient demographics, disease characteristics, and treatment outcomes.
  • Versatile Applications: Suitable for classification, prediction, clustering, and data visualization tasks.
  • Data Integrity: Synthetic data eliminates privacy concerns while retaining the structure and characteristics of real-world data.

🚀 Potential Use Cases:

  • Disease Classification: Using machine learning to classify skin disease types.
  • Severity Prediction: Predicting the severity level based on demographic and disease characteristics.
  • Treatment Effectiveness Analysis: Analyzing how previous treatments correlate with disease severity and affected areas.
  • Health Insights: Gaining insights into how skin color and demographics impact disease prevalence and severity.

🛠️ Recommended Techniques:

  • Exploratory Data Analysis (EDA) for initial data inspection and visualization.
  • Machine Learning Algorithms such as Decision Trees, Random Forest, SVM, and Neural Networks for classification tasks.
  • Data Preprocessing Techniques like handling missing values, encoding categorical data, and scaling numerical values.
  • Model Evaluation Metrics including accuracy, precision, recall, F1-score, and ROC-AUC.

📈 License:

This dataset is licensed under the CC BY 4.0 License. You are free to use, share, and modify the dataset with proper attribution.


💬 Inspiration:

  • Can machine learning accurately classify skin disease types based on demographic and clinical features?
  • How effective are various treatments for different skin conditions?
  • Can we predict the severity of skin diseases using patient attributes?

📬 Acknowledgments:

This dataset is synthetically generated and does not represent real patient data. It is designed purely for educational and research purposes in machine learning and data analysis.