Diff of /README.md [000000] .. [c7ae97]

Switch to unified view

a b/README.md
1
<h1>About Dataset</h1>
2
3
<h2>πŸ“Œ Overview</h2>
4
<p>
5
This dataset is curated to support research in <strong>stroke risk prediction</strong>, enabling the development of models that estimate:
6
</p>
7
<ul>
8
  <li><strong>Binary Classification:</strong> Whether a person is at risk of stroke.</li>
9
  <li><strong>Regression Analysis:</strong> The percentage likelihood of stroke occurrence.</li>
10
</ul>
11
<p>
12
It is designed for use in machine learning and deep learning applications in medical AI and predictive healthcare. The dataset is balanced, with 50% of records for individuals at risk and 50% not at risk.
13
</p>
14
15
<h2>πŸ“œ Dataset Generation Process</h2>
16
<p>
17
The dataset was created through a combination of <strong>medical literature review</strong>, expert consultation, and <strong>statistical modeling</strong>. Feature distributions and relationships reflect real-world clinical patterns.
18
</p>
19
20
<h2>πŸ“– Medical References & Sources</h2>
21
<p>
22
The dataset is grounded in established risk factors from trusted medical sources, including:
23
</p>
24
<ul>
25
  <li>American Stroke Association (ASA): Guidelines on stroke risk and early symptoms.</li>
26
  <li>Mayo Clinic & Cleveland Clinic: Literature on cardiovascular and stroke risk.</li>
27
  <li><em>Harrison’s Principles of Internal Medicine</em> (20th Ed.)</li>
28
  <li><em>Stroke Prevention, Treatment, and Rehabilitation</em> (Oxford University Press, 2021)</li>
29
  <li><em>The Stroke Book</em> (Cambridge Medicine, 2nd Ed.)</li>
30
  <li>World Health Organization (WHO) reports on stroke risk and prevention</li>
31
</ul>
32
33
<h2>πŸ”¬ Features of the Dataset</h2>
34
35
<h3>1️⃣ Symptoms (Primary Predictors)</h3>
36
<p>All features are binary (1 = present, 0 = absent):</p>
37
<ul>
38
  <li>Chest Pain</li>
39
  <li>Shortness of Breath</li>
40
  <li>Irregular Heartbeat</li>
41
  <li>Fatigue & Weakness</li>
42
  <li>Dizziness</li>
43
  <li>Swelling (Edema)</li>
44
  <li>Pain in Neck/Jaw/Shoulder/Back</li>
45
  <li>Excessive Sweating</li>
46
  <li>Persistent Cough</li>
47
  <li>Nausea/Vomiting</li>
48
  <li>High Blood Pressure</li>
49
  <li>Chest Discomfort (Activity)</li>
50
  <li>Cold Hands/Feet</li>
51
  <li>Snoring/Sleep Apnea</li>
52
  <li>Anxiety/Feeling of Doom</li>
53
</ul>
54
55
<h3>2️⃣ Target Variables (Predicted Outcomes)</h3>
56
<ul>
57
  <li><strong>At Risk (Binary):</strong> 1 if the person is at risk of stroke, 0 otherwise</li>
58
  <li><strong>Stroke Risk (%):</strong> Estimated probability of stroke (0–100)</li>
59
</ul>
60
61
<h3>3️⃣ Demographic Feature</h3>
62
<ul>
63
  <li><strong>Age:</strong> Stroke risk increases significantly with age</li>
64
</ul>
65
66
<h2>⚑ Why This Dataset is Accurate and Useful?</h2>
67
68
<h4>βœ… Balanced Data Distribution:</h4>
69
<ul>
70
  <li>50% at risk, 50% not at risk</li>
71
  <li>Prevents model bias toward any class</li>
72
</ul>
73
74
<h4>βœ… Medically-Inspired Feature Engineering:</h4>
75
<ul>
76
  <li>Features validated via clinical guidelines and expert opinion</li>
77
  <li>Age is a weighted predictor</li>
78
  <li>Symptom severity is implicitly encoded</li>
79
</ul>
80
81
<h4>βœ… Diverse Risk Factors Included:</h4>
82
<ul>
83
  <li>Cardiovascular: chest pain, high BP, heartbeat irregularity</li>
84
  <li>Neurological: dizziness, fatigue, anxiety</li>
85
  <li>Sleep-related: snoring, sleep apnea</li>
86
</ul>
87
88
<h4>βœ… Scalable and ML-Ready:</h4>
89
<ul>
90
  <li>Supports classification and regression</li>
91
  <li>Works with ML (XGBoost, SVM, RF) and DL frameworks (PyTorch, TensorFlow)</li>
92
  <li>Suitable for Explainable AI (XAI)</li>
93
</ul>
94
95
<h2>πŸ“‚ Dataset Usage & Applications</h2>
96
<ul>
97
  <li>βœ… <strong>Predictive Analytics:</strong> Early detection and prevention of stroke</li>
98
  <li>βœ… <strong>Healthcare Chatbots:</strong> Real-time triage and risk advice</li>
99
  <li>βœ… <strong>Medical Research:</strong> Studying patterns in stroke risk</li>
100
  <li>βœ… <strong>Explainable AI:</strong> Understanding how models assess stroke likelihood</li>
101
</ul>