StrokeRisk / Git / Diff of /README.md

Datasets:

gilberto-topp/

StrokeRisk

Downloads: 1

Diff of /README.md [000000] .. [c7ae97]

Switch to unified view

 b/README.md
+<h1>About Dataset</h1>
+<h2>📌 Overview</h2>
+<p>
+This dataset is curated to support research in <strong>stroke risk prediction</strong>, enabling the development of models that estimate:
+</p>
+<ul>
+  <li><strong>Binary Classification:</strong> Whether a person is at risk of stroke.</li>
+  <li><strong>Regression Analysis:</strong> The percentage likelihood of stroke occurrence.</li>
+</ul>
+<p>
+It is designed for use in machine learning and deep learning applications in medical AI and predictive healthcare. The dataset is balanced, with 50% of records for individuals at risk and 50% not at risk.
+</p>
+<h2>📜 Dataset Generation Process</h2>
+<p>
+The dataset was created through a combination of <strong>medical literature review</strong>, expert consultation, and <strong>statistical modeling</strong>. Feature distributions and relationships reflect real-world clinical patterns.
+</p>
+<h2>📖 Medical References & Sources</h2>
+<p>
+The dataset is grounded in established risk factors from trusted medical sources, including:
+</p>
+<ul>
+  <li>American Stroke Association (ASA): Guidelines on stroke risk and early symptoms.</li>
+  <li>Mayo Clinic & Cleveland Clinic: Literature on cardiovascular and stroke risk.</li>
+  <li><em>Harrison’s Principles of Internal Medicine</em> (20th Ed.)</li>
+  <li><em>Stroke Prevention, Treatment, and Rehabilitation</em> (Oxford University Press, 2021)</li>
+  <li><em>The Stroke Book</em> (Cambridge Medicine, 2nd Ed.)</li>
+  <li>World Health Organization (WHO) reports on stroke risk and prevention</li>
+</ul>
+<h2>🔬 Features of the Dataset</h2>
+<h3>1️⃣ Symptoms (Primary Predictors)</h3>
+<p>All features are binary (1 = present, 0 = absent):</p>
+<ul>
+  <li>Chest Pain</li>
+  <li>Shortness of Breath</li>
+  <li>Irregular Heartbeat</li>
+  <li>Fatigue & Weakness</li>
+  <li>Dizziness</li>
+  <li>Swelling (Edema)</li>
+  <li>Pain in Neck/Jaw/Shoulder/Back</li>
+  <li>Excessive Sweating</li>
+  <li>Persistent Cough</li>
+  <li>Nausea/Vomiting</li>
+  <li>High Blood Pressure</li>
+  <li>Chest Discomfort (Activity)</li>
+  <li>Cold Hands/Feet</li>
+  <li>Snoring/Sleep Apnea</li>
+  <li>Anxiety/Feeling of Doom</li>
+</ul>
+<h3>2️⃣ Target Variables (Predicted Outcomes)</h3>
+<ul>
+  <li><strong>At Risk (Binary):</strong> 1 if the person is at risk of stroke, 0 otherwise</li>
+  <li><strong>Stroke Risk (%):</strong> Estimated probability of stroke (0–100)</li>
+</ul>
+<h3>3️⃣ Demographic Feature</h3>
+<ul>
+  <li><strong>Age:</strong> Stroke risk increases significantly with age</li>
+</ul>
+<h2>⚡ Why This Dataset is Accurate and Useful?</h2>
+<h4>✅ Balanced Data Distribution:</h4>
+<ul>
+  <li>50% at risk, 50% not at risk</li>
+  <li>Prevents model bias toward any class</li>
+</ul>
+<h4>✅ Medically-Inspired Feature Engineering:</h4>
+<ul>
+  <li>Features validated via clinical guidelines and expert opinion</li>
+  <li>Age is a weighted predictor</li>
+  <li>Symptom severity is implicitly encoded</li>
+</ul>
+<h4>✅ Diverse Risk Factors Included:</h4>
+<ul>
+  <li>Cardiovascular: chest pain, high BP, heartbeat irregularity</li>
+  <li>Neurological: dizziness, fatigue, anxiety</li>
+  <li>Sleep-related: snoring, sleep apnea</li>
+</ul>
+<h4>✅ Scalable and ML-Ready:</h4>
+<ul>
+  <li>Supports classification and regression</li>
+  <li>Works with ML (XGBoost, SVM, RF) and DL frameworks (PyTorch, TensorFlow)</li>
+  <li>Suitable for Explainable AI (XAI)</li>
+</ul>
+<h2>📂 Dataset Usage & Applications</h2>
+<ul>
+  <li>✅ <strong>Predictive Analytics:</strong> Early detection and prevention of stroke</li>
+  <li>✅ <strong>Healthcare Chatbots:</strong> Real-time triage and risk advice</li>
+  <li>✅ <strong>Medical Research:</strong> Studying patterns in stroke risk</li>
+  <li>✅ <strong>Explainable AI:</strong> Understanding how models assess stroke likelihood</li>
+</ul>