|
a |
|
b/README.md |
|
|
1 |
# Patient Readmission Risk Prediction |
|
|
2 |
|
|
|
3 |
## Problem Statement |
|
|
4 |
|
|
|
5 |
Patient readmissions within 30 days of discharge pose a significant challenge in healthcare. High readmission rates are not only indicators of suboptimal care but also lead to financial penalties under programs such as the Hospital Readmissions Reduction Program (HRRP). This project aims to predict the likelihood of patient readmission by leveraging advanced SQL for data preparation, Python for predictive modeling, and Tableau for data visualization. |
|
|
6 |
|
|
|
7 |
## Table of Contents |
|
|
8 |
|
|
|
9 |
- [Problem Statement](#problem-statement) |
|
|
10 |
- [Project Overview](#project-overview) |
|
|
11 |
- [Folder Structure](#folder-structure) |
|
|
12 |
- [Data Collection and Preparation](#data-collection-and-preparation) |
|
|
13 |
- [Predictive Analytics](#predictive-analytics) |
|
|
14 |
- [Data Visualization](#data-visualization) |
|
|
15 |
- [Technical Details](#technical-details) |
|
|
16 |
- [Key Features](#key-features) |
|
|
17 |
- [Outcome and Impact](#outcome-and-impact) |
|
|
18 |
|
|
|
19 |
## Project Overview |
|
|
20 |
|
|
|
21 |
The project is structured into the following phases: |
|
|
22 |
|
|
|
23 |
1. **Data Collection and Preparation**: SQL-based extraction, transformation, and loading (ETL) of patient data. |
|
|
24 |
2. **Predictive Analytics**: Developing a logistic regression model to predict 30-day readmission risks. |
|
|
25 |
3. **Data Visualization**: Using Tableau and Python libraries to create interactive dashboards and visual representations of key metrics. |
|
|
26 |
|
|
|
27 |
## Folder Structure |
|
|
28 |
|
|
|
29 |
- **Data_loading_in_MySQL.ipynb**: MySQL-based data loading script to manage and insert patient data into a relational database. |
|
|
30 |
- **generating_data.ipynb**: Python script to generate synthetic patient data, simulating real-world scenarios. |
|
|
31 |
- **readmission_risk_prediction.ipynb**: Jupyter notebook for running SQL queries and implementing predictive models using Python. |
|
|
32 |
- **Visualisation_script.ipynb**: Notebook to create visualizations using seaborn and matplotlib, along with Tableau for interactive dashboards. |
|
|
33 |
|
|
|
34 |
## Data Collection and Preparation |
|
|
35 |
|
|
|
36 |
### 1. Data Loading |
|
|
37 |
- **MySQL Operations**: The `Data_loading_in_MySQL.ipynb` script performs the following SQL operations: |
|
|
38 |
- **Database Connection**: Utilizes the `mysql.connector` Python library to establish a connection to the MySQL server. |
|
|
39 |
- **Data Insertion**: Reads a CSV file into a pandas DataFrame, and inserts the data using appropriate SQL commands. The script includes error handling for data types, particularly for date fields, using the `datetime.strptime` function for proper formatting.Ensured the generated data integrity and that all the tables have related data creating a relational database. |
|
|
40 |
|
|
|
41 |
### 2. Data Generation |
|
|
42 |
- **Synthetic Data Creation**: The `generating_data.ipynb` script: |
|
|
43 |
- Creates synthetic patient data using numpy and pandas, simulating various patient attributes such as age, comorbidity count, medication count, and lab results. |
|
|
44 |
- **Feature Engineering**: Derives additional fields like `readmission_risk` using a combination of boolean logic and statistical methods, ensuring the synthetic data is representative of real-world scenarios. |
|
|
45 |
|
|
|
46 |
## Predictive Analytics |
|
|
47 |
|
|
|
48 |
### 3. Risk Prediction |
|
|
49 |
- **Data Extraction**: In `readmission_risk_prediction.ipynb`, patient data is extracted from the MySQL database using SQL queries. |
|
|
50 |
- **Feature Encoding**: Uses pandas to encode categorical variables into numerical formats suitable for modeling, utilizing techniques such as one-hot encoding. |
|
|
51 |
- **Correlation Matrix**: Computes correlations between features using pandas `.corr()` method, helping identify key predictors of readmission. |
|
|
52 |
- **Predictive Modeling**: Implements logistic regression using `sklearn.linear_model.LogisticRegression` to predict the probability of a patient being readmitted within 30 days. |
|
|
53 |
|
|
|
54 |
## Data Visualization |
|
|
55 |
|
|
|
56 |
### 4. Visualization |
|
|
57 |
- **Boxplots and Heatmaps**: `Visualisation_script.ipynb` includes: |
|
|
58 |
- **Age Distribution Analysis**: Boxplot visualization using seaborn’s `sns.boxplot()` to compare age across different readmission risk levels. |
|
|
59 |
- **Correlation Heatmap**: Utilizes seaborn’s `sns.heatmap()` to display a correlation matrix, identifying relationships between variables such as length of stay, comorbidity count, and medication count. |
|
|
60 |
- **Additional Visualizations**: Multiple boxplots and scatter plots that provide insights into how various factors like medication duration and lab results impact readmission risks. |
|
|
61 |
|
|
|
62 |
- **Tableau Dashboards**: |
|
|
63 |
- Interactive dashboards created in Tableau, providing healthcare professionals with tools to explore and filter data dynamically. Dashboards include trend analyses, patient segmentation, and real-time risk scoring. |
|
|
64 |
|
|
|
65 |
## Technical Details |
|
|
66 |
|
|
|
67 |
- **SQL Expertise**: Proficient use of SQL for data extraction, manipulation, and integration within a Python environment. |
|
|
68 |
- **Data Processing**: Extensive use of pandas for data cleaning, transformation, and feature engineering. |
|
|
69 |
- **Machine Learning**: Implementation of logistic regression for classification tasks, utilizing scikit-learn. |
|
|
70 |
- **Visualization Tools**: Expertise in seaborn and matplotlib for static plots; proficiency in Tableau for interactive dashboards. |
|
|
71 |
|
|
|
72 |
## Key Features |
|
|
73 |
|
|
|
74 |
- **ETL Process**: Comprehensive ETL pipeline that extracts, transforms, and loads data into a MySQL database. |
|
|
75 |
- **Predictive Analytics**: Develops a logistic regression model that accurately predicts patient readmission, offering actionable insights. |
|
|
76 |
- **Interactive Dashboards**: (If applicable) Tableau dashboards that allow for dynamic exploration of patient data. |
|
|
77 |
|
|
|
78 |
|
|
|
79 |
## Outcome and Impact |
|
|
80 |
|
|
|
81 |
### Outcome |
|
|
82 |
- **Predictive Insights**: The model predicts patient readmission risk, allowing healthcare providers to take preemptive actions, such as closer monitoring or additional follow-up care. |
|
|
83 |
|
|
|
84 |
### Impact |
|
|
85 |
- **Reduction in Readmission Rates**: Targeted interventions can lower the number of 30-day readmissions, improving patient outcomes. |
|
|
86 |
- **Cost Efficiency**: Avoidance of financial penalties under HRRP and reduced overall healthcare costs through better resource allocation. |
|
|
87 |
- **Enhanced Patient Care**: Improved discharge planning and patient management based on data-driven insights. |
|
|
88 |
|
|
|
89 |
--- |