Patient Risk Profiling using Machine Learning

Overview

This repository contains a Jupyter Notebook that implements three different machine learning models to create patient risk profiles using healthcare and clinical datasets. This is only a sample model. The models included are:

Logistic Regression - A simple baseline model for binary classification.
Random Forest - An ensemble-based model for improved performance.
XGBoost - A gradient boosting model optimized for structured data.

Dataset

The script expects a healthcare dataset in CSV format. The dataset should include a Risk column as the target variable (0: Low Risk, 1: High Risk) and a PatientID column, which will be dropped during processing. All other numerical features will be used for training the models.

Prerequisites

Ensure you have the following dependencies installed before running the notebook:

pip install pandas numpy scikit-learn xgboost

Usage

Clone the repository:

git clone https://github.com/rkumar1010/patient-risk-profiling.git
cd patient-risk-profiling

Place your dataset in the project directory and update the healthcare_data.csv filename in the notebook if necessary.
Run the Jupyter Notebook:

jupyter notebook patient_risk_models.ipynb

The script will:
Load and preprocess the dataset.
Train and evaluate three different machine learning models.
Print performance metrics including accuracy and classification reports.

Model Performance

The notebook compares model performance based on accuracy and classification metrics. The best-performing model can be selected for further deployment.

Contributing

Feel free to fork this repository and submit pull requests for improvements, additional models, or dataset enhancements.

License

This project is licensed under the MIT License.

For any questions or suggestions, please open an issue in the repository or contact the maintainers.