|
a/README.md |
|
b/README.md |
1 |
# Multi-Class-Prediction-of-Obesity-Risk |
1 |
# Multi-Class-Prediction-of-Obesity-Risk |
2 |
|
2 |
|
3 |
#### This project is an extension of improving the models, productionizing the project with best practices previously developed for Kaggle Competition "Multi Class Prediction of Obesity Risk"where we placed within the top 5%. The project aims at redoing the project with added production using best practices learned from class MGSC-695-076. For the sake of security, no access keys were shared. |
3 |
#### This project is an extension of improving the models, productionizing the project with best practices previously developed for Kaggle Competition "Multi Class Prediction of Obesity Risk"where we placed within the top 5%. The project aims at redoing the project with added production using best practices learned from class MGSC-695-076. For the sake of security, no access keys were shared. |
4 |
|
4 |
|
5 |
Tech Stack: Apache Kafka, MLflow, Azure ML, VS Code, Poetry, AutoGluon, H2O, PyCaret, FLAML, PandasAI, Docker, Streamlit, Postman, FastAPI, SHAP |
5 |
Tech Stack: Apache Kafka, MLflow, Azure ML, VS Code, Poetry, AutoGluon, H2O, PyCaret, FLAML, PandasAI, Docker, Streamlit, Postman, FastAPI, SHAP |
6 |
|
6 |
|
7 |
## Project Overview |
7 |
## Project Overview |
8 |
|
8 |
|
9 |
#### 1. Data Preparation and Simulation |
9 |
#### 1. Data Preparation and Simulation |
10 |
|
10 |
|
11 |
- **Data Source:** Original Kaggle CSV data split into Model Development and Hold-Off datasets. |
11 |
- **Data Source:** Original Kaggle CSV data split into Model Development and Hold-Off datasets.
|
12 |
- **Live Data Simulation:** Used Apache Kafka for simulating real-time data feeds. |
12 |
- **Live Data Simulation:** Used Apache Kafka for simulating real-time data feeds. |
13 |
|
13 |
|
14 |
|
14 |
|
15 |
|
15 |
|
16 |
<!-- Slide 6 --> |
16 |
<!-- Slide 6 -->
|
17 |
<p align="center"> |
17 |
<p align="center">
|
18 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide6.png"> |
18 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide6.png?raw=true">
|
19 |
</p> |
19 |
</p> |
20 |
|
20 |
|
21 |
<!-- Slide 7 --> |
21 |
<!-- Slide 7 -->
|
22 |
<p align="center"> |
22 |
<p align="center">
|
23 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide7.png"> |
23 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide7.png?raw=true">
|
24 |
</p> |
24 |
</p> |
25 |
|
25 |
|
26 |
<!-- Slide 8 --> |
26 |
<!-- Slide 8 -->
|
27 |
<p align="center"> |
27 |
<p align="center">
|
28 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide8.png"> |
28 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide8.png?raw=true">
|
29 |
</p> |
29 |
</p> |
30 |
|
30 |
|
31 |
#### 2. Azure Machine Learning Setup |
31 |
#### 2. Azure Machine Learning Setup |
32 |
|
32 |
|
33 |
- **Workspace Configuration:** Established Azure ML Workspace with RBAC. |
33 |
- **Workspace Configuration:** Established Azure ML Workspace with RBAC.
|
34 |
- **Team Roles:** Assigned roles for Data Science, Data Engineering, ML Engineering, and Governance. |
34 |
- **Team Roles:** Assigned roles for Data Science, Data Engineering, ML Engineering, and Governance. |
35 |
|
35 |
|
36 |
#### 3. Exploratory Data Analysis (EDA) |
36 |
#### 3. Exploratory Data Analysis (EDA) |
37 |
|
37 |
|
38 |
- **Comprehensive Analysis:** |
38 |
- **Comprehensive Analysis:**
|
39 |
- **Univariate Analysis:** Leveraged PandasAI for detailed insights. |
39 |
- **Univariate Analysis:** Leveraged PandasAI for detailed insights.
|
40 |
- **Bivariate Analysis:** Used pairplots and interaction plots. |
40 |
- **Bivariate Analysis:** Used pairplots and interaction plots.
|
41 |
- **Dimensionality Reduction:** Applied PCA with KMediansClustering. |
41 |
- **Dimensionality Reduction:** Applied PCA with KMediansClustering. |
42 |
|
42 |
|
43 |
<!-- Slide 9 --> |
43 |
<!-- Slide 9 -->
|
44 |
<p align="center"> |
44 |
<p align="center">
|
45 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide9.png"> |
45 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide9.png?raw=true">
|
46 |
</p> |
46 |
</p> |
47 |
|
47 |
|
48 |
<!-- Slide 10 --> |
48 |
<!-- Slide 10 -->
|
49 |
<p align="center"> |
49 |
<p align="center">
|
50 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide10.png"> |
50 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide10.png?raw=true">
|
51 |
</p> |
51 |
</p> |
52 |
|
52 |
|
53 |
<!-- Slide 11 --> |
53 |
<!-- Slide 11 -->
|
54 |
<p align="center"> |
54 |
<p align="center">
|
55 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide11.png"> |
55 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide11.png?raw=true">
|
56 |
</p> |
56 |
</p> |
57 |
|
57 |
|
58 |
<!-- Slide 12 --> |
58 |
<!-- Slide 12 -->
|
59 |
<p align="center"> |
59 |
<p align="center">
|
60 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide12.png"> |
60 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide12.png?raw=true">
|
61 |
</p> |
61 |
</p> |
62 |
|
62 |
|
63 |
|
63 |
|
64 |
<!-- Slide 13 --> |
64 |
<!-- Slide 13 -->
|
65 |
<p align="center"> |
65 |
<p align="center">
|
66 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide13.png"> |
66 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide13.png?raw=true">
|
67 |
</p> |
67 |
</p> |
68 |
|
68 |
|
69 |
#### 4. Data Preprocessing |
69 |
#### 4. Data Preprocessing |
70 |
|
70 |
|
71 |
- **Feature Engineering:** Enhanced performance based on EDA insights. |
71 |
- **Feature Engineering:** Enhanced performance based on EDA insights.
|
72 |
- **Normalization and Scaling:** Ensured optimal feature scaling. |
72 |
- **Normalization and Scaling:** Ensured optimal feature scaling.
|
73 |
- **Missing Data Handling:** Applied appropriate strategies for missing data. |
73 |
- **Missing Data Handling:** Applied appropriate strategies for missing data. |
74 |
|
74 |
|
75 |
#### Step 9: EDA [Owner to Update Step] |
75 |
#### Step 9: EDA [Owner to Update Step]
|
76 |
<!-- Slide 14 --> |
76 |
<!-- Slide 14 -->
|
77 |
<p align="center"> |
77 |
<p align="center">
|
78 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide14.png"> |
78 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide14.png?raw=true">
|
79 |
</p> |
79 |
</p> |
80 |
|
80 |
|
81 |
|
81 |
|
82 |
|
82 |
|
83 |
#### 5. Dependency Management |
83 |
#### 5. Dependency Management |
84 |
|
84 |
|
85 |
- **Poetry Integration:** Managed dependencies for reproducibility. |
85 |
- **Poetry Integration:** Managed dependencies for reproducibility. |
86 |
|
86 |
|
87 |
|
87 |
|
88 |
<!-- Slide 15 --> |
88 |
<!-- Slide 15 -->
|
89 |
<p align="center"> |
89 |
<p align="center">
|
90 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide15.png"> |
90 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide15.png?raw=true">
|
91 |
</p> |
91 |
</p> |
92 |
|
92 |
|
93 |
|
93 |
|
94 |
|
94 |
|
95 |
#### 6. Model Development and Optimization |
95 |
#### 6. Model Development and Optimization |
96 |
|
96 |
|
97 |
- **State-of-the-Art Models:** |
97 |
- **State-of-the-Art Models:**
|
98 |
- Custom models like XGBoost, LightGBM, CatBoost. |
98 |
- Custom models like XGBoost, LightGBM, CatBoost.
|
99 |
- **Hyperparameter Tuning:** Used Optuna for optimization. |
99 |
- **Hyperparameter Tuning:** Used Optuna for optimization. |
100 |
|
100 |
|
101 |
- **AutoML Exploration:** |
101 |
- **AutoML Exploration:**
|
102 |
- Explored Pycaret, AutoGluon, H2O for benchmarking. |
102 |
- Explored Pycaret, AutoGluon, H2O for benchmarking.
|
103 |
- **Advanced Techniques:** Stacked models, Isolation Forest, custom loss functions. |
103 |
- **Advanced Techniques:** Stacked models, Isolation Forest, custom loss functions. |
104 |
|
104 |
|
105 |
#### 7. Experiment Tracking and Management |
105 |
#### 7. Experiment Tracking and Management |
106 |
|
106 |
|
107 |
- **MLflow & Azure MLFlow Integration:** |
107 |
- **MLflow & Azure MLFlow Integration:**
|
108 |
- Tracked global and local metrics, target distribution. |
108 |
- Tracked global and local metrics, target distribution.
|
109 |
- **SHAP Analysis:** Utilized SHAP values for explainability and error analysis. |
109 |
- **SHAP Analysis:** Utilized SHAP values for explainability and error analysis. |
110 |
|
110 |
|
111 |
|
111 |
|
112 |
<!-- Slide 16 --> |
112 |
<!-- Slide 16 -->
|
113 |
<p align="center"> |
113 |
<p align="center">
|
114 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide16.png"> |
114 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide16.png?raw=true">
|
115 |
</p> |
115 |
</p> |
116 |
|
116 |
|
117 |
|
117 |
|
118 |
|
118 |
|
119 |
<!-- Slide 17 --> |
119 |
<!-- Slide 17 -->
|
120 |
<p align="center"> |
120 |
<p align="center">
|
121 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide17.png"> |
121 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide17.png?raw=true">
|
122 |
</p> |
122 |
</p> |
123 |
|
123 |
|
124 |
|
124 |
|
125 |
|
125 |
|
126 |
<!-- Slide 18 --> |
126 |
<!-- Slide 18 -->
|
127 |
<p align="center"> |
127 |
<p align="center">
|
128 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide18.png"> |
128 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide18.png?raw=true">
|
129 |
</p> |
129 |
</p> |
130 |
|
130 |
|
131 |
|
131 |
|
132 |
|
132 |
|
133 |
<!-- Slide 19 --> |
133 |
<!-- Slide 19 -->
|
134 |
<p align="center"> |
134 |
<p align="center">
|
135 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide19.png"> |
135 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide19.png?raw=true">
|
136 |
</p> |
136 |
</p> |
137 |
|
137 |
|
138 |
|
138 |
|
139 |
|
139 |
|
140 |
<!-- Slide 20 --> |
140 |
<!-- Slide 20 -->
|
141 |
<p align="center"> |
141 |
<p align="center">
|
142 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide20.png"> |
142 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide20.png?raw=true">
|
143 |
</p> |
143 |
</p> |
144 |
|
144 |
|
145 |
|
145 |
|
146 |
|
146 |
|
147 |
<!-- Slide 21 --> |
147 |
<!-- Slide 21 -->
|
148 |
<p align="center"> |
148 |
<p align="center">
|
149 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide21.png"> |
149 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide21.png?raw=true">
|
150 |
</p> |
150 |
</p> |
151 |
|
151 |
|
152 |
|
152 |
|
153 |
#### 8. Deployment Strategies |
153 |
#### 8. Deployment Strategies |
154 |
|
154 |
|
155 |
- **Containerization:** Used FastAPI and Docker. |
155 |
- **Containerization:** Used FastAPI and Docker.
|
156 |
- **Azure Deployment:** Azure Container Instances, planned Kubernetes. |
156 |
- **Azure Deployment:** Azure Container Instances, planned Kubernetes. |
157 |
|
157 |
|
158 |
- **Conversion to Azure Scripts:** |
158 |
- **Conversion to Azure Scripts:**
|
159 |
- Converted Jupyter notebooks to Python scripts for Azure jobs. |
159 |
- Converted Jupyter notebooks to Python scripts for Azure jobs.
|
160 |
- **Azure Pipelines:** CI/CD with GitHub Actions and Azure Container Registry. |
160 |
- **Azure Pipelines:** CI/CD with GitHub Actions and Azure Container Registry. |
161 |
|
161 |
|
162 |
#### 9. User Interface and Interaction |
162 |
#### 9. User Interface and Interaction |
163 |
|
163 |
|
164 |
- **Streamlit Application:** User-friendly interface integrated with APIs. |
164 |
- **Streamlit Application:** User-friendly interface integrated with APIs. |
165 |
|
165 |
|
166 |
#### 10. Model Monitoring and Drift Management |
166 |
#### 10. Model Monitoring and Drift Management |
167 |
|
167 |
|
168 |
- **Monitoring Strategy:** Drift detection, automated endpoint management. |
168 |
- **Monitoring Strategy:** Drift detection, automated endpoint management. |
169 |
|
169 |
|
170 |
#### 11. Azure ML Designer Integration |
170 |
#### 11. Azure ML Designer Integration |
171 |
|
171 |
|
172 |
- **UI-Based Experiments:** Used Azure ML Designer for experiments additionally for learning purposes using SDK v2, and UI. |
172 |
- **UI-Based Experiments:** Used Azure ML Designer for experiments additionally for learning purposes using SDK v2, and UI. |
173 |
|
173 |
|
174 |
|
174 |
|
175 |
<!-- Slide 22 --> |
175 |
<!-- Slide 22 -->
|
176 |
<p align="center"> |
176 |
<p align="center">
|
177 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide22.png"> |
177 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide22.png?raw=true">
|
178 |
</p> |
178 |
</p> |
179 |
|
179 |
|
180 |
|
180 |
|
181 |
|
181 |
|
182 |
<!-- Slide 23 --> |
182 |
<!-- Slide 23 -->
|
183 |
<p align="center"> |
183 |
<p align="center">
|
184 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide23.png"> |
184 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide23.png?raw=true">
|
185 |
</p> |
185 |
</p> |
186 |
|
186 |
|
187 |
|
187 |
|
188 |
|
188 |
|
189 |
<!-- Slide 24 --> |
189 |
<!-- Slide 24 -->
|
190 |
<p align="center"> |
190 |
<p align="center">
|
191 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide24.png"> |
191 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide24.png?raw=true">
|
192 |
</p> |
192 |
</p> |
193 |
|
193 |
|
194 |
|
194 |
|
195 |
|
195 |
|
196 |
<!-- Slide 25 --> |
196 |
<!-- Slide 25 -->
|
197 |
<p align="center"> |
197 |
<p align="center">
|
198 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide25.png"> |
198 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide25.png?raw=true">
|
199 |
</p> |
199 |
</p> |
200 |
|
200 |
|
201 |
|
201 |
|
202 |
|
202 |
|
203 |
<!-- Slide 26 --> |
203 |
<!-- Slide 26 -->
|
204 |
<p align="center"> |
204 |
<p align="center">
|
205 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide26.png"> |
205 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide26.png?raw=true">
|
206 |
</p> |
206 |
</p> |
207 |
|
207 |
|
208 |
|
208 |
|
209 |
|
209 |
|
210 |
<!-- Slide 27 --> |
210 |
<!-- Slide 27 -->
|
211 |
<p align="center"> |
211 |
<p align="center">
|
212 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide27.png"> |
212 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide27.png?raw=true">
|
213 |
</p> |
213 |
</p> |
214 |
|
214 |
|
215 |
|
215 |
|
216 |
|
216 |
|
217 |
<!-- Slide 28 --> |
217 |
<!-- Slide 28 -->
|
218 |
<p align="center"> |
218 |
<p align="center">
|
219 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide28.png"> |
219 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide28.png?raw=true">
|
220 |
</p> |
220 |
</p> |
221 |
|
221 |
|
222 |
|
222 |
|
223 |
|
223 |
|
224 |
<!-- Slide 29 --> |
224 |
<!-- Slide 29 -->
|
225 |
<p align="center"> |
225 |
<p align="center">
|
226 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide29.png"> |
226 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide29.png?raw=true">
|
227 |
</p> |
227 |
</p> |
228 |
|
228 |
|
229 |
|
229 |
|
230 |
|
230 |
|
231 |
<!-- Slide 30 --> |
231 |
<!-- Slide 30 -->
|
232 |
<p align="center"> |
232 |
<p align="center">
|
233 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide30.png"> |
233 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide30.png?raw=true">
|
234 |
</p> |
234 |
</p> |
235 |
|
235 |
|
236 |
|
236 |
|
237 |
|
237 |
|
238 |
<!-- Slide 31 --> |
238 |
<!-- Slide 31 -->
|
239 |
<p align="center"> |
239 |
<p align="center">
|
240 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide31.png"> |
240 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide31.png?raw=true">
|
241 |
</p> |
241 |
</p> |
242 |
|
242 |
|
243 |
|
243 |
|
244 |
|
244 |
|
245 |
<!-- Slide 32 --> |
245 |
<!-- Slide 32 -->
|
246 |
<p align="center"> |
246 |
<p align="center">
|
247 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide32.png"> |
247 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide32.png?raw=true">
|
248 |
</p> |
248 |
</p> |
249 |
|
249 |
|
250 |
|
250 |
|
251 |
|
251 |
|
252 |
<!-- Slide 33 --> |
252 |
<!-- Slide 33 -->
|
253 |
<p align="center"> |
253 |
<p align="center">
|
254 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide33.png"> |
254 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide33.png?raw=true">
|
255 |
</p> |
255 |
</p> |
256 |
|
256 |
|
257 |
|
257 |
|
258 |
|
258 |
|
259 |
<!-- Slide 34 --> |
259 |
<!-- Slide 34 -->
|
260 |
<p align="center"> |
260 |
<p align="center">
|
261 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide34.png"> |
261 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide34.png?raw=true">
|
262 |
</p> |
262 |
</p> |
263 |
|
263 |
|
264 |
|
264 |
|
265 |
|
265 |
|
266 |
<!-- Slide 35 --> |
266 |
<!-- Slide 35 -->
|
267 |
<p align="center"> |
267 |
<p align="center">
|
268 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide35.png"> |
268 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide35.png?raw=true">
|
269 |
</p> |
269 |
</p> |
270 |
|
270 |
|
271 |
|
271 |
|
272 |
|
272 |
|
273 |
#### 12. Additional Expert Considerations |
273 |
#### 12. Additional Expert Considerations |
274 |
|
274 |
|
275 |
- **Cross-Validation:** Ensured model generalizability. |
275 |
- **Cross-Validation:** Ensured model generalizability.
|
276 |
- **Model Governance:** Versioning, lineage tracking, compliance. |
276 |
- **Model Governance:** Versioning, lineage tracking, compliance.
|
277 |
- **Scalability and Optimization:** Performance tests, scalability checks. |
277 |
- **Scalability and Optimization:** Performance tests, scalability checks.
|
278 |
- **Feedback Loop:** Integrated feedback for continuous improvement. |
278 |
- **Feedback Loop:** Integrated feedback for continuous improvement. |
279 |
|
279 |
|
280 |
|
280 |
|
281 |
|
281 |
|
282 |
#### 13. Branches: |
282 |
#### 13. Branches:
|
283 |
1. Main: For Final Product [Owner - Team] |
283 |
1. Main: For Final Product [Owner - Team]
|
284 |
2. Experiments: For ML Experiments and tracking [Owners - Arham, Krishan] |
284 |
2. Experiments: For ML Experiments and tracking [Owners - Arham, Krishan]
|
285 |
3. ArchDevelopment: For CICD [Owner - Nandani] |
285 |
3. ArchDevelopment: For CICD [Owner - Nandani]
|
286 |
4. Streamlit: For front end [Owner - Nandani] |
286 |
4. Streamlit: For front end [Owner - Nandani]
|
287 |
5. Data Engineering: For Kafka Streaming [Owner- Yash] |
287 |
5. Data Engineering: For Kafka Streaming [Owner- Yash]
|
288 |
6. Backup: For Backup [Owner - Aasna, Mahrukh] |
288 |
6. Backup: For Backup [Owner - Aasna, Mahrukh] |
289 |
|
289 |
|
290 |
|
290 |
|
291 |
### Technologies Used |
291 |
### Technologies Used |
292 |
|
292 |
|
293 |
- **Data Analysis/Model Training:** Python, Jupyter Notebooks |
293 |
- **Data Analysis/Model Training:** Python, Jupyter Notebooks
|
294 |
- **Experiment Tracking:** MLFlow |
294 |
- **Experiment Tracking:** MLFlow
|
295 |
- **Model Building:** PyCaret, LightGBM, XGBoost, CatBoost |
295 |
- **Model Building:** PyCaret, LightGBM, XGBoost, CatBoost
|
296 |
- **Hyperparameter Optimization:** Optuna |
296 |
- **Hyperparameter Optimization:** Optuna
|
297 |
- **Containerization:** Docker |
297 |
- **Containerization:** Docker
|
298 |
- **Realtime Data Streaming:** Kafka |
298 |
- **Realtime Data Streaming:** Kafka
|
299 |
- **Version Control and CI/CD:** Git, GitHub Actions |
299 |
- **Version Control and CI/CD:** Git, GitHub Actions
|
300 |
- **Cloud Deployment:** Azure Machine Learning, Azure Blob Storage |
300 |
- **Cloud Deployment:** Azure Machine Learning, Azure Blob Storage
|
301 |
- **User Interface:** Streamlit |
301 |
- **User Interface:** Streamlit
|
302 |
- **Dependency and Environment Management:** Poetry |
302 |
- **Dependency and Environment Management:** Poetry |
303 |
|
303 |
|
304 |
## How to Run the Code |
304 |
## How to Run the Code |
305 |
|
305 |
|
306 |
### Prerequisites |
306 |
### Prerequisites |
307 |
|
307 |
|
308 |
- **Python 3.8+** |
308 |
- **Python 3.8+**
|
309 |
- **Poetry** |
309 |
- **Poetry**
|
310 |
- **Docker** |
310 |
- **Docker**
|
311 |
- **Azure Account** |
311 |
- **Azure Account**
|
312 |
- **Kafka** |
312 |
- **Kafka** |
313 |
|
313 |
|
314 |
### Setup |
314 |
### Setup |
315 |
|
315 |
|
316 |
1. **Clone the Repository** |
316 |
1. **Clone the Repository** |
317 |
|
317 |
|
318 |
```bash |
318 |
```bash
|
319 |
git clone https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk.git |
319 |
git clone https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk.git
|
320 |
cd Multi-Class-Prediction-of-Obesity-Risk |
320 |
cd Multi-Class-Prediction-of-Obesity-Risk
|
321 |
``` |
321 |
``` |
322 |
|
322 |
|
323 |
2. **Install Dependencies** |
323 |
2. **Install Dependencies** |
324 |
|
324 |
|
325 |
```bash |
325 |
```bash
|
326 |
poetry install |
326 |
poetry install
|
327 |
``` |
327 |
``` |
328 |
|
328 |
|
329 |
3. **Set Up Environment Variables** |
329 |
3. **Set Up Environment Variables** |
330 |
|
330 |
|
331 |
Create a `.env` file in the root directory and add the necessary environment variables. Example: |
331 |
Create a `.env` file in the root directory and add the necessary environment variables. Example: |
332 |
|
332 |
|
333 |
```env |
333 |
```env
|
334 |
AZURE_SUBSCRIPTION_ID=your_subscription_id |
334 |
AZURE_SUBSCRIPTION_ID=your_subscription_id
|
335 |
AZURE_RESOURCE_GROUP=your_resource_group |
335 |
AZURE_RESOURCE_GROUP=your_resource_group
|
336 |
AZURE_WORKSPACE_NAME=your_workspace_name |
336 |
AZURE_WORKSPACE_NAME=your_workspace_name
|
337 |
``` |
337 |
``` |
338 |
|
338 |
|
339 |
4. **Start Docker** |
339 |
4. **Start Docker** |
340 |
|
340 |
|
341 |
Ensure Docker is running on your machine. Build and run the Docker containers: |
341 |
Ensure Docker is running on your machine. Build and run the Docker containers: |
342 |
|
342 |
|
343 |
```bash |
343 |
```bash
|
344 |
docker-compose up --build |
344 |
docker-compose up --build
|
345 |
``` |
345 |
``` |
346 |
|
346 |
|
347 |
5. **Run Streamlit Application** |
347 |
5. **Run Streamlit Application** |
348 |
|
348 |
|
349 |
```bash |
349 |
```bash
|
350 |
streamlit run Streamlit/app.py |
350 |
streamlit run Streamlit/app.py
|
351 |
``` |
351 |
``` |
352 |
|
352 |
|
353 |
6. **Run Jupyter Notebooks** |
353 |
6. **Run Jupyter Notebooks** |
354 |
|
354 |
|
355 |
Start Jupyter Lab to run and explore notebooks: |
355 |
Start Jupyter Lab to run and explore notebooks: |
356 |
|
356 |
|
357 |
```bash |
357 |
```bash
|
358 |
poetry run jupyter lab |
358 |
poetry run jupyter lab
|
359 |
``` |
359 |
``` |
360 |
|
360 |
|
361 |
### Deployment |
361 |
### Deployment |
362 |
|
362 |
|
363 |
1. **Azure ML Deployment** |
363 |
1. **Azure ML Deployment** |
364 |
|
364 |
|
365 |
- Configure your Azure workspace by setting up the necessary resources. |
365 |
- Configure your Azure workspace by setting up the necessary resources.
|
366 |
- Use the provided Azure scripts to deploy models and services. |
366 |
- Use the provided Azure scripts to deploy models and services. |
367 |
|
367 |
|
368 |
```bash |
368 |
```bash
|
369 |
poetry run python deploy/deploy_to_azure.py |
369 |
poetry run python deploy/deploy_to_azure.py
|
370 |
``` |
370 |
``` |
371 |
|
371 |
|
372 |
2. **CI/CD Setup** |
372 |
2. **CI/CD Setup** |
373 |
|
373 |
|
374 |
- Ensure GitHub Actions are configured correctly. |
374 |
- Ensure GitHub Actions are configured correctly.
|
375 |
- Push changes to the repository to trigger CI/CD pipelines. |
375 |
- Push changes to the repository to trigger CI/CD pipelines. |
376 |
|
376 |
|
377 |
```bash |
377 |
```bash
|
378 |
git add . |
378 |
git add .
|
379 |
git commit -m "Your commit message" |
379 |
git commit -m "Your commit message"
|
380 |
git push origin main |
380 |
git push origin main
|
381 |
``` |
381 |
``` |
382 |
|
382 |
|
383 |
### Monitoring and Maintenance |
383 |
### Monitoring and Maintenance |
384 |
|
384 |
|
385 |
- **Model Monitoring:** Utilize integrated monitoring tools to track model performance and detect drift. |
385 |
- **Model Monitoring:** Utilize integrated monitoring tools to track model performance and detect drift.
|
386 |
- **Endpoint Management:** Automated endpoint management to ensure availability and performance. |
386 |
- **Endpoint Management:** Automated endpoint management to ensure availability and performance. |
387 |
|
387 |
|
388 |
|
388 |
|
389 |
|
389 |
|
390 |
### Business Case |
390 |
### Business Case |
391 |
|
391 |
|
392 |
Our solution targets healthcare providers for early identification of at-risk patients, public health officials for data-driven policy making, and insurance companies for premium adjustment based on individual risk. The economic impact includes significant healthcare cost savings and revenue generation from tailored wellness programs. |
392 |
Our solution targets healthcare providers for early identification of at-risk patients, public health officials for data-driven policy making, and insurance companies for premium adjustment based on individual risk. The economic impact includes significant healthcare cost savings and revenue generation from tailored wellness programs. |
393 |
|
393 |
|
394 |
### Acknowledgements |
394 |
### Acknowledgements |
395 |
|
395 |
|
396 |
This project is an effort by the team to tackle the global health crisis of obesity by employing advanced data science and machine learning techniques, aiming to make a significant impact in the healthcare sector. |
396 |
This project is an effort by the team to tackle the global health crisis of obesity by employing advanced data science and machine learning techniques, aiming to make a significant impact in the healthcare sector. |
397 |
|
397 |
|
398 |
|
398 |
|
399 |
### Meet the Team |
399 |
### Meet the Team
|
400 |
1. Product Manager - Aasna |
400 |
1. Product Manager - Aasna
|
401 |
2. Machine Learning Engineer - Arham |
401 |
2. Machine Learning Engineer - Arham
|
402 |
3. ML Ops - Krishan |
402 |
3. ML Ops - Krishan
|
403 |
4. Data Engineer - Yash |
403 |
4. Data Engineer - Yash
|
404 |
5. Cloud SME - Nandani |
404 |
5. Cloud SME - Nandani
|
405 |
6. Business Analyst - Mahrukh |
405 |
6. Business Analyst - Mahrukh |
406 |
|
406 |
|
407 |
|
407 |
|
408 |
---- |
408 |
---- |
409 |
|
409 |
|
410 |
<!-- Slide 36 --> |
410 |
<!-- Slide 36 -->
|
411 |
<p align="center"> |
411 |
<p align="center">
|
412 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide36.png"> |
412 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide36.png?raw=true">
|
413 |
</p> |
413 |
</p> |
414 |
|
414 |
|
415 |
|
415 |
|
416 |
|
416 |
|
417 |
<!-- Slide 37 --> |
417 |
<!-- Slide 37 -->
|
418 |
<p align="center"> |
418 |
<p align="center">
|
419 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide37.png"> |
419 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide37.png?raw=true">
|
420 |
</p> |
420 |
</p> |
421 |
|
421 |
|
422 |
|
422 |
|
423 |
|
423 |
|
424 |
<!-- Slide 38 --> |
424 |
<!-- Slide 38 -->
|
425 |
<p align="center"> |
425 |
<p align="center">
|
426 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide38.png"> |
426 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide38.png?raw=true">
|
427 |
</p> |
427 |
</p> |
428 |
|
428 |
|
429 |
|
429 |
|
430 |
|
430 |
|
431 |
|
431 |
|
432 |
<!-- Slide 2 --> |
432 |
<!-- Slide 2 -->
|
433 |
<p align="center"> |
433 |
<p align="center">
|
434 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide2.png"> |
434 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide2.png?raw=true">
|
435 |
</p> |
435 |
</p> |
436 |
|
436 |
|
437 |
|
437 |
|
438 |
|
438 |
|
439 |
<!-- Slide 3 --> |
439 |
<!-- Slide 3 -->
|
440 |
<p align="center"> |
440 |
<p align="center">
|
441 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide3.png"> |
441 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide3.png?raw=true">
|
442 |
</p> |
442 |
</p> |
443 |
|
443 |
|
444 |
|
444 |
|
445 |
|
445 |
|
446 |
<!-- Slide 4 --> |
446 |
<!-- Slide 4 -->
|
447 |
<p align="center"> |
447 |
<p align="center">
|
448 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide4.png"> |
448 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide4.png?raw=true">
|
449 |
</p> |
449 |
</p> |
450 |
|
450 |
|
451 |
|
451 |
|
452 |
|
452 |
|
453 |
<!-- Slide 5 --> |
453 |
<!-- Slide 5 -->
|
454 |
<p align="center"> |
454 |
<p align="center">
|
455 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide5.png"> |
455 |
<img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide5.png?raw=true">
|
456 |
</p> |
456 |
</p>
|