Switch to unified view

a/README.md b/README.md
1
# Multi-Class-Prediction-of-Obesity-Risk
1
# Multi-Class-Prediction-of-Obesity-Risk
2
2
3
#### This project is an extension of improving the models, productionizing the project with best practices previously developed for Kaggle Competition "Multi Class Prediction of Obesity Risk"where we placed within the top 5%. The project aims at redoing the project with added production using best practices learned from class MGSC-695-076. For the sake of security, no access keys were shared. 
3
#### This project is an extension of improving the models, productionizing the project with best practices previously developed for Kaggle Competition "Multi Class Prediction of Obesity Risk"where we placed within the top 5%. The project aims at redoing the project with added production using best practices learned from class MGSC-695-076. For the sake of security, no access keys were shared. 
4
4
5
Tech Stack: Apache Kafka, MLflow, Azure ML, VS Code, Poetry, AutoGluon, H2O, PyCaret, FLAML, PandasAI, Docker, Streamlit, Postman, FastAPI, SHAP
5
Tech Stack: Apache Kafka, MLflow, Azure ML, VS Code, Poetry, AutoGluon, H2O, PyCaret, FLAML, PandasAI, Docker, Streamlit, Postman, FastAPI, SHAP
6
6
7
## Project Overview
7
## Project Overview
8
8
9
#### 1. Data Preparation and Simulation
9
#### 1. Data Preparation and Simulation
10
10
11
- **Data Source:** Original Kaggle CSV data split into Model Development and Hold-Off datasets.
11
- **Data Source:** Original Kaggle CSV data split into Model Development and Hold-Off datasets.
12
- **Live Data Simulation:** Used Apache Kafka for simulating real-time data feeds.
12
- **Live Data Simulation:** Used Apache Kafka for simulating real-time data feeds.
13
13
14
14
15
15
16
<!-- Slide 6 -->
16
<!-- Slide 6 -->
17
<p align="center">
17
<p align="center">
18
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide6.png">
18
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide6.png?raw=true">
19
</p>
19
</p>
20
20
21
<!-- Slide 7 -->
21
<!-- Slide 7 -->
22
<p align="center">
22
<p align="center">
23
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide7.png">
23
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide7.png?raw=true">
24
</p>
24
</p>
25
25
26
<!-- Slide 8 -->
26
<!-- Slide 8 -->
27
<p align="center">
27
<p align="center">
28
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide8.png">
28
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide8.png?raw=true">
29
</p>
29
</p>
30
30
31
#### 2. Azure Machine Learning Setup
31
#### 2. Azure Machine Learning Setup
32
32
33
- **Workspace Configuration:** Established Azure ML Workspace with RBAC.
33
- **Workspace Configuration:** Established Azure ML Workspace with RBAC.
34
- **Team Roles:** Assigned roles for Data Science, Data Engineering, ML Engineering, and Governance.
34
- **Team Roles:** Assigned roles for Data Science, Data Engineering, ML Engineering, and Governance.
35
35
36
#### 3. Exploratory Data Analysis (EDA)
36
#### 3. Exploratory Data Analysis (EDA)
37
37
38
- **Comprehensive Analysis:**
38
- **Comprehensive Analysis:**
39
  - **Univariate Analysis:** Leveraged PandasAI for detailed insights.
39
  - **Univariate Analysis:** Leveraged PandasAI for detailed insights.
40
  - **Bivariate Analysis:** Used pairplots and interaction plots.
40
  - **Bivariate Analysis:** Used pairplots and interaction plots.
41
  - **Dimensionality Reduction:** Applied PCA with KMediansClustering.
41
  - **Dimensionality Reduction:** Applied PCA with KMediansClustering.
42
42
43
<!-- Slide 9 -->
43
<!-- Slide 9 -->
44
<p align="center">
44
<p align="center">
45
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide9.png">
45
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide9.png?raw=true">
46
</p>
46
</p>
47
47
48
<!-- Slide 10 -->
48
<!-- Slide 10 -->
49
<p align="center">
49
<p align="center">
50
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide10.png">
50
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide10.png?raw=true">
51
</p>
51
</p>
52
52
53
<!-- Slide 11 -->
53
<!-- Slide 11 -->
54
<p align="center">
54
<p align="center">
55
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide11.png">
55
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide11.png?raw=true">
56
</p>
56
</p>
57
57
58
<!-- Slide 12 -->
58
<!-- Slide 12 -->
59
<p align="center">
59
<p align="center">
60
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide12.png">
60
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide12.png?raw=true">
61
</p>
61
</p>
62
62
63
63
64
<!-- Slide 13 -->
64
<!-- Slide 13 -->
65
<p align="center">
65
<p align="center">
66
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide13.png">
66
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide13.png?raw=true">
67
</p>
67
</p>
68
68
69
#### 4. Data Preprocessing
69
#### 4. Data Preprocessing
70
70
71
- **Feature Engineering:** Enhanced performance based on EDA insights.
71
- **Feature Engineering:** Enhanced performance based on EDA insights.
72
- **Normalization and Scaling:** Ensured optimal feature scaling.
72
- **Normalization and Scaling:** Ensured optimal feature scaling.
73
- **Missing Data Handling:** Applied appropriate strategies for missing data.
73
- **Missing Data Handling:** Applied appropriate strategies for missing data.
74
74
75
#### Step 9: EDA [Owner to Update Step]
75
#### Step 9: EDA [Owner to Update Step]
76
<!-- Slide 14 -->
76
<!-- Slide 14 -->
77
<p align="center">
77
<p align="center">
78
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide14.png">
78
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide14.png?raw=true">
79
</p>
79
</p>
80
80
81
81
82
82
83
#### 5. Dependency Management
83
#### 5. Dependency Management
84
84
85
- **Poetry Integration:** Managed dependencies for reproducibility.
85
- **Poetry Integration:** Managed dependencies for reproducibility.
86
86
87
87
88
<!-- Slide 15 -->
88
<!-- Slide 15 -->
89
<p align="center">
89
<p align="center">
90
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide15.png">
90
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide15.png?raw=true">
91
</p>
91
</p>
92
92
93
93
94
94
95
#### 6. Model Development and Optimization
95
#### 6. Model Development and Optimization
96
96
97
- **State-of-the-Art Models:**
97
- **State-of-the-Art Models:**
98
  - Custom models like XGBoost, LightGBM, CatBoost.
98
  - Custom models like XGBoost, LightGBM, CatBoost.
99
  - **Hyperparameter Tuning:** Used Optuna for optimization.
99
  - **Hyperparameter Tuning:** Used Optuna for optimization.
100
100
101
- **AutoML Exploration:**
101
- **AutoML Exploration:**
102
  - Explored Pycaret, AutoGluon, H2O for benchmarking.
102
  - Explored Pycaret, AutoGluon, H2O for benchmarking.
103
  - **Advanced Techniques:** Stacked models, Isolation Forest, custom loss functions.
103
  - **Advanced Techniques:** Stacked models, Isolation Forest, custom loss functions.
104
104
105
#### 7. Experiment Tracking and Management
105
#### 7. Experiment Tracking and Management
106
106
107
- **MLflow & Azure MLFlow Integration:**
107
- **MLflow & Azure MLFlow Integration:**
108
  - Tracked global and local metrics, target distribution.
108
  - Tracked global and local metrics, target distribution.
109
  - **SHAP Analysis:** Utilized SHAP values for explainability and error analysis.
109
  - **SHAP Analysis:** Utilized SHAP values for explainability and error analysis.
110
110
111
111
112
<!-- Slide 16 -->
112
<!-- Slide 16 -->
113
<p align="center">
113
<p align="center">
114
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide16.png">
114
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide16.png?raw=true">
115
</p>
115
</p>
116
116
117
117
118
118
119
<!-- Slide 17 -->
119
<!-- Slide 17 -->
120
<p align="center">
120
<p align="center">
121
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide17.png">
121
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide17.png?raw=true">
122
</p>
122
</p>
123
123
124
124
125
125
126
<!-- Slide 18 -->
126
<!-- Slide 18 -->
127
<p align="center">
127
<p align="center">
128
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide18.png">
128
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide18.png?raw=true">
129
</p>
129
</p>
130
130
131
131
132
132
133
<!-- Slide 19 -->
133
<!-- Slide 19 -->
134
<p align="center">
134
<p align="center">
135
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide19.png">
135
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide19.png?raw=true">
136
</p>
136
</p>
137
137
138
138
139
139
140
<!-- Slide 20 -->
140
<!-- Slide 20 -->
141
<p align="center">
141
<p align="center">
142
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide20.png">
142
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide20.png?raw=true">
143
</p>
143
</p>
144
144
145
145
146
146
147
<!-- Slide 21 -->
147
<!-- Slide 21 -->
148
<p align="center">
148
<p align="center">
149
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide21.png">
149
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide21.png?raw=true">
150
</p>
150
</p>
151
151
152
152
153
#### 8. Deployment Strategies
153
#### 8. Deployment Strategies
154
154
155
- **Containerization:** Used FastAPI and Docker.
155
- **Containerization:** Used FastAPI and Docker.
156
- **Azure Deployment:** Azure Container Instances, planned Kubernetes.
156
- **Azure Deployment:** Azure Container Instances, planned Kubernetes.
157
157
158
- **Conversion to Azure Scripts:**
158
- **Conversion to Azure Scripts:**
159
  - Converted Jupyter notebooks to Python scripts for Azure jobs.
159
  - Converted Jupyter notebooks to Python scripts for Azure jobs.
160
  - **Azure Pipelines:** CI/CD with GitHub Actions and Azure Container Registry.
160
  - **Azure Pipelines:** CI/CD with GitHub Actions and Azure Container Registry.
161
161
162
#### 9. User Interface and Interaction
162
#### 9. User Interface and Interaction
163
163
164
- **Streamlit Application:** User-friendly interface integrated with APIs.
164
- **Streamlit Application:** User-friendly interface integrated with APIs.
165
165
166
#### 10. Model Monitoring and Drift Management
166
#### 10. Model Monitoring and Drift Management
167
167
168
- **Monitoring Strategy:** Drift detection, automated endpoint management.
168
- **Monitoring Strategy:** Drift detection, automated endpoint management.
169
169
170
#### 11. Azure ML Designer Integration
170
#### 11. Azure ML Designer Integration
171
171
172
- **UI-Based Experiments:** Used Azure ML Designer for experiments additionally for learning purposes using SDK v2, and UI.
172
- **UI-Based Experiments:** Used Azure ML Designer for experiments additionally for learning purposes using SDK v2, and UI.
173
173
174
174
175
<!-- Slide 22 -->
175
<!-- Slide 22 -->
176
<p align="center">
176
<p align="center">
177
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide22.png">
177
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide22.png?raw=true">
178
</p>
178
</p>
179
179
180
180
181
181
182
<!-- Slide 23 -->
182
<!-- Slide 23 -->
183
<p align="center">
183
<p align="center">
184
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide23.png">
184
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide23.png?raw=true">
185
</p>
185
</p>
186
186
187
187
188
188
189
<!-- Slide 24 -->
189
<!-- Slide 24 -->
190
<p align="center">
190
<p align="center">
191
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide24.png">
191
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide24.png?raw=true">
192
</p>
192
</p>
193
193
194
194
195
195
196
<!-- Slide 25 -->
196
<!-- Slide 25 -->
197
<p align="center">
197
<p align="center">
198
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide25.png">
198
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide25.png?raw=true">
199
</p>
199
</p>
200
200
201
201
202
202
203
<!-- Slide 26 -->
203
<!-- Slide 26 -->
204
<p align="center">
204
<p align="center">
205
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide26.png">
205
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide26.png?raw=true">
206
</p>
206
</p>
207
207
208
208
209
209
210
<!-- Slide 27 -->
210
<!-- Slide 27 -->
211
<p align="center">
211
<p align="center">
212
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide27.png">
212
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide27.png?raw=true">
213
</p>
213
</p>
214
214
215
215
216
216
217
<!-- Slide 28 -->
217
<!-- Slide 28 -->
218
<p align="center">
218
<p align="center">
219
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide28.png">
219
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide28.png?raw=true">
220
</p>
220
</p>
221
221
222
222
223
223
224
<!-- Slide 29 -->
224
<!-- Slide 29 -->
225
<p align="center">
225
<p align="center">
226
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide29.png">
226
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide29.png?raw=true">
227
</p>
227
</p>
228
228
229
229
230
230
231
<!-- Slide 30 -->
231
<!-- Slide 30 -->
232
<p align="center">
232
<p align="center">
233
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide30.png">
233
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide30.png?raw=true">
234
</p>
234
</p>
235
235
236
236
237
237
238
<!-- Slide 31 -->
238
<!-- Slide 31 -->
239
<p align="center">
239
<p align="center">
240
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide31.png">
240
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide31.png?raw=true">
241
</p>
241
</p>
242
242
243
243
244
244
245
<!-- Slide 32 -->
245
<!-- Slide 32 -->
246
<p align="center">
246
<p align="center">
247
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide32.png">
247
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide32.png?raw=true">
248
</p>
248
</p>
249
249
250
250
251
251
252
<!-- Slide 33 -->
252
<!-- Slide 33 -->
253
<p align="center">
253
<p align="center">
254
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide33.png">
254
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide33.png?raw=true">
255
</p>
255
</p>
256
256
257
257
258
258
259
<!-- Slide 34 -->
259
<!-- Slide 34 -->
260
<p align="center">
260
<p align="center">
261
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide34.png">
261
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide34.png?raw=true">
262
</p>
262
</p>
263
263
264
264
265
265
266
<!-- Slide 35 -->
266
<!-- Slide 35 -->
267
<p align="center">
267
<p align="center">
268
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide35.png">
268
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide35.png?raw=true">
269
</p>
269
</p>
270
270
271
271
272
272
273
#### 12. Additional Expert Considerations
273
#### 12. Additional Expert Considerations
274
274
275
- **Cross-Validation:** Ensured model generalizability.
275
- **Cross-Validation:** Ensured model generalizability.
276
- **Model Governance:** Versioning, lineage tracking, compliance.
276
- **Model Governance:** Versioning, lineage tracking, compliance.
277
- **Scalability and Optimization:** Performance tests, scalability checks.
277
- **Scalability and Optimization:** Performance tests, scalability checks.
278
- **Feedback Loop:** Integrated feedback for continuous improvement.
278
- **Feedback Loop:** Integrated feedback for continuous improvement.
279
279
280
280
281
281
282
#### 13. Branches: 
282
#### 13. Branches: 
283
1. Main: For Final Product [Owner - Team]
283
1. Main: For Final Product [Owner - Team]
284
2. Experiments: For ML Experiments and tracking [Owners - Arham, Krishan]
284
2. Experiments: For ML Experiments and tracking [Owners - Arham, Krishan]
285
3. ArchDevelopment: For CICD  [Owner - Nandani]
285
3. ArchDevelopment: For CICD  [Owner - Nandani]
286
4. Streamlit: For front end [Owner - Nandani]
286
4. Streamlit: For front end [Owner - Nandani]
287
5. Data Engineering: For Kafka Streaming [Owner- Yash]
287
5. Data Engineering: For Kafka Streaming [Owner- Yash]
288
6. Backup: For Backup [Owner - Aasna, Mahrukh]
288
6. Backup: For Backup [Owner - Aasna, Mahrukh]
289
289
290
   
290
   
291
### Technologies Used
291
### Technologies Used
292
292
293
- **Data Analysis/Model Training:** Python, Jupyter Notebooks
293
- **Data Analysis/Model Training:** Python, Jupyter Notebooks
294
- **Experiment Tracking:** MLFlow
294
- **Experiment Tracking:** MLFlow
295
- **Model Building:** PyCaret, LightGBM, XGBoost, CatBoost
295
- **Model Building:** PyCaret, LightGBM, XGBoost, CatBoost
296
- **Hyperparameter Optimization:** Optuna
296
- **Hyperparameter Optimization:** Optuna
297
- **Containerization:** Docker
297
- **Containerization:** Docker
298
- **Realtime Data Streaming:** Kafka
298
- **Realtime Data Streaming:** Kafka
299
- **Version Control and CI/CD:** Git, GitHub Actions
299
- **Version Control and CI/CD:** Git, GitHub Actions
300
- **Cloud Deployment:** Azure Machine Learning, Azure Blob Storage
300
- **Cloud Deployment:** Azure Machine Learning, Azure Blob Storage
301
- **User Interface:** Streamlit
301
- **User Interface:** Streamlit
302
- **Dependency and Environment Management:** Poetry
302
- **Dependency and Environment Management:** Poetry
303
303
304
## How to Run the Code
304
## How to Run the Code
305
305
306
### Prerequisites
306
### Prerequisites
307
307
308
- **Python 3.8+**
308
- **Python 3.8+**
309
- **Poetry**
309
- **Poetry**
310
- **Docker**
310
- **Docker**
311
- **Azure Account**
311
- **Azure Account**
312
- **Kafka**
312
- **Kafka**
313
313
314
### Setup
314
### Setup
315
315
316
1. **Clone the Repository**
316
1. **Clone the Repository**
317
317
318
    ```bash
318
    ```bash
319
    git clone https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk.git
319
    git clone https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk.git
320
    cd Multi-Class-Prediction-of-Obesity-Risk
320
    cd Multi-Class-Prediction-of-Obesity-Risk
321
    ```
321
    ```
322
322
323
2. **Install Dependencies**
323
2. **Install Dependencies**
324
324
325
    ```bash
325
    ```bash
326
    poetry install
326
    poetry install
327
    ```
327
    ```
328
328
329
3. **Set Up Environment Variables**
329
3. **Set Up Environment Variables**
330
330
331
    Create a `.env` file in the root directory and add the necessary environment variables. Example:
331
    Create a `.env` file in the root directory and add the necessary environment variables. Example:
332
332
333
    ```env
333
    ```env
334
    AZURE_SUBSCRIPTION_ID=your_subscription_id
334
    AZURE_SUBSCRIPTION_ID=your_subscription_id
335
    AZURE_RESOURCE_GROUP=your_resource_group
335
    AZURE_RESOURCE_GROUP=your_resource_group
336
    AZURE_WORKSPACE_NAME=your_workspace_name
336
    AZURE_WORKSPACE_NAME=your_workspace_name
337
    ```
337
    ```
338
338
339
4. **Start Docker**
339
4. **Start Docker**
340
340
341
    Ensure Docker is running on your machine. Build and run the Docker containers:
341
    Ensure Docker is running on your machine. Build and run the Docker containers:
342
342
343
    ```bash
343
    ```bash
344
    docker-compose up --build
344
    docker-compose up --build
345
    ```
345
    ```
346
346
347
5. **Run Streamlit Application**
347
5. **Run Streamlit Application**
348
348
349
    ```bash
349
    ```bash
350
    streamlit run Streamlit/app.py
350
    streamlit run Streamlit/app.py
351
    ```
351
    ```
352
352
353
6. **Run Jupyter Notebooks**
353
6. **Run Jupyter Notebooks**
354
354
355
    Start Jupyter Lab to run and explore notebooks:
355
    Start Jupyter Lab to run and explore notebooks:
356
356
357
    ```bash
357
    ```bash
358
    poetry run jupyter lab
358
    poetry run jupyter lab
359
    ```
359
    ```
360
360
361
### Deployment
361
### Deployment
362
362
363
1. **Azure ML Deployment**
363
1. **Azure ML Deployment**
364
364
365
    - Configure your Azure workspace by setting up the necessary resources.
365
    - Configure your Azure workspace by setting up the necessary resources.
366
    - Use the provided Azure scripts to deploy models and services.
366
    - Use the provided Azure scripts to deploy models and services.
367
367
368
    ```bash
368
    ```bash
369
    poetry run python deploy/deploy_to_azure.py
369
    poetry run python deploy/deploy_to_azure.py
370
    ```
370
    ```
371
371
372
2. **CI/CD Setup**
372
2. **CI/CD Setup**
373
373
374
    - Ensure GitHub Actions are configured correctly.
374
    - Ensure GitHub Actions are configured correctly.
375
    - Push changes to the repository to trigger CI/CD pipelines.
375
    - Push changes to the repository to trigger CI/CD pipelines.
376
376
377
    ```bash
377
    ```bash
378
    git add .
378
    git add .
379
    git commit -m "Your commit message"
379
    git commit -m "Your commit message"
380
    git push origin main
380
    git push origin main
381
    ```
381
    ```
382
382
383
### Monitoring and Maintenance
383
### Monitoring and Maintenance
384
384
385
- **Model Monitoring:** Utilize integrated monitoring tools to track model performance and detect drift.
385
- **Model Monitoring:** Utilize integrated monitoring tools to track model performance and detect drift.
386
- **Endpoint Management:** Automated endpoint management to ensure availability and performance.
386
- **Endpoint Management:** Automated endpoint management to ensure availability and performance.
387
387
388
388
389
389
390
### Business Case
390
### Business Case
391
391
392
Our solution targets healthcare providers for early identification of at-risk patients, public health officials for data-driven policy making, and insurance companies for premium adjustment based on individual risk. The economic impact includes significant healthcare cost savings and revenue generation from tailored wellness programs.
392
Our solution targets healthcare providers for early identification of at-risk patients, public health officials for data-driven policy making, and insurance companies for premium adjustment based on individual risk. The economic impact includes significant healthcare cost savings and revenue generation from tailored wellness programs.
393
393
394
### Acknowledgements
394
### Acknowledgements
395
395
396
This project is an effort by the team to tackle the global health crisis of obesity by employing advanced data science and machine learning techniques, aiming to make a significant impact in the healthcare sector.
396
This project is an effort by the team to tackle the global health crisis of obesity by employing advanced data science and machine learning techniques, aiming to make a significant impact in the healthcare sector.
397
397
398
398
399
### Meet the Team 
399
### Meet the Team 
400
1. Product Manager - Aasna
400
1. Product Manager - Aasna
401
2. Machine Learning Engineer - Arham
401
2. Machine Learning Engineer - Arham
402
3. ML Ops - Krishan
402
3. ML Ops - Krishan
403
4. Data Engineer - Yash
403
4. Data Engineer - Yash
404
5. Cloud SME - Nandani
404
5. Cloud SME - Nandani
405
6. Business Analyst - Mahrukh
405
6. Business Analyst - Mahrukh
406
406
407
407
408
----
408
----
409
409
410
<!-- Slide 36 -->
410
<!-- Slide 36 -->
411
<p align="center">
411
<p align="center">
412
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide36.png">
412
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide36.png?raw=true">
413
</p>
413
</p>
414
414
415
415
416
416
417
<!-- Slide 37 -->
417
<!-- Slide 37 -->
418
<p align="center">
418
<p align="center">
419
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide37.png">
419
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide37.png?raw=true">
420
</p>
420
</p>
421
421
422
422
423
423
424
<!-- Slide 38 -->
424
<!-- Slide 38 -->
425
<p align="center">
425
<p align="center">
426
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide38.png">
426
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide38.png?raw=true">
427
</p>
427
</p>
428
428
429
429
430
430
431
431
432
<!-- Slide 2 -->
432
<!-- Slide 2 -->
433
<p align="center">
433
<p align="center">
434
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide2.png">
434
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide2.png?raw=true">
435
</p>
435
</p>
436
436
437
437
438
438
439
<!-- Slide 3 -->
439
<!-- Slide 3 -->
440
<p align="center">
440
<p align="center">
441
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide3.png">
441
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide3.png?raw=true">
442
</p>
442
</p>
443
443
444
444
445
445
446
<!-- Slide 4 -->
446
<!-- Slide 4 -->
447
<p align="center">
447
<p align="center">
448
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide4.png">
448
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide4.png?raw=true">
449
</p>
449
</p>
450
450
451
451
452
452
453
<!-- Slide 5 -->
453
<!-- Slide 5 -->
454
<p align="center">
454
<p align="center">
455
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide5.png">
455
  <img src="https://github.com/McGill-MMA-EnterpriseAnalytics/Multi-Class-Prediction-of-Obesity-Risk/blob/main/16-README-Support-Files/Slide5.png?raw=true">
456
</p>
456
</p>