|
a |
|
b/README.md |
|
|
1 |
# ML-Project-Cancer-Prediction |
|
|
2 |
My Kaggle Project(Based on Medical Treatment) |
|
|
3 |
Kaggle Problem link: |
|
|
4 |
Some libraries/Subpackages used are: https://www.kaggle.com/c/msk-redefining-cancer-treatment |
|
|
5 |
1.nltk |
|
|
6 |
2.sklearn.calibration |
|
|
7 |
3.sklearn.naive_bayes |
|
|
8 |
4.mlxtend.classifier |
|
|
9 |
5.sklearn.linear_model |
|
|
10 |
6.seaborn |
|
|
11 |
7.sklearn.metrics |
|
|
12 |
|
|
|
13 |
Algorithm applied: |
|
|
14 |
1.Naive Bayes |
|
|
15 |
2.Random Forest(using oneHotEncoding) |
|
|
16 |
3.Random Forest(using ResponseEncoding) |
|
|
17 |
4.Logistic Regression |
|
|
18 |
4.Linear Support Vector Machine(Linear SVM) |
|
|
19 |
5.K Nearest Neighbours |
|
|
20 |
6.Stacking Model |
|
|
21 |
7.Maximum Voting classifier |
|
|
22 |
|
|
|
23 |
Evaluation is done on basis of multi log-loss |
|
|
24 |
1.Log-loss for Naive Bayes Model is: 1.2174351082980228 |
|
|
25 |
2.Log-loss for Logistic Regression is: 1.0139465030649317 |
|
|
26 |
3.Log-loss for KNN is : 0.9686092822627863 |
|
|
27 |
4.Log-loss for LinearSVM is: 1.0518829636631724 |
|
|
28 |
5.Log-loss for Random Forest Classifier is : 1.1440820641479814(using one hot encoding) and 1.220569827205813(using Response Encoding) |
|
|
29 |
6.Log-loss for Stacking Model is:[training_set:0.497983218669304,test_set:1.1751619600947567,cross_validation_set:1.09245098514981767 |
|
|
30 |
7.Log-loss for Maximum Voting Classifier is:[training_set: 0.8677287779975493,test_set:1.2148355813599823,cross_validation_set:1.142669504] |
|
|
31 |
|
|
|
32 |
As per the Evaluation Logistic Regression is the best Model to be fitted in this problem . |