|
a |
|
b/README.md |
|
|
1 |
# NeoLung: Lung cancer prediction using machine learning |
|
|
2 |
|
|
|
3 |
## Aim: |
|
|
4 |
|
|
|
5 |
The purpose of this project is to comapare Classification algorithms implemented on Lung Cancer Dataset |
|
|
6 |
|
|
|
7 |
## Dataset: |
|
|
8 |
|
|
|
9 |
The Lung cancer dataset used in the project has been collected from data.world whose link is: |
|
|
10 |
|
|
|
11 |
https://data.world/sta427ceyin/survey-lung-cancer |
|
|
12 |
|
|
|
13 |
## Working: |
|
|
14 |
|
|
|
15 |
We have selected **10 of the following classification algorithms** that have been used in this project: |
|
|
16 |
1. Logistic Regression |
|
|
17 |
2. K-Nearest Neighbors (KNN) |
|
|
18 |
3. Decision Tree |
|
|
19 |
4. Support Vector Machines (SVM) |
|
|
20 |
5. Naive Bayes |
|
|
21 |
6. Random Forest |
|
|
22 |
7. Gradient Boosting |
|
|
23 |
8. Neural Networks |
|
|
24 |
9. AdaBoost |
|
|
25 |
10. XGBoost |
|
|
26 |
|
|
|
27 |
Then we build the model for each of the above mentioned algorithms. Using the following **Evaluation Metrics** we have compared the algorithms: |
|
|
28 |
1. Accuracy |
|
|
29 |
2. Precision |
|
|
30 |
3. F1 Score |
|
|
31 |
4. Recall Score |
|
|
32 |
5. Confusion Matrix |
|
|
33 |
|
|
|
34 |
These are the accuracies of the algorithms: |
|
|
35 |
1. Logistic Regression: **90.29%** |
|
|
36 |
2. K-Nearest Neighbors (KNN): **87.37%** |
|
|
37 |
3. Decision Tree: **87.37%** |
|
|
38 |
4. Support Vector Machines (SVM): **84.46%** |
|
|
39 |
5. Naive Bayes: **86.4%** |
|
|
40 |
6. Random Forest: **89.32%** |
|
|
41 |
7. Gradient Boosting: **89.32%** |
|
|
42 |
8. Neural Networks: **84.46%** |
|
|
43 |
9. AdaBoost: **84.46%** |
|
|
44 |
10. XGBoost: **84.46%** |
|
|
45 |
|
|
|
46 |
## Results: |
|
|
47 |
|
|
|
48 |
Out of all the algorithms so implemented, **Logistic Regression** performed the best. The evaluation metrics for Logistic Regression is as follows: |
|
|
49 |
|
|
|
50 |
**Accuracy: 0.9029126213592233** |
|
|
51 |
|
|
|
52 |
**Precision: 0.9052631578947369** |
|
|
53 |
|
|
|
54 |
**Recall: 0.9885057471264368** |
|
|
55 |
|
|
|
56 |
**F1 score: 0.945054945054945** |
|
|
57 |
|
|
|
58 |
**Confusion Matrix:** |
|
|
59 |
|
|
|
60 |
 |