--- a +++ b/README.md @@ -0,0 +1,60 @@ +# NeoLung: Lung cancer prediction using machine learning + +## Aim: + +The purpose of this project is to comapare Classification algorithms implemented on Lung Cancer Dataset + +## Dataset: + +The Lung cancer dataset used in the project has been collected from data.world whose link is: + +https://data.world/sta427ceyin/survey-lung-cancer + +## Working: + +We have selected **10 of the following classification algorithms** that have been used in this project: +1. Logistic Regression +2. K-Nearest Neighbors (KNN) +3. Decision Tree +4. Support Vector Machines (SVM) +5. Naive Bayes +6. Random Forest +7. Gradient Boosting +8. Neural Networks +9. AdaBoost +10. XGBoost + +Then we build the model for each of the above mentioned algorithms. Using the following **Evaluation Metrics** we have compared the algorithms: +1. Accuracy +2. Precision +3. F1 Score +4. Recall Score +5. Confusion Matrix + +These are the accuracies of the algorithms: +1. Logistic Regression: **90.29%** +2. K-Nearest Neighbors (KNN): **87.37%** +3. Decision Tree: **87.37%** +4. Support Vector Machines (SVM): **84.46%** +5. Naive Bayes: **86.4%** +6. Random Forest: **89.32%** +7. Gradient Boosting: **89.32%** +8. Neural Networks: **84.46%** +9. AdaBoost: **84.46%** +10. XGBoost: **84.46%** + +## Results: + +Out of all the algorithms so implemented, **Logistic Regression** performed the best. The evaluation metrics for Logistic Regression is as follows: + +**Accuracy: 0.9029126213592233** + +**Precision: 0.9052631578947369** + +**Recall: 0.9885057471264368** + +**F1 score: 0.945054945054945** + +**Confusion Matrix:** + +