Card

Among the models tested, XGBoost and Random Forest achieved the highest accuracy, with XGBoost achieving 92% accuracy and RF achieving 91% accuracy. Conversely, LR and SVM demonstrated lower accuracy, with LR achieving 79% accuracy and SVM achieving 73% accuracy. Examination of the confusion matrices revealed that LR misclassified one instance as false positive (FP=1), and SVM misclassified three instances as false positive (FP=3), incorrectly classifying
32
individuals with cancer as healthy. XGBoost and RF, on the other hand, exhibited superior performance in minimizing such errors, as indicated by the confusion matrices. Misclassification of individuals with cancer as healthy is concerning, as it may lead to delayed or missed treatments that are most effective in the early stages of the disease. An important feature for XGBoost in the diagnosis was REG1A, whereas RF placed significance on the diagnosis feature.