A simple framework to detect the Covid-19 by analyzing the lung scans CT.
The goal of this research is to train a classifier to recognize Covid-19 positive patients from their CT lungs scans in order to support the physician’s decision process with a quantitative approach.
The general pipeline process is:
We obtain three 4x13 matrices that we reshape into a single vector of 156 features. We first attempt to perform feature selection by eliminating the least contributing features. However, the loss of information is excessive. So, given the high dimensionality, we opt for a feature synthesis approach and apply PCA to only retain the first 2 Principal Components as they alone explain ~89% of the total variability.
Finally, the processed data is fed into the different Classifiers (SVM, Logistic Regression, Random Forest, Ensemble methods). All methods were tested with 5-fold Cross Validation and 80/20 train/test split stratified on the labels. SVM with a linear kernel obtained the best results both in terms of AUC(85%), Accuracy(81%), Precision (81%) and Recall(80%). The counfusion matrix and the AUC plot is reported below: