|
a |
|
b/README.md |
|
|
1 |
|
|
|
2 |
# Lung cancer detection |
|
|
3 |
|
|
|
4 |
This project aims to detect lung cancer from CT-Scan images using deep learning techniques. The dataset used in this project contains CT-Scan images of Adenocarcinoma, Large cell carcinoma, Squamous cell carcinoma, and normal cells. The dataset can be found [here](https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images?resource=download). |
|
|
5 |
|
|
|
6 |
## Dataset Description |
|
|
7 |
|
|
|
8 |
The CT-Scan images are in jpg or png format to fit the model. The dataset contains four main folders: |
|
|
9 |
|
|
|
10 |
- `Adenocarcinoma`: contains CT-Scan images of Adenocarcinoma of the lung. Adenocarcinoma is the most common form of lung cancer, accounting for 30% of all cases overall and about 40% of all non-small cell lung cancer occurrences. |
|
|
11 |
|
|
|
12 |
- `Large cell carcinoma`: contains CT-Scan images of Large-cell undifferentiated carcinoma of the lung. This type of lung cancer usually accounts for 10 to 15% of all cases of NSCLC. |
|
|
13 |
|
|
|
14 |
- `Squamous cell carcinoma`: contains CT-Scan images of Squamous cell carcinoma of the lung. This type of lung cancer is responsible for about 30% of all non-small cell lung cancers, and is generally linked to smoking. |
|
|
15 |
|
|
|
16 |
- `Normal`: contains CT-Scan images of normal cells. |
|
|
17 |
|
|
|
18 |
|
|
|
19 |
The dataset is divided into three sets: training, testing, and validation. The `training` set contains 70% of the data, the `testing` set contains 20% of the data, and the `validation` set contains 10% of the data. |
|
|
20 |
|
|
|
21 |
## Technologies Used |
|
|
22 |
|
|
|
23 |
This project was implemented using the following technologies: |
|
|
24 |
|
|
|
25 |
- `TensorFlow` and `Keras`: for building and training the deep learning model. |
|
|
26 |
- `ImageDataGenerator`: for data augmentation. |
|
|
27 |
- `ResNet50`, `VGG16`, `ResNet101`, `VGG19`, `DenseNet201`, `EfficientNetB4`, `MobileNetV2`: pre-trained models used for transfer learning. |
|
|
28 |
- `PIL`, `OpenCV`: for image processing. |
|
|
29 |
- `Matplotlib`: for visualizing the training and validation results. |
|
|
30 |
|
|
|
31 |
## Model Architecture |
|
|
32 |
|
|
|
33 |
The deep learning model used in this project is a convolutional neural network (CNN). The model consists of several layers including convolutional, max pooling, batch normalization, and dense layers. Transfer learning is used by initializing the model with pre-trained weights from one of the above-mentioned pre-trained models. |
|
|
34 |
|
|
|
35 |
The model was trained using the `Adam` optimizer with a learning rate of 0.001, a batch size of 16, and a total of 50 epochs. |
|
|
36 |
|
|
|
37 |
## Results |
|
|
38 |
|
|
|
39 |
The model achieved an accuracy of 95.2% on the testing set. The training and validation accuracies and losses for each epoch are visualized below: |
|
|
40 |
|
|
|
41 |
 |
|
|
42 |
|
|
|
43 |
## Checkpoints |
|
|
44 |
|
|
|
45 |
Two checkpoints were saved during the training process. These checkpoints can be used to resume training or to evaluate the model on new data. |
|
|
46 |
|
|
|
47 |
## Credit |
|
|
48 |
|
|
|
49 |
This notebook was created by the Lovelace team,, for the AI_NIGHT_CHALLENGE competition. |