a b/README.md
1
# Lung Cancer Prediction using CNN and Transfer Learning
2
3
This project aims to build a Lung Cancer Prediction System using Convolutional Neural Networks (CNN) and transfer learning. The model classifies lung cancer images into four categories: Normal, Adenocarcinoma, Large Cell Carcinoma, and Squamous Cell Carcinoma.
4
5
6
## Table of Contents
7
- [Introduction](#introduction)
8
- [Dataset](#dataset)
9
- [Dependencies](#dependencies)
10
- [Project Structure](#project-structure)
11
- [Training the Model](#training-the-model)
12
- [Using the Model](#using-the-model)
13
- [Results](#results)
14
- [Acknowledgements](#acknowledgements)
15
- [License](#license)
16
17
## Introduction
18
19
Lung cancer is one of the leading causes of cancer-related deaths worldwide. Early detection and accurate classification are crucial for effective treatment and patient survival. This project leverages deep learning techniques to develop a robust lung cancer classification model using chest X-ray images.
20
21
## Dataset
22
23
The dataset used in this project consists of lung cancer images categorized into four classes:
24
1. Normal
25
2. Adenocarcinoma
26
3. Large Cell Carcinoma
27
4. Squamous Cell Carcinoma
28
29
The dataset should be organized into training (`train`), validation (`valid`), and testing (`test`) folders with the following subfolders for each class:
30
31
- `train/`
32
  - `normal/`
33
  - `adenocarcinoma/`
34
  - `large_cell_carcinoma/`
35
  - `squamous_cell_carcinoma/`
36
37
- `valid/`
38
  - `normal/`
39
  - `adenocarcinoma/`
40
  - `large_cell_carcinoma/`
41
  - `squamous_cell_carcinoma/`
42
43
- `test/`
44
  - `normal/`
45
  - `adenocarcinoma/`
46
  - `large_cell_carcinoma/`
47
  - `squamous_cell_carcinoma/`
48
49
Alternatively, you can also download a similar dataset from [Kaggle](https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images) which includes Chest CT scan images.
50
51
### Google Colab Link
52
To replicate and run the project in Google Colab, use the following link: [Lung Cancer Prediction System on Colab](https://colab.research.google.com/drive/1kMTghEwVoJaFmlKydxuhhoyzHluIUjoV?usp=sharing)
53
54
55
### Usage
56
57
- **Direct Download**: You can download the dataset directly from this repository and store it on your local system.
58
- **Google Drive**: Alternatively, you can store the dataset in your Google Drive and mount it using the provided code to replicate the environment used in this project.
59
60
## Dependencies
61
62
The project requires the following libraries:
63
- Python 3.x
64
- pandas
65
- numpy
66
- seaborn
67
- matplotlib
68
- scikit-learn
69
- tensorflow
70
- keras
71
72
You can install the required libraries using the following command:
73
74
```bash
75
pip install pandas numpy seaborn matplotlib scikit-learn tensorflow keras
76
```
77
78
79
## Project Structure
80
81
```
82
.
83
├── Lung_Cancer_Prediction.ipynb
84
├── README.md
85
├── dataset
86
│ ├── train
87
│ │ ├── adenocarcinoma_left.lower.lobe_T2_N0_M0_Ib
88
│ │ ├── large.cell.carcinoma_left.hilum_T2_N2_M0_IIIa
89
│ │ ├── normal
90
│ │ └── squamous.cell.carcinoma_left.hilum_T1_N2_M0_IIIa
91
│ ├── test
92
│ │ ├── adenocarcinoma_left.lower.lobe_T2_N0_M0_Ib
93
│ │ ├── large.cell.carcinoma_left.hilum_T2_N2_M0_IIIa
94
│ │ ├── normal
95
│ │ └── squamous.cell.carcinoma_left.hilum_T1_N2_M0_IIIa
96
│ └── valid
97
│ ├── adenocarcinoma_left.lower.lobe_T2_N0_M0_Ib
98
│ ├── large.cell.carcinoma_left.hilum_T2_N2_M0_IIIa
99
│ ├── normal
100
│ └── squamous.cell.carcinoma_left.hilum_T1_N2_M0_IIIa
101
└── best_model.hdf5
102
```
103
104
This structure outlines the files and directories included in your project:
105
106
- **Lung_Cancer_Prediction.ipynb**: Jupyter Notebook containing the code for training and evaluating the lung cancer prediction model.
107
- **README.md**: Markdown file providing an overview of the project, usage instructions, and other relevant information.
108
- **dataset/**: Directory containing the dataset used for training and testing.
109
  - **train/**: Subdirectory containing training images categorized into different classes of lung cancer.
110
  - **test/**: Subdirectory containing testing images categorized similarly to the training set.
111
  - **valid/**: Subdirectory containing validation images categorized similarly to the training set.
112
- **best_model.hdf5**: File where the best-trained model weights are saved after training.
113
114
115
116
117
## Training the Model
118
119
The Jupyter Notebook `Lung_Cancer_Prediction.ipynb` contains the code for training the model. Below are the steps involved:
120
121
1. **Mount Google Drive**: To access the dataset stored in Google Drive.
122
2. **Load and Preprocess Data**: Use `ImageDataGenerator` for data augmentation and normalization.
123
3. **Define the Model**: Use the Xception model pre-trained on ImageNet as the base model and add custom layers on top.
124
4. **Compile the Model**: Specify the optimizer, loss function, and metrics.
125
5. **Train the Model**: Fit the model on the training data and validate it on the validation data. Callbacks like learning rate reduction, early stopping, and model checkpointing are used.
126
6. **Save the Model**: Save the trained model for future use.
127
128
### Example Usage
129
130
```python
131
# Mount Google Drive
132
from google.colab import drive
133
drive.mount('/content/drive', force_remount=True)
134
135
# Load and preprocess data
136
IMAGE_SIZE = (350, 350)
137
train_datagen = ImageDataGenerator(rescale=1./255, horizontal_flip=True)
138
test_datagen = ImageDataGenerator(rescale=1./255)
139
140
train_generator = train_datagen.flow_from_directory(
141
    train_folder,
142
    target_size=IMAGE_SIZE,
143
    batch_size=8,
144
    class_mode='categorical'
145
)
146
147
validation_generator = test_datagen.flow_from_directory(
148
    validate_folder,
149
    target_size=IMAGE_SIZE,
150
    batch_size=8,
151
    class_mode='categorical'
152
)
153
154
# Define the model
155
pretrained_model = tf.keras.applications.Xception(weights='imagenet', include_top=False, input_shape=[*IMAGE_SIZE, 3])
156
pretrained_model.trainable = False
157
158
model = Sequential([
159
    pretrained_model,
160
    GlobalAveragePooling2D(),
161
    Dense(4, activation='softmax')
162
])
163
164
# Compile the model
165
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
166
167
# Train the model
168
history = model.fit(
169
    train_generator,
170
    steps_per_epoch=25,
171
    epochs=50,
172
    validation_data=validation_generator,
173
    validation_steps=20
174
)
175
176
# Save the model
177
model.save('/content/drive/MyDrive/dataset/trained_lung_cancer_model.h5')
178
```
179
180
181
182
## Using the Model
183
184
To use the trained model for predictions, follow these steps:
185
186
1. **Load the Trained Model**: Load the saved `.h5` model file using TensorFlow/Keras.
187
2. **Preprocess the Input Image**: Load and preprocess the input image using `image.load_img()` and `image.img_to_array()`.
188
3. **Make Predictions**: Use the loaded model to predict the class of the input image.
189
4. **Display Results**: Display the input image along with the predicted class label.
190
191
### Example Code
192
193
```python
194
from tensorflow.keras.models import load_model
195
from tensorflow.keras.preprocessing import image
196
import numpy as np
197
import matplotlib.pyplot as plt
198
199
# Load the trained model
200
model = load_model('/content/drive/MyDrive/dataset/trained_lung_cancer_model.h5')
201
202
def load_and_preprocess_image(img_path, target_size):
203
    # Load and preprocess the image
204
    img = image.load_img(img_path, target_size=target_size)
205
    img_array = image.img_to_array(img)
206
    img_array = np.expand_dims(img_array, axis=0)
207
    img_array /= 255.0  # Rescale the image like the training images
208
    return img_array
209
210
# Example usage with an image path
211
img_path = '/content/test_image.png'
212
target_size = (350, 350)
213
214
# Load and preprocess the image
215
img = load_and_preprocess_image(img_path, target_size)
216
217
# Make predictions
218
predictions = model.predict(img)
219
predicted_class = np.argmax(predictions[0])
220
221
# Map the predicted class to the class label
222
class_labels = list(train_generator.class_indices.keys())  # Assuming `train_generator` is defined
223
predicted_label = class_labels[predicted_class]
224
225
# Print the predicted class
226
print(f"The image belongs to class: {predicted_label}")
227
228
# Display the image with the predicted class
229
plt.imshow(image.load_img(img_path, target_size=target_size))
230
plt.title(f"Predicted: {predicted_label}")
231
plt.axis('off')
232
plt.show()
233
```
234
235
236
237
## Results
238
239
After training and evaluating the lung cancer prediction model, the following results were obtained:
240
241
- Final training accuracy: `history.history['accuracy'][-1]`
242
- Final validation accuracy: `history.history['val_accuracy'][-1]`
243
- Model accuracy: 93%
244
245
246
### Example Predictions
247
248
Include images and their predicted classes here, demonstrating the model's performance on new data.
249
250
251
252
253
## Acknowledgements
254
255
We acknowledge and thank the contributors to the [Chest CT Scan Images Dataset](https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images) on Kaggle for providing the dataset used in this project.
256
257
258
259
260
## License
261
262
This project is licensed under the [MIT License](LICENSE).
263
264
Feel free to use, modify, or distribute this code for educational and non-commercial purposes. Refer to the LICENSE file for more details.
265
266
267
268
269