Switch to unified view

a b/Projects/Augmentation/README.md
1
# Peter Moss Acute Myeloid & Lymphoblastic Leukemia AI Research Project
2
## Acute Lymphoblastic Leukemia Classifiers 2019
3
### Data Augmentation
4
5
![Peter Moss Acute Myeloid & Lymphoblastic Leukemia Research Project](Media/Images/ALL_IDB1-Augmentation.png)
6
7
 
8
9
# Table Of Contents
10
11
- [Introduction](#introduction)
12
- [Projects](#projects)
13
- [Research Papers Followed](#research-papers-followed)
14
- [Dataset Used](#dataset-used)
15
- [Data augmentation](#data-augmentation)
16
    - [Resizing](#resizing)
17
    - [Grayscaling](#grayscaling)
18
    - [Histogram Equalization](#histogram-equalization)
19
    - [Reflection](#reflection)
20
    - [Gaussian Blur](#gaussian-blur)
21
    - [Translation](#translation)
22
    - [Rotation](#rotation)
23
- [System Requirements](#system-requirements)
24
- [Setup](#setup)
25
    - [Clone the repository](#clone-the-repository)
26
        - [Developer Forks](#developer-forks)
27
    - [Install Requirements](#install-requirements)
28
    - [Sort your dataset](#sort-your-dataset)
29
- [Run locally](#run-locally)
30
- [Run using Jupyter Notebook](#run-using-jupyter-notebook)
31
- [Your augmented dataset](#your-augmented-dataset)
32
- [Contributing](#contributing)
33
    - [Contributors](#contributors)
34
- [Versioning](#versioning)
35
- [License](#license)
36
- [Bugs/Issues](#bugs-issues)
37
38
 
39
40
# Introduction
41
The Acute Lymphoblastic Leukemia Detection System 2019 Data Augmentation program applies augmentations/filters to datasets and increases the amount of training/test data available to use. The program is part of the computer vision research and development for the Peter Moss Acute Myeloid & Lymphoblastic Leukemia AI Research Project. This page will provide general information, as well as a guide for installing and setting up the augmentation script.
42
43
 
44
45
# Projects 
46
| Project                                                                                                                                                                                           | Description                                                                                                             | Author                                                                                                             |
47
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
48
| [Data Augmentation Using Python](Augmentation.py "Data Augmentation Using Python")                              | A Python program for applying filters to datasets to increase the amount of training / test data.                       | [Adam Milton-Barker](https://www.leukemiaresearchassociation.ai/team/adam-milton-barker "Adam Milton-Barker") |
49
| [Data Augmentation Using Jupyter Notebook](Augmentation.ipynb "Data Augmentation Using Jupyter Notebook") | A Python tutorial and Jupyter Notebook for applying filters to datasets to increase the amount of training / test data. | [Adam Milton-Barker](https://www.leukemiaresearchassociation.ai/team/adam-milton-barker "Adam Milton-Barker") |
50
51
 
52
53
# Research papers followed
54
55
The Acute Lymphoblastic Leukemia Detection System 2019 uses the data augmentation methods proposed in the [Leukemia Blood Cell Image Classification Using Convolutional Neural Network by T. T. P. Thanh, Caleb Vununu, Sukhrob Atoev, Suk-Hwan Lee, and Ki-Ryong Kwon](http://www.ijcte.org/vol10/1198-H0012.pdf "Leukemia Blood Cell Image Classification Using Convolutional Neural Network by T. T. P. Thanh, Caleb Vununu, Sukhrob Atoev, Suk-Hwan Lee, and Ki-Ryong Kwon").
56
57
| Paper                                                                       | Description                                                                  | Link                                                       |
58
| --------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ---------------------------------------------------------- |
59
| Leukemia Blood Cell Image Classification Using Convolutional Neural Network | T. T. P. Thanh, Caleb Vununu, Sukhrob Atoev, Suk-Hwan Lee, and Ki-Ryong Kwon | [Paper](http://www.ijcte.org/vol10/1198-H0012.pdf "Paper") |
60
61
 
62
63
# Dataset Used
64
65
The [Acute Lymphoblastic Leukemia Image Database for Image Processing](https://homes.di.unimi.it/scotti/all/) dataset is used for this project. The dataset was created by [Fabio Scotti, Associate Professor Dipartimento di Informatica, Università degli Studi di Milano](https://homes.di.unimi.it/scotti/). Big thanks to Fabio for his research and time put in to creating the dataset and documentation, it is one of his personal projects. You will need to follow the steps outlined [here](https://homes.di.unimi.it/scotti/all/#download) to gain access to the dataset.
66
67
| Dataset                                                          | Description                                                                                                                                      | Link                                                                |
68
| ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------- |
69
| Acute Lymphoblastic Leukemia Image Database for Image Processing | Created by [Fabio Scotti, Associate Professor Dipartimento di Informatica, Università degli Studi di Milano](https://homes.di.unimi.it/scotti/). | [Dataset](https://homes.di.unimi.it/scotti/all/#download "Dataset") |
70
71
 
72
73
# Data augmentation
74
75
![AML & ALL Data Augmentation](Media/Images/ALL_IDB1-Augmented-Slides.png)
76
77
In this dataset there were 49 negative and 59 positive. To make this even I removed 10 images from the negative dataset. From here I removed a further 10 images per class for testing further on in the tutorial and for the purpose of demos etc. 
78
79
In my case I had 20 test images (10 pos/10 neg) and 39 images per class ready for augmentation. Place the original images that you wish to augment into the **Model/Data/0** & **Model/Data/1**. Using this program I was able to create a dataset of **1053** positive and **1053** negative augmented images.
80
81
The full Python class that holds the functions mentioned below can be found in [Classes/Data.py](Classes/Data.py), The Data class is a wrapper class around releated functions provided in popular computer vision libraries including as OpenCV and Scipy.
82
83
## Resizing
84
85
The first step is to resize the image this is done with the following function:
86
87
```
88
    def resize(self, filePath, savePath, show = False):
89
90
        """
91
        Writes an image based on the filepath and the image provided.
92
        """
93
94
        image = cv2.resize(cv2.imread(filePath), self.fixed)
95
        self.writeImage(savePath, image)
96
        self.filesMade += 1
97
        print("Resized image written to: " + savePath)
98
99
        if show is True:
100
            plt.imshow(image)
101
            plt.show()
102
103
        return image
104
```
105
106
## Grayscaling
107
108
In general grayscaled images are not as complex as color images and result in a less complex model. In the paper the authors described using grayscaling to create more data easily. To create a greyscale copy of each image I wrapped the built in OpenCV function, [cv2.cvtColor()](https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_colorspaces/py_colorspaces.html). The created images will be saved to the relevant directories in the default configuration.
109
110
```
111
    def grayScale(self, image, grayPath, show = False):
112
113
        """
114
        Writes a grayscale copy of the image to the filepath provided.
115
        """
116
117
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
118
        self.writeImage(grayPath, gray)
119
        self.filesMade += 1
120
        print("Grayscaled image written to: " + grayPath)
121
122
        if show is True:
123
            plt.imshow(gray)
124
            plt.show()
125
126
        return image, gray
127
```
128
## Histogram Equalization
129
130
Histogram equalization is basically stretching the histogram horizontally on both sides, increasing the intensity/contrast. Histogram equalization is described in the paper to enhance the contrast.
131
132
In the case of this dataset, it makes both the white and red blood cells more distinguishable. The created images will be saved to the relevant directories in the default configuration.
133
134
```
135
    def equalizeHist(self, gray, histPath, show = False):
136
137
        """
138
        Writes histogram equalized copy of the image to the filepath provided.
139
        """
140
141
        hist = cv2.equalizeHist(gray)
142
        self.writeImage(histPath, cv2.equalizeHist(gray))
143
        self.filesMade += 1
144
        print("Histogram equalized image written to: " + histPath)
145
146
        if show is True:
147
            plt.imshow(hist)
148
            plt.show()
149
150
        return hist
151
```
152
153
## Reflection
154
155
Reflection is a way of increasing your dataset by creating a copy that is fliped on its X axis, and a copy that is flipped on its Y axis. The reflection function below uses the built in OpenCV function, cv2.flip, to flip the image on the mentioned axis. The created images will be saved to the relevant directories in the default configuration.
156
157
```
158
    def reflection(self, image, horPath, verPath, show = False):
159
160
        """
161
        Writes reflected copies of the image to the filepath provided.
162
        """
163
164
        horImg = cv2.flip(image, 0)
165
        self.writeImage(horPath, horImg)
166
        self.filesMade += 1
167
        print("Horizontally reflected image written to: " + horPath)
168
169
        if show is True:
170
            plt.imshow(horImg)
171
            plt.show()
172
173
        verImg = cv2.flip(image, 1)
174
        self.writeImage(verPath, verImg)
175
        self.filesMade += 1
176
        print("Vertical reflected image written to: " + verPath)
177
178
        if show is True:
179
            plt.imshow(verImg)
180
            plt.show()
181
182
        return horImg, verImg
183
```
184
## Gaussian Blur
185
186
Gaussian Blur is a popular technique used on images and is especially popular in the computer vision world. The function below uses the ndimage.gaussian_filter function. The created images will be saved to the relevant directories in the default configuration.
187
188
```
189
    def gaussian(self, filePath, gaussianPath, show = False):
190
191
        """
192
        Writes gaussian blurred copy of the image to the filepath provided.
193
        """
194
195
        gaussianBlur = ndimage.gaussian_filter(plt.imread(filePath), sigma=5.11)
196
        self.writeImage(gaussianPath, gaussianBlur)
197
        self.filesMade += 1
198
        print("Gaussian image written to: " + gaussianPath)
199
200
        if show is True:
201
            plt.imshow(gaussianBlur)
202
            plt.show()
203
204
        return gaussianBlur
205
```
206
207
## Translation
208
209
Translation is a type of Affine Transformation and basically repositions the image within itself. The function below uses the cv2.warpAffine function. The created images will be saved to the relevant directories in the default configuration.
210
211
```
212
    def translate(self, image, translatedPath, show = False):
213
214
        """
215
        Writes transformed copy of the image to the filepath provided.
216
        """
217
218
        cols, rows, chs = image.shape
219
220
        translated = cv2.warpAffine(image, np.float32([[1, 0, 84], [0, 1, 56]]), (rows, cols),
221
                                    borderMode=cv2.BORDER_CONSTANT, borderValue=(144, 159, 162))
222
223
        self.writeImage(filePath, translated)
224
        self.filesMade += 1
225
        print("Translated image written to: " + filePath)
226
227
        if show is True:
228
            plt.imshow(translated)
229
            plt.show()
230
231
        return translated
232
```
233
234
## Rotation
235
236
Gaussian Blur is a popular technique used on images and is especially popular in the computer vision world. The function below uses the ndimage.gaussian_filter function. The created images will be saved to the relevant directories in the default configuration.
237
238
```
239
    def rotation(self, path, filePath, filename, show=False):
240
            """
241
            Writes rotated copies of the image to the filepath provided. 
242
            """
243
244
            img = Image.open(filePath)
245
246
            image = cv2.imread(filePath)
247
            cols, rows, chs = image.shape
248
249
            for i in range(0, 20):
250
                # Seed needs to be set each time randint is called
251
                random.seed(self.seed)
252
                randDeg = random.randint(-180, 180)
253
                matrix = cv2.getRotationMatrix2D((cols/2, rows/2), randDeg, 0.70)
254
                rotated = cv2.warpAffine(image, matrix, (rows, cols), borderMode=cv2.BORDER_CONSTANT,
255
                                        borderValue=(144, 159, 162))
256
                fullPath = os.path.join(
257
                    path, str(randDeg) + '-' + str(i) + '-' + filename)
258
259
                self.writeImage(fullPath, rotated)
260
                self.filesMade += 1
261
                print("Rotated image written to: " + fullPath)
262
263
                if show is True:
264
                    plt.imshow(rotated)
265
                    plt.show()
266
```
267
268
 
269
270
# System Requirements
271
272
- Tested on Ubuntu 20.04, 18.04 & 16.04
273
- [Tested with Python 3.0 and above](https://www.python.org/download/releases/3.0/ "Tested with Python 3.0 and above")
274
- Requires PIP3
275
- Jupyter Notebook (Optional)
276
277
 
278
279
# Setup
280
281
Below is a guide on how to setup the augmentation program on your device, as mentioned above the program has been tested with Ubuntu 18.04 & 16.04, but may work on other versions of Linux and possibly Windows.
282
283
## Clone the repository
284
285
Clone the [ALL Classifiers 2019](https://github.com/AMLResearchProject/ALL-Classifiers-2019 "ALL Classifiers 2019") repository from the [Peter Moss Acute Myleoid & Lymphoblastic AI Research Project](https://github.com/AMLResearchProject "Peter Moss Acute Myleoid & Lymphoblastic AI Research Project") Github Organization.
286
287
To clone the repository and install the ALL Classifiers 2019, make sure you have Git installed. Now navigate to the home directory on your device using terminal/commandline, and then use the following command.
288
289
```
290
  git clone https://github.com/AMLResearchProject/ALL-Classifiers-2019.git
291
```
292
293
Once you have used the command above you will see a directory called **ALL-Classifiers-2019** in your home directory.
294
295
```
296
ls
297
```
298
299
Using the ls command in your home directory should show you the following.
300
301
```
302
ALL-Classifiers-2019
303
```
304
305
Navigate to **ALL-Classifiers-2019/Augmentation** directory, this is your project root directory for this tutorial.
306
307
### Developer Forks
308
309
Developers from the Github community that would like to contribute to the development of this project should first create a fork, and clone that repository. For detailed information please view the [CONTRIBUTING](../CONTRIBUTING.md "CONTRIBUTING") guide. You should pull the latest code from the development branch.
310
311
```
312
  $ git clone -b "0.2.0" https://github.com/AMLResearchProject/ALL-Classifiers-2019.git
313
```
314
315
The **-b "0.2.0"** parameter ensures you get the code from the latest master branch. Before using the below command please check our latest master branch in the button at the top of the project README.
316
317
## Install Requirements
318
319
Once you have used the command above you will see a directory called **ALL-Classifiers-2019** in the location you chose to download the repo to. In terminal, navigate to the **ALL-Classifiers-2019/Augmentation** and use the following command to install the required software for this program.
320
321
```
322
  sed -i 's/\r//' setup.sh
323
  sh Setup.sh
324
```
325
326
## Sort your dataset
327
328
The **ALL_IDB_1** dataset is the one used in this tutorial. In this dataset there were 59 negative and 49 positive. To make this even I removed 10 images from the negative dataset. From here I removed a further 10 images per class for testing further on in the tutorial and for the purpose of demos etc. In my case I ended up with 20 test images (10 pos/10 neg) and 49 images per class ready for augmentation. Place the original images that you wish to augment into the **Model/Data/0** & **Model/Data/1**. Using this program I was able to create a dataset of **1053** positive and **1053** negative augmented images.
329
330
You are now ready to move onto starting your Jupyter Notebook server or run the data augmentation locally.
331
332
 
333
334
# Run locally
335
336
If you would like to run the program locally you can navigate to the Augmentation directory and use the following command:
337
338
```
339
  python3 Augmentation.py
340
```
341
342
# Run using Jupyter Notebook
343
344
You need to make sure you have Jupyter Notebook installed, you can use the following commands to install Jupyter, if you are unsure if you have it installed you can run the commands and it will tell you if you already have it installed and exit the download.
345
346
```
347
pip3 install --upgrade --force-reinstall --no-cache-dir jupyter
348
sudo apt install jupyter-notebook
349
```
350
351
Once you have completed the above, make sure you are in the **ALL-Classifiers-2019/Augmentation** directory and use the following commands to start your server, a URL will be shown in your terminal which will point to your Juupyter Notebook server with the required authentication details in the URL paramaters.
352
353
Below you would replace **###.###.#.##** with local IP address of the device you are going to be running the augmentation on.
354
355
```
356
  sudo jupyter notebook --ip ###.###.#.##
357
```
358
359
Using the URL provided to you in the above step, you should be able to access a copy of this directory hosted on your own device. From here you can navigate the project files and source code, you need to navigate to the **ALL-Classifiers-2019/Augmentation/Augmentation.ipynb** file on your own device which will take you to the second part of this tutorial. If you get stuck with anything in the above or following tutorial, please use the repository [issues](../issues "issues") and fill out the request information.
360
361
# Your augmented dataset
362
363
If you head to your **Model/Data/** directory you will notice the **Augmented** directory. Inside the augmented directory you will find **0** (negative) and **1** (postive) directories including resized copies of the original along with augmented copies.
364
365
Using data augmentation I was able to increase the dataset from **39** images per class to **1053** per class.
366
367
 
368
369
# Contributing
370
371
The Peter Moss Acute Myeloid & Lymphoblastic Leukemia AI Research project encourages and welcomes code contributions, bug fixes and enhancements from the Github.
372
373
Please read the [CONTRIBUTING](../CONTRIBUTING.md "CONTRIBUTING") document for a full guide to forking our repositories and submitting your pull requests. You will also find information about our code of conduct on this page.
374
375
## Contributors
376
377
- [Adam Milton-Barker](https://www.leukemiaresearchassociation.ai/team/adam-milton-barker "Adam Milton-Barker") - [Asociacion De Investigation En Inteligencia Artificial Para La Leucemia Peter Moss](https://www.leukemiaresearchassociation.ai "Asociacion De Investigation En Inteligencia Artificial Para La Leucemia Peter Moss") President & Lead Developer, Sabadell, Spain
378
379
 
380
381
# Versioning
382
383
We use SemVer for versioning.
384
385
 
386
387
# License
388
389
This project is licensed under the **MIT License** - see the [LICENSE](../LICENSE "LICENSE") file for details.
390
391
 
392
393
# Bugs/Issues
394
395
We use the [repo issues](../issues "repo issues") to track bugs and general requests related to using this project.