Switch to unified view

a/README.md b/README.md
1
# MediAug
1
# MediAug
2
2
3
## Overview
3
## Overview
4
4
5
MediAug is a set of tools for data augmentation of histology
5
MediAug is a set of tools for data augmentation of histology
6
slides. It is primaraly developed for cervical cancer by
6
slides. It is primaraly developed for cervical cancer by
7
augmenting Pap smear slides. However, it can be extended to
7
augmenting Pap smear slides. However, it can be extended to
8
any cell data that has an image and mask of different types of
8
any cell data that has an image and mask of different types of
9
cells. Currently supports general image augmentation techniques
9
cells. Currently supports general image augmentation techniques
10
as well as specialized ones like cell insertion and blending.
10
as well as specialized ones like cell insertion and blending.
11
11
12
![example_cell](docs/project_writeup/images/augment/example_cell.png)
13
14
## Installation
12
## Installation
15
13
16
To install:
14
To install:
17
15
18
```bash
16
```bash
19
$ git clone https://github.com/smwade/MediAug
17
$ git clone https://github.com/smwade/MediAug
20
$ python setup.py install
18
$ python setup.py install
21
```
19
```
22
20
23
## Datasets
21
## Datasets
24
22
25
There are two main open datasets for Pap smear images and MediAug is able to support both.
23
There are two main open datasets for Pap smear images and MediAug is able to support both.
26
24
27
###  SMEAR
25
###  SMEAR
28
26
29
The SMEAR dataset is 917 indavidual cells. They are segmented by nucleus and cytoplasm.
27
The SMEAR dataset is 917 indavidual cells. They are segmented by nucleus and cytoplasm.
30
28
31
<https://mde-lab.aegean.gr/downloads>
29
<https://mde-lab.aegean.gr/downloads>
32
30
33
### SPIaKMeD
31
### SPIaKMeD
34
32
35
The SIPaKMeD Database consists of 4049 images of isolated cells that have been manually cropped from 966 cluster cell images of Pap smear slides. These images were acquired through a CCD camera adapted to an optical microscope. The cell images are divided into five categories containing normal, abnormal and benign cells.
33
The SIPaKMeD Database consists of 4049 images of isolated cells that have been manually cropped from 966 cluster cell images of Pap smear slides. These images were acquired through a CCD camera adapted to an optical microscope. The cell images are divided into five categories containing normal, abnormal and benign cells.
36
34
37
<http://cs.uoi.gr/~marina/sipakmed.html>
35
<http://cs.uoi.gr/~marina/sipakmed.html>
38
36
39
37
40
## Custom Dataset
38
## Custom Dataset
41
39
42
The data pipeline can work with other datasets besides SIPaKMed and SMEAR. In order to
40
The data pipeline can work with other datasets besides SIPaKMed and SMEAR. In order to
43
use another, you must convert the data to the correct format.
41
use another, you must convert the data to the correct format.
44
42
45
```
43
```
46
slides/
44
slides/
47
  metaplastic/
45
  metaplastic/
48
    image/
46
    image/
49
    mask/
47
    mask/
50
  parabasal/
48
  parabasal/
51
    image/
49
    image/
52
    mask/
50
    mask/
53
  ...
51
  ...
54
```
52
```
55
53
56
And for cells:
54
And for cells:
57
55
58
```
56
```
59
cells/
57
cells/
60
  metaplastic/
58
  metaplastic/
61
    image/
59
    image/
62
    mask/
60
    mask/
63
  parabasal/
61
  parabasal/
64
    image/
62
    image/
65
    mask/
63
    mask/
66
  ...
64
  ...
67
```
65
```
68
66
69
## Notebooks
67
## Notebooks
70
68
71
To show the library in action there are several notebooks that address key aspects of the library, such as what is a dataset, using Operations, and creating a Pipeline. These are found in `notebooks/`
69
To show the library in action there are several notebooks that address key aspects of the library, such as what is a dataset, using Operations, and creating a Pipeline. These are found in `notebooks/`
72
70
73
## CLI
71
## CLI
74
72
75
MediAug comes with a CLI with useful scripts. These include:
73
MediAug comes with a CLI with useful scripts. These include:
76
74
77
* generate-augment-dataset
75
* generate-augment-dataset
78
* prepare-pix2pix-images
76
* prepare-pix2pix-images
79
* resize-images
77
* resize-images
80
78
81
The list of all can be seen with the command
79
The list of all can be seen with the command
82
80
83
```bash
81
```bash
84
$ mediaug --help
82
$ mediaug --help
85
```
83
```
86
84
87
### Generate cell augmented dataset
85
### Generate cell augmented dataset
88
86
89
```bash
87
```bash
90
$ mediaug generate-augment-dataset --slide_dir <slide_dir> --cell_dir <cell_dir> --out_dir <out_dir> --num 1000 --max_cells <10>
88
$ mediaug generate-augment-dataset --slide_dir <slide_dir> --cell_dir <cell_dir> --out_dir <out_dir> --num 1000 --max_cells <10>
91
```
89
```
92
90
93
### Prepare images for Pix2Pix
91
### Prepare images for Pix2Pix
94
92
95
```bash
93
```bash
96
$ mediaug prepare-pix2pix-images --image_dir <image_dir> --mask_dir <mask_dir> --out_dir <out_dir> --split_ratio <split_ratio>
94
$ mediaug prepare-pix2pix-images --image_dir <image_dir> --mask_dir <mask_dir> --out_dir <out_dir> --split_ratio <split_ratio>
97
```
95
```
98
96
99
### Recursivly resize all images in directory
97
### Recursivly resize all images in directory
100
98
101
```bash
99
```bash
102
$ mediaug resize-images --input_dir <input_dir> --out_dir <out_dir> --w 256 --height 256
100
$ mediaug resize-images --input_dir <input_dir> --out_dir <out_dir> --w 256 --height 256
103
```
101
```