MediAug is a set of tools for data augmentation of histology
slides. It is primaraly developed for cervical cancer by
augmenting Pap smear slides. However, it can be extended to
any cell data that has an image and mask of different types of
cells. Currently supports general image augmentation techniques
as well as specialized ones like cell insertion and blending.
To install:
$ git clone https://github.com/smwade/MediAug
$ python setup.py install
There are two main open datasets for Pap smear images and MediAug is able to support both.
The SMEAR dataset is 917 indavidual cells. They are segmented by nucleus and cytoplasm.
https://mde-lab.aegean.gr/downloads
The SIPaKMeD Database consists of 4049 images of isolated cells that have been manually cropped from 966 cluster cell images of Pap smear slides. These images were acquired through a CCD camera adapted to an optical microscope. The cell images are divided into five categories containing normal, abnormal and benign cells.
http://cs.uoi.gr/~marina/sipakmed.html
The data pipeline can work with other datasets besides SIPaKMed and SMEAR. In order to
use another, you must convert the data to the correct format.
slides/
metaplastic/
image/
mask/
parabasal/
image/
mask/
...
And for cells:
cells/
metaplastic/
image/
mask/
parabasal/
image/
mask/
...
To show the library in action there are several notebooks that address key aspects of the library, such as what is a dataset, using Operations, and creating a Pipeline. These are found in notebooks/
MediAug comes with a CLI with useful scripts. These include:
The list of all can be seen with the command
$ mediaug --help
$ mediaug generate-augment-dataset --slide_dir <slide_dir> --cell_dir <cell_dir> --out_dir <out_dir> --num 1000 --max_cells <10>
$ mediaug prepare-pix2pix-images --image_dir <image_dir> --mask_dir <mask_dir> --out_dir <out_dir> --split_ratio <split_ratio>
$ mediaug resize-images --input_dir <input_dir> --out_dir <out_dir> --w 256 --height 256