a b/Dataset-pre-process/README.md
1
2
#### Data-set
3
4
[KVASIR version 2](https://datasets.simula.no/kvasir/data/kvasir-dataset-v2.zip) is used wchich contains 8000 images.(1000 images for each class.)
5
6
7
###### 8 Classes of the data-set
8
9
Anatomical Landmarks:-
10
- Z-line
11
- Pylorus
12
- Cecum
13
14
Pathological Findings:-
15
- Esophagitis
16
- Polyps
17
- Ulcerative Colitis
18
19
20
Polyp Removals:-
21
- Dyed and Lifted Polyps
22
- Dyed Resection Margins
23
24
Data-set is shuffled using Linux ```shuf``` command and split into two as follows while preserving the data distribution.
25
26
Train set - 6400 images (800 images from each class)
27
Test set - 1600 images (200 images from each class)
28
29
Data-set is stored back in gdrive in order to load in to Google Colab when needed.