|
a |
|
b/README.md |
|
|
1 |
# wsi_preprocessing |
|
|
2 |
|
|
|
3 |
## Processing and tiling of histological slides |
|
|
4 |
|
|
|
5 |
openslide-based processing and filtering (Only tissue filtering right now, more will follow) |
|
|
6 |
The process can be configured using a config json file. |
|
|
7 |
|
|
|
8 |
The tissue detection is processed on a higher level to speed up the process. Thereby rough tiles will be sampled and |
|
|
9 |
discarded if there isn't enough tissue coverage. The tiles will then be divided into patches for training etc. |
|
|
10 |
|
|
|
11 |
Supported annotation types are .xml (Camelyon17 and some other public datasets) or .geojson (QuPath) |
|
|
12 |
Right now only binary annotation types are supported (tumor - non-tumor) |
|
|
13 |
|
|
|
14 |
Supported slide formats are .tif and .svs right now |
|
|
15 |
|
|
|
16 |
### Usage: |
|
|
17 |
|
|
|
18 |
This script is designed to be used together with CuPath in case there are no annotations. |
|
|
19 |
Main file is "tile_generator.py" - Configure the process via the config file and execute this file to start the process |
|
|
20 |
|
|
|
21 |
### Additional information: |
|
|
22 |
|
|
|
23 |
NOTE: |
|
|
24 |
Right now there is a bug on Unix systems regarding openslide where image data isn't properly loaded. To fix this follow: |
|
|
25 |
https://github.com/openslide/openslide-python/issues/58#issuecomment-883446558 |
|
|
26 |
|
|
|
27 |
### Config Explanation: |
|
|
28 |
|
|
|
29 |
| Dictionary Entry | Explanation | |
|
|
30 |
|---------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
|
|
31 |
| tissue_coverage | Threshold [0,1] for how much tissue coverage is necessary, default is 0.75 | |
|
|
32 |
| keep_annotated_tiles_despite_too_little_tissue_coverage | legacy option. Old behaviour: Keep annotated tiles even if not covered by tissue. New behaviour (to allow easier tile clean-up around the edges): discard tiles with too little tissue coverage regardless of annotation status. | |
|
|
33 |
| processing_level | Level of downscaling by openslide - Lowering the level will increase precision but more time is needed, default is 5 | |
|
|
34 |
| blocked_threads | Number of threads that wont be used by the program | |
|
|
35 |
| patches_per_tile | Number of patches used for lower resolution operations like tissue detection | |
|
|
36 |
| overlap | Value [0,1[ to set the overlap between neighbouring unannotated patches | |
|
|
37 |
| annotation_overlap | Value [0,1[ to set the overlap between neighbouring annotated patches | |
|
|
38 |
| patch_size | Output pixel size of the quadratic patches | |
|
|
39 |
| slides_dir | Directory where the different slides and subdirs are located | |
|
|
40 |
| slides_file | txt file containing paths to all slides to process (absolute paths) | |
|
|
41 |
| annotation_dir | Directory where the annotations are located | |
|
|
42 |
| annotation_file_format | File format of the input annotations ("xml","geojson") | |
|
|
43 |
| output_path | Output directory to where the resulting images will be stored | |
|
|
44 |
| skip_unlabeled_slides | Boolean to skip slides without an annotation file | |
|
|
45 |
| save_annotated_only | Boolean to only save annotated patches | |
|
|
46 |
| output_format | Image output format default is "png" | |
|
|
47 |
| show_mode | Boolean to enable plotting of some intermediate results/visualizations | |
|
|
48 |
| label_dict | Structure to set up the operator and the threshold for checking the coverage of a certain class. Up to one unannotated tissue type (e.g. non-tumor) is possible and must go first for implementation reasons. | |
|
|
49 |
| type | Operator type [ "==", ">=", "<="] | |
|
|
50 |
| threshold | Coverage threshold for the individual class | |