|
a |
|
b/preprocess/README.md |
|
|
1 |
# Preprocessing |
|
|
2 |
|
|
|
3 |
The goal of these steps is to end up with a collection of images that are neural-network ready, and each have associated measurements (e.g. size and variance) that can be used in a structural causal model |
|
|
4 |
|
|
|
5 |
1. Download the data from the official repository http://doi.org/10.7937/K9/TCIA.2015.LO9QL9SX |
|
|
6 |
|
|
|
7 |
Note that in order for the package pylidc to work, it needs to know where the original data are stored; See https://pylidc.github.io/install.html for instructions |
|
|
8 |
|
|
|
9 |
2. Run lidc-preprocessing.py |
|
|
10 |
|
|
|
11 |
This step extracts individual nodules from the ct scans and generates the 2D images from the 3D nodules. On my machine (12 threads) this takes about (nodules = 5/10 mins) |
|
|
12 |
|
|
|
13 |
3. measure_slices.py |
|
|
14 |
|
|
|
15 |
Measure size (area) and variance of the pixel intensities based on the segmentations. |
|
|
16 |
These measurements will form the basis of the simulations |
|
|
17 |
|
|
|
18 |
4. preprare-data-2d.py |
|
|
19 |
|
|
|
20 |
Split the data in train/valid, move to new folder, filter out slices that are too small (<20mm) or that the annotators dont agree on and normalize measurements to approximately normal distributions |