Diff of /README.md [000000] .. [39fb2b]

Switch to unified view

a b/README.md
1
# Eliminating Biasing Signals in Lung Cancer Images for Prognosis Predictions with Deep Learning
2
3
4
This repository contains the necessary files to reproduce the results of paper
5
"Eliminating Biasing Signals Lung Cancer Images for Prognosis Predictions with Deep Learning"
6
by W.A.C. van Amsterdam, J.J.C. Verhoeff, P.A. de Jong, T. Leiner and M.J.C. Eijkemans; 
7
in Nature Digital Medicine, 2019
8
9
## Replicating the experiments
10
11
See this release for the code that generated the published results
12
13
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3522229.svg)](https://doi.org/10.5281/zenodo.3522229)
14
15
Please follow these steps to replicate the results as published.
16
The original python scripts are (somewhat) self-explanatory.
17
They do contain unused code that was useful during initial experiments, but was not used for the final publication
18
19
### Installation
20
21
The easiest way to go about this is to create a new conda environment and install all dependencies using conda and pip
22
23
```
24
conda create --name elimbias
25
conda activate elimbias
26
conda install python=3.7.3 tqdm numpy pandas feather-format nibabel pillow scikit-learn tensorboard future seaborn
27
conda install -c pytorch pytorch=1.1.0 torchvision
28
pip install pyro-ppl==0.3.0 pypng pylidc
29
```
30
31
32
33
### Pre-processing
34
35
Go to subfolder elimbias/preproces, follow steps in README there
36
37
The goal of these steps is to end up with a collection of images that are neural-network ready, and each have associated measurements (e.g. size and variance) that can be used in a structural causal model
38
39
The result is a data folder that contains the images separated in train / valid subfolders (test is optional but not default), with associated measurements in a labels.csv file
40
41
### Data simulation
42
43
This is where the statistical association between the images and the 'clinical' data are simulated, based on a structural causal model and the measurements of the images.
44
45
1. Define a structural causal model that will generate the data
46
47
   See experiments/sims/README.md for a short instruction to define a structural causal model
48
   See experiments/sims for an example csv file that defines a structural causal model
49
50
2. Define a setting in the settings directory with a setting.json file that together with the structural causal model defines the experiment (see the example)
51
52
3. After defining the SCM and setting, run simulate_data.py to create a dataset based on the SCM and sample images accordingly for the defined setting like so:
53
54
   ```python simulate_data.py --setting <mysetting>```
55
56
   run without the `--setting` argument to replicate the published results, using the default setting
57
58
   This will create a data folder in the setting/mysetting folder.
59
   Here are the images stored, coupled with the simulated ground truth data that will be used for training and validation. 
60
61
### Running the models
62
63
To replicate, run:
64
65
```python train.py```
66
67
To run on your own simulated data:
68
69
```python train.py --setting <mysetting>```
70
71
To evaluate the CNNs ability to predict the ground truth measurements, run with: 
72
73
```python train.py --setting <mysetting> --fase feature```
74
75
Result will be saved in the setting directory, with subfolders for each 'fase' (xybn: predict x, y and use bottleneck loss; feature: predict features)
76
77
[experiments/base_model/params.json](experiments/base_model/params.json) contains the hyperparameters that controls how train.py runs
78
79
### Evaluation
80
81
Run Tensorboard in this directory for visualization of the results