Switch to unified view

a b/documentation/results_format.md
1
# Results folder format
2
In this work, results include: trained models, training and validation logs, predicted masks, metrics on the test set, etc. These will all be written to a folder called `results` as defined in the variable `RESULTS_FOLDER` in the `config.py` (`./../config.py`) file. This folder will be next to the `data` folder, as explained in [dataset_format.md](LINK).
3
4
## Results folder/filenaming convention
5
6
### `logs` and `models` folders
7
While a model is training (see `trainddp.md` for details), the following two folders will be created within `results` folder: `logs` and `models` and the directory structure may look like this:
8
9
    └───lymphoma.segmentation/
10
        ├── data
11
        └── results
12
            ├── logs
13
            │    ├── fold0
14
            │    │   └── unet
15
            │    │       └── unet_fold0_rancrop192
16
            │    │           ├── trainlog_gpu0.csv
17
            │    │           ├── trainlog_gpu1.csv
18
            │    │           ├── validlog_gpu0.csv
19
            │    │           └── validlog_gpu1.csv
20
            │    └── fold1
21
            │        └── unet
22
            │            └── unet_fold1_rancrop192
23
            │                ├── trainlog_gpu0.csv
24
            │                ├── trainlog_gpu1.csv
25
            │                ├── validlog_gpu0.csv
26
            │                └── validlog_gpu1.csv
27
            ├── models
28
            │    ├── fold0
29
            │    │   └── unet
30
            │    │       └── unet_fold0_rancrop192
31
            │    │           ├── model_ep=0002.csv
32
            │    │           ├── model_ep=0004.csv
33
            │    │           ├── model_ep=0006.csv
34
            │    │           ├── model_ep=0008.csv
35
            │    │           ├── ...
36
            │    └── fold1
37
            │        └── unet
38
            │            └── unet_fold1_rancrop192
39
            │                ├── model_ep=0002.csv
40
            │                ├── model_ep=0004.csv
41
            │                ├── model_ep=0006.csv
42
            │                ├── model_ep=0008.csv
43
            │                ├── ...
44
            ├── ...  
45
46
47
This directory stucture shows that so far, the model `unet` has been (or is being) trained on two folds: `fold0` and `fold1`. Within the `logs` or `models` folder, the directory structure is `{logs_or_models}/fold{fold}/{network_name}/{experiment_code}`, where the `experiment_code` is defined as `{network_name}_fold{fold}_randcrop{input_patch_size}`. The above directory structure shows that for both folds `fold0` and `fold1`, the `experiment_code` is `{unet}_fold{0 or 1}_randcrop{192}`, meaning we trained/are training `unet` for fold 0 or 1 with an `input_patch_size = 192`. If you train other networks (like `segresnet`, `dynunet`, or `swinunetr` as was the case in this work), they will appear accordingly within the framework of the above directory structure.
48
49
Since the training in this work was carried out using the PyTorch's `torch.nn.parallel.DistributedDataParallel`, the `trainlog_gpu0.csv`, `trainlog_gpu1.csv`, `validlog_gpu0.csv`, `validlog_gpu1.csv` store the training and validation logs on accumulated on GPU with deviceids 0 and 1. All the `validlog_gpu[i].csv` are identical and hence redundant so you can use any one of them analysis (we will resolve this to save only one file, in the later versions). All the `trainlog_gpu[i].csv` are NOT identical, hence each file separately stores the loss accumulated using the distributed data on two GPUs. In our work, we used 4 GPUs, but the above directory structure only shows training on 2 GPUs for the purpose of illustration. The typical `trainlog_gpu[i].csv` file looks like this:
50
51
```
52
Loss
53
0.6536665889951918
54
0.6449973914358351
55
0.6385666595564948
56
0.6357755064964294
57
...
58
```
59
60
where each line shows the mean `DiceLoss` on the training inputs (averaged over all batches) at epoch `j+1` with `j` in the range `np.arange(0, epochs)`; `epochs` is the total number of epochs for which we are running the training. Similarly, a typical `validlog_gpu[i].csv` file looks like this:
61
62
```
63
Metric
64
0.0011193332029506564
65
0.001015653251670301
66
...
67
```
68
where each line shows the mean `DiceMetric` on the validation inputs at epoch `j` with `j` in the range `np.arange(2, epochs+1, val_interval)`, `epochs` is the total number of epochs for which we are running the training and `val_interval` (default=2) is the epoch interval at which we are running validation, computing Dice metric and saving the trained model. The variables `val_internal`, `epochs`, etc. can be set in `train.sh` script which is used for running the training.  
69
70
The saved models are saved in the similar way under the correspding /fold/network/experiment_code folder with filenames `model_ep=0002.pth`, `model_ep=0004.pth`, etc. In this case, `val_interval = 2` (for example), so the models are saved at interval of 2 starting from the second epoch.
71
72
73
### `predictions` and `test_metrics` folders
74
After the trained models are used for predicting the segmentation masks on test images (see `inference.md` for details), based on the `fold`, `network_name` and `experiment_code`, the predicted masks will be written to `LYMPHOMA_SEGMENTATION_FOLDER/results/predictions/fold{fold}/{network_name}/{experiment_code}`. Once the predicted masks have been generated and saved, the metrics computed on the test set using the test ground truth and predicted masks will be written to `LYMPHOMA_SEGMENTATION_FOLDER/results/test_metrics/fold{fold}/{network_name}/{experiment_code}/testmetrics.csv`. We compute three segmentation metrics: `Dice similarity coefficient (DSC)`, `false positive volume (FPV) in ml`, `false negative volume (FNV) in ml`. We also compute detection metrics such as `true positive (TP)`, `false positive (FP)`, and `false negative (FN)` lesion detections via three different criterion labeled as `Criterion1`, `Criterion2`, and `Criterion3`. These metrics have been defined in [metrics/metrics.py](./../metrics/metrics.py). After running inference and calculating the test metrics, the (relevant) directory structure may look like:
75
76
    └───lymphoma.segmentation/
77
            ├── data
78
            └── results
79
                ├── logs
80
                ├── models
81
                ├── predictions
82
                │   ├── fold0
83
                │   │   └── unet
84
                │   │       └── unet_fold0_randcrop192
85
                │   │           ├── Patient0003_20190402.nii.gz
86
                │   │           ├── Patient0004_20160204.nii.gz 
87
                │   │           ├── ...
88
                │   └── fold1
89
                │       └── unet
90
                │           └── unet_fold1_randcrop192
91
                │               ├── Patient0003_20190402.nii.gz
92
                │               ├── Patient0004_20160204.nii.gz 
93
                │               ├── ...
94
95
                └── test_metrics
96
                    ├── fold0
97
                    │   └── unet
98
                    │       └── unet_fold0_randcrop192
99
                    │           └── testmetrics.csv   
100
                    └── fold1
101
                        └── unet
102
                            └── unet_fold1_randcrop192
103
                                └── testmetrics.csv
104
105
The predicted masks are in the same geometry (same size, spacing, origin, direction) as their corresponding ground truth masks. A typical `testmetrics.csv` file looks like:
106
107
| PatientID | DSC | FPV | FNV | TP_C1 | FP_C1 | FN_C1 | TP_C2 | FP_C2 | FN_C2 | TP_C3 | FP_C3 | FN_C3 |
108
|-----------|-----|-----|-----|-------|-------|-------|-------|-------|-------|-------|-------|-------|
109
| Patient0003_20190402 | 0.7221043699618158 | 17.5164623503173 | 1.173559512304143 | 3 | 6 | 2 | 2 | 7  | 3 | 3 | 6 | 2 | 
110
| Patient0004_20160204 | 0.0807955251709131 | 53.4186903933997 | 5.563541391664086 | 2 | 8 | 1 | 0 | 10 | 3 | 2 | 8 | 1 |
111
112
Here, all the metrics are at the patient level and FPV and FNV are expressed in ml.
113
114
### `test_lesion_measures` folder
115
In this work, we have performed further analyses on the predicted segmentation masks on the test set and compared them to the ground truth masks. These include comparing the patient-level lesion SUV<sub>mean</sub>, lesion SUV<sub>max</sub>, number of lesions, total metabolic tumor volume (TMTV) in ml, total lesion glycolysis (TLG) in ml, lesion dissemination (D<sub>max</sub>) in cm. These metrics have been defined in [metrics/metrics.py](./../metrics/metrics.py). The test set predicted lesion measures are written to `LYMPHOMA_SEGMENTATION_FOLDER/results/test_lesion_measures/fold{fold}/{network_name}/{experiment_code}/testlesionmeasures.csv`. After generating `testlesionmeasures.csv` files, the relevant directory structure may look like:
116
117
    └───lymphoma.segmentation/
118
            ├── data
119
            └── results
120
                ├── logs
121
                ├── models
122
                ├── predictions
123
                ├── test_metrics
124
                └── test_lesion_measures
125
                    ├── fold0
126
                    │   └── unet
127
                    │       └── unet_fold0_randcrop192
128
                    │           └── testlesionmeasures.csv   
129
                    └── fold1
130
                        └── unet
131
                            └── unet_fold1_randcrop192
132
                                └── testlesionmeasures.csv
133
134
A typical `testlesionmeasures.csv` file looks like:
135
136
| PatientID | DSC | SUVmean_orig | SUVmean_pred | SUVmax_orig | SUVmax_pred | LesionCount_orig | LesionCount_pred | TMTV_orig | TMTV_pred | TLG_orig | TLG_pred | Dmax_orig | Dmax_pred |
137
|-----------|-----|--------------|--------------|-------------|-------------|------------------|------------------|-----------|-----------|----------|----------|----------|-----------|
138
| Patient0003_20190402 | 0.7221043699618158  | 2.935304139385291 | 4.362726242681123 | 6.1822732035904515 | 7.827266273892102 | 3 | 4 | 13.691527643548337 | 18.6272625128359097 | 40.18879776661558 | 50.2728492927217289 | 15.837606584884108 | 25.82763813918739 | 
139
| Patient0004_20160204 | 0.0807955251709131  | 8.72882540822585 | 12.71524350987 | 40.294842200490244 | 45.9483628492382 | 9 | 6 | 20.732884717373196 | 16.756373846353748 | 180.9737309068245 | 120.2387139879348 | 14.737477375372881 | 7.652628627281008 |
140
141
Here, all the lesion measures are at the patient level. TMTV and TLG are expressed in ml and D<sub>max</sub> in cm.