|
a |
|
b/documentation/results_format.md |
|
|
1 |
# Results folder format |
|
|
2 |
In this work, results include: trained models, training and validation logs, predicted masks, metrics on the test set, etc. These will all be written to a folder called `results` as defined in the variable `RESULTS_FOLDER` in the `config.py` (`./../config.py`) file. This folder will be next to the `data` folder, as explained in [dataset_format.md](LINK). |
|
|
3 |
|
|
|
4 |
## Results folder/filenaming convention |
|
|
5 |
|
|
|
6 |
### `logs` and `models` folders |
|
|
7 |
While a model is training (see `trainddp.md` for details), the following two folders will be created within `results` folder: `logs` and `models` and the directory structure may look like this: |
|
|
8 |
|
|
|
9 |
└───lymphoma.segmentation/ |
|
|
10 |
├── data |
|
|
11 |
└── results |
|
|
12 |
├── logs |
|
|
13 |
│ ├── fold0 |
|
|
14 |
│ │ └── unet |
|
|
15 |
│ │ └── unet_fold0_rancrop192 |
|
|
16 |
│ │ ├── trainlog_gpu0.csv |
|
|
17 |
│ │ ├── trainlog_gpu1.csv |
|
|
18 |
│ │ ├── validlog_gpu0.csv |
|
|
19 |
│ │ └── validlog_gpu1.csv |
|
|
20 |
│ └── fold1 |
|
|
21 |
│ └── unet |
|
|
22 |
│ └── unet_fold1_rancrop192 |
|
|
23 |
│ ├── trainlog_gpu0.csv |
|
|
24 |
│ ├── trainlog_gpu1.csv |
|
|
25 |
│ ├── validlog_gpu0.csv |
|
|
26 |
│ └── validlog_gpu1.csv |
|
|
27 |
├── models |
|
|
28 |
│ ├── fold0 |
|
|
29 |
│ │ └── unet |
|
|
30 |
│ │ └── unet_fold0_rancrop192 |
|
|
31 |
│ │ ├── model_ep=0002.csv |
|
|
32 |
│ │ ├── model_ep=0004.csv |
|
|
33 |
│ │ ├── model_ep=0006.csv |
|
|
34 |
│ │ ├── model_ep=0008.csv |
|
|
35 |
│ │ ├── ... |
|
|
36 |
│ └── fold1 |
|
|
37 |
│ └── unet |
|
|
38 |
│ └── unet_fold1_rancrop192 |
|
|
39 |
│ ├── model_ep=0002.csv |
|
|
40 |
│ ├── model_ep=0004.csv |
|
|
41 |
│ ├── model_ep=0006.csv |
|
|
42 |
│ ├── model_ep=0008.csv |
|
|
43 |
│ ├── ... |
|
|
44 |
├── ... |
|
|
45 |
|
|
|
46 |
|
|
|
47 |
This directory stucture shows that so far, the model `unet` has been (or is being) trained on two folds: `fold0` and `fold1`. Within the `logs` or `models` folder, the directory structure is `{logs_or_models}/fold{fold}/{network_name}/{experiment_code}`, where the `experiment_code` is defined as `{network_name}_fold{fold}_randcrop{input_patch_size}`. The above directory structure shows that for both folds `fold0` and `fold1`, the `experiment_code` is `{unet}_fold{0 or 1}_randcrop{192}`, meaning we trained/are training `unet` for fold 0 or 1 with an `input_patch_size = 192`. If you train other networks (like `segresnet`, `dynunet`, or `swinunetr` as was the case in this work), they will appear accordingly within the framework of the above directory structure. |
|
|
48 |
|
|
|
49 |
Since the training in this work was carried out using the PyTorch's `torch.nn.parallel.DistributedDataParallel`, the `trainlog_gpu0.csv`, `trainlog_gpu1.csv`, `validlog_gpu0.csv`, `validlog_gpu1.csv` store the training and validation logs on accumulated on GPU with deviceids 0 and 1. All the `validlog_gpu[i].csv` are identical and hence redundant so you can use any one of them analysis (we will resolve this to save only one file, in the later versions). All the `trainlog_gpu[i].csv` are NOT identical, hence each file separately stores the loss accumulated using the distributed data on two GPUs. In our work, we used 4 GPUs, but the above directory structure only shows training on 2 GPUs for the purpose of illustration. The typical `trainlog_gpu[i].csv` file looks like this: |
|
|
50 |
|
|
|
51 |
``` |
|
|
52 |
Loss |
|
|
53 |
0.6536665889951918 |
|
|
54 |
0.6449973914358351 |
|
|
55 |
0.6385666595564948 |
|
|
56 |
0.6357755064964294 |
|
|
57 |
... |
|
|
58 |
``` |
|
|
59 |
|
|
|
60 |
where each line shows the mean `DiceLoss` on the training inputs (averaged over all batches) at epoch `j+1` with `j` in the range `np.arange(0, epochs)`; `epochs` is the total number of epochs for which we are running the training. Similarly, a typical `validlog_gpu[i].csv` file looks like this: |
|
|
61 |
|
|
|
62 |
``` |
|
|
63 |
Metric |
|
|
64 |
0.0011193332029506564 |
|
|
65 |
0.001015653251670301 |
|
|
66 |
... |
|
|
67 |
``` |
|
|
68 |
where each line shows the mean `DiceMetric` on the validation inputs at epoch `j` with `j` in the range `np.arange(2, epochs+1, val_interval)`, `epochs` is the total number of epochs for which we are running the training and `val_interval` (default=2) is the epoch interval at which we are running validation, computing Dice metric and saving the trained model. The variables `val_internal`, `epochs`, etc. can be set in `train.sh` script which is used for running the training. |
|
|
69 |
|
|
|
70 |
The saved models are saved in the similar way under the correspding /fold/network/experiment_code folder with filenames `model_ep=0002.pth`, `model_ep=0004.pth`, etc. In this case, `val_interval = 2` (for example), so the models are saved at interval of 2 starting from the second epoch. |
|
|
71 |
|
|
|
72 |
|
|
|
73 |
### `predictions` and `test_metrics` folders |
|
|
74 |
After the trained models are used for predicting the segmentation masks on test images (see `inference.md` for details), based on the `fold`, `network_name` and `experiment_code`, the predicted masks will be written to `LYMPHOMA_SEGMENTATION_FOLDER/results/predictions/fold{fold}/{network_name}/{experiment_code}`. Once the predicted masks have been generated and saved, the metrics computed on the test set using the test ground truth and predicted masks will be written to `LYMPHOMA_SEGMENTATION_FOLDER/results/test_metrics/fold{fold}/{network_name}/{experiment_code}/testmetrics.csv`. We compute three segmentation metrics: `Dice similarity coefficient (DSC)`, `false positive volume (FPV) in ml`, `false negative volume (FNV) in ml`. We also compute detection metrics such as `true positive (TP)`, `false positive (FP)`, and `false negative (FN)` lesion detections via three different criterion labeled as `Criterion1`, `Criterion2`, and `Criterion3`. These metrics have been defined in [metrics/metrics.py](./../metrics/metrics.py). After running inference and calculating the test metrics, the (relevant) directory structure may look like: |
|
|
75 |
|
|
|
76 |
└───lymphoma.segmentation/ |
|
|
77 |
├── data |
|
|
78 |
└── results |
|
|
79 |
├── logs |
|
|
80 |
├── models |
|
|
81 |
├── predictions |
|
|
82 |
│ ├── fold0 |
|
|
83 |
│ │ └── unet |
|
|
84 |
│ │ └── unet_fold0_randcrop192 |
|
|
85 |
│ │ ├── Patient0003_20190402.nii.gz |
|
|
86 |
│ │ ├── Patient0004_20160204.nii.gz |
|
|
87 |
│ │ ├── ... |
|
|
88 |
│ └── fold1 |
|
|
89 |
│ └── unet |
|
|
90 |
│ └── unet_fold1_randcrop192 |
|
|
91 |
│ ├── Patient0003_20190402.nii.gz |
|
|
92 |
│ ├── Patient0004_20160204.nii.gz |
|
|
93 |
│ ├── ... |
|
|
94 |
│ |
|
|
95 |
└── test_metrics |
|
|
96 |
├── fold0 |
|
|
97 |
│ └── unet |
|
|
98 |
│ └── unet_fold0_randcrop192 |
|
|
99 |
│ └── testmetrics.csv |
|
|
100 |
└── fold1 |
|
|
101 |
└── unet |
|
|
102 |
└── unet_fold1_randcrop192 |
|
|
103 |
└── testmetrics.csv |
|
|
104 |
|
|
|
105 |
The predicted masks are in the same geometry (same size, spacing, origin, direction) as their corresponding ground truth masks. A typical `testmetrics.csv` file looks like: |
|
|
106 |
|
|
|
107 |
| PatientID | DSC | FPV | FNV | TP_C1 | FP_C1 | FN_C1 | TP_C2 | FP_C2 | FN_C2 | TP_C3 | FP_C3 | FN_C3 | |
|
|
108 |
|-----------|-----|-----|-----|-------|-------|-------|-------|-------|-------|-------|-------|-------| |
|
|
109 |
| Patient0003_20190402 | 0.7221043699618158 | 17.5164623503173 | 1.173559512304143 | 3 | 6 | 2 | 2 | 7 | 3 | 3 | 6 | 2 | |
|
|
110 |
| Patient0004_20160204 | 0.0807955251709131 | 53.4186903933997 | 5.563541391664086 | 2 | 8 | 1 | 0 | 10 | 3 | 2 | 8 | 1 | |
|
|
111 |
|
|
|
112 |
Here, all the metrics are at the patient level and FPV and FNV are expressed in ml. |
|
|
113 |
|
|
|
114 |
### `test_lesion_measures` folder |
|
|
115 |
In this work, we have performed further analyses on the predicted segmentation masks on the test set and compared them to the ground truth masks. These include comparing the patient-level lesion SUV<sub>mean</sub>, lesion SUV<sub>max</sub>, number of lesions, total metabolic tumor volume (TMTV) in ml, total lesion glycolysis (TLG) in ml, lesion dissemination (D<sub>max</sub>) in cm. These metrics have been defined in [metrics/metrics.py](./../metrics/metrics.py). The test set predicted lesion measures are written to `LYMPHOMA_SEGMENTATION_FOLDER/results/test_lesion_measures/fold{fold}/{network_name}/{experiment_code}/testlesionmeasures.csv`. After generating `testlesionmeasures.csv` files, the relevant directory structure may look like: |
|
|
116 |
|
|
|
117 |
└───lymphoma.segmentation/ |
|
|
118 |
├── data |
|
|
119 |
└── results |
|
|
120 |
├── logs |
|
|
121 |
├── models |
|
|
122 |
├── predictions |
|
|
123 |
├── test_metrics |
|
|
124 |
└── test_lesion_measures |
|
|
125 |
├── fold0 |
|
|
126 |
│ └── unet |
|
|
127 |
│ └── unet_fold0_randcrop192 |
|
|
128 |
│ └── testlesionmeasures.csv |
|
|
129 |
└── fold1 |
|
|
130 |
└── unet |
|
|
131 |
└── unet_fold1_randcrop192 |
|
|
132 |
└── testlesionmeasures.csv |
|
|
133 |
|
|
|
134 |
A typical `testlesionmeasures.csv` file looks like: |
|
|
135 |
|
|
|
136 |
| PatientID | DSC | SUVmean_orig | SUVmean_pred | SUVmax_orig | SUVmax_pred | LesionCount_orig | LesionCount_pred | TMTV_orig | TMTV_pred | TLG_orig | TLG_pred | Dmax_orig | Dmax_pred | |
|
|
137 |
|-----------|-----|--------------|--------------|-------------|-------------|------------------|------------------|-----------|-----------|----------|----------|----------|-----------| |
|
|
138 |
| Patient0003_20190402 | 0.7221043699618158 | 2.935304139385291 | 4.362726242681123 | 6.1822732035904515 | 7.827266273892102 | 3 | 4 | 13.691527643548337 | 18.6272625128359097 | 40.18879776661558 | 50.2728492927217289 | 15.837606584884108 | 25.82763813918739 | |
|
|
139 |
| Patient0004_20160204 | 0.0807955251709131 | 8.72882540822585 | 12.71524350987 | 40.294842200490244 | 45.9483628492382 | 9 | 6 | 20.732884717373196 | 16.756373846353748 | 180.9737309068245 | 120.2387139879348 | 14.737477375372881 | 7.652628627281008 | |
|
|
140 |
|
|
|
141 |
Here, all the lesion measures are at the patient level. TMTV and TLG are expressed in ml and D<sub>max</sub> in cm. |