|
a |
|
b/README.md |
|
|
1 |
# RSNA Intracranial Hemorrhage Detection |
|
|
2 |
https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection |
|
|
3 |
|
|
|
4 |
A video of our solution can be found here: https://www.youtube.com/watch?v=1zLBxwTAcAs |
|
|
5 |
|
|
|
6 |
# Hardware used |
|
|
7 |
* 2x NVidia RTX 2080Ti GPUs |
|
|
8 |
* 128GB RAM |
|
|
9 |
* 16-core AMD CPU |
|
|
10 |
* Data stored on NVMe drives in RAID configuration |
|
|
11 |
* OS: Ubuntu 19.04 |
|
|
12 |
|
|
|
13 |
# Steps to reproduce results |
|
|
14 |
1. Modify the data paths at the top of `data_prep.py`, `datasets.py` & `model.py` |
|
|
15 |
|
|
|
16 |
2. Run `data_prep.py`. This takes around 12-15 hours for each set of images and will create: |
|
|
17 |
* `train_metadata.parquet.gzip` |
|
|
18 |
* `stage_1_test_metadata.parquet.gzip` |
|
|
19 |
* `stage_2_test_metadata.parquet.gzip` |
|
|
20 |
* `train_triplets.csv` |
|
|
21 |
* `stage_1_test_triplets.csv` |
|
|
22 |
* `stage_1_test_triplets.csv` |
|
|
23 |
* A folder called `png` for all 3 stages containing the preprocessed & cropped images |
|
|
24 |
|
|
|
25 |
3. Run `batch_run.sh`. This will train (using 5 fold CV) and make submission files (as well as |
|
|
26 |
out-of fold predictions) using: |
|
|
27 |
* EfficientNet-B0 (224x224 model for fast experimentation) |
|
|
28 |
* EfficientNet-B5 (456x456, 3-slice model) |
|
|
29 |
* EfficientNet-B3 (300x300, 3-window model) |
|
|
30 |
* ~~DenseNet-169~~ |
|
|
31 |
* ~~SE-ResNeXt101_32x4d~~ |
|
|
32 |
|
|
|
33 |
This will create a timestamped folder in the `OUTPUT_DIR` containing: |
|
|
34 |
* A submission file |
|
|
35 |
* Out-of-fold (OOF) predictions (for later stacking models) |
|
|
36 |
* Model checkpoints for each fold |
|
|
37 |
* QC plots (e.g. ROC curves, train/validation loss curves) |
|
|
38 |
|
|
|
39 |
4. To infer on different datasets: |
|
|
40 |
* In `config.yml` set `stage` to either `test1` or `test2` |
|
|
41 |
* For the `checkpoint` eneter the name of the timestamped folder containing the model checkpoints |
|
|
42 |
* Set epochs to 1 (this will skip the training part) |
|
|
43 |
* Re-run the models using `batch_run.sh`. A new output directory will be created with the |
|
|
44 |
predictions |
|
|
45 |
* If a completely new dataset is being used, the file paths in `ICHDataset` found in |
|
|
46 |
`datasets.py` will need to be modified |
|
|
47 |
|
|
|
48 |
# Model summary |
|
|
49 |
|
|
|
50 |
## Primary model (3-slice model) |
|
|
51 |
1. First the metadata is collected from the individual DICOM images. This allows the studies to be |
|
|
52 |
grouped by `PatientID` which is important for a stable cross validation due to overlapping patients |
|
|
53 |
|
|
|
54 |
2. Based on `StudyInstanceUID` and sorting on `ImagePositionPatient` it is possible to reconstruct |
|
|
55 |
3D volumes for each study. However since each study contained a variable number of axial slices |
|
|
56 |
(between 20-60) this makes it difficult to create a architecture that implements 3D convolutions. |
|
|
57 |
Instead, triplets of images were created from the 3D volumes to represent the RGB channels of an |
|
|
58 |
image, i.e. the green channel being the target image and the red & blue channels being the adjacent |
|
|
59 |
images. If an image was at the edge of the volume, then the green channel was repeated. This is |
|
|
60 |
essentially a 3D volume but only using 3 axial slices. At this stage no windowing was applied |
|
|
61 |
and the image is retained in Hounsfield units. |
|
|
62 |
|
|
|
63 |
3. The images then had the objects labeled using `scipy.ndimage.label` which looks for groups of |
|
|
64 |
connected pixels. The group with the second largest number of pixels was assumed to be the head |
|
|
65 |
(the first largest group is the background). This removes most of the dead space and the headrest |
|
|
66 |
of the CT scanner. A 10 pixel border was retained to keep some space for rotation augmentations |
|
|
67 |
|
|
|
68 |
4. The images were clipped between 0-255 Hounsfield units and saved as 8-bit PNG files to pass to |
|
|
69 |
the PyTorch dataset object. The reason for this was a) most of the interesting features are |
|
|
70 |
between 0-255 HU so we shouldn't be losing too much detail and b) this makes it easier to try |
|
|
71 |
different windows without recreating the images. I also found that processing DICOM images on the |
|
|
72 |
fly was too slow to keep 2 GPUs busy when small images/large batch sizes were used, (224x224, |
|
|
73 |
batch size=256) which is why I went down the PNG route. |
|
|
74 |
|
|
|
75 |
5. The images are then windowed once loaded into the dataset. A subdural window with |
|
|
76 |
`window_width, window_length = 200, 80` was used on all 3 channels. |
|
|
77 |
|
|
|
78 |
## Alternative model (3-window model) |
|
|
79 |
An alternative model using different CT windowing for each channel is also used. Here the windows |
|
|
80 |
are applied when the PNG images are made according to the channels in `prepare_png` found in |
|
|
81 |
`data_prep.py`. The images are cropped in the same way above. The windows used were : |
|
|
82 |
* Brain - `window_width, window_length = 80, 40` |
|
|
83 |
* Subdural - `window_width, window_length = 200, 80` |
|
|
84 |
* Bone - `window_width, window_length = 2000, 600` |
|
|
85 |
|
|
|
86 |
## Model details |
|
|
87 |
|
|
|
88 |
1. Image augmentations: |
|
|
89 |
1. `RandomHorizontalFlip(p=0.5)` |
|
|
90 |
2. `RandomRotation(degrees=15)` |
|
|
91 |
3. `RandomResizedCrop(scale=(0.85, 1.0), ratio=(0.8, 1.2))` |
|
|
92 |
|
|
|
93 |
2. The models were then trained as follows: |
|
|
94 |
* 5 folds defined by grouping on `PatientID`. See the section below on the CV scheme. |
|
|
95 |
* All the data used (no down/up sampling) |
|
|
96 |
* 10 epochs with early stopping (patience=3) |
|
|
97 |
* AdamW optimiser with default parameters |
|
|
98 |
* Learning rate decay using cosine annealing |
|
|
99 |
* Loss function: custom weighted multi-label log loss with `weights=[2, 1, 1, 1, 1, 1]` |
|
|
100 |
* Image size: 512x512. No pre-normalisation (i.e. ImageNet stats were not used) |
|
|
101 |
* Batch size: as large as possible depending on the architecture |
|
|
102 |
|
|
|
103 |
3. Postprocessing: |
|
|
104 |
* Test time augmentation (TTA): Identity, horizontal flip, rotate -10 degrees & +10 degrees |
|
|
105 |
* Take the mean of all 5 folds |
|
|
106 |
* A prediction smoothing script based on the relative positions of the axial slices |
|
|
107 |
(this script was used by all teammates) |
|
|
108 |
|
|
|
109 |
# Cross validation scheme |
|
|
110 |
The CV scheme was fixed using 5 pairs of CSV files agreed by the team in the format `train_n.csv` |
|
|
111 |
& `valid_n.csv`. The first version of these files were designed to prevent the same patient appearing |
|
|
112 |
in train & validation sets. A second version of these files were made removing patients that were present |
|
|
113 |
in the train & stage 1 test sets to prevent fitting to the overlapping patients on the stage 1 public LB. |
|
|
114 |
Some of these models are trained with the V1 scheme and others with the V2 scheme (most of the team used |
|
|
115 |
the latter) |
|
|
116 |
|
|
|
117 |
These CSV files are included in a file called `team_folds.zip` and should be in the same folder |
|
|
118 |
as the the rest of the input data. The scheme is selected using the `cv_scheme` value in the config file |