|
a |
|
b/README.md |
|
|
1 |
## Before cloning this repo |
|
|
2 |
|
|
|
3 |
Make sure you have git-lfs installed: |
|
|
4 |
|
|
|
5 |
``` |
|
|
6 |
sudo apt install git-lfs |
|
|
7 |
git lfs install |
|
|
8 |
``` |
|
|
9 |
|
|
|
10 |
## Start here |
|
|
11 |
|
|
|
12 |
Directory tree: |
|
|
13 |
|
|
|
14 |
``` |
|
|
15 |
. |
|
|
16 |
├── data |
|
|
17 |
│ ├── unzip |
|
|
18 |
│ │ ├── stage_2_test_images |
|
|
19 |
│ │ └── stage_2_train_images |
|
|
20 |
│ ├── predictions |
|
|
21 |
├── env |
|
|
22 |
└── models |
|
|
23 |
``` |
|
|
24 |
|
|
|
25 |
Set up conda env with: |
|
|
26 |
|
|
|
27 |
``` |
|
|
28 |
conda env create -n ihd -f=env/tfgpu.yml |
|
|
29 |
conda activate ihd |
|
|
30 |
``` |
|
|
31 |
|
|
|
32 |
Then run `jupyter-notebook` from the repo's root dir: |
|
|
33 |
|
|
|
34 |
``` |
|
|
35 |
jupyter notebook --no-browser --NotebookApp.iopub_msg_rate_limit=10000000000 |
|
|
36 |
``` |
|
|
37 |
|
|
|
38 |
## Steps to reproduce submission: |
|
|
39 |
|
|
|
40 |
1. Start with NBs: |
|
|
41 |
|
|
|
42 |
* `0-preprocess-generate_csvs.ipynb` |
|
|
43 |
* `1-preprocess-brain_norm.ipynb` |
|
|
44 |
* `2-preprocess-pickle.ipynb` |
|
|
45 |
|
|
|
46 |
... to pregenerate dcm metadata + diagnosis pivot tables + various pickles. |
|
|
47 |
|
|
|
48 |
For convenience we've already included these in the git repository so, altenatively, you can skip to step 2. |
|
|
49 |
|
|
|
50 |
2. Train (level 1) L1 models: |
|
|
51 |
|
|
|
52 |
a. fastai v1 library: `3a-L1-train-and-generate-predictions-fastai_v1.ipynb` to train the following architectures: |
|
|
53 |
|
|
|
54 |
* `resnet18` |
|
|
55 |
* `resnet50` |
|
|
56 |
* `resnet34` |
|
|
57 |
* `resnet101` |
|
|
58 |
* `densenet121` |
|
|
59 |
|
|
|
60 |
For each architecture we need to train 5 models (each model for each of 5 different folds). |
|
|
61 |
|
|
|
62 |
All the variables must be set in cell #4, e.g. |
|
|
63 |
|
|
|
64 |
``` |
|
|
65 |
model_fn = None |
|
|
66 |
SZ = 512 |
|
|
67 |
arch = 'resnet34' |
|
|
68 |
fold=0 |
|
|
69 |
n_folds=5 |
|
|
70 |
n_epochs = 4 |
|
|
71 |
lr = 1e-3 |
|
|
72 |
n_tta = 10 |
|
|
73 |
|
|
|
74 |
#model_fn = 'resnet34_sz512_cv0.0821_weighted_loss_fold1_of_5' |
|
|
75 |
|
|
|
76 |
if model_fn is not None: |
|
|
77 |
model_fn_fold = int(model_fn[-6])-1 |
|
|
78 |
assert model_fn_fold == fold |
|
|
79 |
``` |
|
|
80 |
|
|
|
81 |
b. fastai v2 library to train subdural-focused models: same instructions as a) but use file `3b-L1-train-and-generate-predictions-fastai_v2.ipynb` |
|
|
82 |
|
|
|
83 |
* `resnet18` |
|
|
84 |
* `resnet34` |
|
|
85 |
* `resnet101` |
|
|
86 |
* `resnext50_32x4d` |
|
|
87 |
* `densenet121` |
|
|
88 |
|
|
|
89 |
To train models from scratch and generate test and OOF predictions, you need to: |
|
|
90 |
|
|
|
91 |
- Set arch to each of the archs above and train the model for each fold (set `FOLD` variable from 0 to 4 to train each fold. You NEED to train all 5 folds). |
|
|
92 |
|
|
|
93 |
- Comment the second `model_fn` instance (this is used if you need to fine-tune an existing model) |
|
|
94 |
|
|
|
95 |
- Execute all code except for the final section which builds CSV to send to submit single-model predictions to Kaggle (which we do NOT want to do at this stage). |
|
|
96 |
|
|
|
97 |
The code for fastai v1 allocates batch size and finds LR dynamically, but in the fastai v2 version you need to specify your GPU memory in cell #4 as well. |
|
|
98 |
|
|
|
99 |
For convenience: since training models takes a long time, we are providing trained models and test and OOF predictions so, altenatively, you can skip to step 3. |
|
|
100 |
|
|
|
101 |
3. Train (level 2) L2 models and generate submission: With all the models (5 models per arch) trained and predictions in `./data/predictions`, run `4-L2-train-and-submit.ipynb` to generate the final predictions/submission. |
|
|
102 |
|
|
|
103 |
## Resources |
|
|
104 |
|
|
|
105 |
* Dataset visualizer: https://rsna.md.ai/annotator/project/G9qOnN0m/workspace |