--- a +++ b/README.md @@ -0,0 +1,105 @@ +## Before cloning this repo + +Make sure you have git-lfs installed: + +``` +sudo apt install git-lfs +git lfs install +``` + +## Start here + +Directory tree: + +``` +. +├── data +│ ├── unzip +│ │ ├── stage_2_test_images +│ │ └── stage_2_train_images +│ ├── predictions +├── env +└── models +``` + +Set up conda env with: + +``` +conda env create -n ihd -f=env/tfgpu.yml +conda activate ihd +``` + +Then run `jupyter-notebook` from the repo's root dir: + +``` +jupyter notebook --no-browser --NotebookApp.iopub_msg_rate_limit=10000000000 +``` + +## Steps to reproduce submission: + +1. Start with NBs: + + * `0-preprocess-generate_csvs.ipynb` + * `1-preprocess-brain_norm.ipynb` + * `2-preprocess-pickle.ipynb` + +... to pregenerate dcm metadata + diagnosis pivot tables + various pickles. + +For convenience we've already included these in the git repository so, altenatively, you can skip to step 2. + +2. Train (level 1) L1 models: + +a. fastai v1 library: `3a-L1-train-and-generate-predictions-fastai_v1.ipynb` to train the following architectures: + + * `resnet18` + * `resnet50` + * `resnet34` + * `resnet101` + * `densenet121` + +For each architecture we need to train 5 models (each model for each of 5 different folds). + +All the variables must be set in cell #4, e.g. + +``` +model_fn = None +SZ = 512 +arch = 'resnet34' +fold=0 +n_folds=5 +n_epochs = 4 +lr = 1e-3 +n_tta = 10 + +#model_fn = 'resnet34_sz512_cv0.0821_weighted_loss_fold1_of_5' + +if model_fn is not None: + model_fn_fold = int(model_fn[-6])-1 + assert model_fn_fold == fold +``` + +b. fastai v2 library to train subdural-focused models: same instructions as a) but use file `3b-L1-train-and-generate-predictions-fastai_v2.ipynb` + + * `resnet18` + * `resnet34` + * `resnet101` + * `resnext50_32x4d` + * `densenet121` + +To train models from scratch and generate test and OOF predictions, you need to: + +- Set arch to each of the archs above and train the model for each fold (set `FOLD` variable from 0 to 4 to train each fold. You NEED to train all 5 folds). + +- Comment the second `model_fn` instance (this is used if you need to fine-tune an existing model) + +- Execute all code except for the final section which builds CSV to send to submit single-model predictions to Kaggle (which we do NOT want to do at this stage). + +The code for fastai v1 allocates batch size and finds LR dynamically, but in the fastai v2 version you need to specify your GPU memory in cell #4 as well. + +For convenience: since training models takes a long time, we are providing trained models and test and OOF predictions so, altenatively, you can skip to step 3. + +3. Train (level 2) L2 models and generate submission: With all the models (5 models per arch) trained and predictions in `./data/predictions`, run `4-L2-train-and-submit.ipynb` to generate the final predictions/submission. + +## Resources + +* Dataset visualizer: https://rsna.md.ai/annotator/project/G9qOnN0m/workspace