Diff of /README.md [000000] .. [53d15f]

Switch to unified view

a b/README.md
1
2
# Kaggle competition:
3
# [RSNA Intracranial Hemorrhage Detection](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/overview)
4
5
6
Team "Mind Blowers":
7
====================
8
9
-   [Yuval Reina](https://www.kaggle.com/yuval6967)
10
11
-   [Zahar Chikishev](https://www.kaggle.com/zaharch)
12
13
Private Leaderboard Score: 0.04732
14
15
Private Leaderboard Place: 12
16
17
General
18
=======
19
20
This archive holds the code and weights which were used to create and inference
21
the 12th place solution in “RSNA Intracranial Hemorrhage Detection” competition.
22
23
The solution consists of the following components, run consecutively
24
25
-   Prepare data and metadata
26
27
-   Training features generating neural networks
28
29
-   Training shallow neural networks based on the features and metadata
30
31
    -   By Yuval
32
33
    -   By Zahar
34
35
-   Ensembling
36
37
ARCHIVE CONTENTS
38
================
39
40
-   Serialized – folder containing files for serialized training and inferencing
41
    of base models and shallow pooled – res.
42
43
-   Production – folder, kept as reference, holds the original notebooks used to
44
    train the models and the submissions
45
46
-   Notebooks – folder, holds jupyter notebooks to prepare metadata, training
47
    and inferencing Zahar’s shallow networks, end ensembling the full solution. 
48
    It should be run in order appearing in this document below.
49
50
Setup
51
=====
52
53
## Yuval:
54
55
### HARDWARE: (The following specs were used to create the original solution)
56
57
CPU intel i9-9920, RAM 64G, GPU Tesla V100, GPU Titan RTX.
58
59
60
### SOFTWARE (python packages are detailed separately in requirements.txt):
61
62
OS: Ubuntu 18.04 TLS
63
64
CUDA – 10.1
65
66
## Zahar:
67
68
GCP virtual machine with n-8 cores and K-80 GPU
69
70
71
DATA SETUP
72
==========
73
74
1.  Download train and test data from Kaggle and update
75
    `./Serialized/defenitions.py` with the locations of train and test data
76
77
2.  If you want to use our trained models, download and inflate
78
    [models](https://drive.google.com/file/d/1TS2alfQ0AtURLPHXtDE9LhMHnLbfipIP/view?usp=sharing)
79
    (for models in Serialized) put everything in one models folder and update
80
    `./Serialize/defenitions.py`
81
82
Data Processing
83
===============
84
85
Prepare data + metadata
86
-----------------------
87
88
`notebooks/DICOM_metadata_to_CSV.ipynb` - traverses DICOM files and extracts
89
metadata into a dataframe. Produces three dataframes, one for the train images
90
and two for the stage 1&2 test images.
91
92
`notebooks/Metadata.ipynb` - gets the output of the previous notebook and
93
post-processes the collected metadata. Prepares metadata features for training,
94
will be used as an input to Zahar's shallow NNs. Specifically, outputs two
95
dataframes saved in `train_md.csv` and `test_md.csv` with the metadata features.
96
97
The last section of the notebook also prepares weights for the training images.
98
The weights are selected to simulate the distribution to that we encounter in
99
the test images.
100
101
`Production/Prepare.ipynb`is used to prepare the `train.csv` and `test.csv` for the
102
base mosels and yuval's Sallow NN
103
104
Training Base Models 
105
---------------------
106
107
`./Serialized/train_base_models.ipynb` is used to train the base models using, You
108
should change the 2nd cell, and enter part of the name of the GPU you use, and
109
the name of the model to train (look at defenitions.py for a list of names).
110
111
`# here you should set which model parameters you want to choose (see definitions.py) and what GPU to use
112
params=parameters['se_resnet101_5'] # se_resnet101_5, se_resnext101_32x4d_3, se_resnext101_32x4d_5
113
device=device_by_name("Tesla") # RTX , cpu`
114
115
Beware, running this notebook to completion for a single base network will take a day or two.
116
117
118
Training Full Head models 
119
--------------------------
120
121
### Yuval’s shallow model - (Pooled – Res shallow model)
122
123
`./Serialized/Post Full Head Models Train .ipynb` is used to train this shallow
124
networks. This notebook trains all the networks. You should change the 2nd to
125
reflect the GPU you use.
126
127
### Shallow NN by Zahar
128
129
`notebooks/Training.ipynb` - trains a shallow neural network based on the
130
generated features and the metadata. All of the models are fine-tuned after a
131
regular training step. The fine tuning is different in that it uses weighted
132
random sampling, with weights defined by `notebooks/Metadata.ipynb`.
133
134
Inferencing
135
-----------
136
137
### Yuval’s shallow model - (Pooled – Res shallow model):
138
139
`./Serialized/prepare_ensembling.ipynb` is used for inferencing this shallow model
140
and prepare the results for ensembling.
141
142
Ensembling
143
----------
144
145
`notebooks/Ensembling.ipynb` - ensembles the results from all shallow NNs into
146
final predictions and prepares the final submissions.
147
148
The two final submissions are obtained by running this notebook and the
149
difference is the following:
150
151
**Safe submission** ensembles regular Zahar and Yuval's models.
152
153
**Risky submission** ensembles weighted Zahar's models and regular Yuval's
154
models, while the ensembling uses by-sample weighted log-loss with the same
155
weights as defined before.