|
a/README.md |
|
b/README.md |
1 |
## Intracranial Hemorrhage Detection |
1 |
## Intracranial Hemorrhage Detection |
2 |
|
2 |
|
3 |
This blog post is about the challenge that is hosted on kaggle on [RSNA Intracranial Hemorrhage Detection](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection). |
3 |
This blog post is about the challenge that is hosted on kaggle on [RSNA Intracranial Hemorrhage Detection](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection). |
4 |
|
4 |
|
5 |
This post is divided into following parts |
5 |
This post is divided into following parts |
6 |
|
6 |
|
7 |
1. Overview |
7 |
1. Overview
|
8 |
2. Basic EDA [Ipython Notebook](https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-1) |
8 |
2. Basic EDA [Ipython Notebook](https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-1)
|
9 |
3. Data Visualization & Preprocessing |
9 |
3. Data Visualization & Preprocessing
|
10 |
4. Deep Learning Model |
10 |
4. Deep Learning Model
|
11 |
5. Demo |
11 |
5. Demo |
12 |
|
12 |
|
13 |
### 1. Overview |
13 |
### 1. Overview |
14 |
|
14 |
|
15 |
##### What is Intracranial Hemorrhage? |
15 |
##### What is Intracranial Hemorrhage? |
16 |
|
16 |
|
17 |
An intracranial hemorrhage is a type of bleeding that occurs inside the skull. Symptoms include sudden tingling, weakness, numbness, paralysis, severe headache, difficulty with swallowing or vision, loss of balance or coordination, difficulty understanding, speaking , reading, or writing, and a change in level of consciousness or alertness, marked by stupor, lethargy, sleepiness, or coma. Any type of bleeding inside the skull or brain is a medical emergency. It is important to get the person to a hospital emergency room immediately to determine the cause of the bleeding and begin medical treatment. It rquires highly trained specialists review medical images of the patient’s cranium to look for the presence, location and type of hemorrhage. The process is complicated and often time consuming. So as part of this we will be deep learning techniques to detect acute intracranial hemorrhage and its subtypes. |
17 |
An intracranial hemorrhage is a type of bleeding that occurs inside the skull. Symptoms include sudden tingling, weakness, numbness, paralysis, severe headache, difficulty with swallowing or vision, loss of balance or coordination, difficulty understanding, speaking , reading, or writing, and a change in level of consciousness or alertness, marked by stupor, lethargy, sleepiness, or coma. Any type of bleeding inside the skull or brain is a medical emergency. It is important to get the person to a hospital emergency room immediately to determine the cause of the bleeding and begin medical treatment. It rquires highly trained specialists review medical images of the patient’s cranium to look for the presence, location and type of hemorrhage. The process is complicated and often time consuming. So as part of this we will be deep learning techniques to detect acute intracranial hemorrhage and its subtypes. |
18 |
|
18 |
|
19 |
Hemorrhage Types |
19 |
Hemorrhage Types |
20 |
|
20 |
|
21 |
1. Epidural |
21 |
1. Epidural
|
22 |
2. Intraparenchymal |
22 |
2. Intraparenchymal
|
23 |
3. Intraventricular |
23 |
3. Intraventricular
|
24 |
4. Subarachnoid |
24 |
4. Subarachnoid
|
25 |
5. Subdural |
25 |
5. Subdural
|
26 |
6. Any |
26 |
6. Any |
27 |
|
27 |
|
28 |
##### What am i predicting? |
28 |
##### What am i predicting? |
29 |
|
29 |
|
30 |
In this competition our goal is to predict intracranial hemorrhage and its subtypes. Given an image the we need to predict probablity of each subtype. This indicates its a multilabel classification problem. |
30 |
In this competition our goal is to predict intracranial hemorrhage and its subtypes. Given an image the we need to predict probablity of each subtype. This indicates its a multilabel classification problem. |
31 |
|
31 |
|
32 |
##### Evaluation Metric |
32 |
##### Evaluation Metric |
33 |
|
33 |
|
34 |
Competition evaluation metric is **weighted log loss** but weights for each subtype is not disclosed as part of the competition but in the discussion forms some of the teams found it out that the any label has a weight of 2 compared to other subtypes, you can check more details [here](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109526#latest-630190). But as part of this tutorial i'm going to use normal accuracy as evaluation metric and loss as **binary cross entropy loss** and checkpointing the models based on the loss. |
34 |
Competition evaluation metric is **weighted log loss** but weights for each subtype is not disclosed as part of the competition but in the discussion forms some of the teams found it out that the any label has a weight of 2 compared to other subtypes, you can check more details [here](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109526#latest-630190). But as part of this tutorial i'm going to use normal accuracy as evaluation metric and loss as **binary cross entropy loss** and checkpointing the models based on the loss. |
35 |
|
35 |
|
36 |
|
36 |
|
37 |
### 2. Basic EDA |
37 |
### 2. Basic EDA |
38 |
|
38 |
|
39 |
Lets look at the [data](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/data) that is provided. |
39 |
Lets look at the [data](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/data) that is provided. |
40 |
|
40 |
|
41 |
We have a train.csv containing file names and label indicating whether hemorrhage is present or not and train images folder which is set of [Dicom](https://www.dicomstandard.org/) files (Medical images are stored in dicom formats) and test images folder containing test dicom files. |
41 |
We have a train.csv containing file names and label indicating whether hemorrhage is present or not and train images folder which is set of [Dicom](https://www.dicomstandard.org/) files (Medical images are stored in dicom formats) and test images folder containing test dicom files. |
42 |
|
42 |
|
43 |
```python |
43 |
```python
|
44 |
# load the csv file |
44 |
# load the csv file
|
45 |
train_df = pd.read_csv(input_folder + 'stage_1_train.csv') |
45 |
train_df = pd.read_csv(input_folder + 'stage_1_train.csv')
|
46 |
train_df.head() |
46 |
train_df.head()
|
47 |
``` |
47 |
```
|
48 |
<img src='assets/df.png'/> |
|
|
49 |
|
48 |
|
50 |
It consists of two columns ID and Label. ID has a format FILE_ID_SUB_TYPE for example ID_63eb1e259_epidural so ID_63eb1e259 is file id and epidural is subtype and Label indicating whether subtype hemorrhage is present or not. |
49 |
It consists of two columns ID and Label. ID has a format FILE_ID_SUB_TYPE for example ID_63eb1e259_epidural so ID_63eb1e259 is file id and epidural is subtype and Label indicating whether subtype hemorrhage is present or not. |
51 |
|
50 |
|
52 |
Lets seperate file names and subtypes |
51 |
Lets seperate file names and subtypes |
53 |
|
52 |
|
54 |
```python |
53 |
```python
|
55 |
# extract subtype |
54 |
# extract subtype
|
56 |
train_df['sub_type'] = train_df['ID'].apply(lambda x: x.split('_')[-1]) |
55 |
train_df['sub_type'] = train_df['ID'].apply(lambda x: x.split('_')[-1])
|
57 |
# extract filename |
56 |
# extract filename
|
58 |
train_df['file_name'] = train_df['ID'].apply(lambda x: '_'.join(x.split('_')[:2]) + '.dcm') |
57 |
train_df['file_name'] = train_df['ID'].apply(lambda x: '_'.join(x.split('_')[:2]) + '.dcm')
|
59 |
train_df.head() |
58 |
train_df.head()
|
60 |
``` |
59 |
``` |
61 |
<img src='assets/df2.png'/> |
60 |
|
62 |
|
|
|
63 |
|
|
|
64 |
```python |
61 |
```python
|
65 |
train_df.shape |
62 |
train_df.shape
|
66 |
```` |
63 |
````
|
67 |
Output : (4045572, 4) |
64 |
Output : (4045572, 4) |
68 |
|
65 |
|
69 |
```python |
66 |
```python
|
70 |
print("Number of train images availabe:", len(os.listdir(path_train_img))) |
67 |
print("Number of train images availabe:", len(os.listdir(path_train_img)))
|
71 |
``` |
68 |
```
|
72 |
Output : Number of train images availabe: 674258 |
69 |
Output : Number of train images availabe: 674258 |
73 |
|
70 |
|
74 |
The csv file has a shape of (4045572, 4). For every file(dicom file) present in the train folder has 6 entries in csv indicating possible 6 subtype hemorrhages. |
71 |
The csv file has a shape of (4045572, 4). For every file(dicom file) present in the train folder has 6 entries in csv indicating possible 6 subtype hemorrhages. |
75 |
|
72 |
|
76 |
Lets check the files available for each subtype |
73 |
Lets check the files available for each subtype |
77 |
|
74 |
|
78 |
```python |
75 |
```python
|
79 |
plt.figure(figsize=(16, 6)) |
76 |
plt.figure(figsize=(16, 6))
|
80 |
graph = sns.countplot(x="sub_type", hue="Label", data=(train_df)) |
77 |
graph = sns.countplot(x="sub_type", hue="Label", data=(train_df))
|
81 |
graph.set_xticklabels(graph.get_xticklabels(),rotation=90) |
78 |
graph.set_xticklabels(graph.get_xticklabels(),rotation=90)
|
82 |
plt.show() |
79 |
plt.show()
|
83 |
``` |
80 |
``` |
84 |
<img src='assets/counts.png'/> |
81 |
|
85 |
|
|
|
86 |
|
|
|
87 |
Lets check the counts for each subtype |
82 |
Lets check the counts for each subtype |
88 |
|
83 |
|
89 |
##### Epidural |
84 |
##### Epidural |
90 |
|
85 |
|
91 |
```python |
86 |
```python
|
92 |
train_df[train_df['sub_type'] == 'epidural']['Label'].value_counts() |
87 |
train_df[train_df['sub_type'] == 'epidural']['Label'].value_counts()
|
93 |
``` |
88 |
```
|
94 |
Output: |
89 |
Output: |
95 |
|
90 |
|
96 |
0 671501 |
91 |
0 671501 |
97 |
|
92 |
|
98 |
1 2761 |
93 |
1 2761 |
99 |
|
94 |
|
100 |
Name: Label, dtype: int64 |
95 |
Name: Label, dtype: int64 |
101 |
|
96 |
|
102 |
For epidural sub type we have 6,71,501 images labeled as 0 and 2,761 labelled as 1. |
97 |
For epidural sub type we have 6,71,501 images labeled as 0 and 2,761 labelled as 1. |
103 |
|
98 |
|
104 |
##### Intraparenchymal |
99 |
##### Intraparenchymal |
105 |
|
100 |
|
106 |
```python |
101 |
```python
|
107 |
train_df[train_df['sub_type'] == 'intraparenchymal']['Label'].value_counts() |
102 |
train_df[train_df['sub_type'] == 'intraparenchymal']['Label'].value_counts()
|
108 |
``` |
103 |
```
|
109 |
Output: <br/> |
104 |
Output: <br/>
|
110 |
0 641698<br/> |
105 |
0 641698<br/>
|
111 |
1 32564<br/> |
106 |
1 32564<br/>
|
112 |
Name: Label, dtype: int64 |
107 |
Name: Label, dtype: int64 |
113 |
|
108 |
|
114 |
For intraparenchymal sub type we have 6,41,698 images labeled as 0 and 32,564 labelled as 1. |
109 |
For intraparenchymal sub type we have 6,41,698 images labeled as 0 and 32,564 labelled as 1. |
115 |
|
110 |
|
116 |
|
111 |
|
117 |
##### Intraparenchymal |
112 |
##### Intraparenchymal |
118 |
|
113 |
|
119 |
```python |
114 |
```python
|
120 |
train_df[train_df['sub_type'] == 'intraparenchymal']['Label'].value_counts() |
115 |
train_df[train_df['sub_type'] == 'intraparenchymal']['Label'].value_counts()
|
121 |
``` |
116 |
```
|
122 |
Output: <br/> |
117 |
Output: <br/>
|
123 |
0 650496<br/> |
118 |
0 650496<br/>
|
124 |
1 23766<br/> |
119 |
1 23766<br/>
|
125 |
Name: Label, dtype: int64 |
120 |
Name: Label, dtype: int64 |
126 |
|
121 |
|
127 |
For intraparenchymal sub type we have 6,50,496 images labeled as 0 and 23,766 labelled as 1. |
122 |
For intraparenchymal sub type we have 6,50,496 images labeled as 0 and 23,766 labelled as 1. |
128 |
|
123 |
|
129 |
##### Subarachnoid |
124 |
##### Subarachnoid |
130 |
|
125 |
|
131 |
```python |
126 |
```python
|
132 |
train_df[train_df['sub_type'] == 'subarachnoid']['Label'].value_counts() |
127 |
train_df[train_df['sub_type'] == 'subarachnoid']['Label'].value_counts()
|
133 |
``` |
128 |
```
|
134 |
Output: <br/> |
129 |
Output: <br/>
|
135 |
0 642140<br/> |
130 |
0 642140<br/>
|
136 |
1 32122<br/> |
131 |
1 32122<br/>
|
137 |
Name: Label, dtype: int64 |
132 |
Name: Label, dtype: int64 |
138 |
|
133 |
|
139 |
For subarachnoid sub type we have 6,42,140 images labeled as 0 and 32,122 labelled as 1. |
134 |
For subarachnoid sub type we have 6,42,140 images labeled as 0 and 32,122 labelled as 1. |
140 |
|
135 |
|
141 |
|
136 |
|
142 |
##### Subdural |
137 |
##### Subdural |
143 |
|
138 |
|
144 |
```python |
139 |
```python
|
145 |
train_df[train_df['sub_type'] == 'subdural']['Label'].value_counts() |
140 |
train_df[train_df['sub_type'] == 'subdural']['Label'].value_counts()
|
146 |
``` |
141 |
```
|
147 |
Output: <br/> |
142 |
Output: <br/>
|
148 |
0 631766<br/> |
143 |
0 631766<br/>
|
149 |
1 42496<br/> |
144 |
1 42496<br/>
|
150 |
Name: Label, dtype: int64 |
145 |
Name: Label, dtype: int64 |
151 |
|
146 |
|
152 |
For Subdural sub type we have 6,31,766 images labeled as 0 and 42,496 labelled as 1. |
147 |
For Subdural sub type we have 6,31,766 images labeled as 0 and 42,496 labelled as 1. |
153 |
|
148 |
|
154 |
|
149 |
|
155 |
##### Any |
150 |
##### Any |
156 |
|
151 |
|
157 |
```python |
152 |
```python
|
158 |
train_df[train_df['sub_type'] == 'any']['Label'].value_counts() |
153 |
train_df[train_df['sub_type'] == 'any']['Label'].value_counts()
|
159 |
``` |
154 |
```
|
160 |
Output: <br/> |
155 |
Output: <br/>
|
161 |
0 577159<br/> |
156 |
0 577159<br/>
|
162 |
1 97103<br/> |
157 |
1 97103<br/>
|
163 |
Name: Label, dtype: int64 |
158 |
Name: Label, dtype: int64 |
164 |
|
159 |
|
165 |
For any sub type we have 5,77,159 images labeled as 0 and 97,103 labelled as 1. |
160 |
For any sub type we have 5,77,159 images labeled as 0 and 97,103 labelled as 1. |
166 |
|
161 |
|
167 |
### 3. Data Visualization & Preprocessing |
162 |
### 3. Data Visualization & Preprocessing |
168 |
|
163 |
|
169 |
Lets look at the dicom files in the dataset |
164 |
Lets look at the dicom files in the dataset |
170 |
|
165 |
|
171 |
```python |
166 |
```python
|
172 |
dicom = pydicom.read_file(path_train_img + 'ID_ffff922b9.dcm') |
167 |
dicom = pydicom.read_file(path_train_img + 'ID_ffff922b9.dcm')
|
173 |
print(dicom) |
168 |
print(dicom)
|
174 |
``` |
169 |
``` |
175 |
<img src='assets/dicom.png'/> |
170 |
|
176 |
|
171 |
|
177 |
|
|
|
178 |
Dicom data format files contain pixel data of image and other meta data like patient name, instance id, window width etc... |
172 |
Dicom data format files contain pixel data of image and other meta data like patient name, instance id, window width etc... |
179 |
|
173 |
|
180 |
Original image |
174 |
Original image |
181 |
|
175 |
|
182 |
```python |
176 |
```python
|
183 |
plt.imshow(dicom.pixel_array, cmap=plt.cm.bone) |
177 |
plt.imshow(dicom.pixel_array, cmap=plt.cm.bone)
|
184 |
plt.show() |
178 |
plt.show()
|
185 |
``` |
179 |
``` |
186 |
<img src='assets/original.png'/> |
180 |
|
187 |
|
181 |
|
188 |
|
|
|
189 |
The orginal image seems to have difficult to understand, lets check meta deta features like Window Center, Window Width, Rescale Intercept, Rescale Slope |
182 |
The orginal image seems to have difficult to understand, lets check meta deta features like Window Center, Window Width, Rescale Intercept, Rescale Slope |
190 |
|
183 |
|
191 |
<img src='assets/meta.png'/> |
184 |
|
192 |
|
|
|
193 |
|
|
|
194 |
We can use these features to construct the new image. |
185 |
We can use these features to construct the new image. |
195 |
|
186 |
|
196 |
```python |
187 |
```python
|
197 |
def get_dicom_field_value(key, dicom): |
188 |
def get_dicom_field_value(key, dicom):
|
198 |
""" |
189 |
"""
|
199 |
@param key: key is tuple |
190 |
@param key: key is tuple
|
200 |
@param dicom: dicom file |
191 |
@param dicom: dicom file
|
201 |
""" |
192 |
"""
|
202 |
return dicom[key].value |
193 |
return dicom[key].value |
203 |
|
194 |
|
204 |
window_center = int(get_dicom_field_value(('0028', '1050'), dicom)) |
195 |
window_center = int(get_dicom_field_value(('0028', '1050'), dicom))
|
205 |
window_width = int(get_dicom_field_value(('0028', '1051'), dicom)) |
196 |
window_width = int(get_dicom_field_value(('0028', '1051'), dicom))
|
206 |
window_intercept = int(get_dicom_field_value(('0028', '1052'), dicom)) |
197 |
window_intercept = int(get_dicom_field_value(('0028', '1052'), dicom))
|
207 |
window_slope = int(get_dicom_field_value(('0028', '1053'), dicom)) |
198 |
window_slope = int(get_dicom_field_value(('0028', '1053'), dicom))
|
208 |
window_center, window_width, window_intercept, window_slope |
199 |
window_center, window_width, window_intercept, window_slope |
209 |
|
200 |
|
210 |
def get_windowed_image(image, wc,ww, intercept, slope): |
201 |
def get_windowed_image(image, wc,ww, intercept, slope):
|
211 |
img = (image*slope +intercept) |
202 |
img = (image*slope +intercept)
|
212 |
img_min = wc - ww//2 |
203 |
img_min = wc - ww//2
|
213 |
img_max = wc + ww//2 |
204 |
img_max = wc + ww//2
|
214 |
img[img<img_min] = img_min |
205 |
img[img<img_min] = img_min
|
215 |
img[img>img_max] = img_max |
206 |
img[img>img_max] = img_max
|
216 |
return img |
207 |
return img
|
217 |
|
208 |
|
218 |
windowed_image = get_windowed_image(dicom.pixel_array, window_center, window_width, \ |
209 |
windowed_image = get_windowed_image(dicom.pixel_array, window_center, window_width, \
|
219 |
window_intercept, window_slope) |
210 |
window_intercept, window_slope)
|
220 |
|
211 |
|
221 |
plt.imshow(windowed_image, cmap=plt.cm.bone) |
212 |
plt.imshow(windowed_image, cmap=plt.cm.bone)
|
222 |
plt.show() |
213 |
plt.show()
|
223 |
``` |
214 |
``` |
224 |
<img src='assets/windowed.png'/> |
215 |
|
225 |
|
216 |
|
226 |
|
217 |
|
227 |
|
|
|
228 |
The windowed image using meta data is much better than the orginal image this is because the dicom pixel array which contain pixel data contain raw data in Hounsfield units (HU). |
218 |
The windowed image using meta data is much better than the orginal image this is because the dicom pixel array which contain pixel data contain raw data in Hounsfield units (HU). |
229 |
|
219 |
|
230 |
Scaling the image: |
220 |
Scaling the image: |
231 |
|
221 |
|
232 |
Rescale the image to range 0-255. |
222 |
Rescale the image to range 0-255. |
233 |
|
223 |
|
234 |
```python |
224 |
```python
|
235 |
def get_scaled_windowed_image(img): |
225 |
def get_scaled_windowed_image(img):
|
236 |
""" |
226 |
"""
|
237 |
Get scaled image |
227 |
Get scaled image
|
238 |
1. Convert to float |
228 |
1. Convert to float
|
239 |
2. Rescale to 0-255 |
229 |
2. Rescale to 0-255
|
240 |
3. Convert to unit8 |
230 |
3. Convert to unit8
|
241 |
""" |
231 |
"""
|
242 |
img_2d = img.astype(float) |
232 |
img_2d = img.astype(float)
|
243 |
img_2d_scaled = (np.maximum(img_2d,0) / img_2d.max()) * 255.0 |
233 |
img_2d_scaled = (np.maximum(img_2d,0) / img_2d.max()) * 255.0
|
244 |
img_2d_scaled = np.uint8(img_2d_scaled) |
234 |
img_2d_scaled = np.uint8(img_2d_scaled)
|
245 |
return img_2d_scaled |
235 |
return img_2d_scaled
|
246 |
|
236 |
|
247 |
scaled_image = get_scaled_windowed_image(windowed_image) |
237 |
scaled_image = get_scaled_windowed_image(windowed_image)
|
248 |
plt.imshow(scaled_image, cmap=plt.cm.bone, vmin=0, vmax=255) |
238 |
plt.imshow(scaled_image, cmap=plt.cm.bone, vmin=0, vmax=255)
|
249 |
plt.show() |
239 |
plt.show()
|
250 |
``` |
240 |
``` |
251 |
<img src='assets/scaled.png'/> |
241 |
|
252 |
|
242 |
|
253 |
|
|
|
254 |
Hounsfield Units (HU) are the best source for constructing CT images. [Here](https://en.wikipedia.org/wiki/Hounsfield_scale) is detailed table showing the substance and HU range. |
243 |
Hounsfield Units (HU) are the best source for constructing CT images. [Here](https://en.wikipedia.org/wiki/Hounsfield_scale) is detailed table showing the substance and HU range. |
255 |
|
244 |
|
256 |
A detailed explanation of all the possible windowing techniques can be found in this great kernel [(Gradient Sigmoid Windowing)](https://www.kaggle.com/reppic/gradient-sigmoid-windowing) |
245 |
A detailed explanation of all the possible windowing techniques can be found in this great kernel [(Gradient Sigmoid Windowing)](https://www.kaggle.com/reppic/gradient-sigmoid-windowing) |
257 |
|
246 |
|
258 |
```python |
247 |
```python |
259 |
|
248 |
|
260 |
def correct_dcm(dcm): |
249 |
def correct_dcm(dcm):
|
261 |
# Refer Jeremy Howard's Kernel https://www.kaggle.com/jhoward/from-prototyping-to-submission-fastai |
250 |
# Refer Jeremy Howard's Kernel https://www.kaggle.com/jhoward/from-prototyping-to-submission-fastai
|
262 |
x = dcm.pixel_array + 1000 |
251 |
x = dcm.pixel_array + 1000
|
263 |
px_mode = 4096 |
252 |
px_mode = 4096
|
264 |
x[x>=px_mode] = x[x>=px_mode] - px_mode |
253 |
x[x>=px_mode] = x[x>=px_mode] - px_mode
|
265 |
dcm.PixelData = x.tobytes() |
254 |
dcm.PixelData = x.tobytes()
|
266 |
dcm.RescaleIntercept = -1000 |
255 |
dcm.RescaleIntercept = -1000 |
267 |
|
256 |
|
268 |
def window_image(dcm, window_center, window_width): |
257 |
def window_image(dcm, window_center, window_width):
|
269 |
|
258 |
|
270 |
if (dcm.BitsStored == 12) and (dcm.PixelRepresentation == 0) and (int(dcm.RescaleIntercept) > -100): |
259 |
if (dcm.BitsStored == 12) and (dcm.PixelRepresentation == 0) and (int(dcm.RescaleIntercept) > -100):
|
271 |
correct_dcm(dcm) |
260 |
correct_dcm(dcm)
|
272 |
|
261 |
|
273 |
img = dcm.pixel_array * dcm.RescaleSlope + dcm.RescaleIntercept |
262 |
img = dcm.pixel_array * dcm.RescaleSlope + dcm.RescaleIntercept
|
274 |
img_min = window_center - window_width // 2 |
263 |
img_min = window_center - window_width // 2
|
275 |
img_max = window_center + window_width // 2 |
264 |
img_max = window_center + window_width // 2
|
276 |
img = np.clip(img, img_min, img_max) |
265 |
img = np.clip(img, img_min, img_max) |
277 |
|
266 |
|
278 |
return img |
267 |
return img |
279 |
|
268 |
|
280 |
def bsb_window(dcm): |
269 |
def bsb_window(dcm):
|
281 |
brain_img = window_image(dcm, 40, 80) |
270 |
brain_img = window_image(dcm, 40, 80)
|
282 |
subdural_img = window_image(dcm, 80, 200) |
271 |
subdural_img = window_image(dcm, 80, 200)
|
283 |
soft_img = window_image(dcm, 40, 380) |
272 |
soft_img = window_image(dcm, 40, 380)
|
284 |
|
273 |
|
285 |
brain_img = (brain_img - 0) / 80 |
274 |
brain_img = (brain_img - 0) / 80
|
286 |
subdural_img = (subdural_img - (-20)) / 200 |
275 |
subdural_img = (subdural_img - (-20)) / 200
|
287 |
soft_img = (soft_img - (-150)) / 380 |
276 |
soft_img = (soft_img - (-150)) / 380
|
288 |
bsb_img = np.array([brain_img, subdural_img, soft_img]).transpose(1,2,0) |
277 |
bsb_img = np.array([brain_img, subdural_img, soft_img]).transpose(1,2,0) |
289 |
|
278 |
|
290 |
return bsb_img |
279 |
return bsb_img
|
291 |
|
280 |
|
292 |
display_dicom_image('ID_0005d340e.dcm') |
281 |
display_dicom_image('ID_0005d340e.dcm')
|
293 |
``` |
282 |
``` |
294 |
<img src='assets/dicom_all.png'/> |
283 |
|
295 |
|
284 |
|
296 |
|
|
|
297 |
It looks like Brain + Subdural is a good start for our models it has three chaneels and cab be easily fed to any pretrained models. |
285 |
It looks like Brain + Subdural is a good start for our models it has three chaneels and cab be easily fed to any pretrained models. |
298 |
|
286 |
|
299 |
|
287 |
|
300 |
### 4. Deep Learning Model |
288 |
### 4. Deep Learning Model |
301 |
|
289 |
|
302 |
The whole code for the training of the model can be found [here](/notebooks/Effnet-B0 Windowed Image.ipynb) |
290 |
The whole code for the training of the model can be found [here](/notebooks/Effnet-B0 Windowed Image.ipynb) |
303 |
|
291 |
|
304 |
We will using normal windowed images for training the model with augmentations like flip left right and random cropping. |
292 |
We will using normal windowed images for training the model with augmentations like flip left right and random cropping. |
305 |
|
293 |
|
306 |
Here are steps for training the model |
294 |
Here are steps for training the model |
307 |
|
295 |
|
308 |
1. Prepare train and validation data generators we will be splitting the data by stratifying the labels here id the link to [multilabel stratification](https://github.com/trent-b/iterative-stratification). We will make two splits and onlt work on the first split and check the results. |
296 |
1. Prepare train and validation data generators we will be splitting the data by stratifying the labels here id the link to [multilabel stratification](https://github.com/trent-b/iterative-stratification). We will make two splits and onlt work on the first split and check the results.
|
309 |
2. Load pretrained Efficient Net B0 model. |
297 |
2. Load pretrained Efficient Net B0 model.
|
310 |
3. For the first epoch use all the train images for training the model with the first head layers using as it as is by setting trainable as False but train all the later images and save the model. |
298 |
3. For the first epoch use all the train images for training the model with the first head layers using as it as is by setting trainable as False but train all the later images and save the model.
|
311 |
4. Load the saved model and for the further epochs we train whole model except the last layer thus our model will learn most compliated features. |
299 |
4. Load the saved model and for the further epochs we train whole model except the last layer thus our model will learn most compliated features.
|
312 |
5. Make predictions. |
300 |
5. Make predictions. |
313 |
|
301 |
|
314 |
Sample code: |
302 |
Sample code: |
315 |
|
303 |
|
316 |
```python |
304 |
```python
|
317 |
# 1. ---------prepare data generators-------------# |
305 |
# 1. ---------prepare data generators-------------#
|
318 |
# https://github.com/trent-b/iterative-stratification |
306 |
# https://github.com/trent-b/iterative-stratification
|
319 |
# Mutlilabel stratification |
307 |
# Mutlilabel stratification
|
320 |
splits = MultilabelStratifiedShuffleSplit(n_splits = 2, test_size = TEST_SIZE, random_state = SEED) |
308 |
splits = MultilabelStratifiedShuffleSplit(n_splits = 2, test_size = TEST_SIZE, random_state = SEED)
|
321 |
file_names = train_final_df.index |
309 |
file_names = train_final_df.index
|
322 |
labels = train_final_df.values |
310 |
labels = train_final_df.values
|
323 |
# Lets take only the first split |
311 |
# Lets take only the first split
|
324 |
split = next(splits.split(file_names, labels)) |
312 |
split = next(splits.split(file_names, labels))
|
325 |
train_idx = split[0] |
313 |
train_idx = split[0]
|
326 |
valid_idx = split[1] |
314 |
valid_idx = split[1]
|
327 |
submission_predictions = [] |
315 |
submission_predictions = []
|
328 |
len(train_idx), len(valid_idx) |
316 |
len(train_idx), len(valid_idx)
|
329 |
# train data generator |
317 |
# train data generator
|
330 |
data_generator_train = TrainDataGenerator(train_final_df.iloc[train_idx], |
318 |
data_generator_train = TrainDataGenerator(train_final_df.iloc[train_idx],
|
331 |
train_final_df.iloc[train_idx], |
319 |
train_final_df.iloc[train_idx],
|
332 |
TRAIN_BATCH_SIZE, |
320 |
TRAIN_BATCH_SIZE,
|
333 |
(WIDTH, HEIGHT), |
321 |
(WIDTH, HEIGHT),
|
334 |
augment = True) |
322 |
augment = True) |
335 |
|
323 |
|
336 |
# validation data generator |
324 |
# validation data generator
|
337 |
data_generator_val = TrainDataGenerator(train_final_df.iloc[valid_idx], |
325 |
data_generator_val = TrainDataGenerator(train_final_df.iloc[valid_idx],
|
338 |
train_final_df.iloc[valid_idx], |
326 |
train_final_df.iloc[valid_idx],
|
339 |
VALID_BATCH_SIZE, |
327 |
VALID_BATCH_SIZE,
|
340 |
(WIDTH, HEIGHT), |
328 |
(WIDTH, HEIGHT),
|
341 |
augment = False) |
329 |
augment = False)
|
342 |
# 2. ---------load efficient net B0 model-----------# |
330 |
# 2. ---------load efficient net B0 model-----------#
|
343 |
base_model = efn.EfficientNetB0(weights = 'imagenet', include_top = False, \ |
331 |
base_model = efn.EfficientNetB0(weights = 'imagenet', include_top = False, \
|
344 |
pooling = 'avg', input_shape = (HEIGHT, WIDTH, 3)) |
332 |
pooling = 'avg', input_shape = (HEIGHT, WIDTH, 3))
|
345 |
x = base_model.output |
333 |
x = base_model.output
|
346 |
x = Dropout(0.125)(x) |
334 |
x = Dropout(0.125)(x)
|
347 |
output_layer = Dense(6, activation = 'sigmoid')(x) |
335 |
output_layer = Dense(6, activation = 'sigmoid')(x)
|
348 |
model = Model(inputs=base_model.input, outputs=output_layer) |
336 |
model = Model(inputs=base_model.input, outputs=output_layer)
|
349 |
model.compile(optimizer = Adam(learning_rate = 0.0001), |
337 |
model.compile(optimizer = Adam(learning_rate = 0.0001),
|
350 |
loss = 'binary_crossentropy', |
338 |
loss = 'binary_crossentropy',
|
351 |
metrics = ['acc', tf.keras.metrics.AUC()]) |
339 |
metrics = ['acc', tf.keras.metrics.AUC()])
|
352 |
model.summary() |
340 |
model.summary() |
353 |
|
341 |
|
354 |
# 3. ---------for 1 st epoch train on whole dataset ------------# |
342 |
# 3. ---------for 1 st epoch train on whole dataset ------------#
|
355 |
for layer in model.layers[:-5]: |
343 |
for layer in model.layers[:-5]:
|
356 |
layer.trainable = False |
344 |
layer.trainable = False
|
357 |
|
345 |
|
358 |
model.compile(optimizer = Adam(learning_rate = 0.0001), |
346 |
model.compile(optimizer = Adam(learning_rate = 0.0001),
|
359 |
loss = 'binary_crossentropy', |
347 |
loss = 'binary_crossentropy',
|
360 |
metrics = ['acc']) |
348 |
metrics = ['acc'])
|
361 |
|
349 |
|
362 |
model.fit_generator(generator = data_generator_train, |
350 |
model.fit_generator(generator = data_generator_train,
|
363 |
validation_data = data_generator_val, |
351 |
validation_data = data_generator_val,
|
364 |
epochs = 1, |
352 |
epochs = 1,
|
365 |
callbacks = callbacks_list, |
353 |
callbacks = callbacks_list,
|
366 |
verbose = 1) |
354 |
verbose = 1) |
367 |
|
355 |
|
368 |
# 4. ---------for rest of epochs train on sample data----------# |
356 |
# 4. ---------for rest of epochs train on sample data----------#
|
369 |
model.load_weights('model.h5') |
357 |
model.load_weights('model.h5')
|
370 |
model.compile(optimizer = Adam(learning_rate = 0.0004), |
358 |
model.compile(optimizer = Adam(learning_rate = 0.0004),
|
371 |
loss = 'binary_crossentropy', |
359 |
loss = 'binary_crossentropy',
|
372 |
metrics = ['acc']) |
360 |
metrics = ['acc'])
|
373 |
model.fit_generator(generator = data_generator_train, |
361 |
model.fit_generator(generator = data_generator_train,
|
374 |
validation_data = data_generator_val, |
362 |
validation_data = data_generator_val,
|
375 |
steps_per_epoch=len(data_generator_train)/6, |
363 |
steps_per_epoch=len(data_generator_train)/6,
|
376 |
epochs = 10, |
364 |
epochs = 10,
|
377 |
callbacks = callbacks_list, |
365 |
callbacks = callbacks_list,
|
378 |
verbose = 1) |
366 |
verbose = 1)
|
379 |
# 5. --------Make Predictions ------- --------------------------# |
367 |
# 5. --------Make Predictions ------- --------------------------#
|
380 |
model.load_weights('model.h5') |
368 |
model.load_weights('model.h5') |
381 |
|
369 |
|
382 |
def get_scores(data_gen, file_name='scores.pkl'): |
370 |
def get_scores(data_gen, file_name='scores.pkl'):
|
383 |
scores = model.evaluate_generator(data_gen, verbose=1) |
371 |
scores = model.evaluate_generator(data_gen, verbose=1)
|
384 |
joblib.dump(scores, file_name) |
372 |
joblib.dump(scores, file_name)
|
385 |
print(f"Loss: {scores[0]} and Accuracy: {scores[1]*100}") |
373 |
print(f"Loss: {scores[0]} and Accuracy: {scores[1]*100}")
|
386 |
``` |
374 |
``` |
387 |
|
375 |
|
388 |
Lets predict on train and validation generators. |
376 |
Lets predict on train and validation generators. |
389 |
|
377 |
|
390 |
```python |
378 |
```python
|
391 |
get_scores(data_gen=data_generator_train, file_name='train_scores.pkl') |
379 |
get_scores(data_gen=data_generator_train, file_name='train_scores.pkl')
|
392 |
``` |
380 |
``` |
393 |
<img src='assets/train.png'/> |
381 |
|
394 |
|
|
|
395 |
```python |
382 |
```python
|
396 |
get_scores(data_gen=data_generator_val, file_name='val_scores.pkl') |
383 |
get_scores(data_gen=data_generator_val, file_name='val_scores.pkl')
|
397 |
``` |
384 |
``` |
398 |
<img src='assets/val.png'/> |
385 |
|
399 |
|
|
|
400 |
Lets load test data frame, test data csv is also in the same format as train.csv |
386 |
Lets load test data frame, test data csv is also in the same format as train.csv |
401 |
|
387 |
|
402 |
```python |
388 |
```python
|
403 |
# extract subtype |
389 |
# extract subtype
|
404 |
test_df['sub_type'] = test_df['ID'].apply(lambda x: x.split('_')[-1]) |
390 |
test_df['sub_type'] = test_df['ID'].apply(lambda x: x.split('_')[-1])
|
405 |
# extract filename |
391 |
# extract filename
|
406 |
test_df['file_name'] = test_df['ID'].apply(lambda x: '_'.join(x.split('_')[:2]) + '.dcm') |
392 |
test_df['file_name'] = test_df['ID'].apply(lambda x: '_'.join(x.split('_')[:2]) + '.dcm') |
407 |
|
393 |
|
408 |
test_df = pd.pivot_table(test_df.drop(columns='ID'), index="file_name", \ |
394 |
test_df = pd.pivot_table(test_df.drop(columns='ID'), index="file_name", \
|
409 |
columns="sub_type", values="Label") |
395 |
columns="sub_type", values="Label")
|
410 |
test_df.head() |
396 |
test_df.head() |
411 |
|
397 |
|
412 |
test_df.shape |
398 |
test_df.shape
|
413 |
``` |
399 |
``` |
414 |
|
400 |
|
415 |
Output: (78545, 6) |
401 |
Output: (78545, 6) |
416 |
|
402 |
|
417 |
So we have 78,545 test images and we need to predict 6 labels for each image. |
403 |
So we have 78,545 test images and we need to predict 6 labels for each image. |
418 |
|
404 |
|
419 |
```python |
405 |
```python
|
420 |
preds = model.predict_generator(TestDataGenerator(test_df.index, None, VALID_BATCH_SIZE, \ |
406 |
preds = model.predict_generator(TestDataGenerator(test_df.index, None, VALID_BATCH_SIZE, \
|
421 |
(WIDTH, HEIGHT), path_test_img), |
407 |
(WIDTH, HEIGHT), path_test_img),
|
422 |
verbose=1) |
408 |
verbose=1)
|
423 |
print(preds.shape) |
409 |
print(preds.shape)
|
424 |
``` |
410 |
```
|
425 |
Output: (78545, 6) |
411 |
Output: (78545, 6) |
426 |
|
412 |
|
427 |
|
413 |
|
428 |
As per sample submission given by kaggle it is in a different format, the submission should be made with ID and Label column where ID is in the form of <b>dicomId_subType</b>(Ex:ID_0fbf6a978_subarachnoid) so we need format this to convert each prediction to 6 rows each indicating the id with sub type and its probability. The following code generates the required format for submission. |
414 |
As per sample submission given by kaggle it is in a different format, the submission should be made with ID and Label column where ID is in the form of <b>dicomId_subType</b>(Ex:ID_0fbf6a978_subarachnoid) so we need format this to convert each prediction to 6 rows each indicating the id with sub type and its probability. The following code generates the required format for submission. |
429 |
|
415 |
|
430 |
```python |
416 |
```python
|
431 |
def create_download_link(title = "Download CSV file", filename = "data.csv"): |
417 |
def create_download_link(title = "Download CSV file", filename = "data.csv"):
|
432 |
""" |
418 |
"""
|
433 |
Helper function to generate download link to files in kaggle kernel |
419 |
Helper function to generate download link to files in kaggle kernel
|
434 |
""" |
420 |
"""
|
435 |
html = '<a href={filename}>{title}</a>' |
421 |
html = '<a href={filename}>{title}</a>'
|
436 |
html = html.format(title=title,filename=filename) |
422 |
html = html.format(title=title,filename=filename)
|
437 |
return HTML(html) |
423 |
return HTML(html) |
438 |
|
424 |
|
439 |
def generate_submission_file(preds): |
425 |
def generate_submission_file(preds):
|
440 |
from tqdm import tqdm |
426 |
from tqdm import tqdm |
441 |
|
427 |
|
442 |
cols = list(train_final_df.columns) |
428 |
cols = list(train_final_df.columns) |
443 |
|
429 |
|
444 |
# We have preditions for each of the image |
430 |
# We have preditions for each of the image
|
445 |
# We need to make 6 rows for each of file according to the subtype |
431 |
# We need to make 6 rows for each of file according to the subtype
|
446 |
ids = [] |
432 |
ids = []
|
447 |
values = [] |
433 |
values = []
|
448 |
for i, j in tqdm(zip(preds, test_df.index.to_list()), total=preds.shape[0]): |
434 |
for i, j in tqdm(zip(preds, test_df.index.to_list()), total=preds.shape[0]):
|
449 |
# print(i, j) |
435 |
# print(i, j)
|
450 |
# i=[any_prob, epidural_prob, intraparenchymal_prob, intraventricular_prob, subarachnoid_prob, subdural_prob] |
436 |
# i=[any_prob, epidural_prob, intraparenchymal_prob, intraventricular_prob, subarachnoid_prob, subdural_prob]
|
451 |
# j = filename ==> ID_xyz.dcm |
437 |
# j = filename ==> ID_xyz.dcm
|
452 |
for k in range(i.shape[0]): |
438 |
for k in range(i.shape[0]):
|
453 |
ids.append([j.replace('.dcm', '_' + cols[k])]) |
439 |
ids.append([j.replace('.dcm', '_' + cols[k])])
|
454 |
values.append(i[k]) |
440 |
values.append(i[k]) |
455 |
|
441 |
|
456 |
df = pd.DataFrame(data=ids) |
442 |
df = pd.DataFrame(data=ids)
|
457 |
df.head() |
443 |
df.head() |
458 |
|
444 |
|
459 |
sample_df = pd.read_csv(input_folder + 'stage_1_sample_submission.csv') |
445 |
sample_df = pd.read_csv(input_folder + 'stage_1_sample_submission.csv')
|
460 |
sample_df.head() |
446 |
sample_df.head() |
461 |
|
447 |
|
462 |
df['Label'] = values |
448 |
df['Label'] = values
|
463 |
df.columns = sample_df.columns |
449 |
df.columns = sample_df.columns
|
464 |
df.head() |
450 |
df.head() |
465 |
|
451 |
|
466 |
df.to_csv('submission.csv', index=False) |
452 |
df.to_csv('submission.csv', index=False) |
467 |
|
453 |
|
468 |
return create_download_link(filename='submission.csv') |
454 |
return create_download_link(filename='submission.csv')
|
469 |
``` |
455 |
``` |
470 |
|
456 |
|
471 |
```python |
457 |
```python
|
472 |
df = pd.read_csv('submission.csv') |
458 |
df = pd.read_csv('submission.csv')
|
473 |
df.head() |
459 |
df.head()
|
474 |
``` |
460 |
``` |
475 |
|
461 |
|
476 |
<img src='assets/sample_sub.png'/> |
462 |
|
477 |
|
|
|
478 |
All notebooks can be found [here](https://github.com/suryachintu/RSNA-Intracranial-Hemorrhage-Detection/tree/master/notebooks) |
463 |
All notebooks can be found [here](https://github.com/suryachintu/RSNA-Intracranial-Hemorrhage-Detection/tree/master/notebooks) |
479 |
|
464 |
|
480 |
### 5. Demo |
465 |
### 5. Demo |
481 |
|
466 |
|
482 |
You can test the model by uploading the DICOM file [here](http://34.93.89.75:5325/) |
467 |
You can test the model by uploading the DICOM file [here](http://34.93.89.75:5325/) |
483 |
|
468 |
|
484 |
|
469 |
|
485 |
### References |
470 |
### References |
486 |
|
471 |
|
487 |
https://my.clevelandclinic.org/health/diseases/14480-intracranial-hemorrhage-cerebral-hemorrhage-and-hemorrhagic-stroke<br/> |
472 |
https://my.clevelandclinic.org/health/diseases/14480-intracranial-hemorrhage-cerebral-hemorrhage-and-hemorrhagic-stroke<br/>
|
488 |
https://github.com/MGH-LMIC/windows_optimization<br/> |
473 |
https://github.com/MGH-LMIC/windows_optimization<br/>
|
489 |
https://arxiv.org/abs/1812.00572(Must read) |
474 |
https://arxiv.org/abs/1812.00572(Must read)
|
490 |
https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/111325#latest-650043 |
475 |
https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/111325#latest-650043
|
491 |
https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109261#latest-651855 |
476 |
https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109261#latest-651855 |
492 |
|
477 |
|
493 |
### Kaggle Kernels |
478 |
### Kaggle Kernels |
494 |
|
479 |
|
495 |
https://www.kaggle.com/jhoward/some-dicom-gotchas-to-be-aware-of-fastai |
480 |
https://www.kaggle.com/jhoward/some-dicom-gotchas-to-be-aware-of-fastai
|
496 |
https://www.kaggle.com/reppic/gradient-sigmoid-windowing |
481 |
https://www.kaggle.com/reppic/gradient-sigmoid-windowing
|
497 |
https://www.kaggle.com/jhoward/from-prototyping-to-submission-fastai |
482 |
https://www.kaggle.com/jhoward/from-prototyping-to-submission-fastai
|
498 |
https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-1 |
483 |
https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-1
|
499 |
https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-2 |
484 |
https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-2
|