RSNA-Intracranial-Hem / Git / Diff of /README.md

Models:

DavidFeaster/

RSNA-Intracranial-Hem

Downloads: 1

Diff of /README.md [d735e2] .. [b9a43c]

Switch to unified view


## Intracranial Hemorrhage Detection

This blog post is about the challenge that is hosted on kaggle on [RSNA Intracranial Hemorrhage Detection](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection). 

This post is divided into following parts

1. Overview
2. Basic EDA [Ipython Notebook](https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-1)
3. Data Visualization & Preprocessing
4. Deep Learning Model
5. Demo

### 1. Overview

##### What is Intracranial Hemorrhage?

An intracranial hemorrhage is a type of bleeding that occurs inside the skull. Symptoms include sudden tingling, weakness, numbness, paralysis, severe headache, difficulty with swallowing or vision, loss of balance or coordination, difficulty understanding, speaking , reading, or writing, and a change in level of consciousness or alertness, marked by stupor, lethargy, sleepiness, or coma. Any type of bleeding inside the skull or brain is a medical emergency. It is important to get the person to a hospital emergency room immediately to determine the cause of the bleeding and begin medical treatment. It rquires highly trained specialists review medical images of the patient’s cranium to look for the presence, location and type of hemorrhage. The process is complicated and often time consuming. So as part of this we will be deep learning techniques to detect acute intracranial hemorrhage and its subtypes.

Hemorrhage Types

1. Epidural
2. Intraparenchymal    
3. Intraventricular
4. Subarachnoid 
5. Subdural
6. Any

##### What am i predicting?

In this competition our goal is to predict intracranial hemorrhage and its subtypes. Given an image the we need to predict probablity of each subtype. This indicates its a multilabel classification problem.

##### Evaluation Metric

Competition evaluation metric is **weighted log loss** but weights for each subtype is not disclosed as part of the competition but in the discussion forms some of the teams found it out that the any label has a weight of 2 compared to other subtypes, you can check more details [here](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109526#latest-630190). But as part of this tutorial i'm going to use normal accuracy as evaluation metric and loss as **binary cross entropy loss** and checkpointing the models based on the loss.


### 2. Basic EDA 

Lets look at the [data](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/data) that is provided.

We have a train.csv containing file names and label indicating whether hemorrhage is present or not and train images folder which is set of [Dicom](https://www.dicomstandard.org/) files (Medical images are stored in dicom formats) and test images folder containing test dicom files.

```python
# load the csv file
train_df = pd.read_csv(input_folder + 'stage_1_train.csv')
train_df.head()
```

     
It consists of two columns ID and Label. ID has a format FILE_ID_SUB_TYPE for example ID_63eb1e259_epidural so ID_63eb1e259 is file id and epidural is subtype and Label indicating whether subtype hemorrhage is present or not.

Lets seperate file names and subtypes

```python
# extract subtype
train_df['sub_type'] = train_df['ID'].apply(lambda x: x.split('_')[-1])
# extract filename
train_df['file_name'] = train_df['ID'].apply(lambda x: '_'.join(x.split('_')[:2]) + '.dcm')
train_df.head()
```



```python
train_df.shape
````
Output : (4045572, 4)

```python
print("Number of train images availabe:", len(os.listdir(path_train_img)))
```
Output : Number of train images availabe: 674258

The csv file has a shape of (4045572, 4). For every file(dicom file) present in the train folder has 6 entries in csv indicating possible 6 subtype hemorrhages.

Lets check the files available for each subtype

```python
plt.figure(figsize=(16, 6))
graph = sns.countplot(x="sub_type", hue="Label", data=(train_df))
graph.set_xticklabels(graph.get_xticklabels(),rotation=90)
plt.show()
```



Lets check the counts for each subtype

##### Epidural

```python
train_df[train_df['sub_type'] == 'epidural']['Label'].value_counts()
```
Output: 

0    671501

1      2761

Name: Label, dtype: int64

For epidural sub type we have 6,71,501 images labeled as 0 and 2,761 labelled as 1.

##### Intraparenchymal

```python
train_df[train_df['sub_type'] == 'intraparenchymal']['Label'].value_counts()
```
Output: <br/>
0    641698<br/>
1     32564<br/>
Name: Label, dtype: int64

For intraparenchymal sub type we have 6,41,698 images labeled as 0 and 32,564 labelled as 1.


##### Intraparenchymal

```python
train_df[train_df['sub_type'] == 'intraparenchymal']['Label'].value_counts()
```
Output: <br/>
0    650496<br/>
1     23766<br/>
Name: Label, dtype: int64

For intraparenchymal sub type we have 6,50,496 images labeled as 0 and 23,766 labelled as 1.

##### Subarachnoid

```python
train_df[train_df['sub_type'] == 'subarachnoid']['Label'].value_counts()
```
Output: <br/>
0    642140<br/>
1     32122<br/>
Name: Label, dtype: int64

For subarachnoid sub type we have 6,42,140 images labeled as 0 and 32,122 labelled as 1.


##### Subdural

```python
train_df[train_df['sub_type'] == 'subdural']['Label'].value_counts()
```
Output: <br/>
0    631766<br/>
1     42496<br/>
Name: Label, dtype: int64

For Subdural sub type we have 6,31,766 images labeled as 0 and 42,496 labelled as 1.


##### Any

```python
train_df[train_df['sub_type'] == 'any']['Label'].value_counts()
```
Output: <br/>
0    577159<br/>
1     97103<br/>
Name: Label, dtype: int64

For any sub type we have 5,77,159 images labeled as 0 and 97,103 labelled as 1.

### 3. Data Visualization & Preprocessing

Lets look at the dicom files in the dataset

```python
dicom = pydicom.read_file(path_train_img + 'ID_ffff922b9.dcm')
print(dicom)
```



Dicom data format files contain pixel data of image and other meta data like patient name, instance id, window width etc...

Original image

```python
plt.imshow(dicom.pixel_array, cmap=plt.cm.bone)
plt.show()
```



The orginal image seems to have difficult to understand, lets check meta deta features like Window Center, Window Width, Rescale Intercept, Rescale Slope 




We can use these features to construct the new image.

```python
def get_dicom_field_value(key, dicom):
    """
    @param key: key is tuple
    @param dicom: dicom file
    """
    return dicom[key].value

window_center = int(get_dicom_field_value(('0028', '1050'), dicom))
window_width = int(get_dicom_field_value(('0028', '1051'), dicom))
window_intercept = int(get_dicom_field_value(('0028', '1052'), dicom))
window_slope = int(get_dicom_field_value(('0028', '1053'), dicom))
window_center, window_width, window_intercept, window_slope

def get_windowed_image(image, wc,ww, intercept, slope):
    img = (image*slope +intercept)
    img_min = wc - ww//2
    img_max = wc + ww//2
    img[img<img_min] = img_min
    img[img>img_max] = img_max
    return img 
    
windowed_image = get_windowed_image(dicom.pixel_array, window_center, window_width, \
                                    window_intercept, window_slope)
                                    
plt.imshow(windowed_image, cmap=plt.cm.bone)
plt.show()
```




The windowed image using meta data is much better than the orginal image this is because the dicom pixel array which contain pixel data contain raw data in Hounsfield units (HU). 

Scaling the image:

Rescale the image to range 0-255.

```python
def get_scaled_windowed_image(img):
    """
    Get scaled image
    1. Convert to float
    2. Rescale to 0-255
    3. Convert to unit8
    """
    img_2d = img.astype(float)
    img_2d_scaled = (np.maximum(img_2d,0) / img_2d.max()) * 255.0
    img_2d_scaled = np.uint8(img_2d_scaled)
    return img_2d_scaled
    
scaled_image = get_scaled_windowed_image(windowed_image)
plt.imshow(scaled_image, cmap=plt.cm.bone, vmin=0, vmax=255)
plt.show()
```



Hounsfield Units (HU) are the best source for constructing CT images. [Here](https://en.wikipedia.org/wiki/Hounsfield_scale) is detailed table showing the substance and HU range. 

A detailed explanation of all the possible windowing techniques can be found in this great kernel [(Gradient Sigmoid Windowing)](https://www.kaggle.com/reppic/gradient-sigmoid-windowing) 

```python

def correct_dcm(dcm):
    # Refer Jeremy Howard's Kernel https://www.kaggle.com/jhoward/from-prototyping-to-submission-fastai
    x = dcm.pixel_array + 1000
    px_mode = 4096
    x[x>=px_mode] = x[x>=px_mode] - px_mode
    dcm.PixelData = x.tobytes()
    dcm.RescaleIntercept = -1000

def window_image(dcm, window_center, window_width):
    
    if (dcm.BitsStored == 12) and (dcm.PixelRepresentation == 0) and (int(dcm.RescaleIntercept) > -100):
        correct_dcm(dcm)
    
    img = dcm.pixel_array * dcm.RescaleSlope + dcm.RescaleIntercept
    img_min = window_center - window_width // 2
    img_max = window_center + window_width // 2
    img = np.clip(img, img_min, img_max)

    return img

def bsb_window(dcm):
    brain_img = window_image(dcm, 40, 80)
    subdural_img = window_image(dcm, 80, 200)
    soft_img = window_image(dcm, 40, 380)
    
    brain_img = (brain_img - 0) / 80
    subdural_img = (subdural_img - (-20)) / 200
    soft_img = (soft_img - (-150)) / 380
    bsb_img = np.array([brain_img, subdural_img, soft_img]).transpose(1,2,0)

    return bsb_img
    
display_dicom_image('ID_0005d340e.dcm')
```



It looks like Brain + Subdural is a good start for our models it has three chaneels and cab be easily fed to any pretrained models. 


### 4. Deep Learning Model

The whole code for the training of the model can be found [here](/notebooks/Effnet-B0 Windowed Image.ipynb) 

We will using normal windowed images for training the model with augmentations like flip left right and random cropping.

Here are steps for training the model

1. Prepare train and validation data generators we will be splitting the data by stratifying the labels here id the link to [multilabel stratification](https://github.com/trent-b/iterative-stratification). We will make two splits and onlt work on the first split and check the results. 
2. Load pretrained Efficient Net B0 model.
3. For the first epoch use all the train images for training the model with the first head layers using as it as is by setting trainable as False but train all the later images and save the model.
4. Load the saved model and for the further epochs we train whole model except the last layer thus our model will learn most compliated features. 
5. Make predictions.

Sample code:

```python
# 1. ---------prepare data generators-------------#
# https://github.com/trent-b/iterative-stratification
# Mutlilabel stratification
splits = MultilabelStratifiedShuffleSplit(n_splits = 2, test_size = TEST_SIZE, random_state = SEED)
file_names = train_final_df.index
labels = train_final_df.values
# Lets take only the first split
split = next(splits.split(file_names, labels))
train_idx = split[0]
valid_idx = split[1]
submission_predictions = []
len(train_idx), len(valid_idx)
# train data generator
data_generator_train = TrainDataGenerator(train_final_df.iloc[train_idx], 
                                                train_final_df.iloc[train_idx], 
                                                TRAIN_BATCH_SIZE, 
                                                (WIDTH, HEIGHT),
                                                augment = True)

# validation data generator
data_generator_val = TrainDataGenerator(train_final_df.iloc[valid_idx], 
                                            train_final_df.iloc[valid_idx], 
                                            VALID_BATCH_SIZE, 
                                            (WIDTH, HEIGHT),
                                            augment = False)
# 2. ---------load efficient net B0 model-----------#
base_model =  efn.EfficientNetB0(weights = 'imagenet', include_top = False, \
                                 pooling = 'avg', input_shape = (HEIGHT, WIDTH, 3))
x = base_model.output
x = Dropout(0.125)(x)
output_layer = Dense(6, activation = 'sigmoid')(x)
model = Model(inputs=base_model.input, outputs=output_layer)
model.compile(optimizer = Adam(learning_rate = 0.0001), 
                  loss = 'binary_crossentropy',
                  metrics = ['acc', tf.keras.metrics.AUC()])
model.summary()

# 3. ---------for 1 st epoch train on whole dataset ------------#
for layer in model.layers[:-5]:
    layer.trainable = False
    
model.compile(optimizer = Adam(learning_rate = 0.0001), 
                  loss = 'binary_crossentropy',
                  metrics = ['acc'])
    
model.fit_generator(generator = data_generator_train,
                        validation_data = data_generator_val,
                        epochs = 1,
                        callbacks = callbacks_list,
                        verbose = 1)

# 4. ---------for rest of epochs train on sample data----------#
model.load_weights('model.h5')
model.compile(optimizer = Adam(learning_rate = 0.0004), 
                  loss = 'binary_crossentropy',
                  metrics = ['acc'])
model.fit_generator(generator = data_generator_train,
                        validation_data = data_generator_val,
                        steps_per_epoch=len(data_generator_train)/6,
                        epochs = 10,
                        callbacks = callbacks_list,
                        verbose = 1)
# 5. --------Make Predictions ------- --------------------------#
model.load_weights('model.h5')

def get_scores(data_gen, file_name='scores.pkl'):
    scores = model.evaluate_generator(data_gen, verbose=1)
    joblib.dump(scores, file_name)
    print(f"Loss: {scores[0]} and Accuracy: {scores[1]*100}")
```

Lets predict on train and validation generators.

```python
get_scores(data_gen=data_generator_train, file_name='train_scores.pkl')
```


```python
get_scores(data_gen=data_generator_val, file_name='val_scores.pkl')
```


Lets load test data frame, test data csv is also in the same format as train.csv

```python
# extract subtype
test_df['sub_type'] = test_df['ID'].apply(lambda x: x.split('_')[-1])
# extract filename
test_df['file_name'] = test_df['ID'].apply(lambda x: '_'.join(x.split('_')[:2]) + '.dcm')

test_df = pd.pivot_table(test_df.drop(columns='ID'), index="file_name", \
                                columns="sub_type", values="Label")
test_df.head()

test_df.shape
```

Output: (78545, 6)

So we have 78,545 test images and we need to predict 6 labels for each image. 

```python
preds = model.predict_generator(TestDataGenerator(test_df.index, None, VALID_BATCH_SIZE, \
                                                  (WIDTH, HEIGHT), path_test_img), 
                                verbose=1)
print(preds.shape)
```
Output: (78545, 6)


As per sample submission given by kaggle it is in a different format, the submission should be made with ID and Label column where ID is in the form of <b>dicomId_subType</b>(Ex:ID_0fbf6a978_subarachnoid) so we need format this to convert each prediction to 6 rows each indicating the id with sub type and its probability. The following code generates the required format for submission.

```python
def create_download_link(title = "Download CSV file", filename = "data.csv"):  
    """
    Helper function to generate download link to files in kaggle kernel 
    """
    html = '<a href={filename}>{title}</a>'
    html = html.format(title=title,filename=filename)
    return HTML(html)

def generate_submission_file(preds):
    from tqdm import tqdm

    cols = list(train_final_df.columns)

    # We have preditions for each of the image
    # We need to make 6 rows for each of file according to the subtype
    ids = []
    values = []
    for i, j in tqdm(zip(preds, test_df.index.to_list()), total=preds.shape[0]):
    #     print(i, j)
        # i=[any_prob, epidural_prob, intraparenchymal_prob, intraventricular_prob, subarachnoid_prob, subdural_prob]
        # j = filename ==> ID_xyz.dcm
        for k in range(i.shape[0]):
            ids.append([j.replace('.dcm', '_' + cols[k])])
            values.append(i[k])      

    df = pd.DataFrame(data=ids)
    df.head()

    sample_df = pd.read_csv(input_folder + 'stage_1_sample_submission.csv')
    sample_df.head()

    df['Label'] = values
    df.columns = sample_df.columns
    df.head()

    df.to_csv('submission.csv', index=False)

    return create_download_link(filename='submission.csv')
```

```python
df = pd.read_csv('submission.csv')
df.head()
```



All notebooks can be found [here](https://github.com/suryachintu/RSNA-Intracranial-Hemorrhage-Detection/tree/master/notebooks)

### 5. Demo

You can test the model by uploading the DICOM file [here](http://34.93.89.75:5325/)


### References

https://my.clevelandclinic.org/health/diseases/14480-intracranial-hemorrhage-cerebral-hemorrhage-and-hemorrhagic-stroke<br/>
https://github.com/MGH-LMIC/windows_optimization<br/>
https://arxiv.org/abs/1812.00572(Must read)
https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/111325#latest-650043
https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109261#latest-651855

### Kaggle Kernels

https://www.kaggle.com/jhoward/some-dicom-gotchas-to-be-aware-of-fastai
https://www.kaggle.com/reppic/gradient-sigmoid-windowing
https://www.kaggle.com/jhoward/from-prototyping-to-submission-fastai
https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-1
https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-2

	a/README.md		b/README.md
1	## Intracranial Hemorrhage Detection	1	## Intracranial Hemorrhage Detection
2		2
3	This blog post is about the challenge that is hosted on kaggle on [RSNA Intracranial Hemorrhage Detection](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection).	3	This blog post is about the challenge that is hosted on kaggle on [RSNA Intracranial Hemorrhage Detection](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection).
4		4
5	This post is divided into following parts	5	This post is divided into following parts
6		6
7	1. Overview	7	1. Overview
8	2. Basic EDA [Ipython Notebook](https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-1)	8	2. Basic EDA [Ipython Notebook](https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-1)
9	3. Data Visualization & Preprocessing	9	3. Data Visualization & Preprocessing
10	4. Deep Learning Model	10	4. Deep Learning Model
11	5. Demo	11	5. Demo
12		12
13	### 1. Overview	13	### 1. Overview
14		14
15	##### What is Intracranial Hemorrhage?	15	##### What is Intracranial Hemorrhage?
16		16
17	An intracranial hemorrhage is a type of bleeding that occurs inside the skull. Symptoms include sudden tingling, weakness, numbness, paralysis, severe headache, difficulty with swallowing or vision, loss of balance or coordination, difficulty understanding, speaking , reading, or writing, and a change in level of consciousness or alertness, marked by stupor, lethargy, sleepiness, or coma. Any type of bleeding inside the skull or brain is a medical emergency. It is important to get the person to a hospital emergency room immediately to determine the cause of the bleeding and begin medical treatment. It rquires highly trained specialists review medical images of the patient’s cranium to look for the presence, location and type of hemorrhage. The process is complicated and often time consuming. So as part of this we will be deep learning techniques to detect acute intracranial hemorrhage and its subtypes.	17	An intracranial hemorrhage is a type of bleeding that occurs inside the skull. Symptoms include sudden tingling, weakness, numbness, paralysis, severe headache, difficulty with swallowing or vision, loss of balance or coordination, difficulty understanding, speaking , reading, or writing, and a change in level of consciousness or alertness, marked by stupor, lethargy, sleepiness, or coma. Any type of bleeding inside the skull or brain is a medical emergency. It is important to get the person to a hospital emergency room immediately to determine the cause of the bleeding and begin medical treatment. It rquires highly trained specialists review medical images of the patient’s cranium to look for the presence, location and type of hemorrhage. The process is complicated and often time consuming. So as part of this we will be deep learning techniques to detect acute intracranial hemorrhage and its subtypes.
18		18
19	Hemorrhage Types	19	Hemorrhage Types
20		20
21	1. Epidural	21	1. Epidural
22	2. Intraparenchymal	22	2. Intraparenchymal
23	3. Intraventricular	23	3. Intraventricular
24	4. Subarachnoid	24	4. Subarachnoid
25	5. Subdural	25	5. Subdural
26	6. Any	26	6. Any
27		27
28	##### What am i predicting?	28	##### What am i predicting?
29		29
30	In this competition our goal is to predict intracranial hemorrhage and its subtypes. Given an image the we need to predict probablity of each subtype. This indicates its a multilabel classification problem.	30	In this competition our goal is to predict intracranial hemorrhage and its subtypes. Given an image the we need to predict probablity of each subtype. This indicates its a multilabel classification problem.
31		31
32	##### Evaluation Metric	32	##### Evaluation Metric
33		33
34	Competition evaluation metric is weighted log loss but weights for each subtype is not disclosed as part of the competition but in the discussion forms some of the teams found it out that the any label has a weight of 2 compared to other subtypes, you can check more details [here](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109526#latest-630190). But as part of this tutorial i'm going to use normal accuracy as evaluation metric and loss as binary cross entropy loss and checkpointing the models based on the loss.	34	Competition evaluation metric is weighted log loss but weights for each subtype is not disclosed as part of the competition but in the discussion forms some of the teams found it out that the any label has a weight of 2 compared to other subtypes, you can check more details [here](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109526#latest-630190). But as part of this tutorial i'm going to use normal accuracy as evaluation metric and loss as binary cross entropy loss and checkpointing the models based on the loss.
35		35
36		36
37	### 2. Basic EDA	37	### 2. Basic EDA
38		38
39	Lets look at the [data](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/data) that is provided.	39	Lets look at the [data](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/data) that is provided.
40		40
41	We have a train.csv containing file names and label indicating whether hemorrhage is present or not and train images folder which is set of [Dicom](https://www.dicomstandard.org/) files (Medical images are stored in dicom formats) and test images folder containing test dicom files.	41	We have a train.csv containing file names and label indicating whether hemorrhage is present or not and train images folder which is set of [Dicom](https://www.dicomstandard.org/) files (Medical images are stored in dicom formats) and test images folder containing test dicom files.
42		42
43	```python	43	```python
44	# load the csv file	44	# load the csv file
45	train_df = pd.read_csv(input_folder + 'stage_1_train.csv')	45	train_df = pd.read_csv(input_folder + 'stage_1_train.csv')
46	train_df.head()	46	train_df.head()
47	```	47	```
48	<img src='assets/df.png'/>
49		48
50	It consists of two columns ID and Label. ID has a format FILE_ID_SUB_TYPE for example ID_63eb1e259_epidural so ID_63eb1e259 is file id and epidural is subtype and Label indicating whether subtype hemorrhage is present or not.	49	It consists of two columns ID and Label. ID has a format FILE_ID_SUB_TYPE for example ID_63eb1e259_epidural so ID_63eb1e259 is file id and epidural is subtype and Label indicating whether subtype hemorrhage is present or not.
51		50
52	Lets seperate file names and subtypes	51	Lets seperate file names and subtypes
53		52
54	```python	53	```python
55	# extract subtype	54	# extract subtype
56	train_df['sub_type'] = train_df['ID'].apply(lambda x: x.split('_')[-1])	55	train_df['sub_type'] = train_df['ID'].apply(lambda x: x.split('_')[-1])
57	# extract filename	56	# extract filename
58	train_df['file_name'] = train_df['ID'].apply(lambda x: '_'.join(x.split('_')[:2]) + '.dcm')	57	train_df['file_name'] = train_df['ID'].apply(lambda x: '_'.join(x.split('_')[:2]) + '.dcm')
59	train_df.head()	58	train_df.head()
60	```	59	```
61	<img src='assets/df2.png'/>	60
62
63
64	```python	61	```python
65	train_df.shape	62	train_df.shape
66	````	63	````
67	Output : (4045572, 4)	64	Output : (4045572, 4)
68		65
69	```python	66	```python
70	print("Number of train images availabe:", len(os.listdir(path_train_img)))	67	print("Number of train images availabe:", len(os.listdir(path_train_img)))
71	```	68	```
72	Output : Number of train images availabe: 674258	69	Output : Number of train images availabe: 674258
73		70
74	The csv file has a shape of (4045572, 4). For every file(dicom file) present in the train folder has 6 entries in csv indicating possible 6 subtype hemorrhages.	71	The csv file has a shape of (4045572, 4). For every file(dicom file) present in the train folder has 6 entries in csv indicating possible 6 subtype hemorrhages.
75		72
76	Lets check the files available for each subtype	73	Lets check the files available for each subtype
77		74
78	```python	75	```python
79	plt.figure(figsize=(16, 6))	76	plt.figure(figsize=(16, 6))
80	graph = sns.countplot(x="sub_type", hue="Label", data=(train_df))	77	graph = sns.countplot(x="sub_type", hue="Label", data=(train_df))
81	graph.set_xticklabels(graph.get_xticklabels(),rotation=90)	78	graph.set_xticklabels(graph.get_xticklabels(),rotation=90)
82	plt.show()	79	plt.show()
83	```	80	```
84	<img src='assets/counts.png'/>	81
85
86
87	Lets check the counts for each subtype	82	Lets check the counts for each subtype
88		83
89	##### Epidural	84	##### Epidural
90		85
91	```python	86	```python
92	train_df[train_df['sub_type'] == 'epidural']['Label'].value_counts()	87	train_df[train_df['sub_type'] == 'epidural']['Label'].value_counts()
93	```	88	```
94	Output:	89	Output:
95		90
96	0 671501	91	0 671501
97		92
98	1 2761	93	1 2761
99		94
100	Name: Label, dtype: int64	95	Name: Label, dtype: int64
101		96
102	For epidural sub type we have 6,71,501 images labeled as 0 and 2,761 labelled as 1.	97	For epidural sub type we have 6,71,501 images labeled as 0 and 2,761 labelled as 1.
103		98
104	##### Intraparenchymal	99	##### Intraparenchymal
105		100
106	```python	101	```python
107	train_df[train_df['sub_type'] == 'intraparenchymal']['Label'].value_counts()	102	train_df[train_df['sub_type'] == 'intraparenchymal']['Label'].value_counts()
108	```	103	```
109	Output: <br/>	104	Output: <br/>
110	0 641698<br/>	105	0 641698<br/>
111	1 32564<br/>	106	1 32564<br/>
112	Name: Label, dtype: int64	107	Name: Label, dtype: int64
113		108
114	For intraparenchymal sub type we have 6,41,698 images labeled as 0 and 32,564 labelled as 1.	109	For intraparenchymal sub type we have 6,41,698 images labeled as 0 and 32,564 labelled as 1.
115		110
116		111
117	##### Intraparenchymal	112	##### Intraparenchymal
118		113
119	```python	114	```python
120	train_df[train_df['sub_type'] == 'intraparenchymal']['Label'].value_counts()	115	train_df[train_df['sub_type'] == 'intraparenchymal']['Label'].value_counts()
121	```	116	```
122	Output: <br/>	117	Output: <br/>
123	0 650496<br/>	118	0 650496<br/>
124	1 23766<br/>	119	1 23766<br/>
125	Name: Label, dtype: int64	120	Name: Label, dtype: int64
126		121
127	For intraparenchymal sub type we have 6,50,496 images labeled as 0 and 23,766 labelled as 1.	122	For intraparenchymal sub type we have 6,50,496 images labeled as 0 and 23,766 labelled as 1.
128		123
129	##### Subarachnoid	124	##### Subarachnoid
130		125
131	```python	126	```python
132	train_df[train_df['sub_type'] == 'subarachnoid']['Label'].value_counts()	127	train_df[train_df['sub_type'] == 'subarachnoid']['Label'].value_counts()
133	```	128	```
134	Output: <br/>	129	Output: <br/>
135	0 642140<br/>	130	0 642140<br/>
136	1 32122<br/>	131	1 32122<br/>
137	Name: Label, dtype: int64	132	Name: Label, dtype: int64
138		133
139	For subarachnoid sub type we have 6,42,140 images labeled as 0 and 32,122 labelled as 1.	134	For subarachnoid sub type we have 6,42,140 images labeled as 0 and 32,122 labelled as 1.
140		135
141		136
142	##### Subdural	137	##### Subdural
143		138
144	```python	139	```python
145	train_df[train_df['sub_type'] == 'subdural']['Label'].value_counts()	140	train_df[train_df['sub_type'] == 'subdural']['Label'].value_counts()
146	```	141	```
147	Output: <br/>	142	Output: <br/>
148	0 631766<br/>	143	0 631766<br/>
149	1 42496<br/>	144	1 42496<br/>
150	Name: Label, dtype: int64	145	Name: Label, dtype: int64
151		146
152	For Subdural sub type we have 6,31,766 images labeled as 0 and 42,496 labelled as 1.	147	For Subdural sub type we have 6,31,766 images labeled as 0 and 42,496 labelled as 1.
153		148
154		149
155	##### Any	150	##### Any
156		151
157	```python	152	```python
158	train_df[train_df['sub_type'] == 'any']['Label'].value_counts()	153	train_df[train_df['sub_type'] == 'any']['Label'].value_counts()
159	```	154	```
160	Output: <br/>	155	Output: <br/>
161	0 577159<br/>	156	0 577159<br/>
162	1 97103<br/>	157	1 97103<br/>
163	Name: Label, dtype: int64	158	Name: Label, dtype: int64
164		159
165	For any sub type we have 5,77,159 images labeled as 0 and 97,103 labelled as 1.	160	For any sub type we have 5,77,159 images labeled as 0 and 97,103 labelled as 1.
166		161
167	### 3. Data Visualization & Preprocessing	162	### 3. Data Visualization & Preprocessing
168		163
169	Lets look at the dicom files in the dataset	164	Lets look at the dicom files in the dataset
170		165
171	```python	166	```python
172	dicom = pydicom.read_file(path_train_img + 'ID_ffff922b9.dcm')	167	dicom = pydicom.read_file(path_train_img + 'ID_ffff922b9.dcm')
173	print(dicom)	168	print(dicom)
174	```	169	```
175	<img src='assets/dicom.png'/>	170
176		171
177
178	Dicom data format files contain pixel data of image and other meta data like patient name, instance id, window width etc...	172	Dicom data format files contain pixel data of image and other meta data like patient name, instance id, window width etc...
179		173
180	Original image	174	Original image
181		175
182	```python	176	```python
183	plt.imshow(dicom.pixel_array, cmap=plt.cm.bone)	177	plt.imshow(dicom.pixel_array, cmap=plt.cm.bone)
184	plt.show()	178	plt.show()
185	```	179	```
186	<img src='assets/original.png'/>	180
187		181
188
189	The orginal image seems to have difficult to understand, lets check meta deta features like Window Center, Window Width, Rescale Intercept, Rescale Slope	182	The orginal image seems to have difficult to understand, lets check meta deta features like Window Center, Window Width, Rescale Intercept, Rescale Slope
190		183
191	<img src='assets/meta.png'/>	184
192
193
194	We can use these features to construct the new image.	185	We can use these features to construct the new image.
195		186
196	```python	187	```python
197	def get_dicom_field_value(key, dicom):	188	def get_dicom_field_value(key, dicom):
198	"""	189	"""
199	@param key: key is tuple	190	@param key: key is tuple
200	@param dicom: dicom file	191	@param dicom: dicom file
201	"""	192	"""
202	return dicom[key].value	193	return dicom[key].value
203		194
204	window_center = int(get_dicom_field_value(('0028', '1050'), dicom))	195	window_center = int(get_dicom_field_value(('0028', '1050'), dicom))
205	window_width = int(get_dicom_field_value(('0028', '1051'), dicom))	196	window_width = int(get_dicom_field_value(('0028', '1051'), dicom))
206	window_intercept = int(get_dicom_field_value(('0028', '1052'), dicom))	197	window_intercept = int(get_dicom_field_value(('0028', '1052'), dicom))
207	window_slope = int(get_dicom_field_value(('0028', '1053'), dicom))	198	window_slope = int(get_dicom_field_value(('0028', '1053'), dicom))
208	window_center, window_width, window_intercept, window_slope	199	window_center, window_width, window_intercept, window_slope
209		200
210	def get_windowed_image(image, wc,ww, intercept, slope):	201	def get_windowed_image(image, wc,ww, intercept, slope):
211	img = (image*slope +intercept)	202	img = (image*slope +intercept)
212	img_min = wc - ww//2	203	img_min = wc - ww//2
213	img_max = wc + ww//2	204	img_max = wc + ww//2
214	img[img<img_min] = img_min	205	img[img<img_min] = img_min
215	img[img>img_max] = img_max	206	img[img>img_max] = img_max
216	return img	207	return img
217		208
218	windowed_image = get_windowed_image(dicom.pixel_array, window_center, window_width, \	209	windowed_image = get_windowed_image(dicom.pixel_array, window_center, window_width, \
219	window_intercept, window_slope)	210	window_intercept, window_slope)
220		211
221	plt.imshow(windowed_image, cmap=plt.cm.bone)	212	plt.imshow(windowed_image, cmap=plt.cm.bone)
222	plt.show()	213	plt.show()
223	```	214	```
224	<img src='assets/windowed.png'/>	215
225		216
226		217
227
228	The windowed image using meta data is much better than the orginal image this is because the dicom pixel array which contain pixel data contain raw data in Hounsfield units (HU).	218	The windowed image using meta data is much better than the orginal image this is because the dicom pixel array which contain pixel data contain raw data in Hounsfield units (HU).
229		219
230	Scaling the image:	220	Scaling the image:
231		221
232	Rescale the image to range 0-255.	222	Rescale the image to range 0-255.
233		223
234	```python	224	```python
235	def get_scaled_windowed_image(img):	225	def get_scaled_windowed_image(img):
236	"""	226	"""
237	Get scaled image	227	Get scaled image
238	1. Convert to float	228	1. Convert to float
239	2. Rescale to 0-255	229	2. Rescale to 0-255
240	3. Convert to unit8	230	3. Convert to unit8
241	"""	231	"""
242	img_2d = img.astype(float)	232	img_2d = img.astype(float)
243	img_2d_scaled = (np.maximum(img_2d,0) / img_2d.max()) * 255.0	233	img_2d_scaled = (np.maximum(img_2d,0) / img_2d.max()) * 255.0
244	img_2d_scaled = np.uint8(img_2d_scaled)	234	img_2d_scaled = np.uint8(img_2d_scaled)
245	return img_2d_scaled	235	return img_2d_scaled
246		236
247	scaled_image = get_scaled_windowed_image(windowed_image)	237	scaled_image = get_scaled_windowed_image(windowed_image)
248	plt.imshow(scaled_image, cmap=plt.cm.bone, vmin=0, vmax=255)	238	plt.imshow(scaled_image, cmap=plt.cm.bone, vmin=0, vmax=255)
249	plt.show()	239	plt.show()
250	```	240	```
251	<img src='assets/scaled.png'/>	241
252		242
253
254	Hounsfield Units (HU) are the best source for constructing CT images. [Here](https://en.wikipedia.org/wiki/Hounsfield_scale) is detailed table showing the substance and HU range.	243	Hounsfield Units (HU) are the best source for constructing CT images. [Here](https://en.wikipedia.org/wiki/Hounsfield_scale) is detailed table showing the substance and HU range.
255		244
256	A detailed explanation of all the possible windowing techniques can be found in this great kernel [(Gradient Sigmoid Windowing)](https://www.kaggle.com/reppic/gradient-sigmoid-windowing)	245	A detailed explanation of all the possible windowing techniques can be found in this great kernel [(Gradient Sigmoid Windowing)](https://www.kaggle.com/reppic/gradient-sigmoid-windowing)
257		246
258	```python	247	```python
259		248
260	def correct_dcm(dcm):	249	def correct_dcm(dcm):
261	# Refer Jeremy Howard's Kernel https://www.kaggle.com/jhoward/from-prototyping-to-submission-fastai	250	# Refer Jeremy Howard's Kernel https://www.kaggle.com/jhoward/from-prototyping-to-submission-fastai
262	x = dcm.pixel_array + 1000	251	x = dcm.pixel_array + 1000
263	px_mode = 4096	252	px_mode = 4096
264	x[x>=px_mode] = x[x>=px_mode] - px_mode	253	x[x>=px_mode] = x[x>=px_mode] - px_mode
265	dcm.PixelData = x.tobytes()	254	dcm.PixelData = x.tobytes()
266	dcm.RescaleIntercept = -1000	255	dcm.RescaleIntercept = -1000
267		256
268	def window_image(dcm, window_center, window_width):	257	def window_image(dcm, window_center, window_width):
269		258
270	if (dcm.BitsStored == 12) and (dcm.PixelRepresentation == 0) and (int(dcm.RescaleIntercept) > -100):	259	if (dcm.BitsStored == 12) and (dcm.PixelRepresentation == 0) and (int(dcm.RescaleIntercept) > -100):
271	correct_dcm(dcm)	260	correct_dcm(dcm)
272		261
273	img = dcm.pixel_array * dcm.RescaleSlope + dcm.RescaleIntercept	262	img = dcm.pixel_array * dcm.RescaleSlope + dcm.RescaleIntercept
274	img_min = window_center - window_width // 2	263	img_min = window_center - window_width // 2
275	img_max = window_center + window_width // 2	264	img_max = window_center + window_width // 2
276	img = np.clip(img, img_min, img_max)	265	img = np.clip(img, img_min, img_max)
277		266
278	return img	267	return img
279		268
280	def bsb_window(dcm):	269	def bsb_window(dcm):
281	brain_img = window_image(dcm, 40, 80)	270	brain_img = window_image(dcm, 40, 80)
282	subdural_img = window_image(dcm, 80, 200)	271	subdural_img = window_image(dcm, 80, 200)
283	soft_img = window_image(dcm, 40, 380)	272	soft_img = window_image(dcm, 40, 380)
284		273
285	brain_img = (brain_img - 0) / 80	274	brain_img = (brain_img - 0) / 80
286	subdural_img = (subdural_img - (-20)) / 200	275	subdural_img = (subdural_img - (-20)) / 200
287	soft_img = (soft_img - (-150)) / 380	276	soft_img = (soft_img - (-150)) / 380
288	bsb_img = np.array([brain_img, subdural_img, soft_img]).transpose(1,2,0)	277	bsb_img = np.array([brain_img, subdural_img, soft_img]).transpose(1,2,0)
289		278
290	return bsb_img	279	return bsb_img
291		280
292	display_dicom_image('ID_0005d340e.dcm')	281	display_dicom_image('ID_0005d340e.dcm')
293	```	282	```
294	<img src='assets/dicom_all.png'/>	283
295		284
296
297	It looks like Brain + Subdural is a good start for our models it has three chaneels and cab be easily fed to any pretrained models.	285	It looks like Brain + Subdural is a good start for our models it has three chaneels and cab be easily fed to any pretrained models.
298		286
299		287
300	### 4. Deep Learning Model	288	### 4. Deep Learning Model
301		289
302	The whole code for the training of the model can be found [here](/notebooks/Effnet-B0 Windowed Image.ipynb)	290	The whole code for the training of the model can be found [here](/notebooks/Effnet-B0 Windowed Image.ipynb)
303		291
304	We will using normal windowed images for training the model with augmentations like flip left right and random cropping.	292	We will using normal windowed images for training the model with augmentations like flip left right and random cropping.
305		293
306	Here are steps for training the model	294	Here are steps for training the model
307		295
308	1. Prepare train and validation data generators we will be splitting the data by stratifying the labels here id the link to [multilabel stratification](https://github.com/trent-b/iterative-stratification). We will make two splits and onlt work on the first split and check the results.	296	1. Prepare train and validation data generators we will be splitting the data by stratifying the labels here id the link to [multilabel stratification](https://github.com/trent-b/iterative-stratification). We will make two splits and onlt work on the first split and check the results.
309	2. Load pretrained Efficient Net B0 model.	297	2. Load pretrained Efficient Net B0 model.
310	3. For the first epoch use all the train images for training the model with the first head layers using as it as is by setting trainable as False but train all the later images and save the model.	298	3. For the first epoch use all the train images for training the model with the first head layers using as it as is by setting trainable as False but train all the later images and save the model.
311	4. Load the saved model and for the further epochs we train whole model except the last layer thus our model will learn most compliated features.	299	4. Load the saved model and for the further epochs we train whole model except the last layer thus our model will learn most compliated features.
312	5. Make predictions.	300	5. Make predictions.
313		301
314	Sample code:	302	Sample code:
315		303
316	```python	304	```python
317	# 1. ---------prepare data generators-------------#	305	# 1. ---------prepare data generators-------------#
318	# https://github.com/trent-b/iterative-stratification	306	# https://github.com/trent-b/iterative-stratification
319	# Mutlilabel stratification	307	# Mutlilabel stratification
320	splits = MultilabelStratifiedShuffleSplit(n_splits = 2, test_size = TEST_SIZE, random_state = SEED)	308	splits = MultilabelStratifiedShuffleSplit(n_splits = 2, test_size = TEST_SIZE, random_state = SEED)
321	file_names = train_final_df.index	309	file_names = train_final_df.index
322	labels = train_final_df.values	310	labels = train_final_df.values
323	# Lets take only the first split	311	# Lets take only the first split
324	split = next(splits.split(file_names, labels))	312	split = next(splits.split(file_names, labels))
325	train_idx = split[0]	313	train_idx = split[0]
326	valid_idx = split[1]	314	valid_idx = split[1]
327	submission_predictions = []	315	submission_predictions = []
328	len(train_idx), len(valid_idx)	316	len(train_idx), len(valid_idx)
329	# train data generator	317	# train data generator
330	data_generator_train = TrainDataGenerator(train_final_df.iloc[train_idx],	318	data_generator_train = TrainDataGenerator(train_final_df.iloc[train_idx],
331	train_final_df.iloc[train_idx],	319	train_final_df.iloc[train_idx],
332	TRAIN_BATCH_SIZE,	320	TRAIN_BATCH_SIZE,
333	(WIDTH, HEIGHT),	321	(WIDTH, HEIGHT),
334	augment = True)	322	augment = True)
335		323
336	# validation data generator	324	# validation data generator
337	data_generator_val = TrainDataGenerator(train_final_df.iloc[valid_idx],	325	data_generator_val = TrainDataGenerator(train_final_df.iloc[valid_idx],
338	train_final_df.iloc[valid_idx],	326	train_final_df.iloc[valid_idx],
339	VALID_BATCH_SIZE,	327	VALID_BATCH_SIZE,
340	(WIDTH, HEIGHT),	328	(WIDTH, HEIGHT),
341	augment = False)	329	augment = False)
342	# 2. ---------load efficient net B0 model-----------#	330	# 2. ---------load efficient net B0 model-----------#
343	base_model = efn.EfficientNetB0(weights = 'imagenet', include_top = False, \	331	base_model = efn.EfficientNetB0(weights = 'imagenet', include_top = False, \
344	pooling = 'avg', input_shape = (HEIGHT, WIDTH, 3))	332	pooling = 'avg', input_shape = (HEIGHT, WIDTH, 3))
345	x = base_model.output	333	x = base_model.output
346	x = Dropout(0.125)(x)	334	x = Dropout(0.125)(x)
347	output_layer = Dense(6, activation = 'sigmoid')(x)	335	output_layer = Dense(6, activation = 'sigmoid')(x)
348	model = Model(inputs=base_model.input, outputs=output_layer)	336	model = Model(inputs=base_model.input, outputs=output_layer)
349	model.compile(optimizer = Adam(learning_rate = 0.0001),	337	model.compile(optimizer = Adam(learning_rate = 0.0001),
350	loss = 'binary_crossentropy',	338	loss = 'binary_crossentropy',
351	metrics = ['acc', tf.keras.metrics.AUC()])	339	metrics = ['acc', tf.keras.metrics.AUC()])
352	model.summary()	340	model.summary()
353		341
354	# 3. ---------for 1 st epoch train on whole dataset ------------#	342	# 3. ---------for 1 st epoch train on whole dataset ------------#
355	for layer in model.layers[:-5]:	343	for layer in model.layers[:-5]:
356	layer.trainable = False	344	layer.trainable = False
357		345
358	model.compile(optimizer = Adam(learning_rate = 0.0001),	346	model.compile(optimizer = Adam(learning_rate = 0.0001),
359	loss = 'binary_crossentropy',	347	loss = 'binary_crossentropy',
360	metrics = ['acc'])	348	metrics = ['acc'])
361		349
362	model.fit_generator(generator = data_generator_train,	350	model.fit_generator(generator = data_generator_train,
363	validation_data = data_generator_val,	351	validation_data = data_generator_val,
364	epochs = 1,	352	epochs = 1,
365	callbacks = callbacks_list,	353	callbacks = callbacks_list,
366	verbose = 1)	354	verbose = 1)
367		355
368	# 4. ---------for rest of epochs train on sample data----------#	356	# 4. ---------for rest of epochs train on sample data----------#
369	model.load_weights('model.h5')	357	model.load_weights('model.h5')
370	model.compile(optimizer = Adam(learning_rate = 0.0004),	358	model.compile(optimizer = Adam(learning_rate = 0.0004),
371	loss = 'binary_crossentropy',	359	loss = 'binary_crossentropy',
372	metrics = ['acc'])	360	metrics = ['acc'])
373	model.fit_generator(generator = data_generator_train,	361	model.fit_generator(generator = data_generator_train,
374	validation_data = data_generator_val,	362	validation_data = data_generator_val,
375	steps_per_epoch=len(data_generator_train)/6,	363	steps_per_epoch=len(data_generator_train)/6,
376	epochs = 10,	364	epochs = 10,
377	callbacks = callbacks_list,	365	callbacks = callbacks_list,
378	verbose = 1)	366	verbose = 1)
379	# 5. --------Make Predictions ------- --------------------------#	367	# 5. --------Make Predictions ------- --------------------------#
380	model.load_weights('model.h5')	368	model.load_weights('model.h5')
381		369
382	def get_scores(data_gen, file_name='scores.pkl'):	370	def get_scores(data_gen, file_name='scores.pkl'):
383	scores = model.evaluate_generator(data_gen, verbose=1)	371	scores = model.evaluate_generator(data_gen, verbose=1)
384	joblib.dump(scores, file_name)	372	joblib.dump(scores, file_name)
385	print(f"Loss: {scores[0]} and Accuracy: {scores[1]*100}")	373	print(f"Loss: {scores[0]} and Accuracy: {scores[1]*100}")
386	```	374	```
387		375
388	Lets predict on train and validation generators.	376	Lets predict on train and validation generators.
389		377
390	```python	378	```python
391	get_scores(data_gen=data_generator_train, file_name='train_scores.pkl')	379	get_scores(data_gen=data_generator_train, file_name='train_scores.pkl')
392	```	380	```
393	<img src='assets/train.png'/>	381
394
395	```python	382	```python
396	get_scores(data_gen=data_generator_val, file_name='val_scores.pkl')	383	get_scores(data_gen=data_generator_val, file_name='val_scores.pkl')
397	```	384	```
398	<img src='assets/val.png'/>	385
399
400	Lets load test data frame, test data csv is also in the same format as train.csv	386	Lets load test data frame, test data csv is also in the same format as train.csv
401		387
402	```python	388	```python
403	# extract subtype	389	# extract subtype
404	test_df['sub_type'] = test_df['ID'].apply(lambda x: x.split('_')[-1])	390	test_df['sub_type'] = test_df['ID'].apply(lambda x: x.split('_')[-1])
405	# extract filename	391	# extract filename
406	test_df['file_name'] = test_df['ID'].apply(lambda x: '_'.join(x.split('_')[:2]) + '.dcm')	392	test_df['file_name'] = test_df['ID'].apply(lambda x: '_'.join(x.split('_')[:2]) + '.dcm')
407		393
408	test_df = pd.pivot_table(test_df.drop(columns='ID'), index="file_name", \	394	test_df = pd.pivot_table(test_df.drop(columns='ID'), index="file_name", \
409	columns="sub_type", values="Label")	395	columns="sub_type", values="Label")
410	test_df.head()	396	test_df.head()
411		397
412	test_df.shape	398	test_df.shape
413	```	399	```
414		400
415	Output: (78545, 6)	401	Output: (78545, 6)
416		402
417	So we have 78,545 test images and we need to predict 6 labels for each image.	403	So we have 78,545 test images and we need to predict 6 labels for each image.
418		404
419	```python	405	```python
420	preds = model.predict_generator(TestDataGenerator(test_df.index, None, VALID_BATCH_SIZE, \	406	preds = model.predict_generator(TestDataGenerator(test_df.index, None, VALID_BATCH_SIZE, \
421	(WIDTH, HEIGHT), path_test_img),	407	(WIDTH, HEIGHT), path_test_img),
422	verbose=1)	408	verbose=1)
423	print(preds.shape)	409	print(preds.shape)
424	```	410	```
425	Output: (78545, 6)	411	Output: (78545, 6)
426		412
427		413
428	As per sample submission given by kaggle it is in a different format, the submission should be made with ID and Label column where ID is in the form of <b>dicomId_subType</b>(Ex:ID_0fbf6a978_subarachnoid) so we need format this to convert each prediction to 6 rows each indicating the id with sub type and its probability. The following code generates the required format for submission.	414	As per sample submission given by kaggle it is in a different format, the submission should be made with ID and Label column where ID is in the form of <b>dicomId_subType</b>(Ex:ID_0fbf6a978_subarachnoid) so we need format this to convert each prediction to 6 rows each indicating the id with sub type and its probability. The following code generates the required format for submission.
429		415
430	```python	416	```python
431	def create_download_link(title = "Download CSV file", filename = "data.csv"):	417	def create_download_link(title = "Download CSV file", filename = "data.csv"):
432	"""	418	"""
433	Helper function to generate download link to files in kaggle kernel	419	Helper function to generate download link to files in kaggle kernel
434	"""	420	"""
435	html = '<a href={filename}>{title}</a>'	421	html = '<a href={filename}>{title}</a>'
436	html = html.format(title=title,filename=filename)	422	html = html.format(title=title,filename=filename)
437	return HTML(html)	423	return HTML(html)
438		424
439	def generate_submission_file(preds):	425	def generate_submission_file(preds):
440	from tqdm import tqdm	426	from tqdm import tqdm
441		427
442	cols = list(train_final_df.columns)	428	cols = list(train_final_df.columns)
443		429
444	# We have preditions for each of the image	430	# We have preditions for each of the image
445	# We need to make 6 rows for each of file according to the subtype	431	# We need to make 6 rows for each of file according to the subtype
446	ids = []	432	ids = []
447	values = []	433	values = []
448	for i, j in tqdm(zip(preds, test_df.index.to_list()), total=preds.shape[0]):	434	for i, j in tqdm(zip(preds, test_df.index.to_list()), total=preds.shape[0]):
449	# print(i, j)	435	# print(i, j)
450	# i=[any_prob, epidural_prob, intraparenchymal_prob, intraventricular_prob, subarachnoid_prob, subdural_prob]	436	# i=[any_prob, epidural_prob, intraparenchymal_prob, intraventricular_prob, subarachnoid_prob, subdural_prob]
451	# j = filename ==> ID_xyz.dcm	437	# j = filename ==> ID_xyz.dcm
452	for k in range(i.shape[0]):	438	for k in range(i.shape[0]):
453	ids.append([j.replace('.dcm', '_' + cols[k])])	439	ids.append([j.replace('.dcm', '_' + cols[k])])
454	values.append(i[k])	440	values.append(i[k])
455		441
456	df = pd.DataFrame(data=ids)	442	df = pd.DataFrame(data=ids)
457	df.head()	443	df.head()
458		444
459	sample_df = pd.read_csv(input_folder + 'stage_1_sample_submission.csv')	445	sample_df = pd.read_csv(input_folder + 'stage_1_sample_submission.csv')
460	sample_df.head()	446	sample_df.head()
461		447
462	df['Label'] = values	448	df['Label'] = values
463	df.columns = sample_df.columns	449	df.columns = sample_df.columns
464	df.head()	450	df.head()
465		451
466	df.to_csv('submission.csv', index=False)	452	df.to_csv('submission.csv', index=False)
467		453
468	return create_download_link(filename='submission.csv')	454	return create_download_link(filename='submission.csv')
469	```	455	```
470		456
471	```python	457	```python
472	df = pd.read_csv('submission.csv')	458	df = pd.read_csv('submission.csv')
473	df.head()	459	df.head()
474	```	460	```
475		461
476	<img src='assets/sample_sub.png'/>	462
477
478	All notebooks can be found [here](https://github.com/suryachintu/RSNA-Intracranial-Hemorrhage-Detection/tree/master/notebooks)	463	All notebooks can be found [here](https://github.com/suryachintu/RSNA-Intracranial-Hemorrhage-Detection/tree/master/notebooks)
479		464
480	### 5. Demo	465	### 5. Demo
481		466
482	You can test the model by uploading the DICOM file [here](http://34.93.89.75:5325/)	467	You can test the model by uploading the DICOM file [here](http://34.93.89.75:5325/)
483		468
484		469
485	### References	470	### References
486		471
487	https://my.clevelandclinic.org/health/diseases/14480-intracranial-hemorrhage-cerebral-hemorrhage-and-hemorrhagic-stroke<br/>	472	https://my.clevelandclinic.org/health/diseases/14480-intracranial-hemorrhage-cerebral-hemorrhage-and-hemorrhagic-stroke<br/>
488	https://github.com/MGH-LMIC/windows_optimization<br/>	473	https://github.com/MGH-LMIC/windows_optimization<br/>
489	https://arxiv.org/abs/1812.00572(Must read)	474	https://arxiv.org/abs/1812.00572(Must read)
490	https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/111325#latest-650043	475	https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/111325#latest-650043
491	https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109261#latest-651855	476	https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109261#latest-651855
492		477
493	### Kaggle Kernels	478	### Kaggle Kernels
494		479
495	https://www.kaggle.com/jhoward/some-dicom-gotchas-to-be-aware-of-fastai	480	https://www.kaggle.com/jhoward/some-dicom-gotchas-to-be-aware-of-fastai
496	https://www.kaggle.com/reppic/gradient-sigmoid-windowing	481	https://www.kaggle.com/reppic/gradient-sigmoid-windowing
497	https://www.kaggle.com/jhoward/from-prototyping-to-submission-fastai	482	https://www.kaggle.com/jhoward/from-prototyping-to-submission-fastai
498	https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-1	483	https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-1
499	https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-2	484	https://www.kaggle.com/suryaparsa/rsna-basic-eda-part-2