*To run this notebook, please provide the following three paths:*

In [0]:
# Images to be windowed
path_to_images = '/path/to/pngs/'

# Where to store windowed images
path_to_train = '/path/to/Windowed-PNGs-train/'
path_to_test = '/path/to/Windowed-PNGs-test/'

# **Installing & Importing Dependencies**

In [0]:
!pip install imageio
!pip install pillow

In [0]:
from PIL import Image

import numpy as np
import pandas as pd
import imageio
import pickle
import glob
import random

## **Reading in File Names**

In [0]:
patient_df = pd.read_pickle('C:\\Users\\Administrator\\Downloads\\ordered_slices_by_patient.pkl')

# We find the total number of patients (usually, a patient has something like 30-50 associated PNGs, i.e., CT slices of their brain)
len(patient_df)

17079

## **Randomly Subsample 2,500 Patients**

For **faster prototyping**, we randomly subsample 2500 patients. This still yields more than enough images to successfully train and evaluate our models.

In [0]:
fewer_images = dict(random.sample(patient_df.items(), k = 2500))

In [0]:
# Save the list of randomly subsampled patients for reproducibility
with open("ordered_slices_by_patient_randsubset.pkl", "wb") as f:
    pickle.dump(fewer_images, f)

In [0]:
# Unpack the images (in this NumPy array, they are no longer associated with a given patient)
fewer_images_flat = np.concatenate(list(fewer_images.values()))
nb_ims = len(fewer_images_flat)

# **Windowing**

When examining brain CT scans, radiologists rarely look at the raw images (they appear mostly gray to the human eye). Instead, they use so-called "windows"---simple transformations of the raw data that serve to highlight structures of different density in the human brain. The three most common windows for hemorrhage detection are the **bone, brain, and subdural window**.

These are also the three windows that we apply to help our model detect hemorrhages. Specifically, we read in black-and-white, one-channel PNGs and turn them into **RGB**, three-channel PNGs where each channel contains one specific window.

In [0]:
# NOTE: The code in this cell is from
# https://github.com/darraghdog/rsna/blob/master/scripts/prepare_meta_dicom.py

# This function can apply any window to a given image (passed as a NumPy array)
# Note that a window is specified by only two parameters: center and width
def apply_window(image, center, width):
    image = image.copy()
    min_value = center - width // 2
    max_value = center + width // 2
    image[image < min_value] = min_value
    image[image > max_value] = max_value
    return image

# This function contains our specific windowing policy:
# Namely, we perform brain, subdural, and bone windowing, then we concatenate these three windows
def apply_window_policy(image):
    image1 = apply_window(image, 40, 80) # brain
    image2 = apply_window(image, 80, 200) # subdural
    image3 = apply_window(image, 40, 380) # bone
    image1 = (image1 - 0) / 80
    image2 = (image2 - (-20)) / 200
    image3 = (image3 - (-150)) / 380
    image = np.array([image1 - image1.mean(),
                      image2 - image2.mean(),
                      image3 - image3.mean(),
                      ]).transpose(1,2,0)

    return image

Performing the actual windowing:

In [0]:
# Iterate over all PNGs to window them
for i, image in enumerate(fewer_images_flat):

    try:
        # Load PNG as NumPy array
        raw_im = imageio.imread(path_to_images + image + '.png')
        print(i, 'out of {} loaded'.format(nb_ims))
        
        # Window PNG
        windowed_image = apply_window_policy(raw_im)
        print(i, 'out of {} windowed'.format(nb_ims))
    
        # Rescale the image to have pixel values in range [0, 255] & convert to uint8
        rescaled_image = 255.0 / windowed_image.max() * (windowed_image - windowed_image.min())
        rescaled_image = rescaled_image.astype(np.uint8)
        print('Rescaled image {} out of {}'.format(i, nb_ims))

        # Turn NumPy array into PNG again
        final_im = Image.fromarray(rescaled_image)
        
        # Use 16,500 images for testing and the rest for training purposes (this is approximately a 70/30 split)
        if i < 16500:
            final_im.save(path_to_test + fewer_images_flat[i] + '.png')
        else:
            final_im.save(path_to_train + fewer_images_flat[i] + '.png')
        print('Saved image {} out of {}'.format(i + 1, nb_ims))
        
    except FileNotFoundError:
        print('Skipping', image)