2050 lines (2049 with data), 81.4 kB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction about dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p><b>Intracranial hemorrhageš§ (ICH)</b> is caused by bleeding within the brain tissue itself ā a life-threatening type of stroke. A stroke occurs when the brain is deprived of oxygen and blood supply. ICH is most commonly caused by hypertension, arteriovenous malformations, or head trauma. Treatment focuses on stopping the bleeding, removing the blood clot (hematoma), and relieving the pressure on the brain.</p>\n",
"<br/><br/>\n",
"<p><b>Diagnosis</b> requires an urgent procedure. When a patient shows acute neurological symptoms such as severe headache or loss of consciousness, highly trained specialists review medical images of the patientās cranium to look for the presence, location and type of hemorrhage. The process is complicated and often time consuming.</p>\n",
"<br/><br/>\n",
"<p>The current clinical protocol to diagnose Intracranial hemorrhageš§ ICH is examining Computerized Tomography (CT) scans by radiologists to detect ICH and localize its regions. However, this process relies heavily on the availability of an experienced radiologist.CT images are examined by an expert radiologist to determine whether ICH has occurred and if so, detect its type and region. However, this diagnosis process relies on the availability of a subspecialty-trained neuroradiologist, and as a result, could be time inefficient and even inaccurate, especially in remote areas where specialized care is scarce.</p>\n",
"<br/><br/>\n",
"<p>In Recent years ,the Advancement in <b>Deep learning</b> has enable us to solve various problem, even in some cases it shows us better results than humans.we will try to solve Intracranical hemorrhage detection and segmentation using CT scan dataset of brain which is annoted by expert radiologists. </p>\n",
"<p>The challenge is to build an algorithm to detect acute intracranial hemorrhage and its subtypes.</p>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
">Intraparenchymal hemorrhage is blood that is located completely within the brain itself; intraventricular or subarachnoid hemorrhage is blood that has leaked into the spaces of the brain that normally contain cerebrospinal fluid (the ventricles or subarachnoid cisterns). Extra-axial hemorrhages are blood that collects in the tissue coverings that surround the brain (e.g. subdural or epidural subtypes). ee figure.) Patients may exhibit more than one type of cerebral hemorrhage, which c may appear on the same image. While small hemorrhages are less morbid than large hemorrhages typically, even a small hemorrhage can lead to death because it is an indicator of another type of serious abnormality (e.g. cerebral aneurysm).\n",
">\n",
"> #### There are four types of ICH:\n",
"> * **Intraparenchymal hemorrhage**\n",
"> * **Epidural hemorrhage**\n",
"> * **Subdural hemorrhage**\n",
"> * **Subarachnoid hemorrhage**\n",
"> * **intraventricular hemorrhage**\n",
">\n",
"> one patient can exibits more than one type of hemorrhage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This datset have six classes \n",
"1. any - any of five class of hemorrhage is present or not in patient\n",
"2. epidural\n",
"3. intraparenchymal\n",
"4. intraventricular \n",
"5. subarachnoid\n",
"6. subdural\n",
"\n",
"It is possible that one patient have more than type of hemorrhage."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"base_url = '~/kaggle/rsna-intracranial-hemorrhage-detection/'"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"TRAIN_DIR = '/home/ubuntu/kaggle/rsna-intracranial-hemorrhage-detection/stage_2_train'\n",
"TEST_DIR = '/home/ubuntu/kaggle/rsna-intracranial-hemorrhage-detection/stage_2_test'"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import pandas as pd\n",
"import swifter\n",
"import numpy as np\n",
"from tqdm import *\n",
"import re\n",
"import seaborn as sns\n",
"import pydicom\n",
"import joblib"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: pydicom in /home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages (2.0.0)\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install pydicom"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"752803\r\n"
]
}
],
"source": [
"! ls {TRAIN_DIR} | wc -l\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"121232\r\n"
]
}
],
"source": [
"! ls {TEST_DIR} | wc -l"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ID_000012eaf.dcm\n",
"ID_000039fa0.dcm\n",
"ID_00005679d.dcm\n",
"ID_00008ce3c.dcm\n",
"ID_0000950d7.dcm\n",
"ls: write error: Broken pipe\n"
]
}
],
"source": [
"! ls {TRAIN_DIR} | head -n 5"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(4516842, 2)\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>Label</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ID_12cadc6af_epidural</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>ID_12cadc6af_intraparenchymal</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>ID_12cadc6af_intraventricular</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>ID_12cadc6af_subarachnoid</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>ID_12cadc6af_subdural</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>ID_12cadc6af_any</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>ID_38fd7baa0_epidural</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>ID_38fd7baa0_intraparenchymal</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>ID_38fd7baa0_intraventricular</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>ID_38fd7baa0_subarachnoid</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Label\n",
"0 ID_12cadc6af_epidural 0\n",
"1 ID_12cadc6af_intraparenchymal 0\n",
"2 ID_12cadc6af_intraventricular 0\n",
"3 ID_12cadc6af_subarachnoid 0\n",
"4 ID_12cadc6af_subdural 0\n",
"5 ID_12cadc6af_any 0\n",
"6 ID_38fd7baa0_epidural 0\n",
"7 ID_38fd7baa0_intraparenchymal 0\n",
"8 ID_38fd7baa0_intraventricular 0\n",
"9 ID_38fd7baa0_subarachnoid 0"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df = pd.read_csv(base_url+'stage_2_train.csv')\n",
"print(train_df.shape)\n",
"train_df.head(10)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(4516842, 3)\n"
]
}
],
"source": [
"train_df[['ID', 'Subtype']] = train_df['ID'].str.rsplit(pat='_', n=1, expand=True)\n",
"print(train_df.shape)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here if we look then we find that each image have output for each class as True(1) or False(0) mean single image have six duplicate image.so we will convert them into one_hot_encoder and then single image will have single row."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>Label</th>\n",
" <th>Subtype</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4516836</th>\n",
" <td>ID_4a85a3a3f</td>\n",
" <td>0</td>\n",
" <td>epidural</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4516837</th>\n",
" <td>ID_4a85a3a3f</td>\n",
" <td>0</td>\n",
" <td>intraparenchymal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4516838</th>\n",
" <td>ID_4a85a3a3f</td>\n",
" <td>0</td>\n",
" <td>intraventricular</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4516839</th>\n",
" <td>ID_4a85a3a3f</td>\n",
" <td>0</td>\n",
" <td>subarachnoid</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4516840</th>\n",
" <td>ID_4a85a3a3f</td>\n",
" <td>0</td>\n",
" <td>subdural</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4516841</th>\n",
" <td>ID_4a85a3a3f</td>\n",
" <td>0</td>\n",
" <td>any</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Label Subtype\n",
"4516836 ID_4a85a3a3f 0 epidural\n",
"4516837 ID_4a85a3a3f 0 intraparenchymal\n",
"4516838 ID_4a85a3a3f 0 intraventricular\n",
"4516839 ID_4a85a3a3f 0 subarachnoid\n",
"4516840 ID_4a85a3a3f 0 subdural\n",
"4516841 ID_4a85a3a3f 0 any"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df.tail(6)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"def fix_id(img_id, img_dir=TRAIN_DIR):\n",
" if not re.match(r'ID_[a-z0-9]+', img_id):\n",
" sop = re.search(r'[a-z0-9]+', img_id)\n",
" if sop:\n",
" img_id_new = f'ID_{sop[0]}'\n",
" return img_id_new\n",
" else:\n",
" print(img_id)\n",
" return img_id"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 ID_12cadc6af\n",
"1 ID_12cadc6af\n",
"2 ID_12cadc6af\n",
"3 ID_12cadc6af\n",
"4 ID_12cadc6af\n",
" ... \n",
"4516837 ID_4a85a3a3f\n",
"4516838 ID_4a85a3a3f\n",
"4516839 ID_4a85a3a3f\n",
"4516840 ID_4a85a3a3f\n",
"4516841 ID_4a85a3a3f\n",
"Name: ID, Length: 4516842, dtype: object"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df['ID'].apply(fix_id)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(752803, 7)\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th colspan=\"6\" halign=\"left\">Label</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Subtype</th>\n",
" <th></th>\n",
" <th>any</th>\n",
" <th>epidural</th>\n",
" <th>intraparenchymal</th>\n",
" <th>intraventricular</th>\n",
" <th>subarachnoid</th>\n",
" <th>subdural</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ID_000012eaf</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>ID_000039fa0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>ID_00005679d</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>ID_00008ce3c</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>ID_0000950d7</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Label \\\n",
"Subtype any epidural intraparenchymal intraventricular \n",
"0 ID_000012eaf 0 0 0 0 \n",
"1 ID_000039fa0 0 0 0 0 \n",
"2 ID_00005679d 0 0 0 0 \n",
"3 ID_00008ce3c 0 0 0 0 \n",
"4 ID_0000950d7 0 0 0 0 \n",
"\n",
" \n",
"Subtype subarachnoid subdural \n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 "
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_new = train_df.pivot_table(index='ID', columns='Subtype').reset_index()\n",
"print(train_new.shape)\n",
"train_new.head()"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Subtype\n",
"any 107933\n",
"epidural 3145\n",
"intraparenchymal 36118\n",
"intraventricular 26205\n",
"subarachnoid 35675\n",
"subdural 47166\n",
"dtype: int64\n"
]
}
],
"source": [
"subtype_ct = train_new['Label'].sum(axis=0)\n",
"print(subtype_ct)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Distribution of each type of Hemorrhage"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAcoAAAD4CAYAAABsWabOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAa6klEQVR4nO3deZhlVX3u8e8rgw3diEwqitigBEVl6oKAIoOiURNjjNygwaBo5N4YJUbRq1FRNMaHGI0XcaCNggPhooATPipGAQUCTTdTN7MDKIErwaEFBGngd/84q+R0WbXrNFbVqSq+n+c5T+299tp7rXV2db29h7NPqgpJkjS+hwy7A5IkzWYGpSRJHQxKSZI6GJSSJHUwKCVJ6rD+sDugqbflllvW4sWLh90NSZpTVqxYcWtVbTW23KCchxYvXszy5cuH3Q1JmlOS3DBeuadeJUnqYFBKktTBU6/z0FU3/owlb/rMsLshSTNqxfsPnZbtekQpSVIHg1KSpA4GpSRJHQxKSZI6GJSSJHUwKCVJ6mBQSpLUwaCUJKmDQSlJUgeDUpKkDgalJEkdDEpJkjoYlEOQ5EtJViS5Isnhrez2JO9NclmSC5I8MskmSX6UZINW52FJrh+dlyRNP4NyOF5ZVUuAEeCIJFsAC4ELqmoX4LvAq6vqNuBs4I/bei8BTquqNWM3mOTwJMuTLL/n17fNyCAk6cHAoByOI5JcBlwAPBbYAbgbOKMtXwEsbtP/BhzWpg8DThhvg1W1tKpGqmpk/Y03ma5+S9KDjt9HOcOS7A8cCOxdVb9OcjawAFhTVdWq3UvbN1V1XpLFSfYD1quqVUPotiQ9aHlEOfM2BX7RQvKJwF4DrPMZ4GQmOJqUJE0fg3LmfQNYP8nlwHvonX6dzEnAZvTCUpI0gzz1OsOq6jfA88ZZtKivzqnAqX3L9gFOrapfTnP3JEljGJSzXJIP0wvW5w+7L5L0YGRQznJV9bph90GSHsy8RilJUgeDUpKkDgalJEkdDEpJkjoYlJIkdTAoJUnqYFBKktTBz1HOQ0/aZguWv//QYXdDkuYFjyglSepgUEqS1MGglCSpg0EpSVIHg1KSpA4GpSRJHfx4yDx0981X8ON3P7WzzrZHrZyh3kjS3OYRpSRJHQxKSZI6GJSSJHUwKCVJ6mBQSpLUwaCUJKmDQSlJUgeDUpKkDgalJEkdDEpJkjoYlJIkdTAop0CSdyc5cJzy/ZOcMYXtnJ1kZKq2J0manA9FnwJVddRUbCdJgFTVfVOxPUnS788jygkkeVmSZUkuTXJ8kvWS3J7kA0kuTvLtJFu1uicmOahNPzfJ1UnOBf68b3vvSnJk3/yqJIvb66okHwUuBh6b5GNJlie5IsnRMzx0SVIfg3IcSZ4EHAw8vap2Be4FDgEWAhdX1e7AOcA7x6y3APgE8ALgGcCjBmxyR+AzVbVbVd0AvK2qRoCdgf2S7DxAnw9v4br853fcO2CzkqTJGJTjexawBLgoyaVtfnvgPuCUVudzwD5j1nsi8KOquq6qqtUZxA1VdUHf/F8kuRi4BHgysNNkG6iqpVU1UlUjmy9cb8BmJUmT8Rrl+AJ8uqreulZh8o4x9WqcdccrA7iHtf9jsqBv+o6+NrYDjgT2qKpfJDlxTF1J0gzyiHJ83wYOSvIIgCSbJ3kcvffroFbnL4Fzx6x3NbBdkse3+Zf2Lbse2L1tb3dguwnafhi94Fyd5JHA836/oUiSfh8eUY6jqq5M8nbgzCQPAdYAf0svwJ6cZAWwmt51zP717kpyOPC1JLfSC9KntMWnAYe2U7kXAddO0PZlSS4BrgB+CJw35QOUJA0svUtpGkSS26tq0bD7MZmdH7NRnfE/n9BZZ9ujVs5QbyRpbkiyot1IuRZPvUqS1MGgXAdz4WhSkjS1DEpJkjoYlJIkdTAoJUnqYFBKktTBoJQkqYNBKUlSB5/MMw9tuPWT2fao5cPuhiTNCx5RSpLUwaCUJKmDQSlJUgeDUpKkDgalJEkdDEpJkjr48ZB56OpbrubpH376sLuhAZz3Or+XW5rtPKKUJKmDQSlJUgeDUpKkDgalJEkdDEpJkjoYlJIkdTAoJUnqYFBKktTBoJQkqYNBKUlSB4NSkqQO0xaUSc4foM7rk2w8XX2Ybkn2T3LGDLa3OMmqmWpPkjSNQVlVTxug2uuBcYMyyXpT26PfbtcHwUuSBjadR5S3t5/7Jzk7yalJrk5yUnqOAB4NnJXkrNF1krw7yYXA3kmOSnJRklVJliZJq3d2kg8lOb8t27OV79nKLmk/d2zlr0jyhSRfBc5sZW9q2748ydGtbHGSq5J8IskVSc5MslFb9oQk/5HksiQXJ3l8G+qiccb2rCRf7Hsvnp3k9L4xHpNkRdvenm08P0zyp339+F5r5+Ikg/ynQ5I0DQYOyiT7JDmsTW+VZLt1aGc3ekePOwHbA0+vqmOBm4ADquqAVm8hsKqq/rCqzgWOq6o9quopwEbAn/Rtc2E7an0N8KlWdjWwb1XtBhwF/FNf/b2Bl1fVM5M8B9gB2BPYFViSZN9WbwfgI1X1ZOCXwItb+UmtfBfgacDNE40N+A7wpCRbtTqHASf0jfHsqloC3Ab8I/Bs4EXAu1udW4BnV9XuwMHAsd1vLyQ5PMnyJMvX3L5msuqSpAENdBoyyTuBEWBHen/wNwA+Ry8UBrGsqm5s27oUWAycO069e4HT+uYPSPJmeqdnNweuAL7alp0MUFXfTfKwJA8HNgE+nWQHoFo/R32rqn7epp/TXpe0+UX0AvLHwI+q6tJWvgJYnGQT4DFV9cXW5l1tLOOOrarOTfJZ4GVJTqAX0oe2bd4NfKNNrwR+U1Vrkqxs7wut38cl2bW9J38wznu1lqpaCiwFWLTtopqsviRpMINer3sRvSOniwGq6qYWHoP6Td/0vR3t3lVV9wIkWQB8FBipqp8keRewoK/u2DAo4D3AWVX1oiSLgbP7lt/RNx3gfVV1fP8G2jpj+7pRqz+RicZ2Ar1Qvwv4QlXd08rXVNVo3+8bXb+q7uu7fvr3wE+BXegd9d/V0b4kaRoNeur17vbHvQCSLJyi9m+jdxQ4ntFQvDXJIuCgMcsPbn3ZB1hdVauBTYH/astf0dHuN4FXtu2S5DFJHjFR5ar6FXBjkj9r9R862d26VXUTvVPLbwdO7Ko7jk2Bm6vqPuCvgGm5sUmSNLlBg/LzSY4HHp7k1cB/AJ+YgvaXAl8fvZmnX1X9srWxEvgScNGYKr9oH0H5OPCqVvbPwPuSnEdHuFTVmcC/A//ZTnmeysSBPeqvgCOSXA6cDzxqkvrQu675k6q6coC6/T4KvDzJBfROu94xSX1J0jTJ/WcBJ6mYPJvedT2AM6vqW9PWq8n7cjZwZFUtH1YfBpHkOOCSqvrkTLa7aNtFtcubdpnJJvUAnfe684bdBUlNkhVVNTK2fF0+U7iS3vW6atPqkGQFvSPBNw67L5KkB26gU69J/hpYBvw5vWuFFyR55XR2rEtV7T/bjyaraklV7VtVv5m8tiRpthr0iPJNwG5V9TOAJFvQu073qc61JEma4wa9medGeneojroN+MnUd0eSpNll0CPK/wIuTPJletcoXwgsS/IGgKr64DT1T5KkoRo0KH/QXqO+3H6uy0MHJEmacwYNytOqyq93kiQ96Ax6jfLjSZYleU17pqokSQ8KAx1RVtU+Sf6A3rdgLE+yDDixPeFGs8wTH/FEP8guSVNk4K/Zqqpr6T239H8D+wH/p30H459PV+ckSRq2QR84sHOSfwWuAp4JvKCqntSm/3Ua+ydJ0lANejPPcfQeUP4PVXXnaGH7uq23T0vPJEmaBQY99Xp6VX22PyST/B1AVX12WnomSdIsMGhQHjpO2SumsB+SJM1Knadek7wU+EtguyRf6Vu0CfCz6eyYJEmzwWTXKM8Hbga2BD7QV34bcPl0dUqSpNmiMyir6gbgBmDvJI8C9qT3rNdrquqeGeifHoDbrrmGc/bdb9jdmFP2++45w+6CpFlq0I+HvIpZ9H2UkiTNlEE/HvJm/D5KSdKDkN9HKUlSh8nuen1Dmxz3+yinuW+SJA3dZKdeR79vcqLvo5QkaV6b7K7Xo2eqI5IkzUYD3cyT5Cx6p1zXUlXPnPIeSZI0iwx61+uRfdMLgBcDfo5SkjTvDfrFzSvGFJ2XxE9oS5LmvUFPvW7eN/sQYAR41LT0SJKkWWTQz1GuAJa31/nAG4BXTVUnkpw/QJ3XJ9l4qtpcV0l2TfL8juUjSY59gNt+V5IjJ68pSZppnUGZZI8kj6qq7apqe+Bo4Or2unKqOlFVTxug2uuBcYMyyXpT1ZcOuwLjBmWS9atqeVUdMQP9IMmg15YlSb+nyY4ojwfuBkiyL/A+4NPAamDpVHUiye3t5/5Jzk5yapKrk5yUniOARwNntTtwSXJ7kncnuZDeQ9uPSnJRklVJlrb1npRkWV87i5Nc3qaXJDknyYok30yydSs/O8kxSZYluTbJM5JsCLwbODjJpUkObkeBS5OcCXym9f2Mto1FSU5IsjLJ5Ule3D/ONn1QkhPHeS9e3cZxWZLTRo+ik5yY5INt/MdM1XsvSeo2WVCuV1U/b9MHA0ur6rSqegfwhGnq0270jh53ArYHnl5VxwI3AQdU1QGt3kJgVVX9YVWdCxxXVXtU1VOAjYA/qaqrgA2TbN83hs8n2QD4MHBQVS2h98za9/b1Yf2q2rP1451VdTdwFHBKVe1aVae0ekuAF1bVX44ZwzuA1VX11KraGfjOOoz/9DaOXYCrWPsU9x8AB1bVG8eulOTwJMuTLF+9Zs06NCdJ6jJpUPad5nsWa//Bn67Tf8uq6saqug+4FFg8Qb17gdP65g9IcmGSlcAzgSe38s8Df9GmDwZOAXYEngJ8K8mlwNuBbfq2dXr7uaKjfYCvVNWd45QfCHxkdKaqftGxjbGekuR7bRyH9I0D4AtVde94K1XV0qoaqaqRTTfYYB2akyR1mSzsTgbOSXIrcCfwPYAkT6B3+nU6/KZv+l4m7uNdo6GRZAHwUWCkqn6S5F30Pu8JvWD8QpLTgaqq65I8FbiiqvaepA9d7QPcMUF5GOcBDWPKFoyzHOBE4M+q6rIkrwD2H6A9SdI06TyirKr3Am+k98d7n6oa/UP/EOB109u133Eb9z97dqzR0Lk1ySJ635kJQFX9gF7gvYNeaAJcA2yVZG+AJBsk6T9yW9f2xzoTeO3oTJLN2uRP23XThwAvmmDdTYCb2+nhQwZsT5I0TSb9eEhVXVBVX6yqO/rKrq2qi6e3a79jKfD10Zt5+lXVL4FPACuBLwEXjalyCvAyeqdhadccDwKOSXIZvVO8k915exaw0+jNPJPU/Udgs3Zj0WXA6HXVtwBn0DuFffME674DuBD4Fr27iyVJQ5T7DxI1X+y4ySa1dLfdh92NOWW/7/qgKenBLsmKqhoZWz7oAwckSXpQMiglSepgUEqS1MGglCSpg0EpSVIHg1KSpA4GpSRJHQxKSZI6GJSSJHXwC4DnoU123NEnzUjSFPGIUpKkDgalJEkdDEpJkjoYlJIkdTAoJUnqYFBKktTBj4fMQ7fcuJrj3vjVYXdDHV77gRcMuwuSBuQRpSRJHQxKSZI6GJSSJHUwKCVJ6mBQSpLUwaCUJKmDQSlJUgeDUpKkDgalJEkdDEpJkjrM26BM8q4kR85ge69IctwUbet/JTl0nPLFSVZNRRuSpMH4rNdxJAmQqrpvGO1X1ceH0a4k6XfNqSPKJAuTfC3JZUlWJTk4yfVJtmzLR5Kc3bfKLkm+k+S6JK9udRYl+XaSi5OsTPLCVr44yVVJPgpcDDw2yceSLE9yRZKj+/qxR5LzWz+WJdmkLXp0km+09v65r/7tSd7b6l+Q5JGt/HGtL5e3n9u28t8eDSdZ0tb7T+Bvp+u9lSSNb04FJfBc4Kaq2qWqngJ8Y5L6OwN/DOwNHJXk0cBdwIuqanfgAOAD7QgSYEfgM1W1W1XdALytqkbadvZLsnOSDYFTgL+rql2AA4E72/q7AgcDTwUOTvLYVr4QuKDV/y7w6lZ+XGtvZ+Ak4NhxxnACcERV7d010CSHt1BffvuvV0/ytkiSBjXXgnIlcGCSY5I8o6omS4QvV9WdVXUrcBawJxDgn5JcDvwH8Bjgka3+DVV1Qd/6f5HkYuAS4MnATvTC9Oaqugigqn5VVfe0+t+uqtVVdRdwJfC4Vn43cEabXgEsbtN7A//epj8L7NPf+SSbAg+vqnP66oyrqpZW1UhVjSzaeNNJ3hZJ0qDm1DXKqro2yRLg+cD7kpwJ3MP9gb9g7CrjzB8CbAUsqao1Sa7vW++O0YpJtgOOBPaoql8kObHVyzjbHfWbvul7uf/9XVNVNU757wxxzHxXW5KkGTCnjijbqdNfV9XngH8BdgeuB5a0Ki8es8oLkyxIsgWwP3ARsClwSwvJA7j/qG+sh9ELztXtmuLzWvnV9K5F7tH6tEmSB/ofjvOBl7TpQ4Bz+xdW1S9b+/v01ZEkzaA5dURJ79rf+5PcB6wB/gbYCPhkkn8ALhxTfxnwNWBb4D1VdVOSk4CvJlkOXEov+H5HVV2W5BLgCuCHwHmt/O4kBwMfTrIRveuTBz7A8RwBfCrJm4D/Bg4bp85hrc6vgW8+wHYkSQ9Q7j8jqPli20ftUG8+5IPD7oY6vPYDLxh2FySNkWRFu4FzLXPq1KskSTPNoJQkqYNBKUlSB4NSkqQOBqUkSR0MSkmSOhiUkiR1MCglSepgUEqS1GGuPcJOA3jENpv65BdJmiIeUUqS1MGglCSpg0EpSVIHg1KSpA4GpSRJHQxKSZI6+PGQeejmH/2A977soGF3Q3PA2z536rC7IM16HlFKktTBoJQkqYNBKUlSB4NSkqQOBqUkSR0MSkmSOhiUkiR1MCglSepgUEqS1MGglCSpg0EpSVIHg/IBSvKuJEcOUO/EJFPy4NUki5OsmoptSZIGY1DOMkl8UL0kzSIGZZ8kC5N8LcllSVYlOTjJ9Um2bMtHkpzdt8ouSb6T5Lokr251kuS4JFcm+RrwiL7tj7utdnS6NMmZwGfakeP3klzcXk+bobdAkjSGRy9rey5wU1X9MUCSTYFjOurvDOwFLAQuacG4F7Aj8FTgkcCVwKcGaHsJsE9V3ZlkY+DZVXVXkh2Ak4GRrpWTHA4cDrDpxhsN0JwkaRAeUa5tJXBgkmOSPKOqVk9S/8tVdWdV3QqcBewJ7AucXFX3VtVNwHcGbPsrVXVnm94A+ESSlcAXgJ0mW7mqllbVSFWNLFzw0AGblCRNxiPKPlV1bZIlwPOB97VTofdw/38oFoxdZYL5seWjurZ1R9/03wM/BXZp9e8aaACSpCnnEWWfJI8Gfl1VnwP+BdgduJ7eaVGAF49Z5YVJFiTZAtgfuAj4LvCSJOsl2Ro4oK9+17b6bQrcXFX3AX8FrPdAxyRJ+v14RLm2pwLvT3IfsAb4G2Aj4JNJ/gG4cEz9ZcDXgG2B91TVTUm+CDyT3mnca4Fz+uof3bGtfh8FTkvyP+id0r2jo64kaRqlaqKzhJqrHrPFZvWa5z1r2N3QHPC2z5067C5Is0aSFVX1OzdOeupVkqQOBqUkSR0MSkmSOhiUkiR1MCglSepgUEqS1MGglCSpg0EpSVIHn8wzD2293eP9ILkkTRGPKCVJ6mBQSpLUwaCUJKmDD0Wfh5LcBlwz7H7MkC2BW4fdiRniWOcnxzp7PK6qthpb6M0889M14z0Bfz5Kstyxzj+OdX6aq2P11KskSR0MSkmSOhiU89PSYXdgBjnW+cmxzk9zcqzezCNJUgePKCVJ6mBQSpLUwaCcR5I8N8k1Sb6f5C3D7s+gkjw2yVlJrkpyRZK/a+WbJ/lWkuvaz81aeZIc28Z5eZLd+7b18lb/uiQv7ytfkmRlW+fYJJn5kd4vyXpJLklyRpvfLsmFrd+nJNmwlT+0zX+/LV/ct423tvJrkvxRX/ms+T1I8vAkpya5uu3fvefrfk3y9+33d1WSk5MsmE/7NcmnktySZFVf2bTvy4namFFV5WsevID1gB8A2wMbApcBOw27XwP2fWtg9za9CXAtsBPwz8BbWvlbgGPa9POBrwMB9gIubOWbAz9sPzdr05u1ZcuAvds6XweeN+QxvwH4d+CMNv954CVt+uPA37Tp1wAfb9MvAU5p0zu1ffxQYLu279ebbb8HwKeBv27TGwIPn4/7FXgM8CNgo779+Yr5tF+BfYHdgVV9ZdO+LydqY0bHPoxfKl/TsCN7v2Df7Jt/K/DWYffrAY7ly8Cz6T1daOtWtjW9BykAHA+8tK/+NW35S4Hj+8qPb2VbA1f3la9Vbwjj2wb4NvBM4Iz2h+FWYP2x+xL4JrB3m16/1cvY/Ttabzb9HgAPa+GRMeXzbr/SC8qftABYv+3XP5pv+xVYzNpBOe37cqI2ZvLlqdf5Y/Qf6qgbW9mc0k5B7QZcCDyyqm4GaD8f0apNNNau8hvHKR+WDwFvBu5r81sAv6yqe9p8f/9+O6a2fHWrv67vwTBsD/w3cEI7zfxvSRYyD/drVf0X8C/Aj4Gb6e2nFczP/dpvJvblRG3MGINy/hjv2syc+uxPkkXAacDrq+pXXVXHKasHUD7jkvwJcEtVregvHqdqTbJs1o+V3pHS7sDHqmo34A56p84mMmfH2q6bvZDe6dJHAwuB541TdT7s10HMq/EZlPPHjcBj++a3AW4aUl/WWZIN6IXkSVV1eiv+aZKt2/KtgVta+URj7SrfZpzyYXg68KdJrgf+L73Trx8CHp5k9NnL/f377Zja8k2Bn7Pu78Ew3AjcWFUXtvlT6QXnfNyvBwI/qqr/rqo1wOnA05if+7XfTOzLidqYMQbl/HERsEO7y25DejcIfGXIfRpIu7vtk8BVVfXBvkVfAUbvins5vWuXo+WHtjvr9gJWt1My3wSek2Sz9j/859C7rnMzcFuSvVpbh/Zta0ZV1VurapuqWkxvH32nqg4BzgIOatXGjnX0PTio1a9W/pJ29+R2wA70boaYNb8HVfX/gJ8k2bEVPQu4knm4X+mdct0rycatL6NjnXf7dYyZ2JcTtTFzZvqiqK/pe9G70+xaenfHvW3Y/VmHfu9D7zTL5cCl7fV8etdsvg1c135u3uoH+Egb50pgpG9brwS+316H9ZWPAKvaOscx5gaTIY17f+6/63V7en8Qvw98AXhoK1/Q5r/flm/ft/7b2niuoe9uz9n0ewDsCixv+/ZL9O50nJf7FTgauLr157P07lydN/sVOJne9dc19I4AXzUT+3KiNmby5SPsJEnq4KlXSZI6GJSSJHUwKCVJ6mBQSpLUwaCUJKmDQSlJUgeDUpKkDv8f5E4WM3lUSeYAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.barplot(x=subtype_ct.values, y=subtype_ct.index);"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"def id_to_filepath(img_id, img_dir=TRAIN_DIR):\n",
" filepath = f'{img_dir}/{img_id}.dcm' # pydicom doesn't play nice with Path objects\n",
" if os.path.exists(filepath):\n",
" return filepath\n",
" else:\n",
" return 'DNE'"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th colspan=\"6\" halign=\"left\">Label</th>\n",
" <th>filepath</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Subtype</th>\n",
" <th></th>\n",
" <th>any</th>\n",
" <th>epidural</th>\n",
" <th>intraparenchymal</th>\n",
" <th>intraventricular</th>\n",
" <th>subarachnoid</th>\n",
" <th>subdural</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ID_000012eaf</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>ID_000039fa0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>ID_00005679d</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>ID_00008ce3c</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>ID_0000950d7</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Label \\\n",
"Subtype any epidural intraparenchymal intraventricular \n",
"0 ID_000012eaf 0 0 0 0 \n",
"1 ID_000039fa0 0 0 0 0 \n",
"2 ID_00005679d 0 0 0 0 \n",
"3 ID_00008ce3c 0 0 0 0 \n",
"4 ID_0000950d7 0 0 0 0 \n",
"\n",
" \\\n",
"Subtype subarachnoid subdural \n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" filepath \n",
"Subtype \n",
"0 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... \n",
"1 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... \n",
"2 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... \n",
"3 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... \n",
"4 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... "
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_new['filepath'] = train_new['ID'].apply(id_to_filepath)\n",
"train_new.head()"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [],
"source": [
"def get_patient_data(filepath):\n",
" if filepath != 'DNE':\n",
" dcm_data = pydicom.dcmread(filepath, stop_before_pixels=True)\n",
" return dcm_data.PatientID, dcm_data.StudyInstanceUID, dcm_data.SeriesInstanceUID"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|āāāāāāāāāā| 752803/752803 [17:31<00:00, 716.03it/s] \n"
]
}
],
"source": [
"tqdm.pandas()\n",
"train_new['PatientID'], train_new['StudyID'], train_new['SeriesID'] = zip(*train_new['filepath'].progress_apply(get_patient_data))"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th colspan=\"6\" halign=\"left\">Label</th>\n",
" <th>filepath</th>\n",
" <th>PatientID</th>\n",
" <th>StudyID</th>\n",
" <th>SeriesID</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Subtype</th>\n",
" <th></th>\n",
" <th>any</th>\n",
" <th>epidural</th>\n",
" <th>intraparenchymal</th>\n",
" <th>intraventricular</th>\n",
" <th>subarachnoid</th>\n",
" <th>subdural</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ID_000012eaf</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_f15c0eee</td>\n",
" <td>ID_30ea2b02d4</td>\n",
" <td>ID_0ab5820b2a</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>ID_000039fa0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_eeaf99e7</td>\n",
" <td>ID_134d398b61</td>\n",
" <td>ID_5f8484c3e0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>ID_00005679d</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_18f2d431</td>\n",
" <td>ID_b5c26cda09</td>\n",
" <td>ID_203cd6ec46</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>ID_00008ce3c</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_ce8a3cd2</td>\n",
" <td>ID_974735bf79</td>\n",
" <td>ID_3780d48b28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>ID_0000950d7</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>ID_0000aee4b</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_ce5f0b6c</td>\n",
" <td>ID_9aad90e421</td>\n",
" <td>ID_1e59488a44</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>ID_0000ca2f6</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_8c5a14af</td>\n",
" <td>ID_a84b7a0dcd</td>\n",
" <td>ID_d6ba679446</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>ID_0000f1657</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_df70c823</td>\n",
" <td>ID_04ef429610</td>\n",
" <td>ID_245e16180c</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>ID_000178e76</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_462abff7</td>\n",
" <td>ID_4fef99f0df</td>\n",
" <td>ID_72952d87fa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>ID_00019828f</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_fc08e4cf</td>\n",
" <td>ID_ade653597d</td>\n",
" <td>ID_c0d8754a07</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Label \\\n",
"Subtype any epidural intraparenchymal intraventricular \n",
"0 ID_000012eaf 0 0 0 0 \n",
"1 ID_000039fa0 0 0 0 0 \n",
"2 ID_00005679d 0 0 0 0 \n",
"3 ID_00008ce3c 0 0 0 0 \n",
"4 ID_0000950d7 0 0 0 0 \n",
"5 ID_0000aee4b 0 0 0 0 \n",
"6 ID_0000ca2f6 0 0 0 0 \n",
"7 ID_0000f1657 0 0 0 0 \n",
"8 ID_000178e76 0 0 0 0 \n",
"9 ID_00019828f 0 0 0 0 \n",
"\n",
" \\\n",
"Subtype subarachnoid subdural \n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"5 0 0 \n",
"6 0 0 \n",
"7 0 0 \n",
"8 0 0 \n",
"9 0 0 \n",
"\n",
" filepath PatientID \\\n",
"Subtype \n",
"0 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_f15c0eee \n",
"1 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_eeaf99e7 \n",
"2 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_18f2d431 \n",
"3 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_ce8a3cd2 \n",
"4 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"5 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_ce5f0b6c \n",
"6 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_8c5a14af \n",
"7 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_df70c823 \n",
"8 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_462abff7 \n",
"9 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_fc08e4cf \n",
"\n",
" StudyID SeriesID \n",
"Subtype \n",
"0 ID_30ea2b02d4 ID_0ab5820b2a \n",
"1 ID_134d398b61 ID_5f8484c3e0 \n",
"2 ID_b5c26cda09 ID_203cd6ec46 \n",
"3 ID_974735bf79 ID_3780d48b28 \n",
"4 ID_8881b1c4b1 ID_84296c3845 \n",
"5 ID_9aad90e421 ID_1e59488a44 \n",
"6 ID_a84b7a0dcd ID_d6ba679446 \n",
"7 ID_04ef429610 ID_245e16180c \n",
"8 ID_4fef99f0df ID_72952d87fa \n",
"9 ID_ade653597d ID_c0d8754a07 "
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_new.head(10)"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"752803\n",
"18938\n",
"21744\n",
"21744\n"
]
}
],
"source": [
"print(train_new.shape[0])\n",
"print(len(train_new['PatientID'].unique()))\n",
"print(len(train_new['StudyID'].unique()))\n",
"print(len(train_new['SeriesID'].unique()))"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [],
"source": [
"train_new.to_csv('train_new')"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th colspan=\"6\" halign=\"left\">Label</th>\n",
" <th>filepath</th>\n",
" <th>PatientID</th>\n",
" <th>StudyID</th>\n",
" <th>SeriesID</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Subtype</th>\n",
" <th></th>\n",
" <th>any</th>\n",
" <th>epidural</th>\n",
" <th>intraparenchymal</th>\n",
" <th>intraventricular</th>\n",
" <th>subarachnoid</th>\n",
" <th>subdural</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ID_000012eaf</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_f15c0eee</td>\n",
" <td>ID_30ea2b02d4</td>\n",
" <td>ID_0ab5820b2a</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>ID_000039fa0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_eeaf99e7</td>\n",
" <td>ID_134d398b61</td>\n",
" <td>ID_5f8484c3e0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>ID_00005679d</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_18f2d431</td>\n",
" <td>ID_b5c26cda09</td>\n",
" <td>ID_203cd6ec46</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>ID_00008ce3c</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_ce8a3cd2</td>\n",
" <td>ID_974735bf79</td>\n",
" <td>ID_3780d48b28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>ID_0000950d7</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Label \\\n",
"Subtype any epidural intraparenchymal intraventricular \n",
"0 ID_000012eaf 0 0 0 0 \n",
"1 ID_000039fa0 0 0 0 0 \n",
"2 ID_00005679d 0 0 0 0 \n",
"3 ID_00008ce3c 0 0 0 0 \n",
"4 ID_0000950d7 0 0 0 0 \n",
"\n",
" \\\n",
"Subtype subarachnoid subdural \n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" filepath PatientID \\\n",
"Subtype \n",
"0 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_f15c0eee \n",
"1 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_eeaf99e7 \n",
"2 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_18f2d431 \n",
"3 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_ce8a3cd2 \n",
"4 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"\n",
" StudyID SeriesID \n",
"Subtype \n",
"0 ID_30ea2b02d4 ID_0ab5820b2a \n",
"1 ID_134d398b61 ID_5f8484c3e0 \n",
"2 ID_b5c26cda09 ID_203cd6ec46 \n",
"3 ID_974735bf79 ID_3780d48b28 \n",
"4 ID_8881b1c4b1 ID_84296c3845 "
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_new.head()"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th colspan=\"6\" halign=\"left\">Label</th>\n",
" <th>filepath</th>\n",
" <th>PatientID</th>\n",
" <th>StudyID</th>\n",
" <th>SeriesID</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Subtype</th>\n",
" <th></th>\n",
" <th>any</th>\n",
" <th>epidural</th>\n",
" <th>intraparenchymal</th>\n",
" <th>intraventricular</th>\n",
" <th>subarachnoid</th>\n",
" <th>subdural</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>ID_0000950d7</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>39014</th>\n",
" <td>ID_0d428e6ca</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41787</th>\n",
" <td>ID_0e320ef83</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>55957</th>\n",
" <td>ID_12fa73df7</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>62110</th>\n",
" <td>ID_15089384e</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>78497</th>\n",
" <td>ID_1aa35c21e</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95072</th>\n",
" <td>ID_204e3c67f</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95086</th>\n",
" <td>ID_204f882af</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>116802</th>\n",
" <td>ID_27ba4a0ed</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>168054</th>\n",
" <td>ID_39129ab61</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>210093</th>\n",
" <td>ID_4763efbcd</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>245270</th>\n",
" <td>ID_533fbdf73</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>277098</th>\n",
" <td>ID_5df494f62</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>347634</th>\n",
" <td>ID_75fe135a4</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>369043</th>\n",
" <td>ID_7d436ad0f</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>377907</th>\n",
" <td>ID_803e590cf</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>394664</th>\n",
" <td>ID_85f4ec603</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>423116</th>\n",
" <td>ID_8f9695aef</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>432554</th>\n",
" <td>ID_92cafa5f6</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>436494</th>\n",
" <td>ID_941d491aa</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>467763</th>\n",
" <td>ID_9ea9667e9</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>480805</th>\n",
" <td>ID_a31ccb407</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>501610</th>\n",
" <td>ID_aa4ce8ca8</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>512286</th>\n",
" <td>ID_ade7354a7</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>514805</th>\n",
" <td>ID_aec1177e4</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>521980</th>\n",
" <td>ID_b1281c12f</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>550282</th>\n",
" <td>ID_bad859f87</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>632620</th>\n",
" <td>ID_d6fef64ad</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>639962</th>\n",
" <td>ID_d97ba5b0d</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>642046</th>\n",
" <td>ID_da36abbca</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>659382</th>\n",
" <td>ID_e024df4d1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>671395</th>\n",
" <td>ID_e43f11a72</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>717566</th>\n",
" <td>ID_f40ac6f95</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>731883</th>\n",
" <td>ID_f8e0c635e</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>740708</th>\n",
" <td>ID_fbe090828</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>747831</th>\n",
" <td>ID_fe506a641</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
" <td>ID_d278c67b</td>\n",
" <td>ID_8881b1c4b1</td>\n",
" <td>ID_84296c3845</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Label \\\n",
"Subtype any epidural intraparenchymal intraventricular \n",
"4 ID_0000950d7 0 0 0 0 \n",
"39014 ID_0d428e6ca 0 0 0 0 \n",
"41787 ID_0e320ef83 0 0 0 0 \n",
"55957 ID_12fa73df7 0 0 0 0 \n",
"62110 ID_15089384e 0 0 0 0 \n",
"78497 ID_1aa35c21e 0 0 0 0 \n",
"95072 ID_204e3c67f 0 0 0 0 \n",
"95086 ID_204f882af 0 0 0 0 \n",
"116802 ID_27ba4a0ed 1 0 1 0 \n",
"168054 ID_39129ab61 0 0 0 0 \n",
"210093 ID_4763efbcd 0 0 0 0 \n",
"245270 ID_533fbdf73 0 0 0 0 \n",
"277098 ID_5df494f62 0 0 0 0 \n",
"347634 ID_75fe135a4 0 0 0 0 \n",
"369043 ID_7d436ad0f 0 0 0 0 \n",
"377907 ID_803e590cf 0 0 0 0 \n",
"394664 ID_85f4ec603 0 0 0 0 \n",
"423116 ID_8f9695aef 0 0 0 0 \n",
"432554 ID_92cafa5f6 0 0 0 0 \n",
"436494 ID_941d491aa 0 0 0 0 \n",
"467763 ID_9ea9667e9 0 0 0 0 \n",
"480805 ID_a31ccb407 0 0 0 0 \n",
"501610 ID_aa4ce8ca8 0 0 0 0 \n",
"512286 ID_ade7354a7 0 0 0 0 \n",
"514805 ID_aec1177e4 0 0 0 0 \n",
"521980 ID_b1281c12f 0 0 0 0 \n",
"550282 ID_bad859f87 0 0 0 0 \n",
"632620 ID_d6fef64ad 1 0 1 0 \n",
"639962 ID_d97ba5b0d 0 0 0 0 \n",
"642046 ID_da36abbca 0 0 0 0 \n",
"659382 ID_e024df4d1 0 0 0 0 \n",
"671395 ID_e43f11a72 0 0 0 0 \n",
"717566 ID_f40ac6f95 1 0 1 0 \n",
"731883 ID_f8e0c635e 0 0 0 0 \n",
"740708 ID_fbe090828 0 0 0 0 \n",
"747831 ID_fe506a641 0 0 0 0 \n",
"\n",
" \\\n",
"Subtype subarachnoid subdural \n",
"4 0 0 \n",
"39014 0 0 \n",
"41787 0 0 \n",
"55957 0 0 \n",
"62110 0 0 \n",
"78497 0 0 \n",
"95072 0 0 \n",
"95086 0 0 \n",
"116802 0 0 \n",
"168054 0 0 \n",
"210093 0 0 \n",
"245270 0 0 \n",
"277098 0 0 \n",
"347634 0 0 \n",
"369043 0 0 \n",
"377907 0 0 \n",
"394664 0 0 \n",
"423116 0 0 \n",
"432554 0 0 \n",
"436494 0 0 \n",
"467763 0 0 \n",
"480805 0 0 \n",
"501610 0 0 \n",
"512286 0 0 \n",
"514805 0 0 \n",
"521980 0 0 \n",
"550282 0 0 \n",
"632620 0 0 \n",
"639962 0 0 \n",
"642046 0 0 \n",
"659382 0 0 \n",
"671395 0 0 \n",
"717566 0 0 \n",
"731883 0 0 \n",
"740708 0 0 \n",
"747831 0 0 \n",
"\n",
" filepath PatientID \\\n",
"Subtype \n",
"4 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"39014 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"41787 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"55957 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"62110 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"78497 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"95072 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"95086 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"116802 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"168054 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"210093 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"245270 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"277098 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"347634 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"369043 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"377907 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"394664 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"423116 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"432554 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"436494 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"467763 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"480805 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"501610 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"512286 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"514805 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"521980 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"550282 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"632620 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"639962 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"642046 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"659382 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"671395 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"717566 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"731883 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"740708 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"747831 /home/ubuntu/kaggle/rsna-intracranial-hemorrha... ID_d278c67b \n",
"\n",
" StudyID SeriesID \n",
"Subtype \n",
"4 ID_8881b1c4b1 ID_84296c3845 \n",
"39014 ID_8881b1c4b1 ID_84296c3845 \n",
"41787 ID_8881b1c4b1 ID_84296c3845 \n",
"55957 ID_8881b1c4b1 ID_84296c3845 \n",
"62110 ID_8881b1c4b1 ID_84296c3845 \n",
"78497 ID_8881b1c4b1 ID_84296c3845 \n",
"95072 ID_8881b1c4b1 ID_84296c3845 \n",
"95086 ID_8881b1c4b1 ID_84296c3845 \n",
"116802 ID_8881b1c4b1 ID_84296c3845 \n",
"168054 ID_8881b1c4b1 ID_84296c3845 \n",
"210093 ID_8881b1c4b1 ID_84296c3845 \n",
"245270 ID_8881b1c4b1 ID_84296c3845 \n",
"277098 ID_8881b1c4b1 ID_84296c3845 \n",
"347634 ID_8881b1c4b1 ID_84296c3845 \n",
"369043 ID_8881b1c4b1 ID_84296c3845 \n",
"377907 ID_8881b1c4b1 ID_84296c3845 \n",
"394664 ID_8881b1c4b1 ID_84296c3845 \n",
"423116 ID_8881b1c4b1 ID_84296c3845 \n",
"432554 ID_8881b1c4b1 ID_84296c3845 \n",
"436494 ID_8881b1c4b1 ID_84296c3845 \n",
"467763 ID_8881b1c4b1 ID_84296c3845 \n",
"480805 ID_8881b1c4b1 ID_84296c3845 \n",
"501610 ID_8881b1c4b1 ID_84296c3845 \n",
"512286 ID_8881b1c4b1 ID_84296c3845 \n",
"514805 ID_8881b1c4b1 ID_84296c3845 \n",
"521980 ID_8881b1c4b1 ID_84296c3845 \n",
"550282 ID_8881b1c4b1 ID_84296c3845 \n",
"632620 ID_8881b1c4b1 ID_84296c3845 \n",
"639962 ID_8881b1c4b1 ID_84296c3845 \n",
"642046 ID_8881b1c4b1 ID_84296c3845 \n",
"659382 ID_8881b1c4b1 ID_84296c3845 \n",
"671395 ID_8881b1c4b1 ID_84296c3845 \n",
"717566 ID_8881b1c4b1 ID_84296c3845 \n",
"731883 ID_8881b1c4b1 ID_84296c3845 \n",
"740708 ID_8881b1c4b1 ID_84296c3845 \n",
"747831 ID_8881b1c4b1 ID_84296c3845 "
]
},
"execution_count": 105,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_new[train_new.PatientID == 'ID_d278c67b']"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True 733865\n",
"False 18938\n",
"Name: PatientID, dtype: int64"
]
},
"execution_count": 95,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_new.PatientID.duplicated().value_counts()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}