--- a
+++ b/Notebook/Week 2/load_data.ipynb
@@ -0,0 +1,807 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Introduction about dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<p><b>Intracranial hemorrhage🧠 (ICH)</b> is caused by bleeding within the brain tissue itself — a life-threatening type of stroke. A stroke occurs when the brain is deprived of oxygen and blood supply. ICH is most commonly caused by hypertension, arteriovenous malformations, or head trauma. Treatment focuses on stopping the bleeding, removing the blood clot (hematoma), and relieving the pressure on the brain.</p>\n",
+    "<br/><br/>\n",
+    "<p><b>Diagnosis</b> requires an urgent procedure. When a patient shows acute neurological symptoms such as severe headache or loss of consciousness, highly trained specialists review medical images of the patient’s cranium to look for the presence, location and type of hemorrhage. The process is complicated and often time consuming.</p>\n",
+    "<br/><br/>\n",
+    "<p>The current clinical protocol to diagnose Intracranial hemorrhage🧠 ICH is examining Computerized Tomography (CT) scans by radiologists to detect ICH and localize its regions. However, this process relies heavily on the availability of an experienced radiologist.CT images are examined by an expert radiologist to determine whether ICH has occurred and if so, detect its type and region. However, this diagnosis process relies on the availability of a subspecialty-trained neuroradiologist, and as a result, could be time inefficient and even inaccurate, especially in remote areas where specialized care is scarce.</p>\n",
+    "<br/><br/>\n",
+    "<p>In Recent years ,the Advancement in <b>Deep learning</b> has enable us to solve various problem, even in some cases it shows us better results than humans.we will try to solve Intracranical hemorrhage detection and segmentation using CT scan dataset of brain which is annoted by expert radiologists. </p>\n",
+    "<p>The challenge is to build an algorithm to detect acute intracranial hemorrhage and its subtypes.</p>\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    ">Intraparenchymal hemorrhage is blood that is located completely within the brain itself; intraventricular or subarachnoid hemorrhage is blood that has leaked into the spaces of the brain that normally contain cerebrospinal fluid (the ventricles or subarachnoid cisterns). Extra-axial hemorrhages are blood that collects in the tissue coverings that surround the brain (e.g. subdural or epidural subtypes). ee figure.) Patients may exhibit more than one type of cerebral hemorrhage, which c may appear on the same image. While small hemorrhages are less morbid than large hemorrhages typically, even a small hemorrhage can lead to death because it is an indicator of another type of serious abnormality (e.g. cerebral aneurysm).\n",
+    ">\n",
+    "> #### There are four types of ICH:\n",
+    ">    * **Intraparenchymal hemorrhage**\n",
+    ">    * **Epidural hemorrhage**\n",
+    ">    * **Subdural hemorrhage**\n",
+    ">    * **Subarachnoid hemorrhage**\n",
+    ">    * **intraventricular hemorrhage**\n",
+    ">\n",
+    "> one patient can exibits more than one type of hemorrhage"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![file](https://user-images.githubusercontent.com/58046531/89164136-4eac1d00-d594-11ea-9408-6d271518b3a7.png)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This datset have six classes \n",
+    "1. any - any of five class of hemorrhage is present or not in patient\n",
+    "2. epidural\n",
+    "3. intraparenchymal\n",
+    "4. intraventricular \n",
+    "5. subarachnoid\n",
+    "6. subdural\n",
+    "\n",
+    "It is possible that one patient have more than type of hemorrhage."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "base_url = '~/kaggle/rsna-intracranial-hemorrhage-detection/'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "TRAIN_DIR = '/home/ubuntu/kaggle/rsna-intracranial-hemorrhage-detection/stage_2_train'\n",
+    "TEST_DIR = '/home/ubuntu/kaggle/rsna-intracranial-hemorrhage-detection/stage_2_test'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "from tqdm import *\n",
+    "import re\n",
+    "import seaborn as sns"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "752803\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "! ls {TRAIN_DIR} | wc -l\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "121232\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "! ls {TEST_DIR} | wc -l"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "ID_000012eaf.dcm\n",
+      "ID_000039fa0.dcm\n",
+      "ID_00005679d.dcm\n",
+      "ID_00008ce3c.dcm\n",
+      "ID_0000950d7.dcm\n",
+      "ls: write error: Broken pipe\n"
+     ]
+    }
+   ],
+   "source": [
+    "! ls {TRAIN_DIR} | head -n 5"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "(4516842, 2)\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>ID</th>\n",
+       "      <th>Label</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>ID_12cadc6af_epidural</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>ID_12cadc6af_intraparenchymal</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>ID_12cadc6af_intraventricular</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>ID_12cadc6af_subarachnoid</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>ID_12cadc6af_subdural</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>ID_12cadc6af_any</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>ID_38fd7baa0_epidural</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>ID_38fd7baa0_intraparenchymal</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>ID_38fd7baa0_intraventricular</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>ID_38fd7baa0_subarachnoid</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                              ID  Label\n",
+       "0          ID_12cadc6af_epidural      0\n",
+       "1  ID_12cadc6af_intraparenchymal      0\n",
+       "2  ID_12cadc6af_intraventricular      0\n",
+       "3      ID_12cadc6af_subarachnoid      0\n",
+       "4          ID_12cadc6af_subdural      0\n",
+       "5               ID_12cadc6af_any      0\n",
+       "6          ID_38fd7baa0_epidural      0\n",
+       "7  ID_38fd7baa0_intraparenchymal      0\n",
+       "8  ID_38fd7baa0_intraventricular      0\n",
+       "9      ID_38fd7baa0_subarachnoid      0"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "train_df = pd.read_csv(base_url+'stage_2_train.csv')\n",
+    "print(train_df.shape)\n",
+    "train_df.head(10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "(4516842, 3)\n"
+     ]
+    }
+   ],
+   "source": [
+    "train_df[['ID', 'Subtype']] = train_df['ID'].str.rsplit(pat='_', n=1, expand=True)\n",
+    "print(train_df.shape)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here if we look then we find that each image have output for each class as True(1) or False(0) mean single image have six duplicate image.so we will convert them into one_hot_encoder and then single image will have single row."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>ID</th>\n",
+       "      <th>Label</th>\n",
+       "      <th>Subtype</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>4516836</th>\n",
+       "      <td>ID_4a85a3a3f</td>\n",
+       "      <td>0</td>\n",
+       "      <td>epidural</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4516837</th>\n",
+       "      <td>ID_4a85a3a3f</td>\n",
+       "      <td>0</td>\n",
+       "      <td>intraparenchymal</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4516838</th>\n",
+       "      <td>ID_4a85a3a3f</td>\n",
+       "      <td>0</td>\n",
+       "      <td>intraventricular</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4516839</th>\n",
+       "      <td>ID_4a85a3a3f</td>\n",
+       "      <td>0</td>\n",
+       "      <td>subarachnoid</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4516840</th>\n",
+       "      <td>ID_4a85a3a3f</td>\n",
+       "      <td>0</td>\n",
+       "      <td>subdural</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4516841</th>\n",
+       "      <td>ID_4a85a3a3f</td>\n",
+       "      <td>0</td>\n",
+       "      <td>any</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                   ID  Label           Subtype\n",
+       "4516836  ID_4a85a3a3f      0          epidural\n",
+       "4516837  ID_4a85a3a3f      0  intraparenchymal\n",
+       "4516838  ID_4a85a3a3f      0  intraventricular\n",
+       "4516839  ID_4a85a3a3f      0      subarachnoid\n",
+       "4516840  ID_4a85a3a3f      0          subdural\n",
+       "4516841  ID_4a85a3a3f      0               any"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "train_df.tail(6)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def fix_id(img_id, img_dir=TRAIN_DIR):\n",
+    "    if not re.match(r'ID_[a-z0-9]+', img_id):\n",
+    "        sop = re.search(r'[a-z0-9]+', img_id)\n",
+    "        if sop:\n",
+    "            img_id_new = f'ID_{sop[0]}'\n",
+    "            return img_id_new\n",
+    "        else:\n",
+    "            print(img_id)\n",
+    "    return img_id"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0          ID_12cadc6af\n",
+       "1          ID_12cadc6af\n",
+       "2          ID_12cadc6af\n",
+       "3          ID_12cadc6af\n",
+       "4          ID_12cadc6af\n",
+       "               ...     \n",
+       "4516837    ID_4a85a3a3f\n",
+       "4516838    ID_4a85a3a3f\n",
+       "4516839    ID_4a85a3a3f\n",
+       "4516840    ID_4a85a3a3f\n",
+       "4516841    ID_4a85a3a3f\n",
+       "Name: ID, Length: 4516842, dtype: object"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "train_df['ID'].apply(fix_id)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "(752803, 7)\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead tr th {\n",
+       "        text-align: left;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr>\n",
+       "      <th></th>\n",
+       "      <th>ID</th>\n",
+       "      <th colspan=\"6\" halign=\"left\">Label</th>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>Subtype</th>\n",
+       "      <th></th>\n",
+       "      <th>any</th>\n",
+       "      <th>epidural</th>\n",
+       "      <th>intraparenchymal</th>\n",
+       "      <th>intraventricular</th>\n",
+       "      <th>subarachnoid</th>\n",
+       "      <th>subdural</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>ID_000012eaf</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>ID_000039fa0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>ID_00005679d</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>ID_00008ce3c</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>ID_0000950d7</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                   ID Label                                             \\\n",
+       "Subtype                 any epidural intraparenchymal intraventricular   \n",
+       "0        ID_000012eaf     0        0                0                0   \n",
+       "1        ID_000039fa0     0        0                0                0   \n",
+       "2        ID_00005679d     0        0                0                0   \n",
+       "3        ID_00008ce3c     0        0                0                0   \n",
+       "4        ID_0000950d7     0        0                0                0   \n",
+       "\n",
+       "                               \n",
+       "Subtype subarachnoid subdural  \n",
+       "0                  0        0  \n",
+       "1                  0        0  \n",
+       "2                  0        0  \n",
+       "3                  0        0  \n",
+       "4                  0        0  "
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "train_new = train_df.pivot_table(index='ID', columns='Subtype').reset_index()\n",
+    "print(train_new.shape)\n",
+    "train_new.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Subtype\n",
+      "any                 107933\n",
+      "epidural              3145\n",
+      "intraparenchymal     36118\n",
+      "intraventricular     26205\n",
+      "subarachnoid         35675\n",
+      "subdural             47166\n",
+      "dtype: int64\n"
+     ]
+    }
+   ],
+   "source": [
+    "subtype_ct = train_new['Label'].sum(axis=0)\n",
+    "print(subtype_ct)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Distribution of each type of Hemorrhage"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/png": "\n",
+      "text/plain": [
+       "<Figure size 432x288 with 1 Axes>"
+      ]
+     },
+     "metadata": {
+      "needs_background": "light"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "sns.barplot(x=subtype_ct.values, y=subtype_ct.index);"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def id_to_filepath(img_id, img_dir=TRAIN_DIR):\n",
+    "    filepath = f'{img_dir}/{img_id}.dcm' # pydicom doesn't play nice with Path objects\n",
+    "    if os.path.exists(filepath):\n",
+    "        return filepath\n",
+    "    else:\n",
+    "        return 'DNE'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead tr th {\n",
+       "        text-align: left;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr>\n",
+       "      <th></th>\n",
+       "      <th>ID</th>\n",
+       "      <th colspan=\"6\" halign=\"left\">Label</th>\n",
+       "      <th>filepath</th>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>Subtype</th>\n",
+       "      <th></th>\n",
+       "      <th>any</th>\n",
+       "      <th>epidural</th>\n",
+       "      <th>intraparenchymal</th>\n",
+       "      <th>intraventricular</th>\n",
+       "      <th>subarachnoid</th>\n",
+       "      <th>subdural</th>\n",
+       "      <th></th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>ID_000012eaf</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>ID_000039fa0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>ID_00005679d</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>ID_00008ce3c</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>ID_0000950d7</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                   ID Label                                             \\\n",
+       "Subtype                 any epidural intraparenchymal intraventricular   \n",
+       "0        ID_000012eaf     0        0                0                0   \n",
+       "1        ID_000039fa0     0        0                0                0   \n",
+       "2        ID_00005679d     0        0                0                0   \n",
+       "3        ID_00008ce3c     0        0                0                0   \n",
+       "4        ID_0000950d7     0        0                0                0   \n",
+       "\n",
+       "                               \\\n",
+       "Subtype subarachnoid subdural   \n",
+       "0                  0        0   \n",
+       "1                  0        0   \n",
+       "2                  0        0   \n",
+       "3                  0        0   \n",
+       "4                  0        0   \n",
+       "\n",
+       "                                                  filepath  \n",
+       "Subtype                                                     \n",
+       "0        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  \n",
+       "1        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  \n",
+       "2        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  \n",
+       "3        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  \n",
+       "4        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  "
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "train_new['filepath'] = train_new['ID'].apply(id_to_filepath)\n",
+    "train_new.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_new.to_csv('train_df')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}