{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5" }, "outputs": [], "source": [ "# This Python 3 environment comes with many helpful analytics libraries installed\n", "# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python\n", "# For example, here's several helpful packages to load in \n", "\n", "import numpy as np # linear algebra\n", "import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n", "\n", "# Input data files are available in the \"../input/\" directory.\n", "# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory\n", "\n", "import os\n", "# for dirname, _, filenames in os.walk('/kaggle/input'):\n", "# for filename in filenames:\n", "# print(os.path.join(dirname, filename))\n", "\n", "# Any results you write to the current directory are saved as output." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. RNSA Intracranial Hemorrhage Dataset Overview\n", "\n", "#### Intracranial Hemorrhage Types\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Dataset Overview\n", "- Train.csv: include the ID and Label:\n", " - ID is a combined string that includes the image filename and Hemorrhage type. \n", " - Label is a target column, indicating the probability of whether that type of hemorrhage exists in the indicated image. \n", " Format:\n", " [Image Id]_[Sub-type_Name], as follows:\n", " - Id,Label\n", " - 1_epidural_hemorrhage,0\n", " - 1_intraparenchymal_hemorrhage,0\n", " - 1_intraventricular_hemorrhage,0\n", " - 1_subarachnoid_hemorrhage,0.6\n", " - 1_subdural_hemorrhage,0\n", " - 1_any,0.9\n", " \n", " \n", " - DICOM Images:\n", " - DICOM is the standard for the communication and management of medical imaging information and related data.\n", " - It can be exchanged between two entities that are capable of receiving image and patient data in DICOM format. \n", " - Images contain associated metadata. This will include PatientID, StudyInstanceUID, SeriesInstanceUID, and other features.\n", " \n", " #### Data Files\n", "\n", " - **stage_1_train.csv** - Contains Ids and target information.\n", " - **stage_1_train_images.zip** and **stage_1_test_images.zip** - DICOM images -\n", "\n", " \n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "_cell_guid": "79c7e3d0-c299-4dcb-8224-4455121ee9b0", "_uuid": "d629ff2d2480ee46fbb7e2d37f6b5fab8052498a" }, "outputs": [], "source": [ "import glob, pylab, pandas as pd\n", "import pydicom, numpy as np\n", "from os import listdir\n", "from os.path import isfile, join\n", "import matplotlib.pylab as plt\n", "import os\n", "import seaborn as sns" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Train.csv Dataset EDA" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | ID | \n", "Label | \n", "
---|---|---|
0 | \n", "ID_63eb1e259_epidural | \n", "0 | \n", "
1 | \n", "ID_63eb1e259_intraparenchymal | \n", "0 | \n", "
2 | \n", "ID_63eb1e259_intraventricular | \n", "0 | \n", "
3 | \n", "ID_63eb1e259_subarachnoid | \n", "0 | \n", "
4 | \n", "ID_63eb1e259_subdural | \n", "0 | \n", "