Electronic-Health-Records / Git / [9c7551] /Working with Health Care EHR Data Part 1-Generating fake EHR data.ipynb

Models:
philipB/
Electronic-Health-Records
Downloads: 1
[9c7551]: / Working with Health Care EHR Data Part 1-Generating fake EHR data.ipynb
History
Download this file
719 lines (718 with data), 26.5 kB

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generating fake EHR data\n",
    "by: Sparkle Russell-Puleri and Dorian Puleri\n",
    "\n",
    "For the purpose of this three part tutorial, we generated some artificial EHR data to demonstrate how EHR data should be processed for use in sequence models. Please note that this data has no clinical relevance and was just created for training purposes only."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Patient Admission Table\n",
    "This table contains information on the patient admission history and times. The features generated were:\n",
    "1. `PatientID`- Unique identifier that stay with the patient permanently\n",
    "2. `Admission ID` - Specific to each visit\n",
    "3. `AdmissionStartDate` - Date and time of admission \n",
    "4. `AdmissionEndDate` - Date and time of discharge after care for a specific admission ID"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [],
   "source": [
    "admission_table = {'Patient 1': {'PatientID':'A1234-B456', \n",
    "                          'Admission ID':[12,34,15], \n",
    "                          'AdmissionStartDate':['2019-01-03 9:34:55','2019-02-03 10:50:55','2019-04-03 12:34:55'],\n",
    "                          'AdmissionEndDate':['2019-01-07 8:45:43','2019-03-04 1:50:32','2019-04-03 5:38:18']},\n",
    "                   'Patient 2': {'PatientID':'B1234-C456', \n",
    "                          'Admission ID':[13,34], \n",
    "                          'AdmissionStartDate':['2018-01-03 9:34:55','2018-02-03 10:50:55'],\n",
    "                          'AdmissionEndDate':['2018-01-07 8:45:43','2018-03-04 1:50:32']}}\n",
    "admission_table = (pd.concat({k: pd.DataFrame(v) for k, v in admission_table.items()}).reset_index(level=1, drop=True))\n",
    "admission_table = admission_table.reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Patient Diagnosis Table\n",
    "The diagnosis table is quite unique, as it can contain several diagnosis codes for the same visit. For example, Patient 1 was diagnosised with diabetes (`PrimaryDiagnosisCode`:E11.64) during his/her first visit (`Admission ID`:12). However, this code also shows up on subsequent visits (`Admission ID`:34, 15), why is that? Well if a patient is diagnosised with an uncurable condition he/she that code will always be associated all subsequent visits. On the other hand, codes associated with acute care, will come and go as seen with `PrimaryDiagnosisCode`:780.96(Headache). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [],
   "source": [
    "Patient_1 = {'PatientID':'A1234-B456', \n",
    "             'Admission ID':[12,34,15], \n",
    "             'PrimaryDiagnosisCode':[['E11.64','I25.812','I25.10'],\n",
    "                                     ['E11.64','I25.812','I25.10','780.96','784.0'],\n",
    "                                     ['E11.64','I25.812','I25.10','786.50','401.9','789.00']],\n",
    "             'CodingSystem':['ICD-9','ICD-9','ICD-9'],\n",
    "             'DiagnosisCodeDescription':[['Type 2 diabetes mellitus with hypoglycemia',\n",
    "                                          'Atherosclerosis of bypass graft of coronary artery of transplanted heart without angina pectoris',\n",
    "                                          'Atherosclerotic heart disease of native coronary artery without angina pectoris'],\n",
    "                                         ['Type 2 diabetes mellitus with hypoglycemia',\n",
    "                                          'Atherosclerosis of bypass graft of coronary artery of transplanted heart without angina pectoris',\n",
    "                                          'Atherosclerotic heart disease of native coronary artery without angina pectoris',\n",
    "                                          'Generalized Pain', 'Dizziness and giddiness'],\n",
    "                                         ['Type 2 diabetes mellitus with hypoglycemia',\n",
    "                                          'Atherosclerosis of bypass graft of coronary artery of transplanted heart without angina pectoris',\n",
    "                                          'Atherosclerotic heart disease of native coronary artery without angina pectoris',\n",
    "                                          'Chest pain, unspecified','Essential hypertension, unspecified',\n",
    "                                          'Abdominal pain, unspecified site']]}\n",
    "Patient_2 = {'PatientID':'B1234-C456', \n",
    "              'Admission ID':[13,34], \n",
    "              'PrimaryDiagnosisCode':[['M05.59','Z13.85','O99.35'],['M05.59','Z13.85','O99.35','D37.0']],\n",
    "              'CodingSystem':['ICD-9','ICD-9'],\n",
    "              'DiagnosisCodeDescription':[['Rheumatoid polyneuropathy with rheumatoid arthritis of multiple sites',\n",
    "                                           'Encounter for screening for nervous system disorders',\n",
    "                                           'Diseases of the nervous system complicating pregnancy, childbirth, and the puerperium'],\n",
    "                                          ['Rheumatoid polyneuropathy with rheumatoid arthritis of multiple sites',\n",
    "                                           'Encounter for screening for nervous system disorders',\n",
    "                                           'Diseases of the nervous system complicating pregnancy, childbirth, and the puerperium',\n",
    "                                           'Neoplasm of uncertain behavior of lip, oral cavity and pharynx']]}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Helper functions for parsing data from a dictionary to DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [],
   "source": [
    "def process_ehr(Patient1,Patient2):\n",
    "    pt_diagnosis_table = [Patient1,Patient2]\n",
    "    pt_diagnosis_table = pd.concat([pd.DataFrame({k:v for k,v in d.items()}) for d in pt_diagnosis_table])\n",
    "    \n",
    "    pt_diagnosis_table = (pt_diagnosis_table.set_index(['PatientID', 'Admission ID','CodingSystem'])\n",
    "              .apply(lambda x: x.apply(pd.Series).stack())\n",
    "              .reset_index()\n",
    "              .drop('level_3', 1))\n",
    "    return pt_diagnosis_table\n",
    "def hash_key(df):\n",
    "    df['HashKey'] = df['PatientID'].\\\n",
    "    apply(lambda x: x.split('-')[0]) + '-' + df['Admission ID'].astype('str')\n",
    "    cols = [df.columns[-1]] + [col for col in df if col != df.columns[-1]]\n",
    "    print(cols)\n",
    "    return df[cols]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PatientID</th>\n",
       "      <th>Admission ID</th>\n",
       "      <th>CodingSystem</th>\n",
       "      <th>PrimaryDiagnosisCode</th>\n",
       "      <th>DiagnosisCodeDescription</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>12</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>E11.64</td>\n",
       "      <td>Type 2 diabetes mellitus with hypoglycemia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>12</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>I25.812</td>\n",
       "      <td>Atherosclerosis of bypass graft of coronary ar...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>12</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>I25.10</td>\n",
       "      <td>Atherosclerotic heart disease of native corona...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>34</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>E11.64</td>\n",
       "      <td>Type 2 diabetes mellitus with hypoglycemia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>34</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>I25.812</td>\n",
       "      <td>Atherosclerosis of bypass graft of coronary ar...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    PatientID  Admission ID CodingSystem PrimaryDiagnosisCode  \\\n",
       "0  A1234-B456            12        ICD-9               E11.64   \n",
       "1  A1234-B456            12        ICD-9              I25.812   \n",
       "2  A1234-B456            12        ICD-9               I25.10   \n",
       "3  A1234-B456            34        ICD-9               E11.64   \n",
       "4  A1234-B456            34        ICD-9              I25.812   \n",
       "\n",
       "                            DiagnosisCodeDescription  \n",
       "0         Type 2 diabetes mellitus with hypoglycemia  \n",
       "1  Atherosclerosis of bypass graft of coronary ar...  \n",
       "2  Atherosclerotic heart disease of native corona...  \n",
       "3         Type 2 diabetes mellitus with hypoglycemia  \n",
       "4  Atherosclerosis of bypass graft of coronary ar...  "
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "diagnosis_table = process_ehr(Patient_1,Patient_2)\n",
    "diagnosis_table.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a hashkey for Admission ID\n",
    "Why do this step? Unless your EHR system has uniqely identifiable Admission IDs for each patients visit, it would be difficult to associate each patient ID with a unique `Admission ID`. To demonstrate this, we deliberately created double digit `Admission ID`s one of which was repeated ( `Admission ID`: 34) for both patients. To avoid this, we took a pre-cautionary step to create a hash key that is a unique combination of the first half of the the unique `PatientID` hyphenated with the patient's specific `Admission ID`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PatientID</th>\n",
       "      <th>Admission ID</th>\n",
       "      <th>CodingSystem</th>\n",
       "      <th>PrimaryDiagnosisCode</th>\n",
       "      <th>DiagnosisCodeDescription</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>34</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>E11.64</td>\n",
       "      <td>Type 2 diabetes mellitus with hypoglycemia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>34</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>I25.812</td>\n",
       "      <td>Atherosclerosis of bypass graft of coronary ar...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>34</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>I25.10</td>\n",
       "      <td>Atherosclerotic heart disease of native corona...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>34</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>780.96</td>\n",
       "      <td>Generalized Pain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>34</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>784.0</td>\n",
       "      <td>Dizziness and giddiness</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>B1234-C456</td>\n",
       "      <td>34</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>M05.59</td>\n",
       "      <td>Rheumatoid polyneuropathy with rheumatoid arth...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>B1234-C456</td>\n",
       "      <td>34</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>Z13.85</td>\n",
       "      <td>Encounter for screening for nervous system dis...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>B1234-C456</td>\n",
       "      <td>34</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>O99.35</td>\n",
       "      <td>Diseases of the nervous system complicating pr...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>B1234-C456</td>\n",
       "      <td>34</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>D37.0</td>\n",
       "      <td>Neoplasm of uncertain behavior of lip, oral ca...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     PatientID  Admission ID CodingSystem PrimaryDiagnosisCode  \\\n",
       "3   A1234-B456            34        ICD-9               E11.64   \n",
       "4   A1234-B456            34        ICD-9              I25.812   \n",
       "5   A1234-B456            34        ICD-9               I25.10   \n",
       "6   A1234-B456            34        ICD-9               780.96   \n",
       "7   A1234-B456            34        ICD-9                784.0   \n",
       "17  B1234-C456            34        ICD-9               M05.59   \n",
       "18  B1234-C456            34        ICD-9               Z13.85   \n",
       "19  B1234-C456            34        ICD-9               O99.35   \n",
       "20  B1234-C456            34        ICD-9                D37.0   \n",
       "\n",
       "                             DiagnosisCodeDescription  \n",
       "3          Type 2 diabetes mellitus with hypoglycemia  \n",
       "4   Atherosclerosis of bypass graft of coronary ar...  \n",
       "5   Atherosclerotic heart disease of native corona...  \n",
       "6                                    Generalized Pain  \n",
       "7                             Dizziness and giddiness  \n",
       "17  Rheumatoid polyneuropathy with rheumatoid arth...  \n",
       "18  Encounter for screening for nervous system dis...  \n",
       "19  Diseases of the nervous system complicating pr...  \n",
       "20  Neoplasm of uncertain behavior of lip, oral ca...  "
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "diagnosis_table[diagnosis_table['Admission ID']==34]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['HashKey', 'PatientID', 'Admission ID', 'CodingSystem', 'PrimaryDiagnosisCode', 'DiagnosisCodeDescription']\n",
      "['HashKey', 'PatientID', 'Admission ID', 'AdmissionStartDate', 'AdmissionEndDate']\n"
     ]
    }
   ],
   "source": [
    "# Diag\n",
    "diagnosis_table = hash_key(diagnosis_table)\n",
    "admission_table = hash_key(admission_table)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Final Admission and Diagnosis Tables generated with fake EHR data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>HashKey</th>\n",
       "      <th>PatientID</th>\n",
       "      <th>Admission ID</th>\n",
       "      <th>AdmissionStartDate</th>\n",
       "      <th>AdmissionEndDate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A1234-12</td>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>12</td>\n",
       "      <td>2019-01-03 9:34:55</td>\n",
       "      <td>2019-01-07 8:45:43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1234-34</td>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>34</td>\n",
       "      <td>2019-02-03 10:50:55</td>\n",
       "      <td>2019-03-04 1:50:32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A1234-15</td>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>15</td>\n",
       "      <td>2019-04-03 12:34:55</td>\n",
       "      <td>2019-04-03 5:38:18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>B1234-13</td>\n",
       "      <td>B1234-C456</td>\n",
       "      <td>13</td>\n",
       "      <td>2018-01-03 9:34:55</td>\n",
       "      <td>2018-01-07 8:45:43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>B1234-34</td>\n",
       "      <td>B1234-C456</td>\n",
       "      <td>34</td>\n",
       "      <td>2018-02-03 10:50:55</td>\n",
       "      <td>2018-03-04 1:50:32</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    HashKey   PatientID  Admission ID   AdmissionStartDate    AdmissionEndDate\n",
       "0  A1234-12  A1234-B456            12   2019-01-03 9:34:55  2019-01-07 8:45:43\n",
       "1  A1234-34  A1234-B456            34  2019-02-03 10:50:55  2019-03-04 1:50:32\n",
       "2  A1234-15  A1234-B456            15  2019-04-03 12:34:55  2019-04-03 5:38:18\n",
       "3  B1234-13  B1234-C456            13   2018-01-03 9:34:55  2018-01-07 8:45:43\n",
       "4  B1234-34  B1234-C456            34  2018-02-03 10:50:55  2018-03-04 1:50:32"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "admission_table.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Cast Admission dates to Date time object "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 5 entries, 0 to 4\n",
      "Data columns (total 5 columns):\n",
      "HashKey               5 non-null object\n",
      "PatientID             5 non-null object\n",
      "Admission ID          5 non-null int64\n",
      "AdmissionStartDate    5 non-null datetime64[ns]\n",
      "AdmissionEndDate      5 non-null datetime64[ns]\n",
      "dtypes: datetime64[ns](2), int64(1), object(2)\n",
      "memory usage: 280.0+ bytes\n"
     ]
    }
   ],
   "source": [
    "admission_table[['AdmissionStartDate','AdmissionEndDate']] = admission_table[['AdmissionStartDate','AdmissionEndDate']]\\\n",
    "                .apply(lambda x: pd.to_datetime(x, format='%Y-%m-%d'))\n",
    "admission_table.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>HashKey</th>\n",
       "      <th>PatientID</th>\n",
       "      <th>Admission ID</th>\n",
       "      <th>CodingSystem</th>\n",
       "      <th>PrimaryDiagnosisCode</th>\n",
       "      <th>DiagnosisCodeDescription</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A1234-12</td>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>12</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>E11.64</td>\n",
       "      <td>Type 2 diabetes mellitus with hypoglycemia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1234-12</td>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>12</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>I25.812</td>\n",
       "      <td>Atherosclerosis of bypass graft of coronary ar...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A1234-12</td>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>12</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>I25.10</td>\n",
       "      <td>Atherosclerotic heart disease of native corona...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A1234-34</td>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>34</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>E11.64</td>\n",
       "      <td>Type 2 diabetes mellitus with hypoglycemia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>A1234-34</td>\n",
       "      <td>A1234-B456</td>\n",
       "      <td>34</td>\n",
       "      <td>ICD-9</td>\n",
       "      <td>I25.812</td>\n",
       "      <td>Atherosclerosis of bypass graft of coronary ar...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    HashKey   PatientID  Admission ID CodingSystem PrimaryDiagnosisCode  \\\n",
       "0  A1234-12  A1234-B456            12        ICD-9               E11.64   \n",
       "1  A1234-12  A1234-B456            12        ICD-9              I25.812   \n",
       "2  A1234-12  A1234-B456            12        ICD-9               I25.10   \n",
       "3  A1234-34  A1234-B456            34        ICD-9               E11.64   \n",
       "4  A1234-34  A1234-B456            34        ICD-9              I25.812   \n",
       "\n",
       "                            DiagnosisCodeDescription  \n",
       "0         Type 2 diabetes mellitus with hypoglycemia  \n",
       "1  Atherosclerosis of bypass graft of coronary ar...  \n",
       "2  Atherosclerotic heart disease of native corona...  \n",
       "3         Type 2 diabetes mellitus with hypoglycemia  \n",
       "4  Atherosclerosis of bypass graft of coronary ar...  "
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "diagnosis_table.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Write tables to csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "mkdir: data: File exists\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "mkdir data   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write files to data directory\n",
    "diagnosis_table.to_csv('data/Diagnosis_Table.csv',encoding='UTF-8',index=False)\n",
    "admission_table.to_csv('data/Admissions_Table.csv',encoding='UTF-8',index=False,date_format='%Y-%m-%d')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next:\n",
    "We will process the data in preparation for modeling"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}