1115 lines (1114 with data), 44.1 kB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Description\n",
"\n",
"Before using this notebook please read the literature of this project. We try to classify the brain signal data provided by University of Bonn for epilepsy detection in three classes (epileptic ictal, epileptic non-ictal, and non-epileptic). [Bonn Dataset](http://epileptologie-bonn.de/cms/front_content.php?idcat=193&lang=3)\n",
"\n",
"This notebook uses pre-processed data generated by the file `read-and-visualize-data` in this project. Go through that iPython notebook first which generates random shuffled permutations of the given dataset along with shapes and sizes compatible for this notebook. Use the default folder names, prefixes, and suffixes for each permutation of data you generate."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Importing modules\n",
"\n",
"Let's first add these libraries to our project:\n",
"\n",
"[`numpy`](https://pypi.org/project/numpy/): for matrix operations\n",
"\n",
"[`tensorlfow`](https://pypi.org/project/tensorflow/): for creating neural networks and evaluating their computation graphs (version 1.10.0 was used for this project)\n",
"\n",
"[`maplotlitb`](https://pypi.org/project/matplotlib/): for visualization"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import tensorflow as tf\n",
"import matplotlib.pyplot as plt\n",
"import h5py as h5'\n",
"from tensorflow.python.framework import ops\n",
"from os import path\n",
"import os"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Steps for solving the problem\n",
"\n",
"<ol>\n",
" <li>Read data and format it.</li>\n",
" <li>Use sliding window approach to augment data.</li>\n",
" <li>Split data into training/dev/test sets.</li>\n",
" <li>Create procedure for randomly initializing parameters with specified shape using Xavier's initialization.</li>\n",
" <li>Create convolution and pooling procedures.</li>\n",
" <li>Implement forward propagation.</li>\n",
" <li>Implement cost function.</li>\n",
" <li>Create model (uses Adam optimizer for minimization).</li>\n",
" <li>Train model.</li>\n",
" <li>Hyperparameter tuning using cross-validation sets.</li>\n",
" <li>Retrain model until higher accuracy is achevied.</li>\n",
"</ol>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reading data\n",
"\n",
"Use `dataset_relative_path` to point to the directory where the dataset after processing has been stored. \n",
"\n",
"<blockquote>\n",
" <b>Note:</b> Please do not change the prefixes or suffixes \"dataset/random-iter-\" as they're used throughout this project in other files too. You can use datafile1024.h5 or dataset512.h5 to point to versions of dataset with window sizes 1024 or 512.\n",
"</blockquote>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dataset_relative_path = 'dataset/random-iter-5/'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"datafile = dataset_relative_path + 'datafile1024.h5'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with h5.File(datafile, 'r') as datafile:\n",
" X_train = np.array(datafile['X_train'])\n",
" Y_train = np.array(datafile['Y_train'])\n",
" \n",
" X_dev = np.array(datafile['X_dev'])\n",
" Y_dev = np.array(datafile['Y_dev'])\n",
" \n",
" X_test = np.array(datafile['X_test'])\n",
" Y_test = np.array(datafile['Y_test'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def make_dimensions_compatible(arr):\n",
" \n",
" return arr.reshape(arr.shape[0],-1,1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_train = make_dimensions_compatible(X_train)\n",
"X_dev = make_dimensions_compatible(X_dev)\n",
"X_test = make_dimensions_compatible(X_test)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(X_train.shape)\n",
"print(Y_train.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Normalization\n",
"\n",
"It is a standard procedure to use normalization formula that subtracts by mean and divides by standard deviation. However, for the purpose of simplicity it won't hurt the performance of the models too much to just divide by 1000 since most of the data points are voltage measures with values ranging significantly within -1000 and 1000."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_train = X_train / 1000\n",
"X_dev = X_dev / 1000\n",
"X_test = X_test / 1000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Initialization\n",
"\n",
"Deep neural networks suffer from a problem of exploding or vanishing gradients. To reduce the effect, we use Xavier's initialization which is already built into the Tensorflow.\n",
"\n",
"`intiialize_parameters` receives the shapes and values of different parameters and hyper parameters to be initialized."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def initialize_parameters(parameter_shapes, parameter_values = {}):\n",
" \"\"\"\n",
" Initializes weight parameters to build a neural network with tensorflow using Xaviar's initialization.\n",
" The parameters are:\n",
" parameter_shapes: a dictionary where keys represent tensorflow variable names, and values\n",
" are shapes of the parameters in a list format\n",
" Returns:\n",
" params -- a dictionary of tensors containing parameters\n",
" \"\"\"\n",
" \n",
" params = { }\n",
" \n",
" for n,s in parameter_shapes.items():\n",
" param = tf.get_variable(n, s, initializer = tf.contrib.layers.xavier_initializer())\n",
" params[n] = param\n",
" \n",
" for n,v in parameter_values.items():\n",
" params[n] = v\n",
" \n",
" return params"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Forward Propagation\n",
"\n",
"Forward propagation builds most of the computation graph of the models and defines the layers for each model.\n",
"\n",
"For models with different set of layers or different architecture, we define different forward propagation function. Models having similar architecture and which only differ by parameter shapes or hyper parameters share common function.\n",
"\n",
"The architectures of the models we trained are described in the literature of this project.\n",
"\n",
"CNN1, CNN2, CNN4 use common function: `forward_propagation_cnn1`<br>\n",
"CNN3 uses `forward_propagation_cnn3`<br>\n",
"CNN5, CNN6, CNN7, CNN8 use: `forward_propagation_cnn8`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def forward_propagation_cnn1(X, parameters, training=False):\n",
" \"\"\"\n",
" Implements the forward propagation for the model:\n",
" (CONV BN RELU) -> (CONV BN RELU) -> (CONV BN RELU) -> (FC RELU DROPOUT) -> FC\n",
" \n",
" Arguments:\n",
" X -- input dataset placeholder, of shape (input size, number of examples)\n",
" parameters -- python dictionary containing your parameters\n",
" \"CONV1_W\", \"CONV2_W\", \"CONV3_W\", \"FC1_units\", \"DO_prob\", \"output_classes\"\n",
" the shapes are given in initialize_parameters\n",
"\n",
" Returns:\n",
" Z3 -- the output of the last LINEAR unit (without softmax)\n",
" \"\"\"\n",
" \n",
" # Retrieve the parameters from the dictionary \"parameters\" \n",
" CONV1_W = parameters['CONV1_W']\n",
" CONV1_Str = parameters['CONV1_Str']\n",
" CONV2_W = parameters['CONV2_W']\n",
" CONV2_Str = parameters['CONV2_Str']\n",
" CONV3_W = parameters['CONV3_W']\n",
" CONV3_Str = parameters['CONV3_Str']\n",
" FC1_units = parameters['FC1_units']\n",
" DO_prob = parameters['DO_prob']\n",
" output_classes = parameters[\"output_classes\"]\n",
" \n",
" \n",
" #Layer 1\n",
" # CONV\n",
" Z1 = tf.nn.conv1d(X, CONV1_W, stride=CONV1_Str, padding='VALID', data_format='NWC', name='conv1')\n",
" # Batch Normalization\n",
" B1 = tf.contrib.layers.batch_norm(Z1, is_training=training, updates_collections=None)\n",
" # RELU\n",
" A1 = tf.nn.relu(B1)\n",
" \n",
" #Layer 2\n",
" # CONV\n",
" Z2 = tf.nn.conv1d(A1, CONV2_W, stride=CONV2_Str, padding='VALID', data_format='NWC', name='conv2')\n",
" # Batch Normalization\n",
" B2 = tf.contrib.layers.batch_norm(Z2, is_training=training, updates_collections=None)\n",
" # RELU\n",
" A2 = tf.nn.relu(B2)\n",
" \n",
" #Layer 3\n",
" # CONV\n",
" Z3 = tf.nn.conv1d(A2, CONV3_W, stride=CONV3_Str, padding='VALID', data_format='NWC', name='conv3')\n",
" # Batch Normalization\n",
" B3 = tf.contrib.layers.batch_norm(Z3, is_training=training, updates_collections=None)\n",
" # RELU\n",
" A3 = tf.nn.relu(B3)\n",
" \n",
" # Flatten activations for FC layer\n",
" A3_flat = tf.contrib.layers.flatten(A3)\n",
" \n",
" # Layer 4\n",
" # FC\n",
" A4 = tf.contrib.layers.fully_connected(A3_flat, FC1_units, activation_fn=tf.nn.relu)\n",
" # Dropout\n",
" A4_dropped = tf.contrib.layers.dropout(A4, keep_prob=DO_prob, is_training=training)\n",
" \n",
" # Layer 5\n",
" # FC\n",
" logits = tf.contrib.layers.fully_connected(A4_dropped, output_classes, activation_fn=None)\n",
" \n",
" # Although the cost function we use will have in-built softmax computations,\n",
" # for predictions it'll be feasible to have a named tensor\n",
" softmax_output = tf.nn.softmax(logits, name='softmax_output')\n",
" \n",
" return logits, softmax_output\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def forward_propagation_cnn3(X, parameters, training=False):\n",
" \"\"\"\n",
" Implements the forward propagation for the model:\n",
" (CONV BN RELU) -> (CONV BN RELU DROPOUT) -> (CONV BN RELU) -> (FC RELU DROPOUT) -> FC\n",
" \n",
" Arguments:\n",
" X -- input dataset placeholder, of shape (input size, number of examples)\n",
" parameters -- python dictionary containing your parameters\n",
" \"CONV1_W\", \"CONV2_W\", \"CONV3_W\", \"FC1_units\", \"DO_prob\", \"output_classes\"\n",
" the shapes are given in initialize_parameters\n",
"\n",
" Returns:\n",
" Z3 -- the output of the last LINEAR unit (without softmax)\n",
" \"\"\"\n",
" \n",
" # Retrieve the parameters from the dictionary \"parameters\" \n",
" CONV1_W = parameters['CONV1_W']\n",
" CONV1_Str = parameters['CONV1_Str']\n",
" CONV2_W = parameters['CONV2_W']\n",
" CONV2_Str = parameters['CONV2_Str']\n",
" DO_prob_middle_layer = parameters['DO_prob_middle_layer']\n",
" CONV3_W = parameters['CONV3_W']\n",
" CONV3_Str = parameters['CONV3_Str']\n",
" FC1_units = parameters['FC1_units']\n",
" DO_prob = parameters['DO_prob']\n",
" output_classes = parameters[\"output_classes\"]\n",
" \n",
" \n",
" #Layer 1\n",
" # CONV\n",
" Z1 = tf.nn.conv1d(X, CONV1_W, stride=CONV1_Str, padding='VALID', data_format='NWC', name='conv1')\n",
" # Batch Normalization\n",
" B1 = tf.contrib.layers.batch_norm(Z1, is_training=training, updates_collections=None)\n",
" # RELU\n",
" A1 = tf.nn.relu(B1)\n",
" \n",
" #Layer 2\n",
" # CONV\n",
" Z2 = tf.nn.conv1d(A1, CONV2_W, stride=CONV2_Str, padding='VALID', data_format='NWC', name='conv2')\n",
" # Batch Normalization\n",
" B2 = tf.contrib.layers.batch_norm(Z2, is_training=training, updates_collections=None)\n",
" # RELU\n",
" A2 = tf.nn.relu(B2)\n",
" # Dropout\n",
" A2_dropped = tf.contrib.layers.dropout(A2, keep_prob=DO_prob_middle_layer, is_training=training)\n",
" \n",
" #Layer 3\n",
" # CONV\n",
" Z3 = tf.nn.conv1d(A2_dropped, CONV3_W, stride=CONV3_Str, padding='VALID', data_format='NWC', name='conv3')\n",
" # Batch Normalization\n",
" B3 = tf.contrib.layers.batch_norm(Z3, is_training=training, updates_collections=None)\n",
" # RELU\n",
" A3 = tf.nn.relu(B3)\n",
" \n",
" # Flatten activations for FC layer\n",
" A3_flat = tf.contrib.layers.flatten(A3)\n",
" \n",
" # Layer 4\n",
" # FC\n",
" A4 = tf.contrib.layers.fully_connected(A3_flat, FC1_units, activation_fn=tf.nn.relu)\n",
" # Dropout\n",
" A4_dropped = tf.contrib.layers.dropout(A4, keep_prob=DO_prob, is_training=training)\n",
" \n",
" # Layer 5\n",
" # FC\n",
" logits = tf.contrib.layers.fully_connected(A4_dropped, output_classes, activation_fn=None)\n",
" \n",
" # Although the cost function we use will have in-built softmax computations,\n",
" # for predictions it'll be feasible to have a named tensor\n",
" softmax_output = tf.nn.softmax(logits, name='softmax_output')\n",
" \n",
" return logits, softmax_output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def forward_propagation_cnn8(X, parameters, training=False):\n",
" \"\"\"\n",
" Implements the forward propagation for the model:\n",
" (CONV BN RELU) -> (CONV BN RELU) -> (FC RELU DROPOUT) -> FC\n",
" \n",
" Arguments:\n",
" X -- input dataset placeholder, of shape (input size, number of examples)\n",
" parameters -- python dictionary containing your parameters\n",
" \"CONV1_W\", \"CONV2_W\", \"CONV3_W\", \"FC1_units\", \"DO_prob\", \"output_classes\"\n",
" the shapes are given in initialize_parameters\n",
"\n",
" Returns:\n",
" Z3 -- the output of the last LINEAR unit (without softmax)\n",
" \"\"\"\n",
" \n",
" # Retrieve the parameters from the dictionary \"parameters\" \n",
" CONV1_W = parameters['CONV1_W']\n",
" CONV1_Str = parameters['CONV1_Str']\n",
" CONV2_W = parameters['CONV2_W']\n",
" CONV2_Str = parameters['CONV2_Str']\n",
" DO_prob_middle_layer = parameters['DO_prob_middle_layer']\n",
" FC1_units = parameters['FC1_units']\n",
" DO_prob = parameters['DO_prob']\n",
" output_classes = parameters[\"output_classes\"]\n",
" \n",
" \n",
" #Layer 1\n",
" # CONV\n",
" Z1 = tf.nn.conv1d(X, CONV1_W, stride=CONV1_Str, padding='VALID', data_format='NWC', name='conv1')\n",
" # Batch Normalization\n",
" B1 = tf.contrib.layers.batch_norm(Z1, is_training=training, updates_collections=None)\n",
" # RELU\n",
" A1 = tf.nn.relu(B1)\n",
" \n",
" #Layer 2\n",
" # CONV\n",
" Z2 = tf.nn.conv1d(A1, CONV2_W, stride=CONV2_Str, padding='VALID', data_format='NWC', name='conv2')\n",
" # Batch Normalization\n",
" B2 = tf.contrib.layers.batch_norm(Z2, is_training=training, updates_collections=None)\n",
" # RELU\n",
" A2 = tf.nn.relu(B2)\n",
" \n",
" # Flatten activations for FC layer\n",
" A2_flat = tf.contrib.layers.flatten(A2)\n",
" \n",
" # Layer 3\n",
" # FC\n",
" A3 = tf.contrib.layers.fully_connected(A2_flat, FC1_units, activation_fn=tf.nn.relu)\n",
" # Dropout\n",
" A3_dropped = tf.contrib.layers.dropout(A3, keep_prob=DO_prob, is_training=training)\n",
" \n",
" # Layer 4\n",
" # FC\n",
" logits = tf.contrib.layers.fully_connected(A3_dropped, output_classes, activation_fn=None)\n",
" \n",
" # Although the cost function we use will have in-built softmax computations,\n",
" # for predictions it'll be feasible to have a named tensor\n",
" softmax_output = tf.nn.softmax(logits, name='softmax_output')\n",
" \n",
" return logits, softmax_output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Computing cost function\n",
"\n",
"We use cross entropy loss for our classification problem which takes logits from forward propagation as one of its input. The softmax layer's output from forward propagation functions defined above is not used for computing cost and is used for making predictions at the end of this notebook. The cost function of cross entropy which is built in the Tensorflow computes its own softmax."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def compute_cost(X, Y, parameters, nn_key, training):\n",
" \n",
" \"\"\"\n",
" Apply softmax to the output classes and find cross entropy loss\n",
" X - Input data\n",
" Y - One-hot output class training labels\n",
" \n",
" Returns:\n",
" cost - cross entropy loss\n",
" \"\"\"\n",
" \n",
" # FIXME: setting training=training causes problems during evaluation time\n",
" if nn_key == 'cnn1':\n",
" logits, Y_hat = forward_propagation_cnn1(X, parameters, training=training)\n",
" elif nn_key == 'cnn2':\n",
" logits, Y_hat = forward_propagation_cnn1(X, parameters, training=training)\n",
" elif nn_key == 'cnn3':\n",
" logits, Y_hat = forward_propagation_cnn3(X, parameters, training=training)\n",
" elif nn_key == 'cnn4':\n",
" logits, Y_hat = forward_propagation_cnn1(X, parameters, training=training)\n",
" elif nn_key == 'cnn5':\n",
" logits, Y_hat = forward_propagation_cnn8(X, parameters, training=training)\n",
" elif nn_key == 'cnn6':\n",
" logits, Y_hat = forward_propagation_cnn8(X, parameters, training=training)\n",
" elif nn_key == 'cnn7':\n",
" logits, Y_hat = forward_propagation_cnn8(X, parameters, training=training)\n",
" elif nn_key == 'cnn8':\n",
" logits, Y_hat = forward_propagation_cnn8(X, parameters, training=training)\n",
" else:\n",
" KeyError('Provided nn_key doesn\\'t match with any model')\n",
" \n",
" cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=Y))\n",
" \n",
" return cost, Y_hat"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pre-requisites for training\n",
"\n",
"Here are some procedures that are necessary to execute before the actual training."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create placeholders\n",
"\n",
"Tensorflow functions take input in the form of `feed_dict`. The variables in other functions are placeholders for the actual input."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def create_placeholders(n_x, n_y):\n",
" \"\"\"\n",
" Creates Tensorflow placeholders that act for input data and their labels\n",
" \n",
" Arguments:\n",
" n_x - no. of features for X\n",
" n_x - no. of classes for Y\n",
" \n",
" Returns:\n",
" X - placeholder for data that contains input featurs,\n",
" shape: (no. of examples, no. of features). No. of examples is set to None\n",
" Y - placeholder for data that contains output class labels,\n",
" shape (no. of examples, no. of classes). No. of examples is set ot None\n",
" \"\"\"\n",
" \n",
" X = tf.placeholder(tf.float32, name='X', shape=(None, n_x, 1))\n",
" Y = tf.placeholder(tf.float32, name='Y', shape=(None, n_y))\n",
" is_train = tf.placeholder(tf.bool, name='is_train')\n",
" \n",
" return X,Y,is_train"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parameter shapes\n",
"\n",
"To initialize model parameters, we've created a procedure above. It takes as an argument a dictionary in which we supply the model parameter shapes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def parameter_shapes(nn_key):\n",
" \"\"\"\n",
" Get tha shapes of all parameters used in the model.\n",
" Convolutional layer parameter shapes (filters) are in list format\n",
" \n",
" Arguments:\n",
" nn_key - Provide the key for the neural network model used\n",
" could be, 'cnn1', 'cnn2'\n",
" \n",
" Returns:\n",
" param_shapes - dict that contains all the parameters as follows\n",
" CONV1_W, CONV2_W, CONV3_W\n",
" param_values:\n",
" CONV1_Str, CONV2_Str, CONV3_Str,\n",
" FC1_units, DO_prob, output_classes\n",
" \"\"\"\n",
" \n",
" param_shapes = {}\n",
" param_values = {}\n",
" \n",
" do_prob = {\n",
" 'cnn1': 0.5,\n",
" 'cnn2': 0.3,\n",
" 'cnn3': 0.3,\n",
" 'cnn4': 0.9,\n",
" 'cnn5': 0.5,\n",
" 'cnn6': 0.7,\n",
" 'cnn7': 0.3,\n",
" 'cnn8': 0.3\n",
" }\n",
" \n",
" do_prob_middle_layer = {\n",
" 'cnn1': 0, # not used\n",
" 'cnn2': 0, # not used\n",
" 'cnn3': 0.8,\n",
" 'cnn4': 0, # not used\n",
" 'cnn5': 0, # not used\n",
" 'cnn6': 0, # not used\n",
" 'cnn7': 0, # not used\n",
" 'cnn8': 0 # not used\n",
" }\n",
" \n",
" fc1_units = {\n",
" 'cnn1': 20,\n",
" 'cnn2': 15,\n",
" 'cnn3': 15,\n",
" 'cnn4': 15,\n",
" 'cnn5': 20,\n",
" 'cnn6': 15,\n",
" 'cnn7': 15,\n",
" 'cnn8': 10\n",
" }\n",
"\n",
" # Conv Layer 1 parameter shapes\n",
" # No. of channels: 24, Filter size: 5, Stride: 3\n",
" param_shapes['CONV1_W'] = [5, 1, 24]\n",
" param_values['CONV1_Str'] = 3\n",
" \n",
" # Conv Layer 2 parameter shapes\n",
" # No. of channels: 16, Filter size: 3, Stride: 2\n",
" param_shapes['CONV2_W'] = [3, 24, 16]\n",
" param_values['CONV2_Str'] = 2\n",
" \n",
" # Dropout after the convolutional layer 2\n",
" # Not used in some cases\n",
" param_values['DO_prob_middle_layer'] = do_prob_middle_layer[nn_key]\n",
" \n",
" # Conv Layer 3 parameter shapes\n",
" # No. of channels: 8, Filter size: 3, Stride: 2\n",
" param_shapes['CONV3_W'] = [3, 16, 8]\n",
" param_values['CONV3_Str'] = 2\n",
" \n",
" # Fully connected layer 1 units = 20\n",
" param_values['FC1_units'] = fc1_units[nn_key]\n",
" \n",
" # Dropout layer after fully connected layer 1 probability\n",
" param_values['DO_prob'] = do_prob[nn_key]\n",
" \n",
" # Fully connected layer 2 units (also last layer)\n",
" # No. of units = no. of output classes = 3\n",
" param_values['output_classes'] = 3\n",
" \n",
" return param_shapes, param_values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Random mini-batches\n",
"\n",
"For each epoch we'll use different sets of mini-batches to avoid any possible overfitting."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def random_mini_batches(X, Y, mini_batch_size = 64):\n",
" \"\"\"\n",
" Creates a list of random minibatches from (X, Y)\n",
" \n",
" Arguments:\n",
" X -- input data, of shape (number of examples, window size) (m, n_x)\n",
" Y -- output classes, of shape (number of examples, output classes) (m, n_y)\n",
" mini_batch_size - size of the mini-batches, integer\n",
" Returns:\n",
" mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)\n",
" \"\"\"\n",
" \n",
" m = X.shape[0] # number of training examples\n",
" mini_batches = []\n",
" \n",
" # Step 1: Shuffle (X, Y)\n",
" permutation = list(np.random.permutation(m))\n",
" shuffled_X = X[permutation,:,:]\n",
" shuffled_Y = Y[permutation,:]\n",
"\n",
" # Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.\n",
" num_complete_minibatches = np.floor(m/mini_batch_size).astype(int) # number of mini batches of size mini_batch_size in your partitionning\n",
" for k in range(0, num_complete_minibatches):\n",
" mini_batch_X = shuffled_X[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:,:]\n",
" mini_batch_Y = shuffled_Y[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:]\n",
" mini_batch = (mini_batch_X, mini_batch_Y)\n",
" mini_batches.append(mini_batch)\n",
" \n",
" # Handling the end case (last mini-batch < mini_batch_size)\n",
" if m % mini_batch_size != 0:\n",
" mini_batch_X = shuffled_X[num_complete_minibatches * mini_batch_size : m,:,:]\n",
" mini_batch_Y = shuffled_Y[num_complete_minibatches * mini_batch_size : m,:]\n",
" mini_batch = (mini_batch_X, mini_batch_Y)\n",
" mini_batches.append(mini_batch)\n",
" \n",
" return mini_batches"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plotting costs\n",
"\n",
"At the end of each training session, we'll plot the learning curves for training and dev sets."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def plot_costs(costs, dev_costs, learning_rate, total_epochs):\n",
" # plot the cost\n",
" plt.plot(costs, color='blue', label='training')\n",
" plt.plot(dev_costs, color='green', label='dev')\n",
" plt.ylabel('cost')\n",
" plt.xlabel('iterations')\n",
" plt.title(\"Learning rate = %f\\nTotal Epochs = %i\" % (learning_rate, total_epochs))\n",
" plt.legend()\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Training data\n",
"\n",
"To train the model we used Adam optimizer with mini batches. Training each model to 500 epochs is sufficient enough.\n",
"\n",
"Whlie calling the model tune the function parameters, set `nn_key` to one of cnn1, cnn2, cnn3, and so on.\n",
"\n",
"The authors use `model_file` and `save_session_path` as per the following format:<br>\n",
"**model_file**: \"model\" for model file name model.meta<br>\n",
"**save_session_path**: use `\"train/dataset-512-1/cnn8_lr-0.00002_mbs-128/\"` for window_size 512, dataset permutation number 1 (as generated by `read-and-visualize-data` file in this project). Model 'cnn8', learning rate (0.00002), and mini-batch size 128\n",
"\n",
"In case you want continue training a previously saved model, use `restore_session=True`\n",
"\n",
"\n",
"Read [UPDATE_OPS](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/batch_norm) for Batch Normalization layer. Due to some errors faced during training we disabled UPDATE_OPS"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def model(X_train, Y_train, X_dev, Y_dev,\n",
" learning_rate = 0.009, num_epochs = 100, minibatch_size = 64, print_cost = True,\n",
" save_session_path=None, model_file=None, restore_session=False, save_session_interval=5, max_to_keep=10,\n",
" nn_key='cnn1'):\n",
" \"\"\"\n",
" \n",
" Arguments:\n",
" X_train -- training set, of shape (None, 64, 64, 3)\n",
" Y_train -- test set, of shape (None, n_y = 6)\n",
" X_test -- training set, of shape (None, 64, 64, 3)\n",
" Y_test -- test set, of shape (None, n_y = 6)\n",
" learning_rate -- learning rate of the optimization\n",
" num_epochs -- number of epochs of the optimization loop\n",
" minibatch_size -- size of a minibatch\n",
" print_cost -- True to print the cost every 100 epochs\n",
" restore_session -- load previously trained model whose path is derived from save_session_path and model_file\n",
" max_to_keep -- no. of models to be saved\n",
" nn_key -- can be one of cnn1, cnn2, cnn3 ... cnn8 (or the keys described in the literature)\n",
" \n",
" Returns:\n",
" train_accuracy -- real number, accuracy on the train set (X_train)\n",
" test_accuracy -- real number, testing accuracy on the test set (X_test)\n",
" parameters -- parameters learnt by the model. They can then be used to predict.\n",
" \"\"\"\n",
" \n",
" ops.reset_default_graph() # to be able to rerun the model without overwriting tf variables\n",
" (m, n_x,_) = X_train.shape \n",
" n_y = Y_train.shape[1] \n",
" costs = [] # To keep track of the cost\n",
" dev_costs = []\n",
" \n",
" model_path = None\n",
" if (save_session_path != None and model_file != None):\n",
" model_path = save_session_path + model_file\n",
" \n",
" \n",
" # Create Placeholders of the correct shape\n",
" X, Y, is_train = create_placeholders(n_x, n_y)\n",
"\n",
" # Initialize parameters\n",
" param_shapes, param_values = parameter_shapes(nn_key)\n",
" parameters = initialize_parameters(param_shapes, param_values)\n",
" \n",
" # Forward propagation: Build the forward propagation in the tensorflow graph\n",
" # Prediction: Use Y_hat to compute the output class during prediction\n",
" cost, Y_hat = compute_cost(X, Y, parameters, nn_key, is_train)\n",
" \n",
" # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer that minimizes the cost.\n",
" optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)\n",
" # optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate).minimize(cost)\n",
" \n",
" # For saving / restoring sesison when training for long\n",
" epoch_counter = tf.get_variable('epoch_counter', shape=[], initializer=tf.zeros_initializer)\n",
" counter_op = tf.assign_add(epoch_counter, 1)\n",
" saver = tf.train.Saver(max_to_keep=max_to_keep)\n",
" \n",
" # Calculate the correct predictions\n",
" predict_op = tf.argmax(Y_hat, 1)\n",
" correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1))\n",
"\n",
" # Calculate accuracy on the test set\n",
" accuracy = tf.reduce_mean(tf.cast(correct_prediction, \"float\"))\n",
" \n",
" # For impementation of batch norm the tf.GraphKeys.UPDATE_OPS dependency needs to be added\n",
" # see documentation on tf.contrib.layers.batch_norm\n",
"# update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)\n",
" \n",
" # Initialize all the variables globally\n",
" init = tf.global_variables_initializer()\n",
" \n",
" # Start the session to compute the tensorflow graph\n",
" with tf.Session() as sess: #, tf.control_dependencies(update_ops):\n",
" \n",
" if (restore_session == False and path.exists(save_session_path)):\n",
" raise FileExistsError('Session already exists, either restore the session, or manually delete the files.')\n",
" \n",
" # restore the previous session if the path already exists\n",
" if (model_path != None and restore_session==True):\n",
" print(\"Restoring session...\\n\")\n",
" saver.restore(sess, model_path)\n",
" print(\"Previous epoch counter: %i\\n\\n\" % epoch_counter.eval())\n",
" else:\n",
" sess.run(init)\n",
" \n",
" tf.train.export_meta_graph(model_path + '.meta') # save the model file (.meta) only once\n",
" \n",
" print(\"Cost at start: %f\" % cost.eval({X: X_train, Y: Y_train, is_train: False}))\n",
" print(\"Dev cost: %f\" % cost.eval({X: X_dev, Y: Y_dev, is_train: False}))\n",
" \n",
" train_accuracy = accuracy.eval({X: X_train, Y: Y_train, is_train: False})\n",
" dev_accuracy = accuracy.eval({X: X_dev, Y: Y_dev, is_train: False})\n",
" print(\"Train Accuracy:\", train_accuracy)\n",
" print(\"Dev Accuracy:\", dev_accuracy)\n",
" \n",
" # Do the training loop\n",
" for epoch in range(num_epochs):\n",
"\n",
" epoch_cost = 0.\n",
" num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set\n",
" minibatches = random_mini_batches(X_train, Y_train, minibatch_size)\n",
"\n",
" for minibatch in minibatches:\n",
" \n",
" try:\n",
"\n",
" # Select a minibatch\n",
" (minibatch_X, minibatch_Y) = minibatch\n",
"\n",
" # IMPORTANT: The line that runs the graph on a minibatch.\n",
" # Run the session to execute the optimizer and the cost, the feedict should contain a minibatch for (X,Y).\n",
" _,minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y, is_train: True})\n",
"\n",
" epoch_cost += minibatch_cost / num_minibatches\n",
" \n",
" # Implement early stopping mechanism on KeyboardInterrupt\n",
" except KeyboardInterrupt:\n",
" print(\"KeyboardInterrupt received. Stopping early\")\n",
" plot_costs(np.squeeze(costs), np.squeeze(dev_costs), learning_rate, epoch_counter.eval())\n",
" return parameters\n",
" \n",
" \n",
" if (epoch % save_session_interval == 0 and save_session_path != None):\n",
" saver.save(sess, model_path, write_meta_graph=False)\n",
" \n",
" # Save the costs after each epoch for plotting learning curve\n",
" if print_cost == True and epoch % 1 == 0:\n",
" costs.append(epoch_cost)\n",
" dev_cost = cost.eval({X: X_dev, Y: Y_dev, is_train: False})\n",
" dev_costs.append(dev_cost)\n",
" \n",
" \n",
" # Print the cost every epoch\n",
" if print_cost == True and (epoch + 1) % 5 == 0:\n",
" print (\"\\nCost after epoch %i: %f\" % (epoch + 1, epoch_cost))\n",
" print (\"Dev cost after epoch %i: %f\" % (epoch + 1, dev_cost))\n",
" \n",
" train_accuracy = accuracy.eval({X: X_train, Y: Y_train, is_train: False})\n",
" dev_accuracy = accuracy.eval({X: X_dev, Y: Y_dev, is_train: False})\n",
" print(\"Train Accuracy:\", train_accuracy)\n",
" print(\"Dev Accuracy:\", dev_accuracy)\n",
" \n",
" # increment the epoch_counter in case the session is saved\n",
" # and restored later\n",
" sess.run(counter_op)\n",
" \n",
" \n",
" if (save_session_path != None):\n",
" saver.save(sess, model_path, write_meta_graph=False)\n",
" \n",
" \n",
" plot_costs(np.squeeze(costs), np.squeeze(dev_costs), learning_rate, epoch_counter.eval())\n",
"\n",
" # Calculate the correct predictions\n",
" train_accuracy = accuracy.eval({X: X_train, Y: Y_train, is_train: False})\n",
" dev_accuracy = accuracy.eval({X: X_dev, Y: Y_dev, is_train: False})\n",
" print(\"Train Accuracy:\", train_accuracy)\n",
" print(\"Dev Accuracy:\", dev_accuracy)\n",
" \n",
" return parameters\n",
" \n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"parameters = model(X_train, Y_train, X_dev, Y_dev,\n",
" learning_rate=0.00002,\n",
" num_epochs=500,\n",
" minibatch_size=128,\n",
" save_session_path='train/dataset-512-1/cnn8_lr-0.00002_mbs-128/',\n",
" model_file='model',\n",
" restore_session=False,\n",
" save_session_interval=10,\n",
" nn_key='cnn8')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Prediction and restoring saved model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def predict(X_test, session_path, model_file, Y_test_onehot=None):\n",
"\n",
" tf.reset_default_graph()\n",
"\n",
" checkpoint_path = session_path\n",
" model_path = session_path + model_file\n",
"\n",
" with tf.Session() as sess:\n",
" loader = tf.train.import_meta_graph(model_path)\n",
" loader.restore(sess, tf.train.latest_checkpoint(checkpoint_path))\n",
"\n",
" graph = tf.get_default_graph()\n",
"\n",
" X = graph.get_tensor_by_name('X:0')\n",
" Y = graph.get_tensor_by_name('Y:0')\n",
" is_train = graph.get_tensor_by_name('is_train:0')\n",
" \n",
" epoch_counter = graph.get_tensor_by_name('epoch_counter:0')\n",
" print(epoch_counter.eval())\n",
"\n",
" Y_hat = graph.get_tensor_by_name('softmax_output:0')\n",
"\n",
" predict_op = tf.argmax(Y_hat, 1)\n",
"\n",
" y_hat_test = predict_op.eval({X: X_test, is_train: False})\n",
" \n",
" # print the accuracy of the test set if the labels are provided\n",
" if (Y_test_onehot is not None):\n",
" y_test = np.argmax(Y_test_onehot, 1)\n",
" print('Accuracy: %f' % (y_hat_test == y_test).mean())\n",
" \n",
"\n",
" return y_hat_test\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def predict_voting(X_test_voting, session_path, model_file):\n",
"\n",
" tf.reset_default_graph()\n",
"\n",
" checkpoint_path = session_path\n",
" model_path = session_path + model_file\n",
" \n",
" y_hat_test_voting = []\n",
"\n",
" with tf.Session() as sess:\n",
" loader = tf.train.import_meta_graph(model_path)\n",
" loader.restore(sess, tf.train.latest_checkpoint(checkpoint_path))\n",
"\n",
" graph = tf.get_default_graph()\n",
"\n",
" X = graph.get_tensor_by_name('X:0')\n",
" is_train = graph.get_tensor_by_name('is_train:0')\n",
"\n",
" Y_hat = graph.get_tensor_by_name('softmax_output:0')\n",
"\n",
" predict_op = tf.argmax(Y_hat, 1)\n",
" \n",
" classname, idx, counts = tf.unique_with_counts(predict_op)\n",
" predict_voting_op = tf.gather(classname, tf.argmax(counts))\n",
"\n",
" # no. of training examples with the original feature size\n",
" m = X_test_voting.shape[0]\n",
" \n",
" # no. of split training examples of each original example\n",
" m_each = X_test_voting.shape[1]\n",
" \n",
" for ex in range(m):\n",
" x_test_voting = make_dimensions_compatible(X_test_voting[ex])\n",
" pred = predict_voting_op.eval({X: x_test_voting, is_train: False})\n",
" \n",
" y_hat_test_voting.append(pred)\n",
"\n",
" return y_hat_test_voting"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"predictions = predict(X_dev, 'train/dataset-1024-6/cnn6_lr-0.00002_mbs-128/', 'model.meta', Y_test_onehot=Y_dev)\n",
"\n",
"print(\"\\nPredicted values:\")\n",
"print(predictions)\n",
"\n",
"print(\"\\nActual values:\")\n",
"print(np.argmax(Y_test, 1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Evaluating test set for accuracy with voting"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"testfile = dataset_relative_path + 'testset_voting_1024.h5'\n",
"session_path = 'train/dataset-1024-5/cnn6_lr-0.00002_mbs-128/'\n",
"model_file = 'model.meta'\n",
"\n",
"with h5.File(testfile, 'r') as testfile:\n",
" X_test_voting = testfile['X']\n",
" X_test_voting = np.array(X_test_voting) / 1000\n",
" y_test_voting = np.array(testfile['Y'])\n",
" \n",
" y_hat_test_voting = predict_voting(X_test_voting, session_path, model_file)\n",
" \n",
" print(\"Accuracy with voting: %f\" % (y_test_voting == y_hat_test_voting).mean())\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"print(y_hat_test_voting)\n",
"print(y_test_voting)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For more options on accuracy measures go to `prediction-and-accuracy-measures` file in this project."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}