--- a
+++ b/demo/notebooks/demo_validate.ipynb
@@ -0,0 +1,307 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Bootstrap Internal Validation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this notebook, we will describe the code that implements the procedure we use to perform bootstrap-based internal validation of the models compared in our paper. We show the application of this validation procedure to the conventional parameter model. A similar procedure was applied to the deep learning model. \n",
+    "\n",
+    "In order to get a sense of how well our model would generalize to an external validation cohort, we assessed its predictive accuracy within the training sample using a bootstrap-based procedure recommended in the guidelines for *Transparent Reporting of a multivariable model for Individual Prognosis Or Diagnosis (Tripod)*. This procedure attempts to derive realistic, 'optimism-adjusted' estimates of the model's generalization accuracy using the training sample\n",
+    "\n",
+    "We first import required libraries:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "import os, sys, pickle\n",
+    "import optunity, lifelines\n",
+    "from lifelines.utils import concordance_index\n",
+    "import numpy as np"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we import the functions required to train the conventional parameter model:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "sys.path.insert(0, '../code')\n",
+    "from CoxReg_Single_run import *\n",
+    "from hypersearch import *"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Import data, where `x_full` is the $n \\times p$ input matrix of volumetric measures ($n$=sample size, $p$=dimensionality of input vector), `y_full` is an $n \\times 2$ matrix of outcomes where column 1 represents censoring status and column 2 represents survival/censoring time. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "with open('../data/inputdata_conv.pkl', 'rb') as f: c3 = pickle.load(f)\n",
+    "x_full = c3[0]\n",
+    "y_full = c3[1]\n",
+    "del c3"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Initialize empty lists to store predictions and performance measures:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "preds_bootfull = []\n",
+    "inds_inbag = []\n",
+    "Cb_opts  = []"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we implement each step of the bootstrap internal validation procedure, as outlined in the manuscript:\n",
+    "\n",
+    "### Step 1\n",
+    "Train a prediction model on the full sample:\n",
+    "#### 1(a) : Find optimal hyperparameters"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "opars, osummary = hypersearch_cox(x_data=x_full, y_data=y_full, method='particle swarm', nfolds=6, nevals=50, penalty_range=[-2,1])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 1(b) : using optimal hyperparameters, train a model on full sample"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "omod = coxreg_single_run(xtr=x_full, ytr=y_full, penalty=10**opars['penalty'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 1(c) : Compute Harrell's Concordance index ($C_{full}^{full}$)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "predfull = omod.predict_partial_hazard(x_full)\n",
+    "C_app = concordance_index(y_full[:,1], -predfull, y_full[:,0])\n",
+    "\n",
+    "print('\\n\\n==================================================')\n",
+    "print('Apparent concordance index = {0:.4f}'.format(C_app))\n",
+    "print('==================================================\\n\\n')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`C_app` ($C_{full}^{full}$) represents the apparent predictive accuracy, i.e. the inflated accuracy obtained when a model is tested on the same sample on which it was trained/optimized \n",
+    "\n",
+    "In the next steps, we use bootstrap sampling to estimate the optimism, which we then use to adjust the apparent predictive accuracy.\n",
+    "\n",
+    "#### Bootstrap sampling\n",
+    "We will take B = 100 bootstrap samples"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "#define useful variables\n",
+    "nsmp = len(x_full)\n",
+    "rowids = [_ for _ in range(nsmp)]\n",
+    "B = 100\n",
+    "\n",
+    "for b in range(B):\n",
+    "    print('\\n-------------------------------------')\n",
+    "    print('Current bootstrap sample:', b, 'of', B-1)\n",
+    "    print('-------------------------------------')\n",
+    "\n",
+    "    #STEP 2: Generate a bootstrap sample by doing n random selections with replacement (where n is the sample size)\n",
+    "    b_inds = np.random.choice(rowids, size=nsmp, replace=True)\n",
+    "    xboot = x_full[b_inds]\n",
+    "    yboot = y_full[b_inds]\n",
+    "\n",
+    "    #(2a) find optimal hyperparameters\n",
+    "    bpars, bsummary = hypersearch_cox(x_data=xboot, y_data=yboot, method='particle swarm', nfolds=6, nevals=50, penalty_range=[-2,1])\n",
+    "    \n",
+    "    #(2b) using optimal hyperparameters, train a model on bootstrap sample\n",
+    "    bmod = coxreg_single_run(xtr=xboot, ytr=yboot, penalty=10**bpars['penalty'])\n",
+    "    \n",
+    "    #(2c[i])  Using bootstrap-trained model, compute predictions on bootstrap sample. Evaluate accuracy of predictions (Harrell's Concordance index)\n",
+    "    predboot = bmod.predict_partial_hazard(xboot)\n",
+    "    Cb_boot = concordance_index(yboot[:,1], -predboot, yboot[:,0])\n",
+    "    \n",
+    "    #(2c[ii]) Using bootstrap-trained model, compute predictions on FULL sample.     Evaluate accuracy of predictions (Harrell's Concordance index)\n",
+    "    predbootfull = bmod.predict_partial_hazard(x_full)\n",
+    "    Cb_full = concordance_index(y_full[:,1], -predbootfull, y_full[:,0])\n",
+    "\n",
+    "    #STEP 3: Compute optimism for bth bootstrap sample, as difference between results from 2c[i] and 2c[ii]\n",
+    "    Cb_opt = Cb_boot - Cb_full\n",
+    "    \n",
+    "    #store data on current bootstrap sample (predictions, C-indices)\n",
+    "    preds_bootfull.append(predbootfull)\n",
+    "    inds_inbag.append(b_inds)\n",
+    "    Cb_opts.append(Cb_opt)\n",
+    "\n",
+    "    del bpars, bmod"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we compute bootstrap-estimated optimism, by averaging the optimism estimates across the B bootstrap samples: $$\\frac{1}{B}\\sum_{b=1}^{B} \\bigg( C_{b}^{b} - C_{b}^{full} \\bigg)$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "C_opt = np.mean(Cb_opts)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we adjust the apparent C using the bootstrap-estimated optimism:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "C_adj = C_app - C_opt"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we compute confidence intervals for optimism-adjusted C:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "C_opt_95confint = np.percentile([C_app - o for o in Cb_opts], q=[2.5, 97.5])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "print('Optimism bootstrap estimate = {0:.4f}'.format(C_opt))\n",
+    "print('Optimism-adjusted concordance index = {0:.4f}, and 95% CI = {1}'.format(C_adj, C_opt_95confint))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}