308 lines (307 with data), 9.3 kB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bootstrap Internal Validation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook, we will describe the code that implements the procedure we use to perform bootstrap-based internal validation of the models compared in our paper. We show the application of this validation procedure to the conventional parameter model. A similar procedure was applied to the deep learning model. \n",
"\n",
"In order to get a sense of how well our model would generalize to an external validation cohort, we assessed its predictive accuracy within the training sample using a bootstrap-based procedure recommended in the guidelines for *Transparent Reporting of a multivariable model for Individual Prognosis Or Diagnosis (Tripod)*. This procedure attempts to derive realistic, 'optimism-adjusted' estimates of the model's generalization accuracy using the training sample\n",
"\n",
"We first import required libraries:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false},
"outputs": [],
"source": [
"import os, sys, pickle\n",
"import optunity, lifelines\n",
"from lifelines.utils import concordance_index\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we import the functions required to train the conventional parameter model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false},
"outputs": [],
"source": [
"sys.path.insert(0, '../code')\n",
"from CoxReg_Single_run import *\n",
"from hypersearch import *"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Import data, where `x_full` is the $n \\times p$ input matrix of volumetric measures ($n$=sample size, $p$=dimensionality of input vector), `y_full` is an $n \\times 2$ matrix of outcomes where column 1 represents censoring status and column 2 represents survival/censoring time. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false},
"outputs": [],
"source": [
"with open('../data/inputdata_conv.pkl', 'rb') as f: c3 = pickle.load(f)\n",
"x_full = c3[0]\n",
"y_full = c3[1]\n",
"del c3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Initialize empty lists to store predictions and performance measures:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false},
"outputs": [],
"source": [
"preds_bootfull = []\n",
"inds_inbag = []\n",
"Cb_opts = []"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we implement each step of the bootstrap internal validation procedure, as outlined in the manuscript:\n",
"\n",
"### Step 1\n",
"Train a prediction model on the full sample:\n",
"#### 1(a) : Find optimal hyperparameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false},
"outputs": [],
"source": [
"opars, osummary = hypersearch_cox(x_data=x_full, y_data=y_full, method='particle swarm', nfolds=6, nevals=50, penalty_range=[-2,1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1(b) : using optimal hyperparameters, train a model on full sample"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false},
"outputs": [],
"source": [
"omod = coxreg_single_run(xtr=x_full, ytr=y_full, penalty=10**opars['penalty'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1(c) : Compute Harrell's Concordance index ($C_{full}^{full}$)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false},
"outputs": [],
"source": [
"predfull = omod.predict_partial_hazard(x_full)\n",
"C_app = concordance_index(y_full[:,1], -predfull, y_full[:,0])\n",
"\n",
"print('\\n\\n==================================================')\n",
"print('Apparent concordance index = {0:.4f}'.format(C_app))\n",
"print('==================================================\\n\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`C_app` ($C_{full}^{full}$) represents the apparent predictive accuracy, i.e. the inflated accuracy obtained when a model is tested on the same sample on which it was trained/optimized \n",
"\n",
"In the next steps, we use bootstrap sampling to estimate the optimism, which we then use to adjust the apparent predictive accuracy.\n",
"\n",
"#### Bootstrap sampling\n",
"We will take B = 100 bootstrap samples"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false},
"outputs": [],
"source": [
"#define useful variables\n",
"nsmp = len(x_full)\n",
"rowids = [_ for _ in range(nsmp)]\n",
"B = 100\n",
"\n",
"for b in range(B):\n",
" print('\\n-------------------------------------')\n",
" print('Current bootstrap sample:', b, 'of', B-1)\n",
" print('-------------------------------------')\n",
"\n",
" #STEP 2: Generate a bootstrap sample by doing n random selections with replacement (where n is the sample size)\n",
" b_inds = np.random.choice(rowids, size=nsmp, replace=True)\n",
" xboot = x_full[b_inds]\n",
" yboot = y_full[b_inds]\n",
"\n",
" #(2a) find optimal hyperparameters\n",
" bpars, bsummary = hypersearch_cox(x_data=xboot, y_data=yboot, method='particle swarm', nfolds=6, nevals=50, penalty_range=[-2,1])\n",
" \n",
" #(2b) using optimal hyperparameters, train a model on bootstrap sample\n",
" bmod = coxreg_single_run(xtr=xboot, ytr=yboot, penalty=10**bpars['penalty'])\n",
" \n",
" #(2c[i]) Using bootstrap-trained model, compute predictions on bootstrap sample. Evaluate accuracy of predictions (Harrell's Concordance index)\n",
" predboot = bmod.predict_partial_hazard(xboot)\n",
" Cb_boot = concordance_index(yboot[:,1], -predboot, yboot[:,0])\n",
" \n",
" #(2c[ii]) Using bootstrap-trained model, compute predictions on FULL sample. Evaluate accuracy of predictions (Harrell's Concordance index)\n",
" predbootfull = bmod.predict_partial_hazard(x_full)\n",
" Cb_full = concordance_index(y_full[:,1], -predbootfull, y_full[:,0])\n",
"\n",
" #STEP 3: Compute optimism for bth bootstrap sample, as difference between results from 2c[i] and 2c[ii]\n",
" Cb_opt = Cb_boot - Cb_full\n",
" \n",
" #store data on current bootstrap sample (predictions, C-indices)\n",
" preds_bootfull.append(predbootfull)\n",
" inds_inbag.append(b_inds)\n",
" Cb_opts.append(Cb_opt)\n",
"\n",
" del bpars, bmod"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we compute bootstrap-estimated optimism, by averaging the optimism estimates across the B bootstrap samples: $$\\frac{1}{B}\\sum_{b=1}^{B} \\bigg( C_{b}^{b} - C_{b}^{full} \\bigg)$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false},
"outputs": [],
"source": [
"C_opt = np.mean(Cb_opts)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we adjust the apparent C using the bootstrap-estimated optimism:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false},
"outputs": [],
"source": [
"C_adj = C_app - C_opt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we compute confidence intervals for optimism-adjusted C:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false},
"outputs": [],
"source": [
"C_opt_95confint = np.percentile([C_app - o for o in Cb_opts], q=[2.5, 97.5])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false},
"outputs": [],
"source": [
"print('Optimism bootstrap estimate = {0:.4f}'.format(C_opt))\n",
"print('Optimism-adjusted concordance index = {0:.4f}, and 95% CI = {1}'.format(C_adj, C_opt_95confint))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}