4Dsurvival / Git / Diff of /demo/notebooks/demo_modelcomp

Models:
tobiasharvey/
4Dsurvival
Downloads: 1
Diff of /demo/notebooks/demo_modelcomp_pvalue.ipynb [000000] .. [2cc208]
Switch to side-by-side view

--- a
+++ b/demo/notebooks/demo_modelcomp_pvalue.ipynb
@@ -0,0 +1,149 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Statistical comparison of model performance"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this notebook, we will describe how we compare the accuracy of the 2 predictive models used in this study: the deep learning network (4DSurvival) and the conventional parameter model (utilizing volumetric indices of RV function)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To compare 2 models, we compare their optimism-corrected concordance indices (which are derived by the bootstrap internal validation procedure outlined in our paper). From equation (9) in the paper, the optimism-corrected concordance index for each model is given by: $$C_{corrected} = C_{full}^{full} - \\frac{1}{B}\\sum_{b=1}^{B} \\bigg( C_{b}^{b} - C_{b}^{full} \\bigg)$$\n",
+    "In the above equation, the symbol $C_{s_{1}}^{s_{2}}$ refers to the concordance index of a model trained on sample $s_1$ and tested on sample $s_2$. The first term refers to the *apparent* predictive accuracy, i.e. the (inflated) concordance index obtained when a model trained on the full sample is then tested on the same sample. The second term is the average *optimism* (difference between *bootstrap performance* and *test performance*) over the $B$ bootstrap samples.\n",
+    "Note that we can rewrite the equation above as: $$C_{corrected} = \\frac{1}{B}\\sum_{b=1}^{B} \\bigg[ C_{full}^{full} - \\bigg( C_{b}^{b} - C_{b}^{full} \\bigg) \\bigg]$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this formulation, we can think of the summand (the term within the summation) as the optimism-corrected concordance index based on only 1 particular bootstrap sample $b$ ($C_{b,corrected}$). Averaging this quantity across $b=\\{1,...,B\\}$ bootstrap samples gives $C_{corrected}$. \n",
+    "To compare $C_{corrected}$ between 2 competing models, we perform a statistical test comparing the distributions of $C_{b,corrected}$ ($b=\\{1,...,B\\}$) between the 2 models. We use the Wilcoxon rank-sum test for this purpose. This is implemented in the code below:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "------------------------------------------------------------------------------------------------------\n",
+    "Import required libraries:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "import scipy\n",
+    "from scipy.stats import wilcoxon\n",
+    "import numpy as np\n",
+    "import pickle"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For each model under comparison, read in bootstrap data and compute $C_{b,corrected}$ ($b = \\{1,...,B\\}$):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "def p_reader(pfile):\n",
+    "    with open(pfile, 'rb') as f: mlist = pickle.load(f)\n",
+    "    return mlist[0], mlist[1]\n",
+    "\n",
+    "C_app_model1, opts_model1 = p_reader('../data/modelCstats_DL.pkl')\n",
+    "C_app_model2, opts_model2 = p_reader('../data/modelCstats_conv.pkl')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "Cb_adjs_model1 = [C_app_model1 - o for o in opts_model1]\n",
+    "Cb_adjs_model2 = [C_app_model2 - o for o in opts_model2]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Perform Wilcoxon signed-rank test and compute p-value using the:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "pval = wilcoxon(Cb_adjs_model1, Cb_adjs_model2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Print output:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "deletable": false,
+    "editable": false},
+   "outputs": [],
+   "source": [
+    "print('Model 1 optimism-adjusted concordance index = {0:.4f}\\nModel 2 optimism-adjusted concordance index = {1:.4f}\\np-value = {2}'.format(np.mean(Cb_adjs_model1), np.mean(Cb_adjs_model2), pval.pvalue))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}