--- a +++ b/demo/notebooks/demo_modelcomp_pvalue.ipynb @@ -0,0 +1,149 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Statistical comparison of model performance" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this notebook, we will describe how we compare the accuracy of the 2 predictive models used in this study: the deep learning network (4DSurvival) and the conventional parameter model (utilizing volumetric indices of RV function)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To compare 2 models, we compare their optimism-corrected concordance indices (which are derived by the bootstrap internal validation procedure outlined in our paper). From equation (9) in the paper, the optimism-corrected concordance index for each model is given by: $$C_{corrected} = C_{full}^{full} - \\frac{1}{B}\\sum_{b=1}^{B} \\bigg( C_{b}^{b} - C_{b}^{full} \\bigg)$$\n", + "In the above equation, the symbol $C_{s_{1}}^{s_{2}}$ refers to the concordance index of a model trained on sample $s_1$ and tested on sample $s_2$. The first term refers to the *apparent* predictive accuracy, i.e. the (inflated) concordance index obtained when a model trained on the full sample is then tested on the same sample. The second term is the average *optimism* (difference between *bootstrap performance* and *test performance*) over the $B$ bootstrap samples.\n", + "Note that we can rewrite the equation above as: $$C_{corrected} = \\frac{1}{B}\\sum_{b=1}^{B} \\bigg[ C_{full}^{full} - \\bigg( C_{b}^{b} - C_{b}^{full} \\bigg) \\bigg]$$" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this formulation, we can think of the summand (the term within the summation) as the optimism-corrected concordance index based on only 1 particular bootstrap sample $b$ ($C_{b,corrected}$). Averaging this quantity across $b=\\{1,...,B\\}$ bootstrap samples gives $C_{corrected}$. \n", + "To compare $C_{corrected}$ between 2 competing models, we perform a statistical test comparing the distributions of $C_{b,corrected}$ ($b=\\{1,...,B\\}$) between the 2 models. We use the Wilcoxon rank-sum test for this purpose. This is implemented in the code below:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "------------------------------------------------------------------------------------------------------\n", + "Import required libraries:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "deletable": false, + "editable": false}, + "outputs": [], + "source": [ + "import scipy\n", + "from scipy.stats import wilcoxon\n", + "import numpy as np\n", + "import pickle" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For each model under comparison, read in bootstrap data and compute $C_{b,corrected}$ ($b = \\{1,...,B\\}$):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "deletable": false, + "editable": false}, + "outputs": [], + "source": [ + "def p_reader(pfile):\n", + " with open(pfile, 'rb') as f: mlist = pickle.load(f)\n", + " return mlist[0], mlist[1]\n", + "\n", + "C_app_model1, opts_model1 = p_reader('../data/modelCstats_DL.pkl')\n", + "C_app_model2, opts_model2 = p_reader('../data/modelCstats_conv.pkl')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "deletable": false, + "editable": false}, + "outputs": [], + "source": [ + "Cb_adjs_model1 = [C_app_model1 - o for o in opts_model1]\n", + "Cb_adjs_model2 = [C_app_model2 - o for o in opts_model2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Perform Wilcoxon signed-rank test and compute p-value using the:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "deletable": false, + "editable": false}, + "outputs": [], + "source": [ + "pval = wilcoxon(Cb_adjs_model1, Cb_adjs_model2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Print output:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "deletable": false, + "editable": false}, + "outputs": [], + "source": [ + "print('Model 1 optimism-adjusted concordance index = {0:.4f}\\nModel 2 optimism-adjusted concordance index = {1:.4f}\\np-value = {2}'.format(np.mean(Cb_adjs_model1), np.mean(Cb_adjs_model2), pval.pvalue))" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}