|
a |
|
b/demo/notebooks/trainDL.ipynb |
|
|
1 |
{ |
|
|
2 |
"cells": [ |
|
|
3 |
{ |
|
|
4 |
"cell_type": "markdown", |
|
|
5 |
"metadata": {}, |
|
|
6 |
"source": [ |
|
|
7 |
"# Deep Learning Training code" |
|
|
8 |
] |
|
|
9 |
}, |
|
|
10 |
{ |
|
|
11 |
"cell_type": "markdown", |
|
|
12 |
"metadata": {}, |
|
|
13 |
"source": [ |
|
|
14 |
"This notebook describes our deep learning function. It sets up the architecture, pre-processed the input data and runs the specified number of epochs.\n", |
|
|
15 |
"We will describe the code step by step." |
|
|
16 |
] |
|
|
17 |
}, |
|
|
18 |
{ |
|
|
19 |
"cell_type": "markdown", |
|
|
20 |
"metadata": {}, |
|
|
21 |
"source": [ |
|
|
22 |
"## Overview\n", |
|
|
23 |
"The deep learning architecture is built using `Keras`, a model-level library that provides high-level building blocks for developing deep learning models. It provides a user-friendly interface to low-level tensor manipulation libraries that handle the computationally intensive tensor operations involved in model fitting (e.g. `Tensorflow`, `Theano`, `CNTK`). For this work, we used `Keras` interfaced with `Tensorflow` as the backend computational engine. \n", |
|
|
24 |
"\n", |
|
|
25 |
"### 1. Importing Libraries \n", |
|
|
26 |
"We begin by importing required libraries: " |
|
|
27 |
] |
|
|
28 |
}, |
|
|
29 |
{ |
|
|
30 |
"cell_type": "code", |
|
|
31 |
"execution_count": null, |
|
|
32 |
"metadata": {}, |
|
|
33 |
"outputs": [], |
|
|
34 |
"source": [ |
|
|
35 |
"import keras\n", |
|
|
36 |
"from keras import backend as K\n", |
|
|
37 |
"from keras.models import Sequential, Model\n", |
|
|
38 |
"from keras.layers import Input\n", |
|
|
39 |
"from keras.layers.core import Dense, Dropout\n", |
|
|
40 |
"from keras.optimizers import Adam\n", |
|
|
41 |
"from keras.regularizers import l1, l2, l1_l2\n", |
|
|
42 |
"import numpy as np\n", |
|
|
43 |
"import pandas as pd\n", |
|
|
44 |
"from sklearn.preprocessing import scale\n", |
|
|
45 |
"import os, sys, time, pickle, copy" |
|
|
46 |
] |
|
|
47 |
}, |
|
|
48 |
{ |
|
|
49 |
"cell_type": "markdown", |
|
|
50 |
"metadata": {}, |
|
|
51 |
"source": [ |
|
|
52 |
"### 2. Data formatting\n", |
|
|
53 |
"Define function that converts input to require formats for analysis" |
|
|
54 |
] |
|
|
55 |
}, |
|
|
56 |
{ |
|
|
57 |
"cell_type": "code", |
|
|
58 |
"execution_count": null, |
|
|
59 |
"metadata": {}, |
|
|
60 |
"outputs": [], |
|
|
61 |
"source": [ |
|
|
62 |
"def prepare_data(x, e, t):\n", |
|
|
63 |
" return (x.astype(\"float32\"), e.astype(\"int32\"), t.astype(\"float32\"))" |
|
|
64 |
] |
|
|
65 |
}, |
|
|
66 |
{ |
|
|
67 |
"cell_type": "markdown", |
|
|
68 |
"metadata": {}, |
|
|
69 |
"source": [ |
|
|
70 |
"### Define `DL_single_run` function\n", |
|
|
71 |
"We define function `DL_single_run` below. The first 2 function arguments are variables `xtr` and `ytr`, which represent the training data. Variable `xtr` is an $n \\times p$ matrix (where $n$ is sample size and $p$ is number of variables [motion descriptors, in our case]). Variable `ytr` contains the survival outcomes for the training data, it is an $n \\times 2$ matrix where the first column is the censoring status and the second is survival/censoring time.\n", |
|
|
72 |
"The `DL_single_run` function also has arguments representing network parameters: `units1` (hidden layer size), `units2` (latent code size), `dro` (dropout), `lr` (learning rate), `l1r` (L1 penalty for activity regularizer), and `alpha` (tradeoff parameter). The final arguments are `batchsize` (batch size) and `numepochs` (number of epochs).\n", |
|
|
73 |
"\n", |
|
|
74 |
"The first steps of the function format the data into the correct numeric types, and sort into minibatches:" |
|
|
75 |
] |
|
|
76 |
}, |
|
|
77 |
{ |
|
|
78 |
"cell_type": "code", |
|
|
79 |
"execution_count": null, |
|
|
80 |
"metadata": {}, |
|
|
81 |
"outputs": [], |
|
|
82 |
"source": [ |
|
|
83 |
"def DL_single_run(xtr, ytr, units1, units2, dro, lr, l1r, alpha, batchsize, numepochs):\n", |
|
|
84 |
" #Data preparation: create X, E and TM where X=input vector, E=censoring status and T=survival time. Apply formatting\n", |
|
|
85 |
" X_tr, E_tr, TM_tr = prepare_data(xtr, ytr[:,0,np.newaxis], ytr[:,1])\n", |
|
|
86 |
"\n", |
|
|
87 |
" #Arrange data into minibatches (based on specified batch size), and within each minibatch, sort in descending order of survival/censoring time (see explanation of Cox PH loss function definition)\n", |
|
|
88 |
" X_tr, E_tr, TM_tr, _ = sort4minibatches(X_tr, E_tr, TM_tr, batchsize)" |
|
|
89 |
] |
|
|
90 |
}, |
|
|
91 |
{ |
|
|
92 |
"cell_type": "markdown", |
|
|
93 |
"metadata": {}, |
|
|
94 |
"source": [ |
|
|
95 |
"### 3. Model Architecture\n", |
|
|
96 |
"Still within the `DL_single_run` function definition, we set up Denoising autoencoder model in *Keras*." |
|
|
97 |
] |
|
|
98 |
}, |
|
|
99 |
{ |
|
|
100 |
"cell_type": "code", |
|
|
101 |
"execution_count": null, |
|
|
102 |
"metadata": {}, |
|
|
103 |
"outputs": [], |
|
|
104 |
"source": [ |
|
|
105 |
" #before defining network architecture, clear current computational graph (if one exists), and specify input dimensionality)\n", |
|
|
106 |
" K.clear_session()\n", |
|
|
107 |
" inpshape = xtr.shape[1]\n", |
|
|
108 |
" \n", |
|
|
109 |
" #Define Network Architecture\n", |
|
|
110 |
" inputvec= Input(shape=(inpshape,))\n", |
|
|
111 |
" x = Dropout(dro, input_shape=(inpshape,))(inputvec)\n", |
|
|
112 |
" x = Dense(units=int(units1), activation='relu', activity_regularizer=l1(10**l1r))(x)\n", |
|
|
113 |
" encoded = Dense(units=int(units2), activation='relu', name='encoded')(x)\n", |
|
|
114 |
" riskpred= Dense(units=1, activation='linear', name='predicted_risk')(encoded)\n", |
|
|
115 |
" z = Dense(units=int(units1), activation='relu')(encoded)\n", |
|
|
116 |
" decoded = Dense(units=inpshape, activation='linear', name='decoded')(z)\n", |
|
|
117 |
"\n", |
|
|
118 |
" model = Model(inputs=inputvec, outputs=[decoded,riskpred])\n", |
|
|
119 |
" model.summary()" |
|
|
120 |
] |
|
|
121 |
}, |
|
|
122 |
{ |
|
|
123 |
"cell_type": "markdown", |
|
|
124 |
"metadata": {}, |
|
|
125 |
"source": [ |
|
|
126 |
"### 4. Define Network prediction loss function\n", |
|
|
127 |
"We use the negative Cox Proportional Hazards Partial Likelihood function: $$L_{s} = -\\sum_{i=1}^{n} \\delta_{i} \\bigg\\{\\bf{W}'\\phi(\\bf{x}_i) - log \\sum_{j \\in R(t_i)} e^{\\bf{W}'\\phi(\\bf{x}_j)} \\bigg\\}$$\n", |
|
|
128 |
"for the survival prediction component of our network. This loss function is not one of the standard losses included in *Keras*. However, *Keras* allows creation of user-defined custom loss functions written using symbolic Keras backend operations, which operate on tensor objects (as opposed to python primitives). We show our user-defined function below." |
|
|
129 |
] |
|
|
130 |
}, |
|
|
131 |
{ |
|
|
132 |
"cell_type": "code", |
|
|
133 |
"execution_count": null, |
|
|
134 |
"metadata": {}, |
|
|
135 |
"outputs": [], |
|
|
136 |
"source": [ |
|
|
137 |
" def _negative_log_likelihood(E, risk): \n", |
|
|
138 |
" hazard_ratio = K.exp(risk)\n", |
|
|
139 |
" log_risk = K.log(K.cumsum(hazard_ratio))\n", |
|
|
140 |
" uncensored_likelihood = risk - log_risk\n", |
|
|
141 |
" censored_likelihood = uncensored_likelihood * E\n", |
|
|
142 |
" neg_likelihood = -K.sum(censored_likelihood)\n", |
|
|
143 |
" return neg_likelihood" |
|
|
144 |
] |
|
|
145 |
}, |
|
|
146 |
{ |
|
|
147 |
"cell_type": "markdown", |
|
|
148 |
"metadata": {}, |
|
|
149 |
"source": [ |
|
|
150 |
"In the function, argument `E` represents censoring status ($\\delta_{i}$) and `risk` ($\\bf{W}'\\phi(\\bf{x}_i)$) represents the network output, i.e. the log hazard ratio. Before being passed to the `_negative_log_likelihood` function below, the data for the `E` and `risk` arguments must be sorted in order of descending survival time. This is because of the use of the `cumsum` (cumulative sum) Keras backend function on *Line 3* of the definition, which sums hazard ratios over the risk sets of each patient (see second term in above equation)." |
|
|
151 |
] |
|
|
152 |
}, |
|
|
153 |
{ |
|
|
154 |
"cell_type": "markdown", |
|
|
155 |
"metadata": {}, |
|
|
156 |
"source": [ |
|
|
157 |
"### 4b. Arrange data into minibatches sorted by descending time\n", |
|
|
158 |
"We use the function below, with arguments `xvals` (input vector of motion descriptors), `evals` (censoring statuses), `tvals` (survival/censoring time [in days]), `batchsize` (desired batch size)" |
|
|
159 |
] |
|
|
160 |
}, |
|
|
161 |
{ |
|
|
162 |
"cell_type": "code", |
|
|
163 |
"execution_count": null, |
|
|
164 |
"metadata": {}, |
|
|
165 |
"outputs": [], |
|
|
166 |
"source": [ |
|
|
167 |
"def sort4minibatches(xvals, evals, tvals, batchsize):\n", |
|
|
168 |
" ntot = len(xvals)\n", |
|
|
169 |
" indices = np.arange(ntot)\n", |
|
|
170 |
" np.random.shuffle(indices)\n", |
|
|
171 |
" start_idx=0\n", |
|
|
172 |
" esall = []\n", |
|
|
173 |
" for end_idx in list(range(batchsize, batchsize*(ntot//batchsize)+1, batchsize))+[ntot]:\n", |
|
|
174 |
" excerpt = indices[start_idx:end_idx]\n", |
|
|
175 |
" sort_idx = np.argsort(tvals[excerpt])[::-1]\n", |
|
|
176 |
" es = excerpt[sort_idx]\n", |
|
|
177 |
" esall += list(es)\n", |
|
|
178 |
" start_idx = end_idx\n", |
|
|
179 |
" return (xvals[esall], evals[esall], tvals[esall], esall)" |
|
|
180 |
] |
|
|
181 |
}, |
|
|
182 |
{ |
|
|
183 |
"cell_type": "markdown", |
|
|
184 |
"metadata": {}, |
|
|
185 |
"source": [ |
|
|
186 |
"### 5. Compile and Run Network\n", |
|
|
187 |
"Still within the `DL_single_run` function definition, we compile the model and run. For compilation, we specify our 2 losses: reconstruction loss (`mean_squared_error`) and our custom Cox prediction loss (`_negative_log_likelihood`). Their corresponding weights are defined based on the `alpha` parameter.\n", |
|
|
188 |
"Note that for running the model (`model.fit`), we set `shuffle=False` to ensure that the predefined batch size order (see `sort4minibatches`) is preserved from epoch to epoch. \n", |
|
|
189 |
"The last line (`return mlog`) of the `DL_single_run` function definition returns the log file produced by `model.fit`. This contains epoch history (losses at each epoch), the fitted model as a `Keras` model object (architecture, weights, etc.), and other information about the fit." |
|
|
190 |
] |
|
|
191 |
}, |
|
|
192 |
{ |
|
|
193 |
"cell_type": "code", |
|
|
194 |
"execution_count": null, |
|
|
195 |
"metadata": {}, |
|
|
196 |
"outputs": [], |
|
|
197 |
"source": [ |
|
|
198 |
" #Model compilation\n", |
|
|
199 |
" optimdef = Adam(lr = lr)\n", |
|
|
200 |
" model.compile(loss=[keras.losses.mean_squared_error, _negative_log_likelihood], loss_weights=[alpha,1-alpha], optimizer=optimdef, metrics={'decoded':keras.metrics.mean_squared_error})\n", |
|
|
201 |
" \n", |
|
|
202 |
" #Run model\n", |
|
|
203 |
" mlog = model.fit(X_tr, [X_tr,E_tr], batch_size=batchsize, epochs=numepochs, shuffle=False, verbose=1)\n", |
|
|
204 |
"\n", |
|
|
205 |
" K.clear_session()\n", |
|
|
206 |
" return mlog" |
|
|
207 |
] |
|
|
208 |
} |
|
|
209 |
], |
|
|
210 |
"metadata": { |
|
|
211 |
"kernelspec": { |
|
|
212 |
"display_name": "Python 3", |
|
|
213 |
"language": "python", |
|
|
214 |
"name": "python3" |
|
|
215 |
}, |
|
|
216 |
"language_info": { |
|
|
217 |
"codemirror_mode": { |
|
|
218 |
"name": "ipython", |
|
|
219 |
"version": 3 |
|
|
220 |
}, |
|
|
221 |
"file_extension": ".py", |
|
|
222 |
"mimetype": "text/x-python", |
|
|
223 |
"name": "python", |
|
|
224 |
"nbconvert_exporter": "python", |
|
|
225 |
"pygments_lexer": "ipython3", |
|
|
226 |
"version": "3.6.6" |
|
|
227 |
} |
|
|
228 |
}, |
|
|
229 |
"nbformat": 4, |
|
|
230 |
"nbformat_minor": 2 |
|
|
231 |
} |