[53737a]: / docs / _build / html / usage_tuning.html

Download this file

361 lines (240 with data), 24.1 kB

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Tutorial: Tuning DeepProg &mdash; DeepProg documentation</title>
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<!--[if lt IE 9]>
<script src="_static/js/html5shiv.min.js"></script>
<![endif]-->
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<script type="text/javascript" src="_static/js/theme.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="License" href="LICENSE.html" />
<link rel="prev" title="Case study: Analyzing TCGA HCC dataset" href="case_study.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="index.html" class="icon icon-home"> DeepProg
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="installation.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="usage.html">Tutorial: Simple DeepProg model</a></li>
<li class="toctree-l1"><a class="reference internal" href="usage_ensemble.html">Tutorial: Ensemble of DeepProg model</a></li>
<li class="toctree-l1"><a class="reference internal" href="usage_advanced.html">Tutorial: Advanced usage of DeepProg model</a></li>
<li class="toctree-l1"><a class="reference internal" href="case_study.html">Case study: Analyzing TCGA HCC dataset</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Tutorial: Tuning DeepProg</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#a-first-example">A first example</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#simdeeptuning-hyperparameters">SimDeepTuning hyperparameters</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#tuning-using-one-or-multiple-test-datasets">Tuning using one or multiple test datasets</a></li>
<li class="toctree-l2"><a class="reference internal" href="#results">Results</a></li>
<li class="toctree-l2"><a class="reference internal" href="#recommendation">Recommendation</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="LICENSE.html">License</a></li>
<li class="toctree-l1"><a class="reference internal" href="api/simdeep.html">simdeep package</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">DeepProg</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html" class="icon icon-home"></a> &raquo;</li>
<li>Tutorial: Tuning DeepProg</li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/usage_tuning.md.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="tutorial-tuning-deepprog">
<h1>Tutorial: Tuning DeepProg<a class="headerlink" href="#tutorial-tuning-deepprog" title="Permalink to this headline"></a></h1>
<p>DeepProg can accept various alernative hyperparameters to fit a model, including alterative clustering, normalisation, embedding, choice of autoencoder hyperparameters, use/restrict embedding and survival selection, size of holdout samples, ensemble model merging criterion. Furthermore it can accept external methods to perform clustering / normalisation or embedding. To help ones to find the optimal combinaisons of hyperparameter for a given dataset, we implemented an optional hyperparameter search module based on sequencial-model optimisation search and relying on the <a class="reference external" href="https://docs.ray.io/en/master/tune.html">tune</a> and <a class="reference external" href="https://scikit-optimize.github.io/stable/">scikit-optimize</a> python libraries. The optional hyperparameter tuning will perform a non-random itertive grid search and will select each new set of hyperparameters based on the performance of the past iterations. The computation can be entirely distributed thanks to the ray interace (see above).</p>
<p>A DeepProg instance depends on a lot of hyperparameters. Most important hyperparameters to tune are:</p>
<ul class="simple">
<li><p>The combination of <code class="docutils literal notranslate"><span class="pre">-nb_it</span></code> (Number of sumbmodels), <code class="docutils literal notranslate"><span class="pre">-split_n_fold</span> </code>(How each submodel is randomly constructed) and <code class="docutils literal notranslate"><span class="pre">-seed</span></code> (random seed).</p></li>
<li><p>The number of clusters <code class="docutils literal notranslate"><span class="pre">-nb_clusters</span></code></p></li>
<li><p>The clustering algorithm (implemented: <code class="docutils literal notranslate"><span class="pre">kmeans</span></code>, <code class="docutils literal notranslate"><span class="pre">mixture</span></code>, <code class="docutils literal notranslate"><span class="pre">coxPH</span></code>, <code class="docutils literal notranslate"><span class="pre">coxPHMixture</span></code>)</p></li>
<li><p>The preprocessing normalization (<code class="docutils literal notranslate"><span class="pre">-normalization</span></code> option, see <code class="docutils literal notranslate"><span class="pre">Tutorial:</span> <span class="pre">Advanced</span> <span class="pre">usage</span> <span class="pre">of</span> <span class="pre">DeepProg</span> <span class="pre">model</span></code>)</p></li>
<li><p>The embedding used (<code class="docutils literal notranslate"><span class="pre">alternative_embedding</span></code> option)</p></li>
<li><p>The way of creating the new survival features (<code class="docutils literal notranslate"><span class="pre">-feature_selection_usage</span></code> option)</p></li>
</ul>
<div class="section" id="a-first-example">
<h2>A first example<a class="headerlink" href="#a-first-example" title="Permalink to this headline"></a></h2>
<p>A first example of tuning is available in the <a class="reference external" href="../../../examples/example_hyperparameters_tuning.py">example</a> folder (example_hyperparameters_tuning.py). The first part of the script defines the array of hyperparameters to screen. An instance of <code class="docutils literal notranslate"><span class="pre">SimdeepTuning</span></code> is created in which the output folder and the project name are defined.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span>
<span class="kn">from</span> <span class="nn">simdeep.simdeep_tuning</span> <span class="kn">import</span> <span class="n">SimDeepTuning</span>
<span class="c1"># AgglomerativeClustering is an external class that can be used as</span>
<span class="c1"># a clustering algorithm since it has a fit_predict method</span>
<span class="kn">from</span> <span class="nn">sklearn.cluster.hierarchical</span> <span class="kn">import</span> <span class="n">AgglomerativeClustering</span>
<span class="c1"># Array of hyperparameters</span>
<span class="n">args_to_optimize</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">&#39;seed&#39;</span><span class="p">:</span> <span class="p">[</span><span class="mi">100</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="mi">300</span><span class="p">,</span> <span class="mi">400</span><span class="p">],</span>
<span class="s1">&#39;nb_clusters&#39;</span><span class="p">:</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">],</span>
<span class="s1">&#39;cluster_method&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;mixture&#39;</span><span class="p">,</span> <span class="s1">&#39;coxPH&#39;</span><span class="p">,</span>
<span class="n">AgglomerativeClustering</span><span class="p">],</span>
<span class="s1">&#39;use_autoencoders&#39;</span><span class="p">:</span> <span class="p">(</span><span class="kc">True</span><span class="p">,</span> <span class="kc">False</span><span class="p">),</span>
<span class="s1">&#39;class_selection&#39;</span><span class="p">:</span> <span class="p">(</span><span class="s1">&#39;mean&#39;</span><span class="p">,</span> <span class="s1">&#39;max&#39;</span><span class="p">),</span>
<span class="p">}</span>
<span class="n">tuning</span> <span class="o">=</span> <span class="n">SimDeepTuning</span><span class="p">(</span>
<span class="n">args_to_optimize</span><span class="o">=</span><span class="n">args_to_optimize</span><span class="p">,</span>
<span class="n">nb_threads</span><span class="o">=</span><span class="n">nb_threads</span><span class="p">,</span>
<span class="n">survival_tsv</span><span class="o">=</span><span class="n">SURVIVAL_TSV</span><span class="p">,</span>
<span class="n">training_tsv</span><span class="o">=</span><span class="n">TRAINING_TSV</span><span class="p">,</span>
<span class="n">path_data</span><span class="o">=</span><span class="n">PATH_DATA</span><span class="p">,</span>
<span class="n">project_name</span><span class="o">=</span><span class="n">PROJECT_NAME</span><span class="p">,</span>
<span class="n">path_results</span><span class="o">=</span><span class="n">PATH_DATA</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
</div>
<p>The SimDeepTuning module requires the use of the <code class="docutils literal notranslate"><span class="pre">ray</span></code> and <code class="docutils literal notranslate"><span class="pre">tune</span></code> python modules.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span>
<span class="n">ray</span><span class="o">.</span><span class="n">init</span><span class="p">()</span>
</pre></div>
</div>
<div class="section" id="simdeeptuning-hyperparameters">
<h3>SimDeepTuning hyperparameters<a class="headerlink" href="#simdeeptuning-hyperparameters" title="Permalink to this headline"></a></h3>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">num_samples</span></code> is the number of experiements</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">distribute_deepprog</span></code> is used to further distribute each DeepProg instance into the ray framework. If set to True, be sure to either have a large number of CPUs to use and/or to use a small number of <code class="docutils literal notranslate"><span class="pre">max_concurrent</span></code> (which is the number of concurrent experiments run in parallel). <code class="docutils literal notranslate"><span class="pre">iterations</span></code> is the number of iterations to run for each experiment (results will be averaged).</p></li>
</ul>
<p>DeepProg can be tuned using different objective metrics:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">&quot;log_test_fold_pvalue&quot;</span></code>: uses the stacked <em>out of bags</em> samples (survival and labels) to predict the -log10(log-rank Cox-PH pvalue)</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">&quot;log_full_pvalue&quot;</span></code>: minimizes the Cox-PH log-rank pvalue of the model (This metric can lead to overfitting since it relies on all the samples included in the model)</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">&quot;test_fold_cindex&quot;</span></code>: Maximizes the mean c-index of the test folds.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">&quot;cluster_consistency&quot;</span></code>: Maximizes the adjusted Rand scores computed for all the model pairs. (Higher values imply stable clusters)</p></li>
</ul>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">tuning</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span>
<span class="c1"># We will use the holdout samples Cox-PH pvalue as objective</span>
<span class="n">metric</span><span class="o">=</span><span class="s1">&#39;log_test_fold_pvalue&#39;</span><span class="p">,</span>
<span class="n">num_samples</span><span class="o">=</span><span class="mi">25</span><span class="p">,</span>
<span class="c1"># Experiment run concurently using ray as dispatcher</span>
<span class="n">max_concurrent</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="c1"># In addition, each deeprog model will be distributed</span>
<span class="n">distribute_deepprog</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">iterations</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="c1"># We recommend using large `max_concurrent` and distribute_deepprog=True</span>
<span class="c1"># when a large number CPUs and large RAMs are availables</span>
<span class="c1"># Results</span>
<span class="n">table</span> <span class="o">=</span> <span class="n">tuning</span><span class="o">.</span><span class="n">get_results_table</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="n">table</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="tuning-using-one-or-multiple-test-datasets">
<h2>Tuning using one or multiple test datasets<a class="headerlink" href="#tuning-using-one-or-multiple-test-datasets" title="Permalink to this headline"></a></h2>
<p>The computation of labels and the associate metrics from external test datasets can be included in the tuning workflowand be used as objective metrics. Please refers to <a class="reference external" href="../../../examples/example_hyperparameters_tuning_with_dataset.py">example</a> folder (see example_hyperparameters_tuning_with_dataset.py).</p>
<p>Let’s define two dummy test datasets:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span>
<span class="c1"># We will use the methylation and the RNA value as test datasets</span>
<span class="n">test_datasets</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">&#39;testdataset1&#39;</span><span class="p">:</span> <span class="p">({</span><span class="s1">&#39;METH&#39;</span><span class="p">:</span> <span class="s1">&#39;meth_dummy.tsv&#39;</span><span class="p">},</span> <span class="s1">&#39;survival_dummy.tsv&#39;</span><span class="p">)</span>
<span class="s1">&#39;testdataset2&#39;</span><span class="p">:</span> <span class="p">({</span><span class="n">RNA</span><span class="p">:</span> <span class="n">rna_dummy</span><span class="o">.</span><span class="n">tsv</span><span class="s1">&#39;}, &#39;</span><span class="n">survival_dummy</span><span class="o">.</span><span class="n">tsv</span><span class="s1">&#39;)</span>
<span class="p">}</span>
</pre></div>
</div>
<p>We then include these two datasets when instanciating the <code class="docutils literal notranslate"><span class="pre">SimDeepTuning</span></code> instance:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span> <span class="n">tuning</span> <span class="o">=</span> <span class="n">SimDeepTuning</span><span class="p">(</span>
<span class="n">args_to_optimize</span><span class="o">=</span><span class="n">args_to_optimize</span><span class="p">,</span>
<span class="n">test_datasets</span><span class="o">=</span><span class="n">test_datasets</span><span class="p">,</span>
<span class="n">survival_tsv</span><span class="o">=</span><span class="n">SURVIVAL_TSV</span><span class="p">,</span>
<span class="n">training_tsv</span><span class="o">=</span><span class="n">TRAINING_TSV</span><span class="p">,</span>
<span class="n">path_data</span><span class="o">=</span><span class="n">PATH_DATA</span><span class="p">,</span>
<span class="n">project_name</span><span class="o">=</span><span class="n">PROJECT_NAME</span><span class="p">,</span>
<span class="n">path_results</span><span class="o">=</span><span class="n">PATH_DATA</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
</div>
<p>and Finally fits the model using a objective metric accounting for the test datasets:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">&quot;log_test_pval&quot;</span></code> maximizes the sum of the -log10(log-rank Cox-PH pvalue) for each test dataset</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">&quot;test_cindex&quot;</span></code> maximizes the mean on the test C-indexes</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">&quot;sum_log_pval&quot;</span></code> maximizes the sum of the model -log10(log-rank Cox-PH pvalue) with all the test datasets p-value</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">&quot;mix_score&quot;</span></code>: maximizes the product of <code class="docutils literal notranslate"><span class="pre">&quot;sum_log_pval&quot;</span></code>, <code class="docutils literal notranslate"><span class="pre">&quot;cluster_consistency&quot;</span></code>, <code class="docutils literal notranslate"><span class="pre">&quot;test_fold_cindex&quot;</span></code></p></li>
</ul>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span> <span class="n">tuning</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span>
<span class="n">metric</span><span class="o">=</span><span class="s1">&#39;log_test_pval&#39;</span><span class="p">,</span>
<span class="n">num_samples</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
<span class="n">distribute_deepprog</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">max_concurrent</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="c1"># iterations is usefull to take into account the DL parameter fitting variations</span>
<span class="n">iterations</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">table</span> <span class="o">=</span> <span class="n">tuning</span><span class="o">.</span><span class="n">get_results_table</span><span class="p">()</span>
<span class="n">tuning</span><span class="o">.</span><span class="n">save_results_table</span><span class="p">()</span>
</pre></div>
</div>
</div>
<div class="section" id="results">
<h2>Results<a class="headerlink" href="#results" title="Permalink to this headline"></a></h2>
<p>The results will be generated in the <code class="docutils literal notranslate"><span class="pre">path_results</span></code> folder and one results folder per experiement will be generated. The report of all the experiements and metrics will be written in the result tables generated in the <code class="docutils literal notranslate"><span class="pre">path_results</span></code> folder. Once a model achieve satisfactory performance, it is possible to directly use the model by loading the generated labels with the <code class="docutils literal notranslate"><span class="pre">fit_on_pretrained_label_file</span></code> API (see the section <code class="docutils literal notranslate"><span class="pre">Save</span> <span class="pre">/</span> <span class="pre">load</span> <span class="pre">models</span> <span class="pre">from</span> <span class="pre">precomputed</span> <span class="pre">sample</span> <span class="pre">labels</span></code>)</p>
</div>
<div class="section" id="recommendation">
<h2>Recommendation<a class="headerlink" href="#recommendation" title="Permalink to this headline"></a></h2>
<ul class="simple">
<li><p>According to the number of the size N of the hyperparameter array(e.g. the number of combination ), it is recommanded to perform at least more than sqrt(N) experiment but a higher N will always allow to explore a higher hyperparameter space and increase the performance.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">seed</span></code> is definitively a hyperparameter to screen, especially for small number of models <code class="docutils literal notranslate"><span class="pre">nb_its</span></code> (less than 50). It is recommanded to at least screen for 8-10 different seed when using <code class="docutils literal notranslate"><span class="pre">nb_it</span></code> &lt; 20</p></li>
<li><p>Please, test you configuration using a small <code class="docutils literal notranslate"><span class="pre">num_samples</span></code> first</p></li>
</ul>
</div>
</div>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="LICENSE.html" class="btn btn-neutral float-right" title="License" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
<a href="case_study.html" class="btn btn-neutral float-left" title="Case study: Analyzing TCGA HCC dataset" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
&#169; Copyright 2019, Olivier Poirion.
</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>