<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Tutorial: Tuning DeepProg — DeepProg documentation</title>
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<!--[if lt IE 9]>
<script src="_static/js/html5shiv.min.js"></script>
<![endif]-->
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<script type="text/javascript" src="_static/js/theme.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="License" href="LICENSE.html" />
<link rel="prev" title="Case study: Analyzing TCGA HCC dataset" href="case_study.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="index.html" class="icon icon-home"> DeepProg
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="installation.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="usage.html">Tutorial: Simple DeepProg model</a></li>
<li class="toctree-l1"><a class="reference internal" href="usage_ensemble.html">Tutorial: Ensemble of DeepProg model</a></li>
<li class="toctree-l1"><a class="reference internal" href="usage_advanced.html">Tutorial: Advanced usage of DeepProg model</a></li>
<li class="toctree-l1"><a class="reference internal" href="case_study.html">Case study: Analyzing TCGA HCC dataset</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Tutorial: Tuning DeepProg</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#a-first-example">A first example</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#simdeeptuning-hyperparameters">SimDeepTuning hyperparameters</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#tuning-using-one-or-multiple-test-datasets">Tuning using one or multiple test datasets</a></li>
<li class="toctree-l2"><a class="reference internal" href="#results">Results</a></li>
<li class="toctree-l2"><a class="reference internal" href="#recommendation">Recommendation</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="LICENSE.html">License</a></li>
<li class="toctree-l1"><a class="reference internal" href="api/simdeep.html">simdeep package</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">DeepProg</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html" class="icon icon-home"></a> »</li>
<li>Tutorial: Tuning DeepProg</li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/usage_tuning.md.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="tutorial-tuning-deepprog">
<h1>Tutorial: Tuning DeepProg<a class="headerlink" href="#tutorial-tuning-deepprog" title="Permalink to this headline">¶</a></h1>
<p>DeepProg can accept various alernative hyperparameters to fit a model, including alterative clustering, normalisation, embedding, choice of autoencoder hyperparameters, use/restrict embedding and survival selection, size of holdout samples, ensemble model merging criterion. Furthermore it can accept external methods to perform clustering / normalisation or embedding. To help ones to find the optimal combinaisons of hyperparameter for a given dataset, we implemented an optional hyperparameter search module based on sequencial-model optimisation search and relying on the <a class="reference external" href="https://docs.ray.io/en/master/tune.html">tune</a> and <a class="reference external" href="https://scikit-optimize.github.io/stable/">scikit-optimize</a> python libraries. The optional hyperparameter tuning will perform a non-random itertive grid search and will select each new set of hyperparameters based on the performance of the past iterations. The computation can be entirely distributed thanks to the ray interace (see above).</p>
<p>A DeepProg instance depends on a lot of hyperparameters. Most important hyperparameters to tune are:</p>
<ul class="simple">
<li><p>The combination of <code class="docutils literal notranslate"><span class="pre">-nb_it</span></code> (Number of sumbmodels), <code class="docutils literal notranslate"><span class="pre">-split_n_fold</span> </code>(How each submodel is randomly constructed) and <code class="docutils literal notranslate"><span class="pre">-seed</span></code> (random seed).</p></li>
<li><p>The number of clusters <code class="docutils literal notranslate"><span class="pre">-nb_clusters</span></code></p></li>
<li><p>The clustering algorithm (implemented: <code class="docutils literal notranslate"><span class="pre">kmeans</span></code>, <code class="docutils literal notranslate"><span class="pre">mixture</span></code>, <code class="docutils literal notranslate"><span class="pre">coxPH</span></code>, <code class="docutils literal notranslate"><span class="pre">coxPHMixture</span></code>)</p></li>
<li><p>The preprocessing normalization (<code class="docutils literal notranslate"><span class="pre">-normalization</span></code> option, see <code class="docutils literal notranslate"><span class="pre">Tutorial:</span> <span class="pre">Advanced</span> <span class="pre">usage</span> <span class="pre">of</span> <span class="pre">DeepProg</span> <span class="pre">model</span></code>)</p></li>
<li><p>The embedding used (<code class="docutils literal notranslate"><span class="pre">alternative_embedding</span></code> option)</p></li>
<li><p>The way of creating the new survival features (<code class="docutils literal notranslate"><span class="pre">-feature_selection_usage</span></code> option)</p></li>
</ul>
<div class="section" id="a-first-example">
<h2>A first example<a class="headerlink" href="#a-first-example" title="Permalink to this headline">¶</a></h2>
<p>A first example of tuning is available in the <a class="reference external" href="../../../examples/example_hyperparameters_tuning.py">example</a> folder (example_hyperparameters_tuning.py). The first part of the script defines the array of hyperparameters to screen. An instance of <code class="docutils literal notranslate"><span class="pre">SimdeepTuning</span></code> is created in which the output folder and the project name are defined.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span>
<span class="kn">from</span> <span class="nn">simdeep.simdeep_tuning</span> <span class="kn">import</span> <span class="n">SimDeepTuning</span>
<span class="c1"># AgglomerativeClustering is an external class that can be used as</span>
<span class="c1"># a clustering algorithm since it has a fit_predict method</span>
<span class="kn">from</span> <span class="nn">sklearn.cluster.hierarchical</span> <span class="kn">import</span> <span class="n">AgglomerativeClustering</span>
<span class="c1"># Array of hyperparameters</span>
<span class="n">args_to_optimize</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'seed'</span><span class="p">:</span> <span class="p">[</span><span class="mi">100</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="mi">300</span><span class="p">,</span> <span class="mi">400</span><span class="p">],</span>
<span class="s1">'nb_clusters'</span><span class="p">:</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">],</span>
<span class="s1">'cluster_method'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'mixture'</span><span class="p">,</span> <span class="s1">'coxPH'</span><span class="p">,</span>
<span class="n">AgglomerativeClustering</span><span class="p">],</span>
<span class="s1">'use_autoencoders'</span><span class="p">:</span> <span class="p">(</span><span class="kc">True</span><span class="p">,</span> <span class="kc">False</span><span class="p">),</span>
<span class="s1">'class_selection'</span><span class="p">:</span> <span class="p">(</span><span class="s1">'mean'</span><span class="p">,</span> <span class="s1">'max'</span><span class="p">),</span>
<span class="p">}</span>
<span class="n">tuning</span> <span class="o">=</span> <span class="n">SimDeepTuning</span><span class="p">(</span>
<span class="n">args_to_optimize</span><span class="o">=</span><span class="n">args_to_optimize</span><span class="p">,</span>
<span class="n">nb_threads</span><span class="o">=</span><span class="n">nb_threads</span><span class="p">,</span>
<span class="n">survival_tsv</span><span class="o">=</span><span class="n">SURVIVAL_TSV</span><span class="p">,</span>
<span class="n">training_tsv</span><span class="o">=</span><span class="n">TRAINING_TSV</span><span class="p">,</span>
<span class="n">path_data</span><span class="o">=</span><span class="n">PATH_DATA</span><span class="p">,</span>
<span class="n">project_name</span><span class="o">=</span><span class="n">PROJECT_NAME</span><span class="p">,</span>
<span class="n">path_results</span><span class="o">=</span><span class="n">PATH_DATA</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
</div>
<p>The SimDeepTuning module requires the use of the <code class="docutils literal notranslate"><span class="pre">ray</span></code> and <code class="docutils literal notranslate"><span class="pre">tune</span></code> python modules.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span>
<span class="n">ray</span><span class="o">.</span><span class="n">init</span><span class="p">()</span>
</pre></div>
</div>
<div class="section" id="simdeeptuning-hyperparameters">
<h3>SimDeepTuning hyperparameters<a class="headerlink" href="#simdeeptuning-hyperparameters" title="Permalink to this headline">¶</a></h3>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">num_samples</span></code> is the number of experiements</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">distribute_deepprog</span></code> is used to further distribute each DeepProg instance into the ray framework. If set to True, be sure to either have a large number of CPUs to use and/or to use a small number of <code class="docutils literal notranslate"><span class="pre">max_concurrent</span></code> (which is the number of concurrent experiments run in parallel). <code class="docutils literal notranslate"><span class="pre">iterations</span></code> is the number of iterations to run for each experiment (results will be averaged).</p></li>
</ul>
<p>DeepProg can be tuned using different objective metrics:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">"log_test_fold_pvalue"</span></code>: uses the stacked <em>out of bags</em> samples (survival and labels) to predict the -log10(log-rank Cox-PH pvalue)</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">"log_full_pvalue"</span></code>: minimizes the Cox-PH log-rank pvalue of the model (This metric can lead to overfitting since it relies on all the samples included in the model)</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">"test_fold_cindex"</span></code>: Maximizes the mean c-index of the test folds.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">"cluster_consistency"</span></code>: Maximizes the adjusted Rand scores computed for all the model pairs. (Higher values imply stable clusters)</p></li>
</ul>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">tuning</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span>
<span class="c1"># We will use the holdout samples Cox-PH pvalue as objective</span>
<span class="n">metric</span><span class="o">=</span><span class="s1">'log_test_fold_pvalue'</span><span class="p">,</span>
<span class="n">num_samples</span><span class="o">=</span><span class="mi">25</span><span class="p">,</span>
<span class="c1"># Experiment run concurently using ray as dispatcher</span>
<span class="n">max_concurrent</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="c1"># In addition, each deeprog model will be distributed</span>
<span class="n">distribute_deepprog</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">iterations</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="c1"># We recommend using large `max_concurrent` and distribute_deepprog=True</span>
<span class="c1"># when a large number CPUs and large RAMs are availables</span>
<span class="c1"># Results</span>
<span class="n">table</span> <span class="o">=</span> <span class="n">tuning</span><span class="o">.</span><span class="n">get_results_table</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="n">table</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="tuning-using-one-or-multiple-test-datasets">
<h2>Tuning using one or multiple test datasets<a class="headerlink" href="#tuning-using-one-or-multiple-test-datasets" title="Permalink to this headline">¶</a></h2>
<p>The computation of labels and the associate metrics from external test datasets can be included in the tuning workflowand be used as objective metrics. Please refers to <a class="reference external" href="../../../examples/example_hyperparameters_tuning_with_dataset.py">example</a> folder (see example_hyperparameters_tuning_with_dataset.py).</p>
<p>Let’s define two dummy test datasets:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span>
<span class="c1"># We will use the methylation and the RNA value as test datasets</span>
<span class="n">test_datasets</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'testdataset1'</span><span class="p">:</span> <span class="p">({</span><span class="s1">'METH'</span><span class="p">:</span> <span class="s1">'meth_dummy.tsv'</span><span class="p">},</span> <span class="s1">'survival_dummy.tsv'</span><span class="p">)</span>
<span class="s1">'testdataset2'</span><span class="p">:</span> <span class="p">({</span><span class="n">RNA</span><span class="p">:</span> <span class="n">rna_dummy</span><span class="o">.</span><span class="n">tsv</span><span class="s1">'}, '</span><span class="n">survival_dummy</span><span class="o">.</span><span class="n">tsv</span><span class="s1">')</span>
<span class="p">}</span>
</pre></div>
</div>
<p>We then include these two datasets when instanciating the <code class="docutils literal notranslate"><span class="pre">SimDeepTuning</span></code> instance:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span> <span class="n">tuning</span> <span class="o">=</span> <span class="n">SimDeepTuning</span><span class="p">(</span>
<span class="n">args_to_optimize</span><span class="o">=</span><span class="n">args_to_optimize</span><span class="p">,</span>
<span class="n">test_datasets</span><span class="o">=</span><span class="n">test_datasets</span><span class="p">,</span>
<span class="n">survival_tsv</span><span class="o">=</span><span class="n">SURVIVAL_TSV</span><span class="p">,</span>
<span class="n">training_tsv</span><span class="o">=</span><span class="n">TRAINING_TSV</span><span class="p">,</span>
<span class="n">path_data</span><span class="o">=</span><span class="n">PATH_DATA</span><span class="p">,</span>
<span class="n">project_name</span><span class="o">=</span><span class="n">PROJECT_NAME</span><span class="p">,</span>
<span class="n">path_results</span><span class="o">=</span><span class="n">PATH_DATA</span><span class="p">,</span>
<span class="p">)</span>
</pre></div>
</div>
<p>and Finally fits the model using a objective metric accounting for the test datasets:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">"log_test_pval"</span></code> maximizes the sum of the -log10(log-rank Cox-PH pvalue) for each test dataset</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">"test_cindex"</span></code> maximizes the mean on the test C-indexes</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">"sum_log_pval"</span></code> maximizes the sum of the model -log10(log-rank Cox-PH pvalue) with all the test datasets p-value</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">"mix_score"</span></code>: maximizes the product of <code class="docutils literal notranslate"><span class="pre">"sum_log_pval"</span></code>, <code class="docutils literal notranslate"><span class="pre">"cluster_consistency"</span></code>, <code class="docutils literal notranslate"><span class="pre">"test_fold_cindex"</span></code></p></li>
</ul>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span> <span class="n">tuning</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span>
<span class="n">metric</span><span class="o">=</span><span class="s1">'log_test_pval'</span><span class="p">,</span>
<span class="n">num_samples</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
<span class="n">distribute_deepprog</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">max_concurrent</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="c1"># iterations is usefull to take into account the DL parameter fitting variations</span>
<span class="n">iterations</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">table</span> <span class="o">=</span> <span class="n">tuning</span><span class="o">.</span><span class="n">get_results_table</span><span class="p">()</span>
<span class="n">tuning</span><span class="o">.</span><span class="n">save_results_table</span><span class="p">()</span>
</pre></div>
</div>
</div>
<div class="section" id="results">
<h2>Results<a class="headerlink" href="#results" title="Permalink to this headline">¶</a></h2>
<p>The results will be generated in the <code class="docutils literal notranslate"><span class="pre">path_results</span></code> folder and one results folder per experiement will be generated. The report of all the experiements and metrics will be written in the result tables generated in the <code class="docutils literal notranslate"><span class="pre">path_results</span></code> folder. Once a model achieve satisfactory performance, it is possible to directly use the model by loading the generated labels with the <code class="docutils literal notranslate"><span class="pre">fit_on_pretrained_label_file</span></code> API (see the section <code class="docutils literal notranslate"><span class="pre">Save</span> <span class="pre">/</span> <span class="pre">load</span> <span class="pre">models</span> <span class="pre">from</span> <span class="pre">precomputed</span> <span class="pre">sample</span> <span class="pre">labels</span></code>)</p>
</div>
<div class="section" id="recommendation">
<h2>Recommendation<a class="headerlink" href="#recommendation" title="Permalink to this headline">¶</a></h2>
<ul class="simple">
<li><p>According to the number of the size N of the hyperparameter array(e.g. the number of combination ), it is recommanded to perform at least more than sqrt(N) experiment but a higher N will always allow to explore a higher hyperparameter space and increase the performance.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">seed</span></code> is definitively a hyperparameter to screen, especially for small number of models <code class="docutils literal notranslate"><span class="pre">nb_its</span></code> (less than 50). It is recommanded to at least screen for 8-10 different seed when using <code class="docutils literal notranslate"><span class="pre">nb_it</span></code> < 20</p></li>
<li><p>Please, test you configuration using a small <code class="docutils literal notranslate"><span class="pre">num_samples</span></code> first</p></li>
</ul>
</div>
</div>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="LICENSE.html" class="btn btn-neutral float-right" title="License" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
<a href="case_study.html" class="btn btn-neutral float-left" title="Case study: Analyzing TCGA HCC dataset" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
© Copyright 2019, Olivier Poirion.
</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>