--- a +++ b/docs/_build/html/usage.html @@ -0,0 +1,408 @@ + + +<!DOCTYPE html> +<html class="writer-html5" lang="en" > +<head> + <meta charset="utf-8" /> + + <meta name="viewport" content="width=device-width, initial-scale=1.0" /> + + <title>Tutorial: Simple DeepProg model — DeepProg documentation</title> + + + + <link rel="stylesheet" href="_static/css/theme.css" type="text/css" /> + <link rel="stylesheet" href="_static/pygments.css" type="text/css" /> + + + + + + + + + + + <!--[if lt IE 9]> + <script src="_static/js/html5shiv.min.js"></script> + <![endif]--> + + + <script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script> + <script src="_static/jquery.js"></script> + <script src="_static/underscore.js"></script> + <script src="_static/doctools.js"></script> + <script src="_static/language_data.js"></script> + + <script type="text/javascript" src="_static/js/theme.js"></script> + + + <link rel="index" title="Index" href="genindex.html" /> + <link rel="search" title="Search" href="search.html" /> + <link rel="next" title="Tutorial: Ensemble of DeepProg model" href="usage_ensemble.html" /> + <link rel="prev" title="Installation" href="installation.html" /> +</head> + +<body class="wy-body-for-nav"> + + + <div class="wy-grid-for-nav"> + + <nav data-toggle="wy-nav-shift" class="wy-nav-side"> + <div class="wy-side-scroll"> + <div class="wy-side-nav-search" > + + + + <a href="index.html" class="icon icon-home"> DeepProg + + + + </a> + + + + + + + +<div role="search"> + <form id="rtd-search-form" class="wy-form" action="search.html" method="get"> + <input type="text" name="q" placeholder="Search docs" /> + <input type="hidden" name="check_keywords" value="yes" /> + <input type="hidden" name="area" value="default" /> + </form> +</div> + + + </div> + + + <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation"> + + + + + + + <ul class="current"> +<li class="toctree-l1"><a class="reference internal" href="installation.html">Installation</a></li> +<li class="toctree-l1 current"><a class="current reference internal" href="#">Tutorial: Simple DeepProg model</a><ul> +<li class="toctree-l2"><a class="reference internal" href="#input-parameters">Input parameters</a></li> +<li class="toctree-l2"><a class="reference internal" href="#input-matrices">Input matrices</a></li> +<li class="toctree-l2"><a class="reference internal" href="#creating-a-simple-deepprog-model-with-one-autoencoder-for-each-omic">Creating a simple DeepProg model with one autoencoder for each omic</a></li> +</ul> +</li> +<li class="toctree-l1"><a class="reference internal" href="usage_ensemble.html">Tutorial: Ensemble of DeepProg model</a></li> +<li class="toctree-l1"><a class="reference internal" href="usage_advanced.html">Tutorial: Advanced usage of DeepProg model</a></li> +<li class="toctree-l1"><a class="reference internal" href="case_study.html">Case study: Analyzing TCGA HCC dataset</a></li> +<li class="toctree-l1"><a class="reference internal" href="LICENSE.html">License</a></li> +<li class="toctree-l1"><a class="reference internal" href="api/simdeep.html">simdeep package</a></li> +</ul> + + + + </div> + + </div> + </nav> + + <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"> + + + <nav class="wy-nav-top" aria-label="top navigation"> + + <i data-toggle="wy-nav-top" class="fa fa-bars"></i> + <a href="index.html">DeepProg</a> + + </nav> + + + <div class="wy-nav-content"> + + <div class="rst-content"> + + + + + + + + + + + + + + + + + + + +<div role="navigation" aria-label="breadcrumbs navigation"> + + <ul class="wy-breadcrumbs"> + + <li><a href="index.html" class="icon icon-home"></a> »</li> + + <li>Tutorial: Simple DeepProg model</li> + + + <li class="wy-breadcrumbs-aside"> + + + <a href="_sources/usage.md.txt" rel="nofollow"> View page source</a> + + + </li> + + </ul> + + + <hr/> +</div> + <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article"> + <div itemprop="articleBody"> + + <div class="section" id="tutorial-simple-deepprog-model"> +<h1>Tutorial: Simple DeepProg model<a class="headerlink" href="#tutorial-simple-deepprog-model" title="Permalink to this headline">¶</a></h1> +<p>The principle of DeepProg can be summarized as follow:</p> +<ul class="simple"> +<li><p>Loading of multiple samples x OMIC matrices</p></li> +<li><p>Preprocessing ,normalisation, and sub-sampling of the input matrices</p></li> +<li><p>Matrix transformation using autoencoder</p></li> +<li><p>Detection of survival features</p></li> +<li><p>Survival feature agglomeration and clustering</p></li> +<li><p>Creation of supervised models to predict the output of new samples</p></li> +</ul> +<div class="section" id="input-parameters"> +<h2>Input parameters<a class="headerlink" href="#input-parameters" title="Permalink to this headline">¶</a></h2> +<p>All the default parameters are defined in the config file: <code class="docutils literal notranslate"><span class="pre">./simdeep/config.py</span></code> but can be passed dynamically. Three types of parameters must be defined:</p> +<ul class="simple"> +<li><p>The training dataset (omics + survival input files)</p> +<ul> +<li><p>In addition, the parameters of the test set, i.e. the omic dataset and the survival file</p></li> +</ul> +</li> +<li><p>The parameters of the autoencoder (the default parameters works but it might be fine-tuned.</p></li> +<li><p>The parameters of the classification procedures (default are still good)</p></li> +</ul> +</div> +<div class="section" id="input-matrices"> +<h2>Input matrices<a class="headerlink" href="#input-matrices" title="Permalink to this headline">¶</a></h2> +<p>As examples, we included two datasets:</p> +<ul class="simple"> +<li><p>A dummy example dataset in the <code class="docutils literal notranslate"><span class="pre">example/data/</span></code> folder:</p></li> +</ul> +<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>examples +├── data +│ ├── meth_dummy.tsv +│ ├── mir_dummy.tsv +│ ├── rna_dummy.tsv +│ ├── rna_test_dummy.tsv +│ ├── survival_dummy.tsv +│ └── survival_test_dummy.tsv +</pre></div> +</div> +<ul class="simple"> +<li><p>And a real dataset in the <code class="docutils literal notranslate"><span class="pre">data</span></code> folder. This dataset derives from the TCGA HCC cancer dataset. This dataset needs to be decompressed before processing:</p></li> +</ul> +<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>data +├── meth.tsv.gz +├── mir.tsv.gz +├── rna.tsv.gz +└── survival.tsv +</pre></div> +</div> +<p>An input matrix file should follow this format:</p> +<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>head mir_dummy.tsv + +Samples dummy_mir_0 dummy_mir_1 dummy_mir_2 dummy_mir_3 ... +sample_test_0 <span class="m">0</span>.469656032287 <span class="m">0</span>.347987447237 <span class="m">0</span>.706633335508 <span class="m">0</span>.440068758445 ... +sample_test_1 <span class="m">0</span>.0453108219657 <span class="m">0</span>.0234642968791 <span class="m">0</span>.593393816691 <span class="m">0</span>.981872970341 ... +sample_test_2 <span class="m">0</span>.908784043793 <span class="m">0</span>.854397550009 <span class="m">0</span>.575879144667 <span class="m">0</span>.553333958713 ... +... +</pre></div> +</div> +<p>Also, if multiple matrices are used as input, they must keep the sample order. For example:</p> +<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>head rna_dummy.tsv + +Samples dummy_gene_0 dummy_gene_1 dummy_gene_2 dummy_gene_3 ... +sample_test_0 <span class="m">0</span>.69656032287 <span class="m">0</span>.47987447237 <span class="m">0</span>.06633335508 <span class="m">0</span>.40068758445 ... +sample_test_1 <span class="m">0</span>.53108219657 <span class="m">0</span>.234642968791 <span class="m">0</span>.93393816691 <span class="m">0</span>.81872970341 ... +sample_test_2 <span class="m">0</span>.8784043793 <span class="m">0</span>.54397550009 <span class="m">0</span>.75879144667 <span class="m">0</span>.53333958713 ... +... +</pre></div> +</div> +<p>The arguments <code class="docutils literal notranslate"><span class="pre">training_tsv</span></code> and <code class="docutils literal notranslate"><span class="pre">path_data</span></code> from the <code class="docutils literal notranslate"><span class="pre">extract_data</span></code> module are used to defined the input matrices.</p> +<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># The keys/values of this dict represent the name of the omic and the corresponding input matrix</span> +<span class="n">training_tsv</span> <span class="o">=</span> <span class="p">{</span> + <span class="s1">'GE'</span><span class="p">:</span> <span class="s1">'rna_dummy.tsv'</span><span class="p">,</span> + <span class="s1">'MIR'</span><span class="p">:</span> <span class="s1">'mir_dummy.tsv'</span><span class="p">,</span> + <span class="s1">'METH'</span><span class="p">:</span> <span class="s1">'meth_dummy.tsv'</span><span class="p">,</span> +<span class="p">}</span> +</pre></div> +</div> +<p>a survival file must have this format:</p> +<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>head survival_dummy.tsv + +barcode days recurrence +sample_test_0 <span class="m">134</span> <span class="m">1</span> +sample_test_1 <span class="m">291</span> <span class="m">0</span> +sample_test_2 <span class="m">125</span> <span class="m">1</span> +sample_test_3 <span class="m">43</span> <span class="m">0</span> +... +</pre></div> +</div> +<p>In addition, the fields corresponding to the patient IDs, the survival time, and the event should be defined using the <code class="docutils literal notranslate"><span class="pre">survival_flag</span></code> argument:</p> +<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1">#Default value</span> +<span class="n">survival_flag</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'patient_id'</span><span class="p">:</span> <span class="s1">'barcode'</span><span class="p">,</span> + <span class="s1">'survival'</span><span class="p">:</span> <span class="s1">'days'</span><span class="p">,</span> + <span class="s1">'event'</span><span class="p">:</span> <span class="s1">'recurrence'</span><span class="p">}</span> +</pre></div> +</div> +</div> +<div class="section" id="creating-a-simple-deepprog-model-with-one-autoencoder-for-each-omic"> +<h2>Creating a simple DeepProg model with one autoencoder for each omic<a class="headerlink" href="#creating-a-simple-deepprog-model-with-one-autoencoder-for-each-omic" title="Permalink to this headline">¶</a></h2> +<p>First, we will build a model using the example dataset from <code class="docutils literal notranslate"><span class="pre">./examples/data/</span></code> (These example files are set as default in the config.py file). We will use them to show how to construct a single DeepProg model inferring a autoencoder for each omic</p> +<div class="highlight-python notranslate"><div class="highlight"><pre><span></span> +<span class="c1"># SimDeep class can be used to build one model with one autoencoder for each omic</span> +<span class="kn">from</span> <span class="nn">simdeep.simdeep_analysis</span> <span class="kn">import</span> <span class="n">SimDeep</span> +<span class="kn">from</span> <span class="nn">simdeep.extract_data</span> <span class="kn">import</span> <span class="n">LoadData</span> + +<span class="n">help</span><span class="p">(</span><span class="n">SimDeep</span><span class="p">)</span> <span class="c1"># to see all the functions</span> +<span class="n">help</span><span class="p">(</span><span class="n">LoadData</span><span class="p">)</span> <span class="c1"># to see all the functions related to loading datasets</span> + +<span class="c1"># Defining training datasets</span> +<span class="kn">from</span> <span class="nn">simdeep.config</span> <span class="kn">import</span> <span class="n">TRAINING_TSV</span> +<span class="kn">from</span> <span class="nn">simdeep.config</span> <span class="kn">import</span> <span class="n">SURVIVAL_TSV</span> +<span class="c1"># Location of the input matrices and survival file</span> +<span class="kn">from</span> <span class="nn">simdeep.config</span> <span class="kn">import</span> <span class="n">PATH_DATA</span> + +<span class="n">dataset</span> <span class="o">=</span> <span class="n">LoadData</span><span class="p">(</span><span class="n">training_tsv</span><span class="o">=</span><span class="n">TRAINING_TSV</span><span class="p">,</span> + <span class="n">survival_tsv</span><span class="o">=</span><span class="n">SURVIVAL_TSV</span><span class="p">,</span> + <span class="n">path_data</span><span class="o">=</span><span class="n">PATH_DATA</span><span class="p">)</span> + +<span class="c1"># Defining the result path in which will be created an output folder</span> +<span class="n">PATH_RESULTS</span> <span class="o">=</span> <span class="s2">"./TEST_DUMMY/"</span> + +<span class="c1"># instantiate the model with the dummy example training dataset defined in the config file</span> +<span class="n">simDeep</span> <span class="o">=</span> <span class="n">SimDeep</span><span class="p">(</span> + <span class="n">dataset</span><span class="o">=</span><span class="n">dataset</span><span class="p">,</span> + <span class="n">path_results</span><span class="o">=</span><span class="n">PATH_RESULTS</span><span class="p">,</span> + <span class="n">path_to_save_modelPATH_RESULTS</span><span class="p">,</span> <span class="c1"># This result path can be used to save the autoencoder</span> + <span class="p">)</span> + +<span class="n">simDeep</span><span class="o">.</span><span class="n">load_training_dataset</span><span class="p">()</span> <span class="c1"># load the training dataset</span> +<span class="n">simDeep</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="c1"># fit the model</span> +</pre></div> +</div> +<p>At that point, the model is fitted and some output files are available in the output folder:</p> +<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>TEST_DUMMY +├── test_dummy_dataset_KM_plot_training_dataset.png +└── test_dummy_dataset_training_set_labels.tsv +</pre></div> +</div> +<p>The tsv file contains the label and the label probability for each sample:</p> +<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sample_test_0 <span class="m">1</span> <span class="m">7</span>.22678272919e-12 +sample_test_1 <span class="m">1</span> <span class="m">4</span>.48594196888e-09 +sample_test_4 <span class="m">1</span> <span class="m">1</span>.53363205571e-06 +sample_test_5 <span class="m">1</span> <span class="m">6</span>.72170409655e-08 +sample_test_6 <span class="m">0</span> <span class="m">0</span>.9996581662 +sample_test_7 <span class="m">1</span> <span class="m">3</span>.38139255666e-08 +</pre></div> +</div> +<p>And we also have the visualisation of a Kaplan-Meier Curve:</p> +<p><img alt="KM plot" src="_images/test_dummy_dataset_KM_plot_training_dataset.png" /></p> +<p>Now we are ready to use a test dataset and to infer the class label for the test samples. +The test dataset do not need to have the same input omic matrices than the training dataset and not even the sample features for a given omic. However, it needs to have at least some features in common.</p> +<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># Defining test datasets</span> +<span class="kn">from</span> <span class="nn">simdeep.config</span> <span class="kn">import</span> <span class="n">TEST_TSV</span> +<span class="kn">from</span> <span class="nn">simdeep.config</span> <span class="kn">import</span> <span class="n">SURVIVAL_TSV_TEST</span> + +<span class="n">simDeep</span><span class="o">.</span><span class="n">load_new_test_dataset</span><span class="p">(</span> + <span class="n">TEST_TSV</span><span class="p">,</span> + <span class="n">fname_key</span><span class="o">=</span><span class="s1">'dummy'</span> + <span class="n">SURVIVAL_TSV_TEST</span><span class="p">,</span> <span class="c1"># [OPTIONAL] test survival file useful to compute accuracy of test dataset</span> + + <span class="p">)</span> + +<span class="c1"># The test set is a dummy rna expression (generated randomly)</span> +<span class="nb">print</span><span class="p">(</span><span class="n">simDeep</span><span class="o">.</span><span class="n">dataset</span><span class="o">.</span><span class="n">test_tsv</span><span class="p">)</span> <span class="c1"># Defined in the config file</span> +<span class="c1"># The data type of the test set is also defined to match an existing type</span> +<span class="nb">print</span><span class="p">(</span><span class="n">simDeep</span><span class="o">.</span><span class="n">dataset</span><span class="o">.</span><span class="n">data_type</span><span class="p">)</span> <span class="c1"># Defined in the config file</span> +<span class="n">simDeep</span><span class="o">.</span><span class="n">predict_labels_on_test_dataset</span><span class="p">()</span> <span class="c1"># Perform the classification analysis and label the set dataset</span> + +<span class="nb">print</span><span class="p">(</span><span class="n">simDeep</span><span class="o">.</span><span class="n">test_labels</span><span class="p">)</span> +<span class="nb">print</span><span class="p">(</span><span class="n">simDeep</span><span class="o">.</span><span class="n">test_labels_proba</span><span class="p">)</span> +</pre></div> +</div> +<p>The assigned class and class probabilities for the test samples are now available in the output folder:</p> +<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>TEST_DUMMY +├── test_dummy_dataset_dummy_KM_plot_test.png +├── test_dummy_dataset_dummy_test_labels.tsv +├── test_dummy_dataset_KM_plot_training_dataset.png +└── test_dummy_dataset_training_set_labels.tsv + +head test_dummy_dataset_training_set_labels.tsv +</pre></div> +</div> +<p>And a KM plot is also constructed using the test labels</p> +<p><img alt="KM plot test" src="_images/test_dummy_dataset_dummy_KM_plot_test.png" /></p> +<p>Finally, it is possible to save the keras model:</p> +<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">simDeep</span><span class="o">.</span><span class="n">save_encoders</span><span class="p">(</span><span class="s1">'dummy_encoder.h5'</span><span class="p">)</span> +</pre></div> +</div> +</div> +</div> + + + </div> + + </div> + <footer> + <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation"> + <a href="usage_ensemble.html" class="btn btn-neutral float-right" title="Tutorial: Ensemble of DeepProg model" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a> + <a href="installation.html" class="btn btn-neutral float-left" title="Installation" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a> + </div> + + <hr/> + + <div role="contentinfo"> + <p> + © Copyright 2019, Olivier Poirion. + + </p> + </div> + + + + Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a + + <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a> + + provided by <a href="https://readthedocs.org">Read the Docs</a>. + +</footer> + </div> + </div> + + </section> + + </div> + + + <script type="text/javascript"> + jQuery(function () { + SphinxRtdTheme.Navigation.enable(true); + }); + </script> + + + + + + +</body> +</html> \ No newline at end of file