|
a |
|
b/docs/_build/html/README.html |
|
|
1 |
|
|
|
2 |
|
|
|
3 |
<!DOCTYPE html> |
|
|
4 |
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]--> |
|
|
5 |
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]--> |
|
|
6 |
<head> |
|
|
7 |
<meta charset="utf-8"> |
|
|
8 |
|
|
|
9 |
<meta name="viewport" content="width=device-width, initial-scale=1.0"> |
|
|
10 |
|
|
|
11 |
<title>Survival Integration of Multi-omics using Deep-Learning (DeepProg) — DeepProg documentation</title> |
|
|
12 |
|
|
|
13 |
|
|
|
14 |
|
|
|
15 |
|
|
|
16 |
|
|
|
17 |
|
|
|
18 |
|
|
|
19 |
|
|
|
20 |
<script type="text/javascript" src="_static/js/modernizr.min.js"></script> |
|
|
21 |
|
|
|
22 |
|
|
|
23 |
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script> |
|
|
24 |
<script type="text/javascript" src="_static/jquery.js"></script> |
|
|
25 |
<script type="text/javascript" src="_static/underscore.js"></script> |
|
|
26 |
<script type="text/javascript" src="_static/doctools.js"></script> |
|
|
27 |
<script type="text/javascript" src="_static/language_data.js"></script> |
|
|
28 |
|
|
|
29 |
<script type="text/javascript" src="_static/js/theme.js"></script> |
|
|
30 |
|
|
|
31 |
|
|
|
32 |
|
|
|
33 |
|
|
|
34 |
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" /> |
|
|
35 |
<link rel="stylesheet" href="_static/pygments.css" type="text/css" /> |
|
|
36 |
<link rel="index" title="Index" href="genindex.html" /> |
|
|
37 |
<link rel="search" title="Search" href="search.html" /> |
|
|
38 |
</head> |
|
|
39 |
|
|
|
40 |
<body class="wy-body-for-nav"> |
|
|
41 |
|
|
|
42 |
|
|
|
43 |
<div class="wy-grid-for-nav"> |
|
|
44 |
|
|
|
45 |
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> |
|
|
46 |
<div class="wy-side-scroll"> |
|
|
47 |
<div class="wy-side-nav-search" > |
|
|
48 |
|
|
|
49 |
|
|
|
50 |
|
|
|
51 |
<a href="index.html" class="icon icon-home"> DeepProg |
|
|
52 |
|
|
|
53 |
|
|
|
54 |
|
|
|
55 |
</a> |
|
|
56 |
|
|
|
57 |
|
|
|
58 |
|
|
|
59 |
|
|
|
60 |
|
|
|
61 |
|
|
|
62 |
|
|
|
63 |
<div role="search"> |
|
|
64 |
<form id="rtd-search-form" class="wy-form" action="search.html" method="get"> |
|
|
65 |
<input type="text" name="q" placeholder="Search docs" /> |
|
|
66 |
<input type="hidden" name="check_keywords" value="yes" /> |
|
|
67 |
<input type="hidden" name="area" value="default" /> |
|
|
68 |
</form> |
|
|
69 |
</div> |
|
|
70 |
|
|
|
71 |
|
|
|
72 |
</div> |
|
|
73 |
|
|
|
74 |
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation"> |
|
|
75 |
|
|
|
76 |
|
|
|
77 |
|
|
|
78 |
|
|
|
79 |
|
|
|
80 |
|
|
|
81 |
<ul> |
|
|
82 |
<li class="toctree-l1"><a class="reference internal" href="installation.html">Installation</a></li> |
|
|
83 |
</ul> |
|
|
84 |
|
|
|
85 |
|
|
|
86 |
|
|
|
87 |
</div> |
|
|
88 |
</div> |
|
|
89 |
</nav> |
|
|
90 |
|
|
|
91 |
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"> |
|
|
92 |
|
|
|
93 |
|
|
|
94 |
<nav class="wy-nav-top" aria-label="top navigation"> |
|
|
95 |
|
|
|
96 |
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> |
|
|
97 |
<a href="index.html">DeepProg</a> |
|
|
98 |
|
|
|
99 |
</nav> |
|
|
100 |
|
|
|
101 |
|
|
|
102 |
<div class="wy-nav-content"> |
|
|
103 |
|
|
|
104 |
<div class="rst-content"> |
|
|
105 |
|
|
|
106 |
|
|
|
107 |
|
|
|
108 |
|
|
|
109 |
|
|
|
110 |
|
|
|
111 |
|
|
|
112 |
|
|
|
113 |
|
|
|
114 |
|
|
|
115 |
|
|
|
116 |
|
|
|
117 |
|
|
|
118 |
|
|
|
119 |
|
|
|
120 |
|
|
|
121 |
|
|
|
122 |
<div role="navigation" aria-label="breadcrumbs navigation"> |
|
|
123 |
|
|
|
124 |
<ul class="wy-breadcrumbs"> |
|
|
125 |
|
|
|
126 |
<li><a href="index.html">Docs</a> »</li> |
|
|
127 |
|
|
|
128 |
<li>Survival Integration of Multi-omics using Deep-Learning (DeepProg)</li> |
|
|
129 |
|
|
|
130 |
|
|
|
131 |
<li class="wy-breadcrumbs-aside"> |
|
|
132 |
|
|
|
133 |
|
|
|
134 |
<a href="_sources/README.md.txt" rel="nofollow"> View page source</a> |
|
|
135 |
|
|
|
136 |
|
|
|
137 |
</li> |
|
|
138 |
|
|
|
139 |
</ul> |
|
|
140 |
|
|
|
141 |
|
|
|
142 |
<hr/> |
|
|
143 |
</div> |
|
|
144 |
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article"> |
|
|
145 |
<div itemprop="articleBody"> |
|
|
146 |
|
|
|
147 |
<div class="section" id="survival-integration-of-multi-omics-using-deep-learning-deepprog"> |
|
|
148 |
<h1>Survival Integration of Multi-omics using Deep-Learning (DeepProg)<a class="headerlink" href="#survival-integration-of-multi-omics-using-deep-learning-deepprog" title="Permalink to this headline">¶</a></h1> |
|
|
149 |
<p>This package allows to combine multi-omics data together with survival. Using autoencoders, the pipeline creates new features and identify those linked with survival, using CoxPH regression. |
|
|
150 |
The omic data used in the original study are RNA-Seq, MiR and Methylation. However, this approach can be extended to any combination of omic data.</p> |
|
|
151 |
<p>The current package contains the omic data used in the study and a copy of the model computed. However, it is very easy to recreate a new model from scratch using any combination of omic data. |
|
|
152 |
The omic data and the survival files should be in tsv (Tabular Separated Values) format and examples are provided. The deep-learning framework uses Keras, which is a embedding of Theano / tensorflow/ CNTK.</p> |
|
|
153 |
<div class="section" id="requirements"> |
|
|
154 |
<h2>Requirements<a class="headerlink" href="#requirements" title="Permalink to this headline">¶</a></h2> |
|
|
155 |
<ul class="simple"> |
|
|
156 |
<li>Python 2 or 3</li> |
|
|
157 |
<li><a class="reference external" href="http://deeplearning.net/software/theano/install.html">theano</a> (the used version for the manuscript was 0.8.2)</li> |
|
|
158 |
<li><a class="reference external" href="https://www.tensorflow.org/">tensorflow</a> as a more robust alternative to theano</li> |
|
|
159 |
<li><a class="reference external" href="https://github.com/microsoft/CNTK">cntk</a> CNTK is anoter DL library that can present some advantages compared to tensorflow or theano. See <a class="reference external" href="https://docs.microsoft.com/en-us/cognitive-toolkit/">https://docs.microsoft.com/en-us/cognitive-toolkit/</a></li> |
|
|
160 |
<li>R</li> |
|
|
161 |
<li>the R “survival” package installed.</li> |
|
|
162 |
<li>numpy, scipy</li> |
|
|
163 |
<li>scikit-learn (>=0.18)</li> |
|
|
164 |
<li>rpy2 2.8.6 (for python2 rpy2 can be install with: pip install rpy2==2.8.6, for python3 pip3 install rpy2==2.8.6). It seems that newer version of rpy2 might not work due to a bug (not tested)</li> |
|
|
165 |
</ul> |
|
|
166 |
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip install theano --user <span class="c1"># Original backend used OR</span> |
|
|
167 |
pip install tensorflow --user <span class="c1"># Alternative backend for keras supposely for efficient</span> |
|
|
168 |
pip install keras --user |
|
|
169 |
pip install <span class="nv">rpy2</span><span class="o">==</span><span class="m">2</span>.8.6 --user |
|
|
170 |
|
|
|
171 |
<span class="c1">#If you want to use theano or CNTK</span> |
|
|
172 |
nano ~/.keras/keras.json |
|
|
173 |
</pre></div> |
|
|
174 |
</div> |
|
|
175 |
<ul class="simple"> |
|
|
176 |
<li>R installation</li> |
|
|
177 |
</ul> |
|
|
178 |
<div class="highlight-R notranslate"><div class="highlight"><pre><span></span><span class="nf">install.package</span><span class="p">(</span><span class="s">"survival"</span><span class="p">)</span> |
|
|
179 |
<span class="nf">install.package</span><span class="p">(</span><span class="s">"glmnet"</span><span class="p">)</span> |
|
|
180 |
<span class="nf">source</span><span class="p">(</span><span class="s">"https://bioconductor.org/biocLite.R"</span><span class="p">)</span> |
|
|
181 |
<span class="nf">biocLite</span><span class="p">(</span><span class="s">"survcomp"</span><span class="p">)</span> |
|
|
182 |
</pre></div> |
|
|
183 |
</div> |
|
|
184 |
<div class="section" id="support-for-cntk-tensorflow"> |
|
|
185 |
<h3>Support for CNTK / tensorflow<a class="headerlink" href="#support-for-cntk-tensorflow" title="Permalink to this headline">¶</a></h3> |
|
|
186 |
<ul class="simple"> |
|
|
187 |
<li>We originally used Keras with theano as backend plateform. However, <a class="reference external" href="https://www.tensorflow.org/">Tensorflow</a> or <a class="reference external" href="https://docs.microsoft.com/en-us/cognitive-toolkit/">CNTK</a> are more recent DL framework that can be faster or more stable than theano. Because keras supports these 3 backends, it is possible to use them as alternative to theano. To change backend, please configure the <code class="docutils literal notranslate"><span class="pre">$HOME/.keras/keras.json</span></code> file. (See official instruction <a class="reference external" href="https://keras.io/backend/">here</a>).</li> |
|
|
188 |
</ul> |
|
|
189 |
<p>The default configuration file looks like this:</p> |
|
|
190 |
<div class="highlight-json notranslate"><div class="highlight"><pre><span></span><span class="p">{</span> |
|
|
191 |
<span class="nt">"image_data_format"</span><span class="p">:</span> <span class="s2">"channels_last"</span><span class="p">,</span> |
|
|
192 |
<span class="nt">"epsilon"</span><span class="p">:</span> <span class="mf">1e-07</span><span class="p">,</span> |
|
|
193 |
<span class="nt">"floatx"</span><span class="p">:</span> <span class="s2">"float32"</span><span class="p">,</span> |
|
|
194 |
<span class="nt">"backend"</span><span class="p">:</span> <span class="s2">"tensorflow"</span> |
|
|
195 |
<span class="p">}</span> |
|
|
196 |
</pre></div> |
|
|
197 |
</div> |
|
|
198 |
</div> |
|
|
199 |
</div> |
|
|
200 |
<div class="section" id="distributed-computation"> |
|
|
201 |
<h2>Distributed computation<a class="headerlink" href="#distributed-computation" title="Permalink to this headline">¶</a></h2> |
|
|
202 |
<ul class="simple"> |
|
|
203 |
<li>It is possible to use the python ray framework <a class="reference external" href="https://github.com/ray-project/ray">https://github.com/ray-project/ray</a> to control the parallel computation of the multiple models. To use this framework, it is required to install it: <code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">ray</span> <span class="pre">--user</span></code></li> |
|
|
204 |
<li>Alternatively, it is also possible to create the model one by one without the need of the ray framework</li> |
|
|
205 |
</ul> |
|
|
206 |
</div> |
|
|
207 |
<div class="section" id="visualisation-module-experimental"> |
|
|
208 |
<h2>Visualisation module (Experimental)<a class="headerlink" href="#visualisation-module-experimental" title="Permalink to this headline">¶</a></h2> |
|
|
209 |
<ul class="simple"> |
|
|
210 |
<li>To visualise test sets projected into the multi-omic survival space, it is required to install <code class="docutils literal notranslate"><span class="pre">mpld3</span></code> module: <code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">mpld3</span> <span class="pre">--user</span></code></li> |
|
|
211 |
<li>Note that the pip version of mpld3 installed on my computer presented a <a class="reference external" href="https://github.com/mpld3/mpld3/issues/434">bug</a>: <code class="docutils literal notranslate"><span class="pre">TypeError:</span> <span class="pre">array([1.])</span> <span class="pre">is</span> <span class="pre">not</span> <span class="pre">JSON</span> <span class="pre">serializable</span> </code>. However, the <a class="reference external" href="https://github.com/mpld3/mpld3">newest</a> version of the mpld3 available from the github solved this issue. It is therefore recommended to install the newest version to avoid this issue.</li> |
|
|
212 |
</ul> |
|
|
213 |
</div> |
|
|
214 |
<div class="section" id="installation-local"> |
|
|
215 |
<h2>installation (local)<a class="headerlink" href="#installation-local" title="Permalink to this headline">¶</a></h2> |
|
|
216 |
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git clone https://github.com/lanagarmire/SimDeep.git |
|
|
217 |
<span class="nb">cd</span> SimDeep |
|
|
218 |
pip install -r requirements.txt --user |
|
|
219 |
</pre></div> |
|
|
220 |
</div> |
|
|
221 |
</div> |
|
|
222 |
<div class="section" id="usage"> |
|
|
223 |
<h2>Usage<a class="headerlink" href="#usage" title="Permalink to this headline">¶</a></h2> |
|
|
224 |
<ul class="simple"> |
|
|
225 |
<li>test if simdeep is functional (all the software are correctly installed):</li> |
|
|
226 |
</ul> |
|
|
227 |
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span> python test/test_dummy_boosting_stacking.py -v <span class="c1"># OR</span> |
|
|
228 |
nosetests <span class="nb">test</span> -v <span class="c1"># Improved version of python unit testing</span> |
|
|
229 |
</pre></div> |
|
|
230 |
</div> |
|
|
231 |
<ul class="simple"> |
|
|
232 |
<li>All the default parameters are defined in the config file: <code class="docutils literal notranslate"><span class="pre">./simdeep/config.py</span></code> but can be passed dynamically. Three types of parameters must be defined:<ul> |
|
|
233 |
<li>The training dataset (omics + survival input files)<ul> |
|
|
234 |
<li>In addition, the parameters of the test set, i.e. the omic dataset and the survival file</li> |
|
|
235 |
</ul> |
|
|
236 |
</li> |
|
|
237 |
<li>The parameters of the autoencoder (the default parameters works but it might be fine-tuned.</li> |
|
|
238 |
<li>The parameters of the classification procedures (default are still good)</li> |
|
|
239 |
</ul> |
|
|
240 |
</li> |
|
|
241 |
</ul> |
|
|
242 |
</div> |
|
|
243 |
<div class="section" id="example-datasets-and-scripts"> |
|
|
244 |
<h2>Example datasets and scripts<a class="headerlink" href="#example-datasets-and-scripts" title="Permalink to this headline">¶</a></h2> |
|
|
245 |
<p>An omic .tsv file must have this format:</p> |
|
|
246 |
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>head mir_dummy.tsv |
|
|
247 |
|
|
|
248 |
Samples dummy_mir_0 dummy_mir_1 dummy_mir_2 dummy_mir_3 ... |
|
|
249 |
sample_test_0 <span class="m">0</span>.469656032287 <span class="m">0</span>.347987447237 <span class="m">0</span>.706633335508 <span class="m">0</span>.440068758445 ... |
|
|
250 |
sample_test_1 <span class="m">0</span>.0453108219657 <span class="m">0</span>.0234642968791 <span class="m">0</span>.593393816691 <span class="m">0</span>.981872970341 ... |
|
|
251 |
sample_test_2 <span class="m">0</span>.908784043793 <span class="m">0</span>.854397550009 <span class="m">0</span>.575879144667 <span class="m">0</span>.553333958713 ... |
|
|
252 |
... |
|
|
253 |
</pre></div> |
|
|
254 |
</div> |
|
|
255 |
<p>a survival file must have this format:</p> |
|
|
256 |
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>head survival_dummy.tsv |
|
|
257 |
|
|
|
258 |
Samples days event |
|
|
259 |
sample_test_0 <span class="m">134</span> <span class="m">1</span> |
|
|
260 |
sample_test_1 <span class="m">291</span> <span class="m">0</span> |
|
|
261 |
sample_test_2 <span class="m">125</span> <span class="m">1</span> |
|
|
262 |
sample_test_3 <span class="m">43</span> <span class="m">0</span> |
|
|
263 |
... |
|
|
264 |
</pre></div> |
|
|
265 |
</div> |
|
|
266 |
<p>As examples, we included two datasets:</p> |
|
|
267 |
<ul class="simple"> |
|
|
268 |
<li>A dummy example dataset in the <code class="docutils literal notranslate"><span class="pre">example/data/</span></code> folder:</li> |
|
|
269 |
</ul> |
|
|
270 |
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>examples |
|
|
271 |
├── data |
|
|
272 |
│ ├── meth_dummy.tsv |
|
|
273 |
│ ├── mir_dummy.tsv |
|
|
274 |
│ ├── rna_dummy.tsv |
|
|
275 |
│ ├── rna_test_dummy.tsv |
|
|
276 |
│ ├── survival_dummy.tsv |
|
|
277 |
│ └── survival_test_dummy.tsv |
|
|
278 |
</pre></div> |
|
|
279 |
</div> |
|
|
280 |
<ul class="simple"> |
|
|
281 |
<li>And a real dataset in the <code class="docutils literal notranslate"><span class="pre">data</span></code> folder. This dataset derives from the TCGA HCC cancer dataset. This dataset needs to be decompressed before processing:</li> |
|
|
282 |
</ul> |
|
|
283 |
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>data |
|
|
284 |
├── meth.tsv.gz |
|
|
285 |
├── mir.tsv.gz |
|
|
286 |
├── rna.tsv.gz |
|
|
287 |
└── survival.tsv |
|
|
288 |
</pre></div> |
|
|
289 |
</div> |
|
|
290 |
</div> |
|
|
291 |
<div class="section" id="creating-a-simple-deepprog-model-with-one-autoencoder-for-each-omic"> |
|
|
292 |
<h2>Creating a simple DeepProg model with one autoencoder for each omic<a class="headerlink" href="#creating-a-simple-deepprog-model-with-one-autoencoder-for-each-omic" title="Permalink to this headline">¶</a></h2> |
|
|
293 |
<p>First, we will build a model using the example dataset from <code class="docutils literal notranslate"><span class="pre">./examples/data/</span></code> (These example files are set as default in the config.py file). We will use them to show how to construct a single DeepProg model inferring a autoencoder for each omic</p> |
|
|
294 |
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span> |
|
|
295 |
<span class="c1"># SimDeep class can be used to build one model with one autoencoder for each omic</span> |
|
|
296 |
<span class="kn">from</span> <span class="nn">simdeep.simdeep_analysis</span> <span class="kn">import</span> <span class="n">SimDeep</span> |
|
|
297 |
<span class="kn">from</span> <span class="nn">simdeep.extract_data</span> <span class="kn">import</span> <span class="n">LoadData</span> |
|
|
298 |
|
|
|
299 |
<span class="n">help</span><span class="p">(</span><span class="n">SimDeep</span><span class="p">)</span> <span class="c1"># to see all the functions</span> |
|
|
300 |
<span class="n">help</span><span class="p">(</span><span class="n">LoadData</span><span class="p">)</span> <span class="c1"># to see all the functions related to loading datasets</span> |
|
|
301 |
|
|
|
302 |
<span class="c1"># Defining training datasets</span> |
|
|
303 |
<span class="kn">from</span> <span class="nn">simdeep.config</span> <span class="kn">import</span> <span class="n">TRAINING_TSV</span> |
|
|
304 |
<span class="kn">from</span> <span class="nn">simdeep.config</span> <span class="kn">import</span> <span class="n">SURVIVAL_TSV</span> |
|
|
305 |
|
|
|
306 |
<span class="n">dataset</span> <span class="o">=</span> <span class="n">LoadData</span><span class="p">(</span><span class="n">training_tsv</span><span class="o">=</span><span class="n">TRAINING_TSV</span><span class="p">,</span> <span class="n">survival_tsv</span><span class="o">=</span><span class="n">SURVIVAL_TSV</span><span class="p">)</span> |
|
|
307 |
|
|
|
308 |
<span class="n">simDeep</span> <span class="o">=</span> <span class="n">SimDeep</span><span class="p">(</span><span class="n">dataset</span><span class="o">=</span><span class="n">dataset</span><span class="p">)</span> <span class="c1"># instantiate the model with the dummy example training dataset defined in the config file</span> |
|
|
309 |
<span class="n">simDeep</span><span class="o">.</span><span class="n">load_training_dataset</span><span class="p">()</span> <span class="c1"># load the training dataset</span> |
|
|
310 |
<span class="n">simDeep</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> <span class="c1"># fit the model</span> |
|
|
311 |
|
|
|
312 |
<span class="c1"># Defining test datasets</span> |
|
|
313 |
<span class="kn">from</span> <span class="nn">simdeep.config</span> <span class="kn">import</span> <span class="n">TEST_TSV</span> |
|
|
314 |
<span class="kn">from</span> <span class="nn">simdeep.config</span> <span class="kn">import</span> <span class="n">SURVIVAL_TSV_TEST</span> |
|
|
315 |
|
|
|
316 |
<span class="n">simDeep</span><span class="o">.</span><span class="n">load_new_test_dataset</span><span class="p">(</span><span class="n">TEST_TSV</span><span class="p">,</span> <span class="n">SURVIVAL_TSV_TEST</span><span class="p">,</span> <span class="n">fname_key</span><span class="o">=</span><span class="s1">'dummy'</span><span class="p">)</span> |
|
|
317 |
|
|
|
318 |
<span class="c1"># The test set is a dummy rna expression (generated randomly)</span> |
|
|
319 |
<span class="nb">print</span><span class="p">(</span><span class="n">simDeep</span><span class="o">.</span><span class="n">dataset</span><span class="o">.</span><span class="n">test_tsv</span><span class="p">)</span> <span class="c1"># Defined in the config file</span> |
|
|
320 |
<span class="c1"># The data type of the test set is also defined to match an existing type</span> |
|
|
321 |
<span class="nb">print</span><span class="p">(</span><span class="n">simDeep</span><span class="o">.</span><span class="n">dataset</span><span class="o">.</span><span class="n">data_type</span><span class="p">)</span> <span class="c1"># Defined in the config file</span> |
|
|
322 |
<span class="n">simDeep</span><span class="o">.</span><span class="n">predict_labels_on_test_dataset</span><span class="p">()</span> <span class="c1"># Perform the classification analysis and label the set dataset</span> |
|
|
323 |
|
|
|
324 |
<span class="nb">print</span><span class="p">(</span><span class="n">simDeep</span><span class="o">.</span><span class="n">test_labels</span><span class="p">)</span> |
|
|
325 |
<span class="nb">print</span><span class="p">(</span><span class="n">simDeep</span><span class="o">.</span><span class="n">test_labels_proba</span><span class="p">)</span> |
|
|
326 |
|
|
|
327 |
<span class="n">simDeep</span><span class="o">.</span><span class="n">save_encoder</span><span class="p">(</span><span class="s1">'dummy_encoder.h5'</span><span class="p">)</span> |
|
|
328 |
</pre></div> |
|
|
329 |
</div> |
|
|
330 |
</div> |
|
|
331 |
<div class="section" id="creating-a-deepprog-model-using-an-ensemble-of-submodels"> |
|
|
332 |
<h2>Creating a DeepProg model using an ensemble of submodels<a class="headerlink" href="#creating-a-deepprog-model-using-an-ensemble-of-submodels" title="Permalink to this headline">¶</a></h2> |
|
|
333 |
<p>Secondly, we will build a more complex DeepProg model constituted of an ensemble of sub-models each originated from a subset of the data. For that purpose, we need to use the <code class="docutils literal notranslate"><span class="pre">SimDeepBoosting</span></code> class:</p> |
|
|
334 |
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">simdeep.simdeep_boosting</span> <span class="kn">import</span> <span class="n">SimDeepBoosting</span> |
|
|
335 |
|
|
|
336 |
<span class="n">help</span><span class="p">(</span><span class="n">SimDeepBoosting</span><span class="p">)</span> |
|
|
337 |
|
|
|
338 |
<span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">OrderedDict</span> |
|
|
339 |
|
|
|
340 |
|
|
|
341 |
<span class="n">path_data</span> <span class="o">=</span> <span class="s2">"../examples/data/"</span> |
|
|
342 |
<span class="c1"># Example tsv files</span> |
|
|
343 |
<span class="n">tsv_files</span> <span class="o">=</span> <span class="n">OrderedDict</span><span class="p">([</span> |
|
|
344 |
<span class="p">(</span><span class="s1">'MIR'</span><span class="p">,</span> <span class="s1">'mir_dummy.tsv'</span><span class="p">),</span> |
|
|
345 |
<span class="p">(</span><span class="s1">'METH'</span><span class="p">,</span> <span class="s1">'meth_dummy.tsv'</span><span class="p">),</span> |
|
|
346 |
<span class="p">(</span><span class="s1">'RNA'</span><span class="p">,</span> <span class="s1">'rna_dummy.tsv'</span><span class="p">),</span> |
|
|
347 |
<span class="p">])</span> |
|
|
348 |
|
|
|
349 |
<span class="c1"># File with survival event</span> |
|
|
350 |
<span class="n">survival_tsv</span> <span class="o">=</span> <span class="s1">'survival_dummy.tsv'</span> |
|
|
351 |
|
|
|
352 |
<span class="n">project_name</span> <span class="o">=</span> <span class="s1">'stacked_TestProject'</span> |
|
|
353 |
<span class="n">epochs</span> <span class="o">=</span> <span class="mi">10</span> <span class="c1"># Autoencoder epochs. Other hyperparameters can be fine-tuned. See the example files</span> |
|
|
354 |
<span class="n">seed</span> <span class="o">=</span> <span class="mi">3</span> <span class="c1"># random seed used for reproducibility</span> |
|
|
355 |
<span class="n">nb_it</span> <span class="o">=</span> <span class="mi">5</span> <span class="c1"># This is the number of models to be fitted using only a subset of the training data</span> |
|
|
356 |
<span class="n">nb_threads</span> <span class="o">=</span> <span class="mi">2</span> <span class="c1"># These treads define the number of threads to be used to compute survival function</span> |
|
|
357 |
|
|
|
358 |
<span class="n">boosting</span> <span class="o">=</span> <span class="n">SimDeepBoosting</span><span class="p">(</span> |
|
|
359 |
<span class="n">nb_threads</span><span class="o">=</span><span class="n">nb_threads</span><span class="p">,</span> |
|
|
360 |
<span class="n">nb_it</span><span class="o">=</span><span class="n">nb_it</span><span class="p">,</span> |
|
|
361 |
<span class="n">split_n_fold</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> |
|
|
362 |
<span class="n">survival_tsv</span><span class="o">=</span><span class="n">tsv_files</span><span class="p">,</span> |
|
|
363 |
<span class="n">training_tsv</span><span class="o">=</span><span class="n">survival_tsv</span><span class="p">,</span> |
|
|
364 |
<span class="n">path_data</span><span class="o">=</span><span class="n">path_data</span><span class="p">,</span> |
|
|
365 |
<span class="n">project_name</span><span class="o">=</span><span class="n">project_name</span><span class="p">,</span> |
|
|
366 |
<span class="n">path_results</span><span class="o">=</span><span class="n">path_data</span><span class="p">,</span> |
|
|
367 |
<span class="n">epochs</span><span class="o">=</span><span class="n">epochs</span><span class="p">,</span> |
|
|
368 |
<span class="n">seed</span><span class="o">=</span><span class="n">seed</span><span class="p">)</span> |
|
|
369 |
|
|
|
370 |
<span class="c1"># Fit the model</span> |
|
|
371 |
<span class="n">boosting</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span> |
|
|
372 |
<span class="c1"># Predict and write the labels</span> |
|
|
373 |
<span class="n">boosting</span><span class="o">.</span><span class="n">predict_labels_on_full_dataset</span><span class="p">()</span> |
|
|
374 |
<span class="c1"># Compute internal metrics</span> |
|
|
375 |
<span class="n">boosting</span><span class="o">.</span><span class="n">compute_clusters_consistency_for_full_labels</span><span class="p">()</span> |
|
|
376 |
<span class="c1"># COmpute the feature importance</span> |
|
|
377 |
<span class="n">boosting</span><span class="o">.</span><span class="n">compute_feature_scores_per_cluster</span><span class="p">()</span> |
|
|
378 |
<span class="c1"># Write the feature importance</span> |
|
|
379 |
<span class="n">boosting</span><span class="o">.</span><span class="n">write_feature_score_per_cluster</span><span class="p">()</span> |
|
|
380 |
|
|
|
381 |
<span class="n">boosting</span><span class="o">.</span><span class="n">load_new_test_dataset</span><span class="p">(</span> |
|
|
382 |
<span class="p">{</span><span class="s1">'RNA'</span><span class="p">:</span> <span class="s1">'rna_dummy.tsv'</span><span class="p">},</span> <span class="c1"># OMIC file of the test set. It doesnt have to be the same as for training</span> |
|
|
383 |
<span class="s1">'survival_dummy.tsv'</span><span class="p">,</span> <span class="c1"># Survival file of the test set</span> |
|
|
384 |
<span class="s1">'TEST_DATA_1'</span><span class="p">,</span> <span class="c1"># Name of the test test to be used</span> |
|
|
385 |
<span class="p">)</span> |
|
|
386 |
|
|
|
387 |
<span class="c1"># Predict the labels on the test dataset</span> |
|
|
388 |
<span class="n">boosting</span><span class="o">.</span><span class="n">predict_labels_on_test_dataset</span><span class="p">()</span> |
|
|
389 |
<span class="c1"># Compute C-index</span> |
|
|
390 |
<span class="n">boosting</span><span class="o">.</span><span class="n">compute_c_indexes_for_test_dataset</span><span class="p">()</span> |
|
|
391 |
<span class="c1"># See cluster consistency</span> |
|
|
392 |
<span class="n">boosting</span><span class="o">.</span><span class="n">compute_clusters_consistency_for_test_labels</span><span class="p">()</span> |
|
|
393 |
|
|
|
394 |
<span class="c1"># [EXPERIMENTAL] method to plot the test dataset amongst the class kernel densities</span> |
|
|
395 |
<span class="n">boosting</span><span class="o">.</span><span class="n">plot_supervised_kernel_for_test_sets</span><span class="p">()</span> |
|
|
396 |
</pre></div> |
|
|
397 |
</div> |
|
|
398 |
</div> |
|
|
399 |
<div class="section" id="creating-a-distributed-deepprog-model-using-an-ensemble-of-submodels"> |
|
|
400 |
<h2>Creating a distributed DeepProg model using an ensemble of submodels<a class="headerlink" href="#creating-a-distributed-deepprog-model-using-an-ensemble-of-submodels" title="Permalink to this headline">¶</a></h2> |
|
|
401 |
<p>We can allow DeepProg to distribute the creation of each submodel on different clusters/nodes/CPUs by using the ray framework. |
|
|
402 |
The configuration of the nodes / clusters, or local CPUs to be used needs to be done when instanciating a new ray object with the ray <a class="reference external" href="https://ray.readthedocs.io/en/latest/">API</a>. It is however quite straightforward to define the number of instances launched on a local machine such as in the example below in which 3 instances are used.</p> |
|
|
403 |
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># Instanciate a ray object that will create multiple workers</span> |
|
|
404 |
<span class="kn">import</span> <span class="nn">ray</span> |
|
|
405 |
<span class="n">ray</span><span class="o">.</span><span class="n">init</span><span class="p">(</span><span class="n">num_cpus</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span> |
|
|
406 |
<span class="c1"># More options can be used (e.g. remote clusters, AWS, memory,...etc...)</span> |
|
|
407 |
<span class="c1"># ray can be used locally to maximize the use of CPUs on the local machine</span> |
|
|
408 |
<span class="c1"># See ray API: https://ray.readthedocs.io/en/latest/index.html</span> |
|
|
409 |
|
|
|
410 |
<span class="n">boosting</span> <span class="o">=</span> <span class="n">SimDeepBoosting</span><span class="p">(</span> |
|
|
411 |
<span class="o">...</span> |
|
|
412 |
<span class="n">distribute</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="c1"># Additional option to use ray cluster scheduler</span> |
|
|
413 |
<span class="o">...</span> |
|
|
414 |
<span class="p">)</span> |
|
|
415 |
<span class="o">...</span> |
|
|
416 |
<span class="c1"># Processing</span> |
|
|
417 |
<span class="o">...</span> |
|
|
418 |
|
|
|
419 |
<span class="c1"># Close clusters and free memory</span> |
|
|
420 |
<span class="n">ray</span><span class="o">.</span><span class="n">shutdown</span><span class="p">()</span> |
|
|
421 |
</pre></div> |
|
|
422 |
</div> |
|
|
423 |
</div> |
|
|
424 |
<div class="section" id="example-scripts"> |
|
|
425 |
<h2>Example scripts<a class="headerlink" href="#example-scripts" title="Permalink to this headline">¶</a></h2> |
|
|
426 |
<p>Example scripts are availables in ./examples/ which will assist you to build a model from scratch with test and real data:</p> |
|
|
427 |
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>examples |
|
|
428 |
├── create_autoencoder_from_scratch.py <span class="c1"># Construct a simple deeprog model on the dummy example dataset</span> |
|
|
429 |
├── example_with_dummy_data_distributed.py <span class="c1"># Process the dummy example dataset using ray</span> |
|
|
430 |
├── example_with_dummy_data.py <span class="c1"># Process the dummy example dataset</span> |
|
|
431 |
└── load_3_omics_model.py <span class="c1"># Process the example HCC dataset</span> |
|
|
432 |
</pre></div> |
|
|
433 |
</div> |
|
|
434 |
</div> |
|
|
435 |
<div class="section" id="contact-and-credentials"> |
|
|
436 |
<h2>contact and credentials<a class="headerlink" href="#contact-and-credentials" title="Permalink to this headline">¶</a></h2> |
|
|
437 |
<ul class="simple"> |
|
|
438 |
<li>Developer: Olivier Poirion (PhD)</li> |
|
|
439 |
<li>contact: opoirion@hawaii.edu, o.poirion@gmail.com</li> |
|
|
440 |
</ul> |
|
|
441 |
</div> |
|
|
442 |
</div> |
|
|
443 |
|
|
|
444 |
|
|
|
445 |
</div> |
|
|
446 |
|
|
|
447 |
</div> |
|
|
448 |
<footer> |
|
|
449 |
|
|
|
450 |
|
|
|
451 |
<hr/> |
|
|
452 |
|
|
|
453 |
<div role="contentinfo"> |
|
|
454 |
<p> |
|
|
455 |
© Copyright 2019, Olivier Poirion |
|
|
456 |
|
|
|
457 |
</p> |
|
|
458 |
</div> |
|
|
459 |
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. |
|
|
460 |
|
|
|
461 |
</footer> |
|
|
462 |
|
|
|
463 |
</div> |
|
|
464 |
</div> |
|
|
465 |
|
|
|
466 |
</section> |
|
|
467 |
|
|
|
468 |
</div> |
|
|
469 |
|
|
|
470 |
|
|
|
471 |
|
|
|
472 |
<script type="text/javascript"> |
|
|
473 |
jQuery(function () { |
|
|
474 |
SphinxRtdTheme.Navigation.enable(true); |
|
|
475 |
}); |
|
|
476 |
</script> |
|
|
477 |
|
|
|
478 |
|
|
|
479 |
|
|
|
480 |
|
|
|
481 |
|
|
|
482 |
|
|
|
483 |
</body> |
|
|
484 |
</html> |