--- a +++ b/docs/installation.md @@ -0,0 +1,151 @@ +# Installation + +Here we describe how to install the DeepProg package. We assume that the installation will be done locally, using the `--user` flag from pip. Alternatively, the package can be installed using a virtual environment or globally with sudo. Both python2.7 or python3.6 (or higher) can be used. We tested the installation on a linux, OSX and Windows environment. + +## Requirements +* Python 2 or 3 (Python3 is recommended) +* Either theano, tensorflow or CNTK (tensorflow is recommended) +* [theano](http://deeplearning.net/software/theano/install.html) (the used version for the manuscript was 0.8.2) +* [tensorflow](https://www.tensorflow.org/) as a more robust alternative to theano +* [cntk](https://github.com/microsoft/CNTK) CNTK is anoter DL library that can present some advantages compared to tensorflow or theano. See [https://docs.microsoft.com/en-us/cognitive-toolkit/](https://docs.microsoft.com/en-us/cognitive-toolkit/) +* scikit-learn (>=0.18) +* numpy, scipy +* lifelines +* (if using python3) scikit-survival +* (For distributed computing) ray (ray >= 0.8.4) framework +* (For hyperparameter tuning) scikit-optimize + +## Tested python package versions +Python 3.8 (tested for Linux and OSX. For Windows Visual C++ is required and LongPathsEnabled shoud be set to 1 in windows registry) +* tensorflow == 2.4.1 (2.4.1 currently doesn't seem to work with python3.9) +* keras == 2.4.3 +* ray == 0.8.4 +* scikit-learn == 0.23.2 +* scikit-survival == 0.14.0 (currently doesn't seem to work with python3.9) +* lifelines == 0.25.5 +* scikit-optimize == 0.8.1 (currently doesn't seem to work with python3.9) +* mpld3 == 0.5.1 + +Since ray and tensorflow are rapidly evolving libraries, newest versions might unfortunatly break DeepProg's API. To avoid any dependencies issues, we recommand working inside a Python 3 [virtual environement](https://docs.python.org/3/tutorial/venv.html) (`virtualenv`) and install the tested packages. + +### installation (local) + +```bash +# The downloading can take few minutes due to the size of th git project +git clone https://github.com/lanagarmire/DeepProg.git +cd DeepProg + +# (RECOMMENDED) install with conda +conda env create -n deepprog -f ./environment.yml python=3.8 +conda activate deepprog +pip install -e . -r requirements_tested.txt + +# (RECOMMENDED) to install the tested python library versions +pip install -e . -r requirements_tested.txt + +# Basic installation (under python3/pip3) +pip3 install -e . -r requirements.txt +# To intall the distributed frameworks +pip3 install -e . -r requirements_distributed.txt +# Installing scikit-survival (python3 only) +pip3 install -r requirements_pip3.txt +# Install ALL required dependencies with the most up to date packages +pip install -e . -r requirements_all.txt + + +# **Ignore this if you are working under python3** +# python 3 is highly preferred, but DeepProg working with python2/pip2, however there is no support for scikit-survival in python2 +pip2 install -e . -r requirements.txt +pip2 install -e . -r requirements_distributed.txt +``` + +### Installation with docker +We have created a docker image (`opoirion/deepprog_docker:v1`) with all the dependencies already installed. For the docker (and singularity) instruction, please refer to the docker [tutorial](https://deepprog-garmires-lab.readthedocs.io/en/latest/usage_with_docker.html). + +## Alternative deep-Learning packages installation + +The required python packages can be installed using pip: + +```bash +pip install theano --user # Original backend used OR +pip install tensorflow --user # Alternative backend for keras and default +pip install keras --user +``` + +## Alternative support for CNTK / theano / tensorflow +We originally used Keras with theano as backend plateform. However, [Tensorflow](https://www.tensorflow.org/) (currently the defaut background DL framework) or [CNTK](https://docs.microsoft.com/en-us/cognitive-toolkit/) are more recent DL framework that can be faster or more stable than theano. Because keras supports these 3 backends, it is possible to use them as alternative. To install CNTK, please refer to the official [guidelines](https://docs.microsoft.com/en-us/cognitive-toolkit/setup-cntk-on-your-machine) . To change backend, please configure the `$HOME/.keras/keras.json` file. (See official instruction [here](https://keras.io/backend/)). + +The default configuration file: ` ~/.keras/keras.json` looks like this: + +```json +{ + "image_data_format": "channels_last", + "epsilon": 1e-07, + "floatx": "float32", + "backend": "tensorflow" +} +``` + +### R installation (Alternative to Python lifelines) + +In his first implementation, DeepProg used the R survival toolkits to fit the survival functions (cox-PH models) and compute the concordance indexes. These functions have been replaced with the python toolkits lifelines and scikit-survival for more convenience and avoid any compatibility issue. However, differences exists regarding the computation of the c-indexes using either python or R libraries. To use the original R functions, it is necessary to install the following R libraries. + +* R +* the R "survival" package installed. +* rpy2 3.4.4 (for python2 rpy2 can be install with: pip install rpy2==2.8.6, for python3 pip3 install rpy2==2.8.6). + + +```R +install.packages("survival") +install.packages("glmnet") +if (!requireNamespace("BiocManager", quietly = TRUE)) + install.packages("BiocManager") +BiocManager::install("survcomp") +``` + +Then, when instantiating a `SimDeep` or a `SimDeepBoosting` object, the option `use_r_packages` needs to be set to `True`. + + +## Visualisation module (Experimental) +To visualise test sets projected into the multi-omic survival space, it is required to install `mpld3` module. +Note that the pip version of mpld3 installed with pip on my computer presented a [bug](https://github.com/mpld3/mpld3/issues/434): `TypeError: array([1.]) is not JSON serializable `. However, the [newest](https://github.com/mpld3/mpld3) version of the mpld3 available from the github solved this issue. Rather than executing `pip install mpld3 --user` It is therefore recommended to install the newest version to avoid this issue directly from the github repository: + +```bash +git clone https://github.com/mpld3/mpld3 +cd mpld3 +pip install -e . --user +``` + +### Distributed computation +* It is possible to use the python ray framework [https://github.com/ray-project/ray](https://github.com/ray-project/ray) to control the parallel computation of the multiple models. To use this framework, it is required to install it: `pip install ray` +* Alternatively, it is also possible to create the model one by one without the need of the ray framework + +### Visualisation module (Experimental) +* To visualise test sets projected into the multi-omic survival space, it is required to install `mpld3` module: `pip install mpld3` +* Note that the pip version of mpld3 installed on my computer presented a [bug](https://github.com/mpld3/mpld3/issues/434): `TypeError: array([1.]) is not JSON serializable `. However, the [newest](https://github.com/mpld3/mpld3) version of the mpld3 available from the github solved this issue. It is therefore recommended to install the newest version to avoid this issue. + +## Usage +* test if simdeep is functional (all the software are correctly installed): go to main folder (./DeepProg/) and run the following + +```bash + python3 test/test_simdeep.py -v # + ``` + +* All the default parameters are defined in the config file: `./simdeep/config.py` but can be passed dynamically. Three types of parameters must be defined: + * The training dataset (omics + survival input files) + * In addition, the parameters of the test set, i.e. the omic dataset and the survival file + * The parameters of the autoencoder (the default parameters works but it might be fine-tuned. + * The parameters of the classification procedures (default are still good) + + +## Example scripts + +Example scripts are availables in ./examples/ which will assist you to build a model from scratch with test and real data: + +```bash +examples +├── example_hyperparameters_tuning.py +├── example_hyperparameters_tuning_with_test_dataset.py +├── example_with_dummy_data_distributed.py +├── example_with_dummy_data.py +└── load_3_omics_model.py