--- a +++ b/README.md @@ -0,0 +1,61 @@ +# janggu_usecases +Examples for deep learning in genomics using Janggu + +## Requirements + +``` +jupyter +bedtools +pybedtools +samtools +dash +janggu +R +rpy2 +tzlocal +r-ggplot2 +r-ggrepel +r-dplyr +statsmodels +pandas +numpy +``` + +These can be installed via conda and pip. + +The respective cells in the notebook for installing requirements may be outcommented. + +## Download the datasets +In order to download the required datasets, enter the 00_preparation folder. +It contains jupyter notebooks that specify and control the data download. +Furthermore, it sets up the regions of interest for the model training and evaluation. + +## Note + +Some of the steps in the notebooks may be outcommented or deactivated, +including the invocation of time-consuming training steps, +so that during evaluation, they are not re-run. You may either activate them within the notebook +or invoke the scripts on the command line if you wish to train the models from scratch. +It may also be necessary to adapt the use of `CUDA_VISIBLE_DEVICES` (see tensorflow docs). The GPU device is selected via the `-dev` option in use case 2. +These were chosen for our specific setup with 8 GPUs. For example, if you only have access to one GPU specify +`CUDA_VISIBLE_DEVICES=0` before running the scripts. + +## JunD prediction + +Run the jupyter notebook 'predicting_jund_binding.ipynb' in order to reproduce the results. +You can control on which gpu the models are trained by specifying the environment variable `CUDA_VISIBLE_DEVICES` (see tensorflow documentation). + +## DeepSEA and DanQ experiments + +To train and evaluate the DeepSEA and DanQ comparison, enter the '02_deepsea_danq_prediction' folder and launch the +jupyter notebook 'deepsea_danq_experiments.ipynb'. +To activate model training, set the parameter `train_models = True`. +Otherwise, the notebook merely evaluates the results. +You may need to adapt `-dev` to select a specfic GPU. + + +## CAGE-tag prediction + +To reproduce the CAGE-tag prediction use case, enter '03_cage_prediction' and launch the 'predicting_cage_tags.ipynb' notebook. +In order to run the cross-validation analysis, outcomment the respective command line invocations of the script 'cage_prediction.py'. +You can control on which gpu the models are trained by specifying the environment variable `CUDA_VISIBLE_DEVICES` (see tensorflow documentation).