Diff of /README.rst [000000] .. [d7cf27]

Switch to side-by-side view

--- a
+++ b/README.rst
@@ -0,0 +1,162 @@
+=====================================
+Janggu - Deep learning for Genomics
+=====================================
+
+.. start-badges
+
+.. image:: https://readthedocs.org/projects/janggu/badge/?style=flat
+    :target: https://janggu.readthedocs.io/en/latest
+    :alt: Documentation Status
+
+.. image:: https://travis-ci.org/BIMSBbioinfo/janggu.svg?branch=master
+    :alt: Travis-CI Build Status
+    :target: https://travis-ci.org/BIMSBbioinfo/janggu
+
+.. image:: https://codecov.io/github/BIMSBbioinfo/janggu/coverage.svg?branch=master
+    :alt: Coverage Status
+    :target: https://codecov.io/github/BIMSBbioinfo/janggu
+
+.. image:: https://badge.fury.io/py/janggu.svg
+    :alt: PyPI Package latest release
+    :target: https://pypi.org/project/janggu
+
+.. image:: https://img.shields.io/pypi/l/janggu.svg?color=green
+    :alt: License
+    :target: https://pypi.org/project/janggu
+
+.. image:: https://img.shields.io/pypi/pyversions/janggu.svg
+    :alt: Supported Python Versions
+    :target: https://pypi.org/project/janggu/
+
+.. image:: https://pepy.tech/badge/janggu
+    :alt: Downloads
+    :target: https://pepy.tech/project/janggu
+
+.. end-badges
+
+.. image:: jangguhex.png
+   :width: 40%
+   :alt: Janggu logo
+   :align: center
+
+Janggu is a python package that facilitates deep learning in the context of
+genomics. The package is freely available under a GPL-3.0 license.
+
+.. image:: Janggu-visAbstract.png
+   :width: 50%
+   :alt: Janggu visual abstract
+   :align: center
+
+
+In particular, the package allows for easy access to
+typical **Genomics data formats**
+and **out-of-the-box evaluation** (for keras models specifically) so that you can concentrate
+on designing the neural network architecture for the purpose
+of quickly testing biological hypothesis.
+A comprehensive documentation is available `here <https://janggu.readthedocs.io/en/latest>`_.
+
+Hallmarks of Janggu:
+---------------------
+
+1. Janggu provides special **Genomics datasets** that allow you to access raw data in FASTA, BAM, BIGWIG, BED and GFF file format.
+2. Various **normalization** procedures are supported for dealing with of the genomics dataset, including 'TPM', 'zscore' or custom normalizers.
+3. Biological features can be represented in terms of higher-order sequence features, e.g. di-nucleotide based features.
+4. The dataset objects are directly consumable with neural networks for example implemented using `keras <https://keras.io>`_ or using `scikit-learn <https://scikit-learn.org/stable/index.html>`_ (see src/examples in this repository).
+5. Numpy format output of a keras model can be converted to represent genomic coverage tracks, which allows exporting the predictions as BIGWIG files and visualization of genome browser-like plots.
+6. Genomic datasets can be stored in various ways, including as numpy array, sparse dataset or in hdf5 format.
+7. Caching of Genomic datasets avoids time consuming preprocessing steps and facilitates fast reloading.
+8. Janggu provides a wrapper for `keras <https://keras.io>`_ models with built-in logging functionality and automatized result evaluation.
+9. Janggu supports input feature importance attribution using the integrated gradients method and variant effect prediction assessment.
+10. Janggu provides a utilities such as keras layer for scanning both DNA strands for motif occurrences.
+
+Getting started
+----------------
+
+Janggu makes it easy to access data from genomic file formats and utilize it for
+machine learning purposes.
+
+.. code-block:: python
+
+  dna = Bioseq.create_from_genome('dna', refgenome=<refgenome.fa>, roi=<roi.bed>)
+  labels = Cover.create_from_bed('labels', bedfiles=<labels.bed>, roi=<roi.bed>)
+
+  kerasmodel.fit(dna, labels)
+  
+A range of examples can be found in './src/examples' of this repository,
+which includes jupyter notebooks that illustrate Janggu's functionality
+and how it can be used with popular deep learning frameworks, including
+keras, sklearn or pytorch.
+
+Why the name Janggu?
+---------------------
+
+`Janggu <https://en.wikipedia.org/wiki/Janggu>`_ is a Korean percussion
+instrument that looks like an hourglass.
+
+Like the two ends of the instrument, the philosophy of the
+Janggu package is to help with the two ends of a
+deep learning application in genomics,
+namely data acquisition and evaluation.
+
+
+
+Installation
+============
+
+A list of python dependencies is defined in `setup.py`.
+Additionally, `bedtools <https://bedtools.readthedocs.io/>`_ is required for `pybedtools` which `janggu` depends on.
+
+Janggu depends on tensorflow and keras.
+To install janggu with tensorflow version 1 and 2 use
+
+::
+
+   # to install with tensorflow==1.14 and keras==2.2
+   pip install janggu[tf] # or janggu[tf_gpu] 
+
+   # to install with tensorflow==2.2 and keras==2.4.3
+   pip install janggu[tf2] # or janggu[tf2_gpu] 
+
+
+Depending on the pip version (e.g. 20.2.2),
+some package dependencies may fail to be resolved
+accurately such that incompatible package versions are installed.
+If this is the case, you could try using
+`pip install ... --use-feature=2020-resolver`
+or install the required package version manually.
+
+Alternatively, you can install tensorflow and keras via
+the conda environment using
+
+::
+
+   # tensorflow v1
+   conda install tensorflow==1.14 keras==2.2  # or tensorflow-gpu
+
+   # tensorflow v2
+   conda install tensorflow==2.2 keras==2.4.3  # or tensorflow-gpu
+
+Further information regarding the installation of tensorflow can be found on
+the official `tensorflow webpage <https://www.tensorflow.org>`_
+
+To verify that the installation works try to run the example contained in the
+janggu package as follows
+
+::
+
+   git clone https://github.com/BIMSBbioinfo/janggu
+   cd janggu
+   python ./src/examples/classify_fasta.py single
+
+A model is then trained to predict the class labels of two sets of toy sequencesby scanning the forward strand for sequence patterns and using an ordinary mono-nucleotide one-hot sequence encoding.
+The entire training process takes a few minutes on CPU backend.
+Eventually, some example prediction scores are shown for Oct4 and Mafk sequences. The accuracy should be around 85% and individual example prediction scores should tend to be higher for Oct4 than for Mafk.
+
+You may also try to rerun the training by evaluating sequences features on both
+strands and using higher-order sequence encoding using i.e. the command-line arguments: `dnaconv -order 2`.
+Accuracies and prediction scores for the individual example sequences should improve compared to the previous example.
+
+Citation
+========
+
+| Kopp, W., Monti, R., Tamburrini, A., Ohler, U., Akalin, A. Deep learning for genomics using Janggu. Nat Commun 11, 3488 (2020). https://doi.org/10.1038/s41467-020-17155-y