Diff of /README.md [000000] .. [d45a3a]

Switch to unified view

a b/README.md
1
# BPNet
2
[![CircleCI](https://circleci.com/gh/kundajelab/bpnet.svg?style=svg&circle-token=f55c1cf580b05df76e260993f7645e35d5302e76)](https://circleci.com/gh/kundajelab/bpnet)
3
4
BPNet is a python package with a CLI to train and interpret base-resolution deep neural networks trained on functional genomics data such as ChIP-nexus or ChIP-seq. It addresses the problem of pinpointing the regulatory elements in the genome:
5
6
<img src="./docs/theme_dir/bpnet/dna-words.png" alt="BPNet" style="width: 600px;"/>
7
8
Specifically, it aims to answer the following questions:
9
- What are the sequence motifs?
10
- Where are they located in the genome?
11
- How do they interact?
12
13
For more information, see the BPNet manuscript:
14
15
*Deep learning at base-resolution reveals motif syntax of the cis-regulatory code* (http://dx.doi.org/10.1101/737981.)
16
17
## Overview
18
19
<img src="./docs/theme_dir/bpnet/overview.png" alt="BPNet" style="width: 400px;"/>
20
21
## Getting started
22
23
Main documentation of the bpnet package and an end-to-end example higlighting the main features are contained in the following colab notebook **<https://colab.research.google.com/drive/1VNsNBfugPJfJ02LBgvPwj-gPK0L_djsD>**. You can run this notebook yourself by clicking on '**Open in playground**'. Individual cells of this notebook can be executed by pressing the Shift+Enter keyboard shortcut.
24
25
<img src="./docs/theme_dir/bpnet/colab-header.png" alt="BPNet" style="width: 300px;"/>
26
27
To learn more about colab, visit <https://colab.research.google.com> and follow the 'Welcome To Colaboratory' notebook.
28
29
## Main commands
30
31
Compute data statistics to inform hyper-parameter selection such as choosing to trade off profile vs total count loss (`lambda` hyper-parameter):
32
33
```bash
34
bpnet dataspec-stats dataspec.yml
35
```
36
37
Train a model on BigWig tracks specified in [dataspec.yml](examples/chip-nexus/dataspec.yml) using an existing architecture [bpnet9](bpnet/premade/bpnet9-pyspec.gin) on 200 bp sequences with 6 dilated convolutional layers:
38
39
```bash
40
bpnet train --premade=bpnet9 dataspec.yml --override='seq_width=200;n_dil_layers=6' .
41
```
42
43
Compute contribution scores for regions specified in the `dataspec.yml` file and store them into `contrib.scores.h5`
44
45
```bash
46
bpnet contrib . --method=deeplift contrib.scores.h5
47
```
48
49
Export BigWig tracks containing model predictions and contribution scores
50
51
```bash
52
bpnet export-bw . --regions=intervals.bed --scale-contribution bigwigs/
53
```
54
55
Discover motifs with TF-MoDISco using contribution scores stored in `contrib.scores.h5`, premade configuration [modisco-50k](bpnet/premade/modisco-50k.gin) and restricting the number of seqlets per metacluster to 20k:
56
57
```bash
58
bpnet modisco-run contrib.scores.h5 --premade=modisco-50k --override='TfModiscoWorkflow.max_seqlets_per_metacluster=20000' modisco/
59
```
60
61
Determine motif instances with CWM scanning and store them to `motif-instances.tsv.gz`
62
63
```bash
64
bpnet cwm-scan modisco/ --contrib-file=contrib.scores.h5 modisco/motif-instances.tsv.gz
65
```
66
67
Generate additional reports suitable for ChIP-nexus or ChIP-seq data:
68
69
```bash
70
bpnet chip-nexus-analysis modisco/
71
```
72
73
Note: these commands are also accessible as python functions:
74
- `bpnet.cli.train.bpnet_train`
75
- `bpnet.cli.train.dataspec_stats`
76
- `bpnet.cli.contrib.bpnet_contrib`
77
- `bpnet.cli.export_bw.bpnet_export_bw`
78
- `bpnet.cli.modisco.bpnet_modisco_run`
79
- `bpnet.cli.modisco.cwm_scan`
80
- `bpnet.cli.modisco.chip_nexus_analysis`
81
82
## Main python classes
83
84
- `bpnet.seqmodel.SeqModel` - Keras model container specified by implementing output 'heads' and a common 'body'. It contains methods to compute the contribution scores of the input sequence w.r.t. differnet output heads.
85
- `bpnet.BPNet.BPNetSeqModel` - Wrapper around `SeqModel` consolidating profile and total count predictions into a single output per task. It provides methods to export predictions and contribution scores to BigWig files as well as methods to simulate the spacing between two motifs.
86
- `bpnet.cli.contrib.ContribFile` - File handle to the HDF5 containing the contribution scores
87
- `bpnet.modisco.files.ModiscoFile` - File handle to the HDF5 file produced by TF-MoDISco.
88
  - `bpnet.modisco.core.Pattern` - Object containing the PFM, CWM and optionally the signal footprint
89
  - `bpnet.modisco.core.Seqlet` - Object containing the seqlet coordinates.
90
  - `bpnet.modisco.core.StackedSeqletContrib` - Object containing the sequence, contribution scores and raw data at seqlet locations.
91
- `bpnet.dataspecs.DataSpec` - File handle to the `dataspec.yml` file
92
- `dfi` - Frequently used alias for a pandas `DataFrame` containing motif instance coordinates produced by `bpnet cwm-scan`. See the [colab notebook](https://colab.research.google.com/drive/1VNsNBfugPJfJ02LBgvPwj-gPK0L_djsD) for the column description.
93
94
## Installation
95
96
Supported python version is 3.6. After installing anaconda ([download page](https://www.anaconda.com/download/)) or miniconda ([download page](https://conda.io/miniconda.html)), create a new bpnet environment by executing the following code:
97
98
```bash
99
# Clone this repository
100
git clone git@github.com:kundajelab/bpnet.git
101
cd bpnet
102
103
# create 'bpnet' conda environment
104
conda env create -f conda-env.yml
105
106
# Disable HDF5 file locking to prevent issues with Keras (https://github.com/h5py/h5py/issues/1082)
107
echo 'export HDF5_USE_FILE_LOCKING=FALSE' >> ~/.bashrc
108
109
# Activate the conda environment
110
source activate bpnet
111
```
112
113
Alternatively, you could also start a fresh conda environment by running the following
114
115
```bash
116
conda env create -n bpnet python=3.6
117
source activate bpnet
118
conda install -c bioconda pybedtools bedtools pybigwig pysam genomelake
119
pip install git+https://github.com/kundajelab/DeepExplain.git
120
pip install tensorflow~=1.0 # or tensorflow-gpu if you are using a GPU
121
pip install bpnet
122
echo 'export HDF5_USE_FILE_LOCKING=FALSE' >> ~/.bashrc
123
```
124
125
When using bpnet from the command line, don't forget to activate the `bpnet` conda environment before:
126
127
```bash
128
# activate the bpnet conda environment
129
source activate bpnet
130
131
# run bpnet
132
bpnet <command> ...
133
```
134
135
### (Optional) Install `vmtouch` to use `bpnet train --vmtouch`
136
137
To use the `--vmtouch` in `bpnet train` command and thereby speed-up data-loading, install [vmtouch](https://hoytech.com/vmtouch/). vmtouch is used to load the bigWig files into system memory cache which allows multiple processes to access
138
the bigWigs loaded into memory. 
139
140
Here's how to build and install vmtouch:
141
142
```bash
143
# ~/bin = directory for localy compiled binaries
144
mkdir -p ~/bin
145
cd ~/bin
146
# Clone and build
147
git clone https://github.com/hoytech/vmtouch.git vmtouch_src
148
cd vmtouch_src
149
make
150
# Move the binary to ~/bin
151
cp vmtouch ../
152
# Add ~/bin to $PATH
153
echo 'export PATH=$PATH:~/bin' >> ~/.bashrc
154
```