Diff of /README.md [000000] .. [b623ff]

Switch to unified view

a b/README.md
1
# DeepDrug3D
2
3
DeepDrug3D is a tool to predict the protein pocket to be ATP/Heme/other-binding given the binding residue numbers and the protein structure.
4
5
If you find this tool useful, please star this repo and cite our paper :)
6
7
Pu L, Govindaraj RG, Lemoine JM, Wu HC, Brylinski M (2019) DeepDrug3D: Classification of ligand-binding pockets in proteins with a convolutional neural network. PLOS Computational Biology 15(2): e1006718. https://doi.org/10.1371/journal.pcbi.1006718
8
9
This README file is written by Limeng Pu.
10
11
<p align="center">
12
    <img width="400" height="400" src="./image/1a2sA.png">
13
</p>
14
15
An example of binding grid generated, pdb ID: 1a2sA, atom type: C.ar. Red --> low potentials while Blue --> high potentials.
16
17
# Change Log
18
19
**This is a newer version of the implmentation. Since many people are interested in visualize the output from the grid generation like in the image above, I've decided to seperate the data-generation module and the training/prediction module. Another reason for this iteration of implementation is the dligand-linux used for potential calculation requires 32-bit Linux while <em>Pytorch</em> requires 64-bit Linux. It causes confiliction and resulting in different errors depending on the order you install them. Also the deep learning library used has been changed from <em>Keras</em> to <em>Pytorch</em>**.
20
21
# Prerequisites
22
23
1. System requirement: Linux (DFIRE potential calculation only runs on Linux. Tested on <em>Red Hat Enterprise Linux 6</em>)
24
2. The data-generation module dependencies are provided in `./DataGeneration/environment.yml`. Please change line 9 in the file according to your system. To install all the dependencies run `conda env create -f environment.yml`.
25
3. The learning module requires <em>Pytorch</em>. To install it, refer to https://pytorch.org/get-started/locally.
26
27
# Usage
28
29
The package provides data-generation, prediction, and training modules.
30
31
1. Data generation
32
33
This module generates data for training/prediction while providing intermediate results for visualization. All files are under `./DataGeneration`. The DFIRE potential calculation uses the module (`./DataGeneration/dligand-linux`) described in `A Knowledge-Based Energy Function for Protein−Ligand, Protein−Protein, and Protein−DNA Complexes by Zhang et al.` since it is written in Fortran, which is faster than our own implementation in Python.
34
35
To generate the binding grid data, run 
36
37
<pre><code>python voxelization.py --f example.pdb --a example_aux.txt --o results --r 15 --n 31 --p</code></pre>
38
39
  - `--f` input pdb file path.
40
  - `--a` input auxilary file path, with binding residue numbers and center of ligand (optional). An example of the auxilary file is provided in `example_aux.txt`.
41
  - `--r` the radius of the spherical grid.
42
  - `--n` the number of points along the dimension of the spherical grid.
43
  - `--o` output folder path.
44
  - `--p` or `--s` whether to calculate the potential nor not. If not, only the binary occupied grid will be returne, i.e., the shape of the grid only. Default, yes (`--p`).
45
46
Several files will be saved, including `example_transformed.pdb` (coordinate-transformed pdb file), `example_transformed.mol2` (coordinate-transformed mol2 file for the calculation of DFIRE potential), `example.grid` (grid representation of the binding pocket grid for visualization), and `example.h5` (numpy array of the voxel representation).
47
48
To visualize the output binidng pocket grid, run 
49
50
<pre><code>python visualization --i example.grid --c 0</code></pre>
51
52
  - `--i` input binding pocket grid file path.
53
  - `--c` channel to visualize. Note that if you pass `--s` in the previous step, the channel number `--c` has to be 0.
54
  
55
An output `example_grid.pdb` will be generated for visualization. Note this pocket grid matches the transformed protein `example_transformed.pdb`.
56
57
2. Prediction
58
59
This module classifies the target binding pocket to be either an ATP-, Heme-, or other-type pocket, which basically means which type of ligand it tends to binding to. The trained model is available at `https://osf.io/enz69/`. All files are under `./Learning`.
60
61
To use the prediction module, run 
62
63
<pre><code>python predict.py --f example.h5 --m path_to_the_trianed_model</code></pre>
64
65
66
  - `--f` input h5 file path.
67
  - `--m` path to the trained model weights.
68
  
69
The output would be something like 
70
71
<pre><code>The probability of pocket provided binds with ATP ligands: 0.3000
72
The probability of pocket provided binds with Heme ligands: 0.2000
73
The probability of pocket provided binds with other ligands: 0.5000
74
</code></pre>
75
 
76
3. Training
77
78
In order to use our model to train your own dataset, you have to convert your dataset, which will be pdbs to voxel representation of protein-ligand biniding grid representation. The data conversion procedure has been descibed before. The module runs a random 5-fold cross validation. All the related results including loss, accuracy and model weights will be saved. All files are under `./Learning`.
79
80
The trainig module can be runned as 
81
82
<pre><code>python train.py --path path_to_your_data_folder --lpath path_to_your_label_file --bs batch_size --lr inital_learning_rate --epoch number_of_epoches --opath output_folder_path</code></pre>
83
84
  - `--path` path to the folder contains all the voxel data.
85
  - `--lpath` label file path. The file should be a comma separated file with no header. The first column is the filename and the second column is the class (starts from 0). An example has been provided `./Learning/labels`.
86
  - `--bs`, `--lr`, `--epoch` is the hyperparameters related to the model. Recommanded values are 64, 1e-5, 50.
87
  - `--opath` If no output location is provided, a `logs` folder will be created under current working directory to store everything.
88
  
89
# Dataset
90
91
We provided our dataset we used for the training at https://osf.io/enz69/, which are the voxel representations of ATP, Heme, and other along with the class label file.