|
a/README.md |
|
b/README.md |
1 |
# DeepDrug3D |
1 |
# DeepDrug3D |
2 |
|
2 |
|
3 |
DeepDrug3D is a tool to predict the protein pocket to be ATP/Heme/other-binding given the binding residue numbers and the protein structure. |
3 |
DeepDrug3D is a tool to predict the protein pocket to be ATP/Heme/other-binding given the binding residue numbers and the protein structure. |
4 |
|
4 |
|
5 |
If you find this tool useful, please star this repo and cite our paper :) |
5 |
If you find this tool useful, please star this repo and cite our paper :) |
6 |
|
6 |
|
7 |
Pu L, Govindaraj RG, Lemoine JM, Wu HC, Brylinski M (2019) DeepDrug3D: Classification of ligand-binding pockets in proteins with a convolutional neural network. PLOS Computational Biology 15(2): e1006718. https://doi.org/10.1371/journal.pcbi.1006718 |
7 |
Pu L, Govindaraj RG, Lemoine JM, Wu HC, Brylinski M (2019) DeepDrug3D: Classification of ligand-binding pockets in proteins with a convolutional neural network. PLOS Computational Biology 15(2): e1006718. https://doi.org/10.1371/journal.pcbi.1006718 |
8 |
|
8 |
|
9 |
This README file is written by Limeng Pu. |
9 |
This README file is written by Limeng Pu. |
10 |
|
10 |
|
11 |
<p align="center"> |
11 |
<p align="center">
|
12 |
<img width="400" height="400" src="./image/1a2sA.png"> |
12 |
<img width="400" height="400" src="https://github.com/pulimeng/DeepDrug3D/blob/master/image/1a2sA.png?raw=true">
|
13 |
</p> |
13 |
</p> |
14 |
|
14 |
|
15 |
An example of binding grid generated, pdb ID: 1a2sA, atom type: C.ar. Red --> low potentials while Blue --> high potentials. |
15 |
An example of binding grid generated, pdb ID: 1a2sA, atom type: C.ar. Red --> low potentials while Blue --> high potentials. |
16 |
|
16 |
|
17 |
# Change Log |
17 |
# Change Log |
18 |
|
18 |
|
19 |
**This is a newer version of the implmentation. Since many people are interested in visualize the output from the grid generation like in the image above, I've decided to seperate the data-generation module and the training/prediction module. Another reason for this iteration of implementation is the dligand-linux used for potential calculation requires 32-bit Linux while <em>Pytorch</em> requires 64-bit Linux. It causes confiliction and resulting in different errors depending on the order you install them. Also the deep learning library used has been changed from <em>Keras</em> to <em>Pytorch</em>**. |
19 |
**This is a newer version of the implmentation. Since many people are interested in visualize the output from the grid generation like in the image above, I've decided to seperate the data-generation module and the training/prediction module. Another reason for this iteration of implementation is the dligand-linux used for potential calculation requires 32-bit Linux while <em>Pytorch</em> requires 64-bit Linux. It causes confiliction and resulting in different errors depending on the order you install them. Also the deep learning library used has been changed from <em>Keras</em> to <em>Pytorch</em>**. |
20 |
|
20 |
|
21 |
# Prerequisites |
21 |
# Prerequisites |
22 |
|
22 |
|
23 |
1. System requirement: Linux (DFIRE potential calculation only runs on Linux. Tested on <em>Red Hat Enterprise Linux 6</em>) |
23 |
1. System requirement: Linux (DFIRE potential calculation only runs on Linux. Tested on <em>Red Hat Enterprise Linux 6</em>)
|
24 |
2. The data-generation module dependencies are provided in `./DataGeneration/environment.yml`. Please change line 9 in the file according to your system. To install all the dependencies run `conda env create -f environment.yml`. |
24 |
2. The data-generation module dependencies are provided in `./DataGeneration/environment.yml`. Please change line 9 in the file according to your system. To install all the dependencies run `conda env create -f environment.yml`.
|
25 |
3. The learning module requires <em>Pytorch</em>. To install it, refer to https://pytorch.org/get-started/locally. |
25 |
3. The learning module requires <em>Pytorch</em>. To install it, refer to https://pytorch.org/get-started/locally. |
26 |
|
26 |
|
27 |
# Usage |
27 |
# Usage |
28 |
|
28 |
|
29 |
The package provides data-generation, prediction, and training modules. |
29 |
The package provides data-generation, prediction, and training modules. |
30 |
|
30 |
|
31 |
1. Data generation |
31 |
1. Data generation |
32 |
|
32 |
|
33 |
This module generates data for training/prediction while providing intermediate results for visualization. All files are under `./DataGeneration`. The DFIRE potential calculation uses the module (`./DataGeneration/dligand-linux`) described in `A Knowledge-Based Energy Function for Protein−Ligand, Protein−Protein, and Protein−DNA Complexes by Zhang et al.` since it is written in Fortran, which is faster than our own implementation in Python. |
33 |
This module generates data for training/prediction while providing intermediate results for visualization. All files are under `./DataGeneration`. The DFIRE potential calculation uses the module (`./DataGeneration/dligand-linux`) described in `A Knowledge-Based Energy Function for Protein−Ligand, Protein−Protein, and Protein−DNA Complexes by Zhang et al.` since it is written in Fortran, which is faster than our own implementation in Python. |
34 |
|
34 |
|
35 |
To generate the binding grid data, run |
35 |
To generate the binding grid data, run |
36 |
|
36 |
|
37 |
<pre><code>python voxelization.py --f example.pdb --a example_aux.txt --o results --r 15 --n 31 --p</code></pre> |
37 |
<pre><code>python voxelization.py --f example.pdb --a example_aux.txt --o results --r 15 --n 31 --p</code></pre> |
38 |
|
38 |
|
39 |
- `--f` input pdb file path. |
39 |
- `--f` input pdb file path.
|
40 |
- `--a` input auxilary file path, with binding residue numbers and center of ligand (optional). An example of the auxilary file is provided in `example_aux.txt`. |
40 |
- `--a` input auxilary file path, with binding residue numbers and center of ligand (optional). An example of the auxilary file is provided in `example_aux.txt`.
|
41 |
- `--r` the radius of the spherical grid. |
41 |
- `--r` the radius of the spherical grid.
|
42 |
- `--n` the number of points along the dimension of the spherical grid. |
42 |
- `--n` the number of points along the dimension of the spherical grid.
|
43 |
- `--o` output folder path. |
43 |
- `--o` output folder path.
|
44 |
- `--p` or `--s` whether to calculate the potential nor not. If not, only the binary occupied grid will be returne, i.e., the shape of the grid only. Default, yes (`--p`). |
44 |
- `--p` or `--s` whether to calculate the potential nor not. If not, only the binary occupied grid will be returne, i.e., the shape of the grid only. Default, yes (`--p`). |
45 |
|
45 |
|
46 |
Several files will be saved, including `example_transformed.pdb` (coordinate-transformed pdb file), `example_transformed.mol2` (coordinate-transformed mol2 file for the calculation of DFIRE potential), `example.grid` (grid representation of the binding pocket grid for visualization), and `example.h5` (numpy array of the voxel representation). |
46 |
Several files will be saved, including `example_transformed.pdb` (coordinate-transformed pdb file), `example_transformed.mol2` (coordinate-transformed mol2 file for the calculation of DFIRE potential), `example.grid` (grid representation of the binding pocket grid for visualization), and `example.h5` (numpy array of the voxel representation). |
47 |
|
47 |
|
48 |
To visualize the output binidng pocket grid, run |
48 |
To visualize the output binidng pocket grid, run |
49 |
|
49 |
|
50 |
<pre><code>python visualization --i example.grid --c 0</code></pre> |
50 |
<pre><code>python visualization --i example.grid --c 0</code></pre> |
51 |
|
51 |
|
52 |
- `--i` input binding pocket grid file path. |
52 |
- `--i` input binding pocket grid file path.
|
53 |
- `--c` channel to visualize. Note that if you pass `--s` in the previous step, the channel number `--c` has to be 0. |
53 |
- `--c` channel to visualize. Note that if you pass `--s` in the previous step, the channel number `--c` has to be 0.
|
54 |
|
54 |
|
55 |
An output `example_grid.pdb` will be generated for visualization. Note this pocket grid matches the transformed protein `example_transformed.pdb`. |
55 |
An output `example_grid.pdb` will be generated for visualization. Note this pocket grid matches the transformed protein `example_transformed.pdb`. |
56 |
|
56 |
|
57 |
2. Prediction |
57 |
2. Prediction |
58 |
|
58 |
|
59 |
This module classifies the target binding pocket to be either an ATP-, Heme-, or other-type pocket, which basically means which type of ligand it tends to binding to. The trained model is available at `https://osf.io/enz69/`. All files are under `./Learning`. |
59 |
This module classifies the target binding pocket to be either an ATP-, Heme-, or other-type pocket, which basically means which type of ligand it tends to binding to. The trained model is available at `https://osf.io/enz69/`. All files are under `./Learning`. |
60 |
|
60 |
|
61 |
To use the prediction module, run |
61 |
To use the prediction module, run |
62 |
|
62 |
|
63 |
<pre><code>python predict.py --f example.h5 --m path_to_the_trianed_model</code></pre> |
63 |
<pre><code>python predict.py --f example.h5 --m path_to_the_trianed_model</code></pre> |
64 |
|
64 |
|
65 |
|
65 |
|
66 |
- `--f` input h5 file path. |
66 |
- `--f` input h5 file path.
|
67 |
- `--m` path to the trained model weights. |
67 |
- `--m` path to the trained model weights.
|
68 |
|
68 |
|
69 |
The output would be something like |
69 |
The output would be something like |
70 |
|
70 |
|
71 |
<pre><code>The probability of pocket provided binds with ATP ligands: 0.3000 |
71 |
<pre><code>The probability of pocket provided binds with ATP ligands: 0.3000
|
72 |
The probability of pocket provided binds with Heme ligands: 0.2000 |
72 |
The probability of pocket provided binds with Heme ligands: 0.2000
|
73 |
The probability of pocket provided binds with other ligands: 0.5000 |
73 |
The probability of pocket provided binds with other ligands: 0.5000
|
74 |
</code></pre> |
74 |
</code></pre>
|
75 |
|
75 |
|
76 |
3. Training |
76 |
3. Training |
77 |
|
77 |
|
78 |
In order to use our model to train your own dataset, you have to convert your dataset, which will be pdbs to voxel representation of protein-ligand biniding grid representation. The data conversion procedure has been descibed before. The module runs a random 5-fold cross validation. All the related results including loss, accuracy and model weights will be saved. All files are under `./Learning`. |
78 |
In order to use our model to train your own dataset, you have to convert your dataset, which will be pdbs to voxel representation of protein-ligand biniding grid representation. The data conversion procedure has been descibed before. The module runs a random 5-fold cross validation. All the related results including loss, accuracy and model weights will be saved. All files are under `./Learning`. |
79 |
|
79 |
|
80 |
The trainig module can be runned as |
80 |
The trainig module can be runned as |
81 |
|
81 |
|
82 |
<pre><code>python train.py --path path_to_your_data_folder --lpath path_to_your_label_file --bs batch_size --lr inital_learning_rate --epoch number_of_epoches --opath output_folder_path</code></pre> |
82 |
<pre><code>python train.py --path path_to_your_data_folder --lpath path_to_your_label_file --bs batch_size --lr inital_learning_rate --epoch number_of_epoches --opath output_folder_path</code></pre> |
83 |
|
83 |
|
84 |
- `--path` path to the folder contains all the voxel data. |
84 |
- `--path` path to the folder contains all the voxel data.
|
85 |
- `--lpath` label file path. The file should be a comma separated file with no header. The first column is the filename and the second column is the class (starts from 0). An example has been provided `./Learning/labels`. |
85 |
- `--lpath` label file path. The file should be a comma separated file with no header. The first column is the filename and the second column is the class (starts from 0). An example has been provided `./Learning/labels`.
|
86 |
- `--bs`, `--lr`, `--epoch` is the hyperparameters related to the model. Recommanded values are 64, 1e-5, 50. |
86 |
- `--bs`, `--lr`, `--epoch` is the hyperparameters related to the model. Recommanded values are 64, 1e-5, 50.
|
87 |
- `--opath` If no output location is provided, a `logs` folder will be created under current working directory to store everything. |
87 |
- `--opath` If no output location is provided, a `logs` folder will be created under current working directory to store everything.
|
88 |
|
88 |
|
89 |
# Dataset |
89 |
# Dataset |
90 |
|
90 |
|
91 |
We provided our dataset we used for the training at https://osf.io/enz69/, which are the voxel representations of ATP, Heme, and other along with the class label file. |
91 |
We provided our dataset we used for the training at https://osf.io/enz69/, which are the voxel representations of ATP, Heme, and other along with the class label file.
|