|
a |
|
b/README.md |
|
|
1 |
# Resources: |
|
|
2 |
|
|
|
3 |
+ README.md: this file. |
|
|
4 |
+ data/davis/folds/test_fold_setting1.txt,train_fold_setting1.txt; data/davis/Y,ligands_can.txt,proteins.txt |
|
|
5 |
data/kiba/folds/test_fold_setting1.txt,train_fold_setting1.txt; data/kiba/Y,ligands_can.txt,proteins.txt |
|
|
6 |
These file were downloaded from https://github.com/hkmztrk/DeepDTA/tree/master/data |
|
|
7 |
|
|
|
8 |
### Source codes: |
|
|
9 |
+ create_data.py: create data in pytorch format |
|
|
10 |
+ utils.py: include TestbedDataset used by create_data.py to create data, and performance measures. |
|
|
11 |
+ training.py: train a GraphDTA model. |
|
|
12 |
+ models/ginconv.py, gat.py, gat_gcn.py, and gcn.py: proposed models GINConvNet, GATNet, GAT_GCN, and GCNNet receiving graphs as input for drugs. |
|
|
13 |
|
|
|
14 |
# Step-by-step running: |
|
|
15 |
|
|
|
16 |
## 0. Install Python libraries needed |
|
|
17 |
+ Install pytorch_geometric following instruction at https://github.com/rusty1s/pytorch_geometric |
|
|
18 |
+ Install rdkit: conda install -y -c conda-forge rdkit |
|
|
19 |
+ Or run the following commands to install both pytorch_geometric and rdkit: |
|
|
20 |
``` |
|
|
21 |
conda create -n geometric python=3 |
|
|
22 |
conda activate geometric |
|
|
23 |
conda install -y -c conda-forge rdkit |
|
|
24 |
conda install pytorch torchvision cudatoolkit -c pytorch |
|
|
25 |
pip install torch-scatter==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html |
|
|
26 |
pip install torch-sparse==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html |
|
|
27 |
pip install torch-cluster==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html |
|
|
28 |
pip install torch-spline-conv==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html |
|
|
29 |
pip install torch-geometric |
|
|
30 |
|
|
|
31 |
``` |
|
|
32 |
|
|
|
33 |
## 1. Create data in pytorch format |
|
|
34 |
Running |
|
|
35 |
```sh |
|
|
36 |
conda activate geometric |
|
|
37 |
python create_data.py |
|
|
38 |
``` |
|
|
39 |
This returns kiba_train.csv, kiba_test.csv, davis_train.csv, and davis_test.csv, saved in data/ folder. These files are in turn input to create data in pytorch format, |
|
|
40 |
stored at data/processed/, consisting of kiba_train.pt, kiba_test.pt, davis_train.pt, and davis_test.pt. |
|
|
41 |
|
|
|
42 |
## 2. Train a prediction model |
|
|
43 |
To train a model using training data. The model is chosen if it gains the best MSE for testing data. |
|
|
44 |
Running |
|
|
45 |
|
|
|
46 |
```sh |
|
|
47 |
conda activate geometric |
|
|
48 |
python training.py 0 0 0 |
|
|
49 |
``` |
|
|
50 |
|
|
|
51 |
where the first argument is for the index of the datasets, 0/1 for 'davis' or 'kiba', respectively; |
|
|
52 |
the second argument is for the index of the models, 0/1/2/3 for GINConvNet, GATNet, GAT_GCN, or GCNNet, respectively; |
|
|
53 |
and the third argument is for the index of the cuda, 0/1 for 'cuda:0' or 'cuda:1', respectively. |
|
|
54 |
Note that your actual CUDA name may vary from these, so please change the following code accordingly: |
|
|
55 |
```sh |
|
|
56 |
cuda_name = "cuda:0" |
|
|
57 |
if len(sys.argv)>3: |
|
|
58 |
cuda_name = "cuda:" + str(int(sys.argv[3])) |
|
|
59 |
``` |
|
|
60 |
|
|
|
61 |
This returns the model and result files for the modelling achieving the best MSE for testing data throughout the training. |
|
|
62 |
For example, it returns two files model_GATNet_davis.model and result_GATNet_davis.csv when running GATNet on Davis data. |
|
|
63 |
|
|
|
64 |
## 3. Train a prediction model with validation |
|
|
65 |
|
|
|
66 |
In "3. Train a prediction model", a model is trained on training data and chosen when it gains the best MSE for testing data. |
|
|
67 |
This follows how a model was chosen in https://github.com/hkmztrk/DeepDTA. The result by two ways of training is comparable though. |
|
|
68 |
|
|
|
69 |
In this section, a model is trained on 80% of training data and chosen if it gains the best MSE for validation data, |
|
|
70 |
which is 20% of training data. Then the model is used to predict affinity for testing data. |
|
|
71 |
|
|
|
72 |
Same arguments as in "3. Train a prediction model" are used. E.g., running |
|
|
73 |
|
|
|
74 |
```sh |
|
|
75 |
python training_validation.py 0 0 0 |
|
|
76 |
``` |
|
|
77 |
|
|
|
78 |
This returns the model achieving the best MSE for validation data throughout the training and performance results of the model on testing data. |
|
|
79 |
For example, it returns two files model_GATNet_davis.model and result_GATNet_davis.csv when running GATNet on Davis data. |