a b/README.md
1
# Resources:
2
3
+ README.md: this file.
4
+ data/davis/folds/test_fold_setting1.txt,train_fold_setting1.txt; data/davis/Y,ligands_can.txt,proteins.txt
5
  data/kiba/folds/test_fold_setting1.txt,train_fold_setting1.txt; data/kiba/Y,ligands_can.txt,proteins.txt
6
  These file were downloaded from https://github.com/hkmztrk/DeepDTA/tree/master/data
7
8
###  Source codes:
9
+ create_data.py: create data in pytorch format
10
+ utils.py: include TestbedDataset used by create_data.py to create data, and performance measures.
11
+ training.py: train a GraphDTA model.
12
+ models/ginconv.py, gat.py, gat_gcn.py, and gcn.py: proposed models GINConvNet, GATNet, GAT_GCN, and GCNNet receiving graphs as input for drugs.
13
14
# Step-by-step running:
15
16
## 0. Install Python libraries needed
17
+ Install pytorch_geometric following instruction at https://github.com/rusty1s/pytorch_geometric
18
+ Install rdkit: conda install -y -c conda-forge rdkit
19
+ Or run the following commands to install both pytorch_geometric and rdkit:
20
```
21
conda create -n geometric python=3
22
conda activate geometric
23
conda install -y -c conda-forge rdkit
24
conda install pytorch torchvision cudatoolkit -c pytorch
25
pip install torch-scatter==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
26
pip install torch-sparse==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
27
pip install torch-cluster==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
28
pip install torch-spline-conv==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
29
pip install torch-geometric
30
31
```
32
33
## 1. Create data in pytorch format
34
Running
35
```sh
36
conda activate geometric
37
python create_data.py
38
```
39
This returns kiba_train.csv, kiba_test.csv, davis_train.csv, and davis_test.csv, saved in data/ folder. These files are in turn input to create data in pytorch format,
40
stored at data/processed/, consisting of  kiba_train.pt, kiba_test.pt, davis_train.pt, and davis_test.pt.
41
42
## 2. Train a prediction model
43
To train a model using training data. The model is chosen if it gains the best MSE for testing data.  
44
Running 
45
46
```sh
47
conda activate geometric
48
python training.py 0 0 0
49
```
50
51
where the first argument is for the index of the datasets, 0/1 for 'davis' or 'kiba', respectively;
52
 the second argument is for the index of the models, 0/1/2/3 for GINConvNet, GATNet, GAT_GCN, or GCNNet, respectively;
53
 and the third argument is for the index of the cuda, 0/1 for 'cuda:0' or 'cuda:1', respectively. 
54
 Note that your actual CUDA name may vary from these, so please change the following code accordingly:
55
```sh
56
cuda_name = "cuda:0"
57
if len(sys.argv)>3:
58
    cuda_name = "cuda:" + str(int(sys.argv[3])) 
59
```
60
61
This returns the model and result files for the modelling achieving the best MSE for testing data throughout the training.
62
For example, it returns two files model_GATNet_davis.model and result_GATNet_davis.csv when running GATNet on Davis data.
63
64
## 3. Train a prediction model with validation 
65
66
In "3. Train a prediction model", a model is trained on training data and chosen when it gains the best MSE for testing data.
67
This follows how a model was chosen in https://github.com/hkmztrk/DeepDTA. The result by two ways of training is comparable though.
68
69
In this section, a model is trained on 80% of training data and chosen if it gains the best MSE for validation data, 
70
which is 20% of training data. Then the model is used to predict affinity for testing data.
71
72
Same arguments as in "3. Train a prediction model" are used. E.g., running 
73
74
```sh
75
python training_validation.py 0 0 0
76
```
77
78
This returns the model achieving the best MSE for validation data throughout the training and performance results of the model on testing data.
79
For example, it returns two files model_GATNet_davis.model and result_GATNet_davis.csv when running GATNet on Davis data.