Here is the modified version of DeepDTA that enables the use of your own training and/or test datasets.
These are two sample datasets that I used as an example. DTC is used as training set and mytest folder contains three example files that your test data should be formatted as.
py
#prepare_new_data(FLAGS.train_path, test=False) #Uncomment this if you also have a new training data
mytest:
Y.tab: tab-seperated binding affinity file (drugs x proteins matrix).
The number of rows corresponds to the number of drugs and the number of columns is equal to the number of proteins. This can be all 0s if one wants to predict binding affinity values for the unknown data. Or you can simply use the known affinity values for each drug-protein pair in which unknown interactions are indicated as 'nan'.
Example Y for predicting unknown protein-drug interactions
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Example Y for predicting known + unknown protein-drug interactions
8.1 2 12 nan 15 5
4 4.3 5 14 nan nan
nan 2.2 5 8 12 nan
ligands.tab: each line contains tab-seperated ligand ID and corresponding SMILES.
You can modify these according to your own data.
You'll need to install following in order to run the codes.
You have to place "data" folder under "source" directory.
python run_experiments.py --num_windows 32 \
--seq_window_lengths 8 12 \
--smi_window_lengths 4 8 \
--batch_size 256 \
--num_epoch 100 \
--max_seq_len 1000 \
--max_smi_len 100 \
--train_path 'data/DTC/' \
--test_path 'data/mytest/' \
--problem_type 1 \
--isLog 0 \
--log_dir 'logs/'