Diff of /README.md [000000] .. [0a9449]

Switch to unified view

a b/README.md
1
<p align="center"><img src="figs/logo_deeppurpose_horizontal.png" alt="logo" width="400px" /></p>
2
3
4
<h3 align="center">
5
<p> A Deep Learning Library for Compound and Protein Modeling <br>DTI, Drug Property, PPI, DDI, Protein Function Prediction<br></h3>
6
<h4 align="center">
7
<p> Applications in Drug Repurposing, Virtual Screening, QSAR, Side Effect Prediction and More </h4>
8
9
---
10
11
[![PyPI version](https://badge.fury.io/py/DeepPurpose.svg)](https://pypi.org/project/DeepPurpose/)
12
[![Downloads](https://pepy.tech/badge/deeppurpose/month)](https://pepy.tech/project/deeppurpose)
13
[![Downloads](https://pepy.tech/badge/deeppurpose)](https://pepy.tech/project/deeppurpose)
14
[![GitHub Repo stars](https://img.shields.io/github/stars/kexinhuang12345/DeepPurpose)](https://github.com/kexinhuang12345/DeepPurpose/stargazers)
15
[![GitHub Repo forks](https://img.shields.io/github/forks/kexinhuang12345/DeepPurpose)](https://github.com/kexinhuang12345/DeepPurpose/network/members)
16
17
This repository hosts DeepPurpose, a Deep Learning Based Molecular Modeling and Prediction Toolkit on Drug-Target Interaction Prediction, Compound Property Prediction, Protein-Protein Interaction Prediction, and Protein Function prediction (using PyTorch). We focus on DTI and its applications in Drug Repurposing and Virtual Screening, but support various other molecular encoding tasks. It allows very easy usage (several lines of codes only) to facilitate deep learning for life science research. 
18
19
### News!
20
- [05/21] `0.1.2` Support 5 new graph neural network based models for compound encoding (DGL_GCN, DGL_NeuralFP, DGL_GIN_AttrMasking, DGL_GIN_ContextPred, DGL_AttentiveFP), implemented using [DGL Life Science](https://github.com/awslabs/dgl-lifesci)! An example is provided [here](DEMO/GNN_Models_Release_Example.ipynb)!
21
- [12/20] DeepPurpose is now supported by TDC data loader, which contains a large collection of ML for therapeutics datasets, including many drug property, DTI datasets. Here is a [tutorial](https://github.com/mims-harvard/TDC/blob/master/tutorials/TDC_104_ML_Model_DeepPurpose.ipynb)!
22
- [12/20] DeepPurpose can now be installed via `pip`!
23
- [11/20] DeepPurpose is published in [Bioinformatics](https://doi.org/10.1093/bioinformatics/btaa1005)!
24
- [11/20] Added 5 more pretrained models on BindingDB IC50 Units (around 1Million data points). 
25
- [10/20] Google Colab Installation Instructions are provided [here](https://colab.research.google.com/drive/1eF60BwGX6PnB91vpx5dRxFa72e6-MYuZ?usp=sharing). Thanks to @hima111997 ! 
26
- [10/20] Using DeepPurpose, we made a humans-in-the-loop molecular design web UI interface, check it out! \[[Website](http://deeppurpose.sunlab.org/), [paper](https://arxiv.org/abs/2010.03951)\]
27
- [09/20] DeepPurpose has now supported three more tasks: DDI, PPI and Protein Function Prediction! You can simply call `from DeepPurpose import DDI/PPI/ProteinPred` to use, checkout examples below!
28
- [07/20] A simple web UI for DTI prediction can be created under 10 lines using [Gradio](https://github.com/gradio-app/gradio)! A demo is provided [here](https://github.com/kexinhuang12345/DeepPurpose/blob/master/DEMO/web_ui_gradio.ipynb).
29
- [07/20] A [blog](https://towardsdatascience.com/drug-discovery-with-deep-learning-under-10-lines-of-codes-742ee306732a) is posted on the Towards Data Science Medium column, check this out!
30
- [07/20] Two tutorials are online to go through DeepPurpose's framework to do drug-target interaction prediction and drug property prediction ([DTI](Tutorial_1_DTI_Prediction.ipynb), [Drug Property](Tutorial_2_Drug_Property_Pred_Assay_Data.ipynb)). 
31
- [05/20] Support drug property prediction for screening data that does not have target proteins such as bacteria! An example using RDKit2D with DNN for training and repurposing for pseudomonas aeruginosa (MIT AI Cures's [open task](https://www.aicures.mit.edu/data)) is provided as a [demo](DEMO/Drug_Property_Prediction_Bacterial_Activity-RDKit2D_MIT_AiCures.ipynb).
32
- [05/20] Now supports hyperparameter tuning via Bayesian Optimization through the [Ax platform](https://ax.dev/)! A demo is provided in [here](DEMO/Drug_Property_Pred-Ax-Hyperparam-Tune.ipynb). 
33
34
### Features
35
36
- 15+ powerful encodings for drugs and proteins, ranging from deep neural network on classic cheminformatics fingerprints, CNN, transformers to message passing graph neural network, with 50+ combined models! Most of the combinations of the encodings are not yet in existing works. All of these under 10 lines but with lots of flexibility! Switching encoding is as simple as changing the encoding names!
37
38
- Realistic and user-friendly design: 
39
    - support DTI, DDI, PPI, molecular property prediction, protein function predictions!
40
    - automatic identification to do drug target binding affinity (regression) or drug target interaction prediction (binary) task.
41
    - support cold target, cold drug settings for robust model evaluations and support single-target high throughput sequencing assay data setup.
42
    - many dataset loading/downloading/unzipping scripts to ease the tedious preprocessing, including antiviral, COVID19 targets, BindingDB, DAVIS, KIBA, ...
43
    - many pretrained checkpoints.
44
    - easy monitoring of training process with detailed training metrics output such as test set figures (AUCs) and tables, also support early stopping.
45
    - detailed output records such as rank list for repurposing result.
46
    - various evaluation metrics: ROC-AUC, PR-AUC, F1 for binary task, MSE, R-squared, Concordance Index for regression task.
47
    - label unit conversion for skewed label distribution such as Kd.
48
    - time reference for computational expensive encoding.
49
    - PyTorch based, support CPU, GPU, Multi-GPUs.
50
    
51
*NOTE: We are actively looking for constructive advices/user feedbacks/experiences on using DeepPurpose! Please open an issue or [contact us](mailto:kexinhuang@hsph.harvard.edu).*
52
53
54
## Cite Us
55
56
If you found this package useful, please cite [our paper](https://doi.org/10.1093/bioinformatics/btaa1005):
57
```
58
@article{huang2020deeppurpose,
59
  title={DeepPurpose: A Deep Learning Library for Drug-Target Interaction Prediction},
60
  author={Huang, Kexin and Fu, Tianfan and Glass, Lucas M and Zitnik, Marinka and Xiao, Cao and Sun, Jimeng},
61
  journal={Bioinformatics},
62
  year={2020}
63
}
64
```
65
66
## Installation
67
Try it on [Binder](https://mybinder.org)! Binder is a cloud Jupyter Notebook interface that will install our environment dependency for you. 
68
69
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/kexinhuang12345/DeepPurpose/master)
70
71
[Video tutorial](https://www.youtube.com/watch?v=ghUyZknxq5o) to install Binder.
72
73
We recommend to install it locally since Binder needs to be refreshed every time launching. To install locally, we recommend to install from `pip`:
74
75
### `pip`
76
77
```bash
78
conda create -n DeepPurpose python=3.6
79
conda activate DeepPurpose
80
conda install -c conda-forge notebook
81
pip install git+https://github.com/bp-kelley/descriptastorus 
82
pip install DeepPurpose
83
```
84
85
### Build from Source
86
87
First time:
88
```bash
89
git clone https://github.com/kexinhuang12345/DeepPurpose.git ## Download code repository
90
cd DeepPurpose ## Change directory to DeepPurpose
91
conda env create -f environment.yml  ## Build virtual environment with all packages installed using conda
92
conda activate DeepPurpose ## Activate conda environment (use "source activate DeepPurpose" for anaconda 4.4 or earlier) 
93
jupyter notebook ## open the jupyter notebook with the conda env
94
95
## run our code, e.g. click a file in the DEMO folder
96
... ...
97
98
conda deactivate ## when done, exit conda environment 
99
```
100
101
In the future:
102
```bash
103
cd DeepPurpose ## Change directory to DeepPurpose
104
conda activate DeepPurpose ## Activate conda environment
105
jupyter notebook ## open the jupyter notebook with the conda env
106
107
## run our code, e.g. click a file in the DEMO folder
108
... ...
109
110
conda deactivate ## when done, exit conda environment 
111
```
112
113
[Video tutorial](https://youtu.be/bqinehjnWvE) to install locally from source.
114
115
116
## Example
117
118
### Case Study 1(a): A Framework for Drug Target Interaction Prediction, with less than 10 lines of codes.
119
In addition to the DTI prediction, we also provide repurpose and virtual screening functions to rapidly generation predictions.
120
121
<details>
122
  <summary>Click here for the code!</summary>
123
124
```python
125
from DeepPurpose import DTI as models
126
from DeepPurpose.utils import *
127
from DeepPurpose.dataset import *
128
129
SAVE_PATH='./saved_path'
130
import os 
131
if not os.path.exists(SAVE_PATH):
132
  os.makedirs(SAVE_PATH)
133
134
135
# Load Data, an array of SMILES for drug, an array of Amino Acid Sequence for Target and an array of binding values/0-1 label.
136
# e.g. ['Cc1ccc(CNS(=O)(=O)c2ccc(s2)S(N)(=O)=O)cc1', ...], ['MSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTH...', ...], [0.46, 0.49, ...]
137
# In this example, BindingDB with Kd binding score is used.
138
X_drug, X_target, y  = process_BindingDB(download_BindingDB(SAVE_PATH),
139
                     y = 'Kd', 
140
                     binary = False, 
141
                     convert_to_log = True)
142
143
# Type in the encoding names for drug/protein.
144
drug_encoding, target_encoding = 'CNN', 'Transformer'
145
146
# Data processing, here we select cold protein split setup.
147
train, val, test = data_process(X_drug, X_target, y, 
148
                                drug_encoding, target_encoding, 
149
                                split_method='cold_protein', 
150
                                frac=[0.7,0.1,0.2])
151
152
# Generate new model using default parameters; also allow model tuning via input parameters.
153
config = generate_config(drug_encoding, target_encoding, transformer_n_layer_target = 8)
154
net = models.model_initialize(**config)
155
156
# Train the new model.
157
# Detailed output including a tidy table storing validation loss, metrics, AUC curves figures and etc. are stored in the ./result folder.
158
net.train(train, val, test)
159
160
# or simply load pretrained model from a model directory path or reproduced model name such as DeepDTA
161
net = models.model_pretrained(MODEL_PATH_DIR or MODEL_NAME)
162
163
# Repurpose using the trained model or pre-trained model
164
# In this example, loading repurposing dataset using Broad Repurposing Hub and SARS-CoV 3CL Protease Target.
165
X_repurpose, drug_name, drug_cid = load_broad_repurposing_hub(SAVE_PATH)
166
target, target_name = load_SARS_CoV_Protease_3CL()
167
168
_ = models.repurpose(X_repurpose, target, net, drug_name, target_name)
169
170
# Virtual screening using the trained model or pre-trained model 
171
X_repurpose, drug_name, target, target_name = ['CCCCCCCOc1cccc(c1)C([O-])=O', ...], ['16007391', ...], ['MLARRKPVLPALTINPTIAEGPSPTSEGASEANLVDLQKKLEEL...', ...], ['P36896', 'P00374']
172
173
_ = models.virtual_screening(X_repurpose, target, net, drug_name, target_name)
174
175
```
176
177
</details>
178
179
180
### Case Study 1(b): A Framework for Drug Property Prediction, with less than 10 lines of codes.
181
Many dataset is in the form of high throughput screening data, which have only drug and its activity score. It can be formulated as a drug property prediction task. We also provide a repurpose function to predict over large space of drugs. 
182
183
<details>
184
  <summary>Click here for the code!</summary>
185
186
```python
187
from DeepPurpose import CompoundPred as models
188
from DeepPurpose.utils import *
189
from DeepPurpose.dataset import *
190
191
192
SAVE_PATH='./saved_path'
193
import os 
194
if not os.path.exists(SAVE_PATH):
195
  os.makedirs(SAVE_PATH)
196
197
198
# load AID1706 Assay Data
199
X_drugs, _, y = load_AID1706_SARS_CoV_3CL()
200
201
drug_encoding = 'rdkit_2d_normalized'
202
train, val, test = data_process(X_drug = X_drugs, y = y, 
203
                drug_encoding = drug_encoding,
204
                split_method='random', 
205
                random_seed = 1)
206
207
config = generate_config(drug_encoding = drug_encoding, 
208
                         cls_hidden_dims = [512], 
209
                         train_epoch = 20, 
210
                         LR = 0.001, 
211
                         batch_size = 128,
212
                        )
213
model = models.model_initialize(**config)
214
model.train(train, val, test)
215
216
X_repurpose, drug_name, drug_cid = load_broad_repurposing_hub(SAVE_PATH)
217
218
_ = models.repurpose(X_repurpose, model, drug_name)
219
220
```
221
222
</details>
223
224
### Case Study 1(c): A Framework for Drug-Drug Interaction Prediction, with less than 10 lines of codes.
225
DDI is very important for drug safety profiling and the success of clinical trials. This framework predicts interaction based on drug pairs chemical structure.
226
227
<details>
228
  <summary>Click here for the code!</summary>
229
230
```python
231
from DeepPurpose import DDI as models
232
from DeepPurpose.utils import *
233
from DeepPurpose.dataset import *
234
235
# load DB Binary Data
236
X_drugs, X_drugs_, y = read_file_training_dataset_drug_drug_pairs("toy_data/ddi.txt")
237
238
drug_encoding = 'rdkit_2d_normalized'
239
train, val, test = data_process(X_drug = X_drugs, X_drug_ = X_drugs_, y = y, 
240
                drug_encoding = drug_encoding,
241
                split_method='random', 
242
                random_seed = 1)
243
244
config = generate_config(drug_encoding = drug_encoding, 
245
                         cls_hidden_dims = [512], 
246
                         train_epoch = 20, 
247
                         LR = 0.001, 
248
                         batch_size = 128,
249
                        )
250
251
model = models.model_initialize(**config)
252
model.train(train, val, test)
253
254
```
255
256
</details>
257
258
### Case Study 1(d): A Framework for Protein-Protein Interaction Prediction, with less than 10 lines of codes.
259
PPI is important to study the relations among targets. 
260
261
<details>
262
  <summary>Click here for the code!</summary>
263
264
```python
265
from DeepPurpose import PPI as models
266
from DeepPurpose.utils import *
267
from DeepPurpose.dataset import *
268
269
# load DB Binary Data
270
X_targets, X_targets_, y = read_file_training_dataset_protein_protein_pairs("toy_data/ppi.txt")
271
272
target_encoding = 'CNN'
273
train, val, test = data_process(X_target = X_targets, X_target_ = X_targets_, y = y, 
274
                target_encoding = target_encoding,
275
                split_method='random', 
276
                random_seed = 1)
277
278
config = generate_config(target_encoding = target_encoding, 
279
                         cls_hidden_dims = [512], 
280
                         train_epoch = 20, 
281
                         LR = 0.001, 
282
                         batch_size = 128,
283
                        )
284
285
model = models.model_initialize(**config)
286
model.train(train, val, test)
287
288
```
289
290
</details>
291
292
293
### Case Study 1(e): A Framework for Protein Function Prediction, with less than 10 lines of codes.
294
Protein function prediction help predict various useful functions such as GO terms, structural classification and etc. Also, for biologics drugs, it is also useful for screening. 
295
296
<details>
297
  <summary>Click here for the code!</summary>
298
299
```python
300
from DeepPurpose import ProteinPred as models
301
from DeepPurpose.utils import *
302
from DeepPurpose.dataset import *
303
304
# load DB Binary Data
305
X_targets, y = read_file_protein_function()
306
307
target_encoding = 'CNN'
308
train, val, test = data_process(X_target = X_targets, y = y, 
309
                target_encoding = target_encoding,
310
                split_method='random', 
311
                random_seed = 1)
312
313
config = generate_config(target_encoding = target_encoding, 
314
                         cls_hidden_dims = [512], 
315
                         train_epoch = 20, 
316
                         LR = 0.001, 
317
                         batch_size = 128,
318
                        )
319
320
model = models.model_initialize(**config)
321
model.train(train, val, test)
322
323
```
324
325
</details>
326
327
### Case Study 2 (a): Antiviral Drugs Repurposing for SARS-CoV2 3CLPro, using One Line.
328
  Given a new target sequence (e.g., SARS-CoV2 3CL Protease), retrieve a list of repurposing drugs from a curated drug library of 81 antiviral drugs. The Binding Score is the Kd values. Results aggregated from five pretrained model on BindingDB dataset! (Caution: this currently is for educational purposes. The pretrained DTI models only cover a small dataset and thus cannot generalize to every new unseen protein. For the best use case, train your own model with customized data.)
329
330
<details>
331
  <summary>Click here for the code!</summary>
332
333
```python
334
from DeepPurpose import oneliner
335
from DeepPurpose.dataset import *
336
oneliner.repurpose(*load_SARS_CoV2_Protease_3CL(), *load_antiviral_drugs(no_cid = True))
337
```
338
```
339
----output----
340
Drug Repurposing Result for SARS-CoV2 3CL Protease
341
+------+----------------------+------------------------+---------------+
342
| Rank |      Drug Name       |      Target Name       | Binding Score |
343
+------+----------------------+------------------------+---------------+
344
|  1   |      Sofosbuvir      | SARS-CoV2 3CL Protease |     190.25    |
345
|  2   |     Daclatasvir      | SARS-CoV2 3CL Protease |     214.58    |
346
|  3   |      Vicriviroc      | SARS-CoV2 3CL Protease |     315.70    |
347
|  4   |      Simeprevir      | SARS-CoV2 3CL Protease |     396.53    |
348
|  5   |      Etravirine      | SARS-CoV2 3CL Protease |     409.34    |
349
|  6   |      Amantadine      | SARS-CoV2 3CL Protease |     419.76    |
350
|  7   |      Letermovir      | SARS-CoV2 3CL Protease |     460.28    |
351
|  8   |     Rilpivirine      | SARS-CoV2 3CL Protease |     470.79    |
352
|  9   |      Darunavir       | SARS-CoV2 3CL Protease |     472.24    |
353
|  10  |      Lopinavir       | SARS-CoV2 3CL Protease |     473.01    |
354
|  11  |      Maraviroc       | SARS-CoV2 3CL Protease |     474.86    |
355
|  12  |    Fosamprenavir     | SARS-CoV2 3CL Protease |     487.45    |
356
|  13  |      Ritonavir       | SARS-CoV2 3CL Protease |     492.19    |
357
....
358
```
359
360
</details>
361
362
363
### Case Study 2(b): Repurposing using Customized training data, with One Line.
364
Given a new target sequence (e.g., SARS-CoV 3CL Pro), training on new data (AID1706 Bioassay), and then retrieve a list of repurposing drugs from a proprietary library (e.g., antiviral drugs). The model can be trained from scratch or finetuned from the pretraining checkpoint!
365
366
<details>
367
  <summary>Click here for the code!</summary>
368
    
369
```python
370
from DeepPurpose import oneliner
371
from DeepPurpose.dataset import *
372
373
oneliner.repurpose(*load_SARS_CoV_Protease_3CL(), *load_antiviral_drugs(no_cid = True),  *load_AID1706_SARS_CoV_3CL(), \
374
        split='HTS', convert_y = False, frac=[0.8,0.1,0.1], pretrained = False, agg = 'max_effect')
375
```
376
```
377
----output----
378
Drug Repurposing Result for SARS-CoV 3CL Protease
379
+------+----------------------+-----------------------+-------------+-------------+
380
| Rank |      Drug Name       |      Target Name      | Interaction | Probability |
381
+------+----------------------+-----------------------+-------------+-------------+
382
|  1   |      Remdesivir      | SARS-CoV 3CL Protease |     YES     |     0.99    |
383
|  2   |      Efavirenz       | SARS-CoV 3CL Protease |     YES     |     0.98    |
384
|  3   |      Vicriviroc      | SARS-CoV 3CL Protease |     YES     |     0.98    |
385
|  4   |      Tipranavir      | SARS-CoV 3CL Protease |     YES     |     0.96    |
386
|  5   |     Methisazone      | SARS-CoV 3CL Protease |     YES     |     0.94    |
387
|  6   |      Letermovir      | SARS-CoV 3CL Protease |     YES     |     0.88    |
388
|  7   |     Idoxuridine      | SARS-CoV 3CL Protease |     YES     |     0.77    |
389
|  8   |       Loviride       | SARS-CoV 3CL Protease |     YES     |     0.76    |
390
|  9   |      Baloxavir       | SARS-CoV 3CL Protease |     YES     |     0.74    |
391
|  10  |     Ibacitabine      | SARS-CoV 3CL Protease |     YES     |     0.70    |
392
|  11  |     Taribavirin      | SARS-CoV 3CL Protease |     YES     |     0.65    |
393
|  12  |      Indinavir       | SARS-CoV 3CL Protease |     YES     |     0.62    |
394
|  13  |   Podophyllotoxin    | SARS-CoV 3CL Protease |     YES     |     0.60    |
395
....
396
```
397
</details>
398
399
400
## Demos
401
Checkout 10+ demos & tutorials to start:
402
403
| Name | Description |
404
|-----------------|-------------|
405
| [Dataset Tutorial](DEMO/load_data_tutorial.ipynb) | Tutorial on how to use the dataset loader and read customized data|
406
| [Drug Repurposing for 3CLPro](DEMO/case-study-I-Drug-Repurposing-for-3CLPro.ipynb)| Example of one-liner repurposing for 3CLPro|
407
| [Drug Repurposing with Customized Data](DEMO/case-study-III-Drug-Repurposing-with-Customized-Data.ipynb)| Example of one-liner repurposing with AID1706 Bioassay Data, training from scratch|
408
| [Virtual Screening for BindingDB IC50](DEMO/case-study-II-Virtual-Screening-for-BindingDB-IC50.ipynb) | Example of one-liner virtual screening |
409
|[Reproduce DeepDTA](DEMO/case-study-IV-Reproduce_DeepDTA.ipynb)|Reproduce [DeepDTA](https://arxiv.org/abs/1801.10193) with DAVIS dataset and show how to use the 10 lines framework|
410
| [Virtual Screening for DAVIS and Correlation Plot](DEMO/Make-DAVIS-Correlation-Figure.ipynb) | Example of one-liner virtual screening and evaluate on unseen dataset by plotting correlation |
411
| [Binary Classification for DAVIS using CNNs](DEMO/CNN-Binary-Example-DAVIS.ipynb)| Binary Classification for DAVIS dataset using CNN encodings by using the 10 lines framework.|
412
| [Pretraining Model Tutorial](DEMO/load_pretraining_models_tutorial.ipynb)| Tutorial on how to load pretraining models|
413
414
and more in the [DEMO](https://github.com/kexinhuang12345/DeepPurpose/tree/master/DEMO) folder!
415
416
## Contact
417
Please contact kexinhuang@hsph.harvard.edu or tfu42@gatech.edu for help or submit an issue. 
418
419
## Encodings
420
Currently, we support the following encodings:
421
422
| Drug Encodings  | Description |
423
|-----------------|-------------|
424
| Morgan | Extended-Connectivity Fingerprints |
425
| Pubchem| Pubchem Substructure-based Fingerprints|
426
| Daylight | Daylight-type fingerprints | 
427
| rdkit_2d_normalized| Normalized Descriptastorus|
428
| ESPF | Explainable Substructure Partition Fingerprint |
429
| ErG | 2D pharmacophore descriptions for scaffold hopping |
430
| CNN | Convolutional Neural Network on SMILES|
431
|CNN_RNN| A GRU/LSTM on top of a CNN on SMILES|
432
|Transformer| Transformer Encoder on ESPF|
433
|  MPNN | Message-passing neural network |
434
| DGL_GCN | Graph Convolutional Network |
435
| DGL_NeuralFP | Neural Fingerprint |
436
| DGL_GIN_AttrMasking | Pretrained GIN with Attribute Masking |
437
| DGL_GIN_ContextPred | Pretrained GIN with Context Prediction |
438
| DGL_AttentiveFP | Attentive FP, Xiong et al. 2020 |
439
440
441
| Target Encodings  | Description |
442
|-----------------|-------------|
443
| AAC | Amino acid composition up to 3-mers |
444
| PseudoAAC| Pseudo amino acid composition|
445
| Conjoint_triad | Conjoint triad features | 
446
| Quasi-seq| Quasi-sequence order descriptor|
447
| ESPF | Explainable Substructure Partition Fingerprint |
448
| CNN | Convolutional Neural Network on target seq|
449
|CNN_RNN| A GRU/LSTM on top of a CNN on target seq|
450
|Transformer| Transformer Encoder on ESPF|
451
452
## Data
453
DeepPurpose supports the following dataset loaders for now and more will be added:
454
455
*Public Drug-Target Binding Benchmark Dataset*
456
| Data  | Function |
457
|-------|----------|
458
|[BindingDB](https://www.bindingdb.org/bind/index.jsp)| ```download_BindingDB()``` to download the data and ```process_BindingDB()``` to process the data|
459
|[DAVIS](http://staff.cs.utu.fi/~aatapa/data/DrugTarget/)|```load_process_DAVIS()``` to download and process the data|
460
|[KIBA](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0209-z)|```load_process_KIBA()``` to download and process the data|
461
462
*Repurposing Dataset*
463
| Data  | Function |
464
|-------|----------|
465
|[Curated Antiviral Drugs Library](https://en.wikipedia.org/wiki/List_of_antiviral_drugs)|```load_antiviral_drugs()``` to load and process the data|
466
|[Broad Repurposing Hub](https://www.broadinstitute.org/drug-repurposing-hub)|```load_broad_repurposing_hub()``` downloads and process the data|
467
468
*Bioassay Data for COVID-19*
469
(Thanks to [MIT AI Cures](https://www.aicures.mit.edu/data))
470
| Data  | Function |
471
|-------|----------|
472
|[AID1706](https://pubchem.ncbi.nlm.nih.gov/bioassay/1706)|```load_AID1706_SARS_CoV_3CL()``` to load and process|
473
474
*COVID-19 Targets*
475
| Data  | Function |
476
|-------|----------|
477
|SARS-CoV 3CL Protease|```load_SARS_CoV_Protease_3CL()```|
478
|SARS-CoV2 3CL Protease|```load_SARS_CoV2_Protease_3CL()```|
479
|SARS_CoV2 RNA Polymerase|```load_SARS_CoV2_RNA_polymerase()```|
480
|SARS-CoV2 Helicase|```load_SARS_CoV2_Helicase()```|
481
|SARS-CoV2 3to5_exonuclease|```load_SARS_CoV2_3to5_exonuclease()```|
482
|SARS-CoV2 endoRNAse|```load_SARS_CoV2_endoRNAse()```|
483
484
DeepPurpose also supports reading from users' txt file. It assumes the following data format.
485
486
<details>
487
  <summary>Click here for the format expected!</summary>
488
489
For drug target pairs:
490
```
491
Drug1_SMILES Target1_Seq Score/Label
492
Drug2_SMILES Target2_Seq Score/Label
493
....
494
```
495
Then, use 
496
497
```python 
498
from DeepPurpose import dataset
499
X_drug, X_target, y = dataset.read_file_training_dataset_drug_target_pairs(PATH)
500
```
501
502
For bioassay training data:
503
```
504
Target_Seq
505
Drug1_SMILES Score/Label
506
Drug2_SMILES Score/Label
507
....
508
```
509
510
Then, use 
511
512
```python 
513
from DeepPurpose import dataset
514
X_drug, X_target, y = dataset.read_file_training_dataset_bioassay(PATH)
515
```
516
517
For drug property prediction training data:
518
```
519
Drug1_SMILES Score/Label
520
Drug2_SMILES Score/Label
521
....
522
```
523
524
Then, use 
525
526
```python 
527
from DeepPurpose import dataset
528
X_drug, y = dataset.read_file_compound_property(PATH)
529
```
530
531
For protein function prediction training data:
532
```
533
Target1_Seq Score/Label
534
Target2_Seq Score/Label
535
....
536
```
537
538
Then, use 
539
540
```python 
541
from DeepPurpose import dataset
542
X_drug, y = dataset.read_file_protein_function(PATH)
543
```
544
545
For drug-drug pairs:
546
```
547
Drug1_SMILES Drug1_SMILES_ Score/Label
548
Drug2_SMILES Drug2_SMILES_ Score/Label
549
....
550
```
551
Then, use 
552
553
```python 
554
from DeepPurpose import dataset
555
X_drug, X_target, y = dataset.read_file_training_dataset_drug_drug_pairs(PATH)
556
```
557
558
For protein-protein pairs:
559
```
560
Target1_Seq Target1_Seq_ Score/Label
561
Target2_Seq Target2_Seq_ Score/Label
562
....
563
```
564
Then, use 
565
566
```python 
567
from DeepPurpose import dataset
568
X_drug, X_target, y = dataset.read_file_training_dataset_protein_protein_pairs(PATH)
569
```
570
571
For drug repurposing library:
572
```
573
Drug1_Name Drug1_SMILES 
574
Drug2_Name Drug2_SMILES
575
....
576
```
577
Then, use 
578
579
```python 
580
from DeepPurpose import dataset
581
X_drug, X_drug_names = dataset.read_file_repurposing_library(PATH)
582
```
583
584
For target sequence to be repurposed:
585
```
586
Target_Name Target_seq 
587
```
588
Then, use 
589
590
```python 
591
from DeepPurpose import dataset
592
Target_seq, Target_name = dataset.read_file_target_sequence(PATH)
593
```
594
595
For virtual screening library:
596
```
597
Drug1_SMILES Drug1_Name Target1_Seq Target1_Name
598
Drug1_SMILES Drug1_Name Target1_Seq Target1_Name
599
....
600
```
601
Then, use 
602
603
```python 
604
from DeepPurpose import dataset
605
X_drug, X_target, X_drug_names, X_target_names = dataset.read_file_virtual_screening_drug_target_pairs(PATH)
606
```
607
</details>
608
609
Checkout [Dataset Tutorial](DEMO/load_data_tutorial.ipynb).
610
611
## Pretrained models
612
We provide more than 10 pretrained models. Please see [Pretraining Model Tutorial](DEMO/load_pretraining_models_tutorial.ipynb) on how to load them. It is as simple as 
613
614
```python
615
from DeepPurpose import DTI as models
616
net = models.model_pretrained(model = 'MPNN_CNN_DAVIS')
617
or
618
net = models.model_pretrained(FILE_PATH)
619
```
620
The list of available pretrained models:
621
622
Model name consists of first the drug encoding, then the target encoding and then the trained dataset.
623
624
Note that for DTI models, the BindingDB and DAVIS are trained on the log scale. But DeepPurpose allows you to specify conversion between log scale (e.g., pIC50) and original scale by the variable `convert_y`.
625
626
<details>
627
  <summary>Click here for the models supported!</summary>
628
629
|Model Name|
630
|------|
631
|CNN_CNN_BindingDB_IC50|
632
|Morgan_CNN_BindingDB_IC50|
633
|Morgan_AAC_BindingDB_IC50|
634
|MPNN_CNN_BindingDB_IC50|
635
|Daylight_AAC_BindingDB_IC50|
636
|CNN_CNN_DAVIS|
637
|CNN_CNN_BindingDB|
638
|Morgan_CNN_BindingDB|
639
|Morgan_CNN_KIBA|
640
|Morgan_CNN_DAVIS|
641
|MPNN_CNN_BindingDB|
642
|MPNN_CNN_KIBA|
643
|MPNN_CNN_DAVIS|
644
|Transformer_CNN_BindingDB|
645
|Daylight_AAC_DAVIS|
646
|Daylight_AAC_KIBA|
647
|Daylight_AAC_BindingDB|
648
|Morgan_AAC_BindingDB|
649
|Morgan_AAC_KIBA|
650
|Morgan_AAC_DAVIS|
651
652
</details>
653
654
## Documentations
655
https://deeppurpose.readthedocs.io is under active development.
656
657
## Disclaimer
658
The output list should be inspected manually by experts before proceeding to the wet-lab validation, and our work is still in active developement with limitations, please do not directly use the drugs.
659
660