Switch to unified view

a/README.md b/README.md
1
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
1
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
2
[![Build Status](https://github.com/PaccMann/paccmann_predictor/actions/workflows/build.yml/badge.svg)](https://github.com/PaccMann/paccmann_predictor/actions/workflows/build.yml)
2
[![Build Status](https://github.com/PaccMann/paccmann_predictor/actions/workflows/build.yml/badge.svg)](https://github.com/PaccMann/paccmann_predictor/actions/workflows/build.yml)
3
3
4
# paccmann_predictor
4
# paccmann_predictor
5
5
6
Drug interaction prediction with PaccMann.
6
Drug interaction prediction with PaccMann.
7
7
8
`paccmann_predictor` is a package for drug interaction prediction, with examples of 
8
`paccmann_predictor` is a package for drug interaction prediction, with examples of 
9
anticancer drug sensitivity prediction and drug target affinity prediction. Please see our papers:
9
anticancer drug sensitivity prediction and drug target affinity prediction. Please see our papers:
10
10
11
- [_Toward explainable anticancer compound sensitivity prediction via multimodal attention-based convolutional encoders_](https://doi.org/10.1021/acs.molpharmaceut.9b00520) (*Molecular Pharmaceutics*, 2019). This is the original paper on IC50 prediction using drug properties and tissue-specific cell line information (gene expression profiles). While the original code was written in `tensorflow` and is available [here](https://github.com/drugilsberg/paccmann), this is the `pytorch` implementation of the best PaccMann architecture (multiscale convolutional encoder).
11
- [_Toward explainable anticancer compound sensitivity prediction via multimodal attention-based convolutional encoders_](https://doi.org/10.1021/acs.molpharmaceut.9b00520) (*Molecular Pharmaceutics*, 2019). This is the original paper on IC50 prediction using drug properties and tissue-specific cell line information (gene expression profiles). While the original code was written in `tensorflow` and is available [here](https://github.com/drugilsberg/paccmann), this is the `pytorch` implementation of the best PaccMann architecture (multiscale convolutional encoder).
12
12
13
13
14
**PaccMann for affinity prediction:**
14
**PaccMann for affinity prediction:**
15
- [Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2](https://iopscience.iop.org/article/10.1088/2632-2153/abe808) (_Machine Learning: Science and Technology_, 2021). In there, we propose a slightly modified version to predict drug-target binding affinities based on protein sequences and SMILES
15
- [Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2](https://iopscience.iop.org/article/10.1088/2632-2153/abe808) (_Machine Learning: Science and Technology_, 2021). In there, we propose a slightly modified version to predict drug-target binding affinities based on protein sequences and SMILES
16
16
17
![Graphical abstract](https://github.com/PaccMann/paccmann_predictor/blob/master/assets/paccmann.png "Graphical abstract")
17
![Graphical abstract](https://github.com/PaccMann/paccmann_predictor/blob/master/assets/paccmann.png?raw=true "Graphical abstract")
18
18
19
## Installation
19
## Installation
20
The library itself has few dependencies (see [setup.py](setup.py)) with loose requirements. 
20
The library itself has few dependencies (see [setup.py](setup.py)) with loose requirements. 
21
First, set up the environment as follows:
21
First, set up the environment as follows:
22
```sh
22
```sh
23
conda env create -f examples/IC50/conda.yml
23
conda env create -f examples/IC50/conda.yml
24
conda activate paccmann_predictor
24
conda activate paccmann_predictor
25
pip install -e .
25
pip install -e .
26
```
26
```
27
27
28
28
29
## Evaluate pretrained drug sensitivty model on your own data
29
## Evaluate pretrained drug sensitivty model on your own data
30
First, please consider using our public [PaccMann webservice](https://ibm.biz/paccmann-aas) as described in the [NAR paper](https://academic.oup.com/nar/article/48/W1/W502/5836770).
30
First, please consider using our public [PaccMann webservice](https://ibm.biz/paccmann-aas) as described in the [NAR paper](https://academic.oup.com/nar/article/48/W1/W502/5836770).
31
31
32
To use our pretrained model, please download the model from: https://ibm.biz/paccmann-data (just download `models/single_pytorch_model`).
32
To use our pretrained model, please download the model from: https://ibm.biz/paccmann-data (just download `models/single_pytorch_model`).
33
For example, assuming that you:
33
For example, assuming that you:
34
1. Set up your conda environment as described above;
34
1. Set up your conda environment as described above;
35
2. Downloaded the model linked above in a directory called `single_pytorch_model` and
35
2. Downloaded the model linked above in a directory called `single_pytorch_model` and
36
3. Downloaded the data from https://ibm.box.com/v/paccmann-pytoda-data in folders `data` and `splitted_data`;
36
3. Downloaded the data from https://ibm.box.com/v/paccmann-pytoda-data in folders `data` and `splitted_data`;
37
then, the following command should work:
37
then, the following command should work:
38
```console
38
```console
39
(paccmann_predictor) $ python examples/IC50/test_paccmann.py \
39
(paccmann_predictor) $ python examples/IC50/test_paccmann.py \
40
splitted_data/gdsc_cell_line_ic50_test_fraction_0.1_id_997_seed_42.csv \
40
splitted_data/gdsc_cell_line_ic50_test_fraction_0.1_id_997_seed_42.csv \
41
data/gene_expression/gdsc-rnaseq_gene-expression.csv \
41
data/gene_expression/gdsc-rnaseq_gene-expression.csv \
42
data/smiles/gdsc.smi \
42
data/smiles/gdsc.smi \
43
data/2128_genes.pkl \
43
data/2128_genes.pkl \
44
single_pytorch_model/smiles_language \
44
single_pytorch_model/smiles_language \
45
single_pytorch_model/weights/best_mse_paccmann_v2.pt \
45
single_pytorch_model/weights/best_mse_paccmann_v2.pt \
46
results \
46
results \
47
single_pytorch_model/model_params.json
47
single_pytorch_model/model_params.json
48
```
48
```
49
*NOTE*: If you bring your own data, please make sure to provide the omic data for the 2128 genes specified in `data/2128_genes.pkl`. Your omic data (here it is `data/gene_expression/gdsc-rnaseq_gene-expression.csv`) can contain more columns and it does not need to follow the order of the pickled gene list. But please dont change this pickle file. Also note that this is PaccMannV2 which is slightly improved compared to the paper version (context attention on both modalities).
49
*NOTE*: If you bring your own data, please make sure to provide the omic data for the 2128 genes specified in `data/2128_genes.pkl`. Your omic data (here it is `data/gene_expression/gdsc-rnaseq_gene-expression.csv`) can contain more columns and it does not need to follow the order of the pickled gene list. But please dont change this pickle file. Also note that this is PaccMannV2 which is slightly improved compared to the paper version (context attention on both modalities).
50
50
51
## Finetuning on your own data
51
## Finetuning on your own data
52
You can also **finetune** our pretrained model on your data instead of training a model from scratch. For that, please follow the instruction below for training on scratch and just set:
52
You can also **finetune** our pretrained model on your data instead of training a model from scratch. For that, please follow the instruction below for training on scratch and just set:
53
- `model_path` --> directory where the `single_pytorch_model` is stored
53
- `model_path` --> directory where the `single_pytorch_model` is stored
54
- `training_name` --> this should be `single_pytorch_model`
54
- `training_name` --> this should be `single_pytorch_model`
55
- `params_filepath` --> `base_path/single_pytorch_model/model_params.json`
55
- `params_filepath` --> `base_path/single_pytorch_model/model_params.json`
56
56
57
57
58
## Training a model from scratch
58
## Training a model from scratch
59
To run the example training script we provide environment files under `examples/IC50/`.
59
To run the example training script we provide environment files under `examples/IC50/`.
60
In the `examples` directory is a training script [train_paccmann.py](./examples/IC50/train_paccmann.py) that makes use
60
In the `examples` directory is a training script [train_paccmann.py](./examples/IC50/train_paccmann.py) that makes use
61
of `paccmann_predictor`.
61
of `paccmann_predictor`.
62
62
63
```console
63
```console
64
(paccmann_predictor) $ python examples/IC50/train_paccmann.py -h
64
(paccmann_predictor) $ python examples/IC50/train_paccmann.py -h
65
usage: train_paccmann.py [-h]
65
usage: train_paccmann.py [-h]
66
                         train_sensitivity_filepath test_sensitivity_filepath
66
                         train_sensitivity_filepath test_sensitivity_filepath
67
                         gep_filepath smi_filepath gene_filepath
67
                         gep_filepath smi_filepath gene_filepath
68
                         smiles_language_filepath model_path params_filepath
68
                         smiles_language_filepath model_path params_filepath
69
                         training_name
69
                         training_name
70
70
71
positional arguments:
71
positional arguments:
72
  train_sensitivity_filepath
72
  train_sensitivity_filepath
73
                        Path to the drug sensitivity (IC50) data.
73
                        Path to the drug sensitivity (IC50) data.
74
  test_sensitivity_filepath
74
  test_sensitivity_filepath
75
                        Path to the drug sensitivity (IC50) data.
75
                        Path to the drug sensitivity (IC50) data.
76
  gep_filepath          Path to the gene expression profile data.
76
  gep_filepath          Path to the gene expression profile data.
77
  smi_filepath          Path to the SMILES data.
77
  smi_filepath          Path to the SMILES data.
78
  gene_filepath         Path to a pickle object containing list of genes.
78
  gene_filepath         Path to a pickle object containing list of genes.
79
  smiles_language_filepath
79
  smiles_language_filepath
80
                        Path to a pickle object a SMILES language object.
80
                        Path to a pickle object a SMILES language object.
81
  model_path            Directory where the model will be stored.
81
  model_path            Directory where the model will be stored.
82
  params_filepath       Path to the parameter file.
82
  params_filepath       Path to the parameter file.
83
  training_name         Name for the training.
83
  training_name         Name for the training.
84
84
85
optional arguments:
85
optional arguments:
86
  -h, --help            show this help message and exit
86
  -h, --help            show this help message and exit
87
```
87
```
88
88
89
`params_filepath` could point to [examples/IC50/example_params.json](examples/IC50/example_params.json), examples for other files can be downloaded from [here](https://ibm.box.com/v/paccmann-pytoda-data).
89
`params_filepath` could point to [examples/IC50/example_params.json](examples/IC50/example_params.json), examples for other files can be downloaded from [here](https://ibm.box.com/v/paccmann-pytoda-data).
90
90
91
## References
91
## References
92
92
93
If you use `paccmann_predictor` in your projects, please cite the following:
93
If you use `paccmann_predictor` in your projects, please cite the following:
94
94
95
```bib
95
```bib
96
@article{manica2019paccmann,
96
@article{manica2019paccmann,
97
  title={Toward explainable anticancer compound sensitivity prediction via multimodal attention-based convolutional encoders},
97
  title={Toward explainable anticancer compound sensitivity prediction via multimodal attention-based convolutional encoders},
98
  author={Manica, Matteo and Oskooei, Ali and Born, Jannis and Subramanian, Vigneshwari and S{\'a}ez-Rodr{\'\i}guez, Julio and Mart{\'\i}nez, Mar{\'\i}a Rodr{\'\i}guez},
98
  author={Manica, Matteo and Oskooei, Ali and Born, Jannis and Subramanian, Vigneshwari and S{\'a}ez-Rodr{\'\i}guez, Julio and Mart{\'\i}nez, Mar{\'\i}a Rodr{\'\i}guez},
99
  journal={Molecular pharmaceutics},
99
  journal={Molecular pharmaceutics},
100
  volume={16},
100
  volume={16},
101
  number={12},
101
  number={12},
102
  pages={4797--4806},
102
  pages={4797--4806},
103
  year={2019},
103
  year={2019},
104
  publisher={ACS Publications},
104
  publisher={ACS Publications},
105
  doi = {10.1021/acs.molpharmaceut.9b00520},
105
  doi = {10.1021/acs.molpharmaceut.9b00520},
106
  note = {PMID: 31618586}
106
  note = {PMID: 31618586}
107
}
107
}
108
108
109
@article{born2021datadriven,
109
@article{born2021datadriven,
110
  author = {Born, Jannis and Manica, Matteo and Cadow, Joris and Markert, Greta and Mill, Nil Adell and Filipavicius, Modestas and Janakarajan, Nikita and Cardinale, Antonio and Laino, Teodoro and {Rodr{\'{i}}guez Mart{\'{i}}nez}, Mar{\'{i}}a},
110
  author = {Born, Jannis and Manica, Matteo and Cadow, Joris and Markert, Greta and Mill, Nil Adell and Filipavicius, Modestas and Janakarajan, Nikita and Cardinale, Antonio and Laino, Teodoro and {Rodr{\'{i}}guez Mart{\'{i}}nez}, Mar{\'{i}}a},
111
  doi = {10.1088/2632-2153/abe808},
111
  doi = {10.1088/2632-2153/abe808},
112
  issn = {2632-2153},
112
  issn = {2632-2153},
113
  journal = {Machine Learning: Science and Technology},
113
  journal = {Machine Learning: Science and Technology},
114
  number = {2},
114
  number = {2},
115
  pages = {025024},
115
  pages = {025024},
116
  title = {{Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2}},
116
  title = {{Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2}},
117
  url = {https://iopscience.iop.org/article/10.1088/2632-2153/abe808},
117
  url = {https://iopscience.iop.org/article/10.1088/2632-2153/abe808},
118
  volume = {2},
118
  volume = {2},
119
  year = {2021}
119
  year = {2021}
120
}
120
}
121
```
121
```