|
a/README.md |
|
b/README.md |
1 |
# DiffSBDD: Structure-based Drug Design with Equivariant Diffusion Models |
1 |
# DiffSBDD: Structure-based Drug Design with Equivariant Diffusion Models |
2 |
|
2 |
|
3 |
Official implementation of **DiffSBDD**, an equivariant diffusion model for structure-based drug design, by Arne Schneuing, Charles Harris, Yuanqi Du, Kieran Didi, Arian Jamasb, Ilia Igashov, Weitao Du, Carla Gomes, Tom Blundell, Pietro Lio, Max Welling, Michael Bronstein & Bruno Correia. |
3 |
Official implementation of **DiffSBDD**, an equivariant diffusion model for structure-based drug design, by Arne Schneuing, Charles Harris, Yuanqi Du, Kieran Didi, Arian Jamasb, Ilia Igashov, Weitao Du, Carla Gomes, Tom Blundell, Pietro Lio, Max Welling, Michael Bronstein & Bruno Correia. |
4 |
|
4 |
|
5 |
[](https://doi.org/10.1038/s43588-024-00737-x) |
5 |
[](https://doi.org/10.1038/s43588-024-00737-x)
|
6 |
[](http://arxiv.org/abs/2210.13695) |
6 |
[](http://arxiv.org/abs/2210.13695)
|
7 |
[](https://colab.research.google.com/github/arneschneuing/DiffSBDD/blob/main/colab/DiffSBDD.ipynb) |
7 |
[](https://colab.research.google.com/github/arneschneuing/DiffSBDD/blob/main/colab/DiffSBDD.ipynb) |
8 |
|
8 |
|
9 |
> [!TIP] |
9 |
[!TIP]
|
10 |
> You can also try out our new 3D generative models for drug design at https://github.com/LPDI-EPFL/DrugFlow. |
10 |
You can also try out our new 3D generative models for drug design at https://github.com/LPDI-EPFL/DrugFlow. |
11 |
|
11 |
|
12 |
 |
12 |
|
13 |
|
13 |
|
14 |
1. [Dependencies](#dependencies) |
14 |
1. [Dependencies](#dependencies)
|
15 |
1. [Conda environment](#conda-environment) |
15 |
1. [Conda environment](#conda-environment)
|
16 |
3. [Pre-trained models](#pre-trained-models) |
16 |
3. [Pre-trained models](#pre-trained-models)
|
17 |
2. [Step-by-step examples](#step-by-step-examples) |
17 |
2. [Step-by-step examples](#step-by-step-examples)
|
18 |
1. [De novo design](#de-novo-design) |
18 |
1. [De novo design](#de-novo-design)
|
19 |
2. [Substructure inpainting](#substructure-inpainting) |
19 |
2. [Substructure inpainting](#substructure-inpainting)
|
20 |
3. [Molecular optimization](#molecular-optimization) |
20 |
3. [Molecular optimization](#molecular-optimization)
|
21 |
3. [Benchmarks](#benchmarks) |
21 |
3. [Benchmarks](#benchmarks)
|
22 |
1. [CrossDocked Benchmark](#crossdocked) |
22 |
1. [CrossDocked Benchmark](#crossdocked)
|
23 |
2. [Binding MOAD](#binding-moad) |
23 |
2. [Binding MOAD](#binding-moad)
|
24 |
3. [Sampled molecules](#sampled-molecules) |
24 |
3. [Sampled molecules](#sampled-molecules)
|
25 |
4. [Training](#training) |
25 |
4. [Training](#training)
|
26 |
5. [Inference](#inference) |
26 |
5. [Inference](#inference)
|
27 |
1. [Sample molecules for a given pocket](#sample-molecules-for-a-given-pocket) |
27 |
1. [Sample molecules for a given pocket](#sample-molecules-for-a-given-pocket)
|
28 |
2. [Test set sampling](#sample-molecules-for-all-pockets-in-the-test-set) |
28 |
2. [Test set sampling](#sample-molecules-for-all-pockets-in-the-test-set)
|
29 |
3. [Fix substructures](#fix-substructures) |
29 |
3. [Fix substructures](#fix-substructures)
|
30 |
4. [Metrics](#metrics) |
30 |
4. [Metrics](#metrics)
|
31 |
6. [Citation](#citation) |
31 |
6. [Citation](#citation) |
32 |
|
32 |
|
33 |
## Dependencies |
33 |
## Dependencies |
34 |
|
34 |
|
35 |
### Conda environment |
35 |
### Conda environment
|
36 |
```bash |
36 |
```bash
|
37 |
conda create -n sbdd-env |
37 |
conda create -n sbdd-env
|
38 |
conda activate sbdd-env |
38 |
conda activate sbdd-env
|
39 |
conda install pytorch cudatoolkit=10.2 -c pytorch |
39 |
conda install pytorch cudatoolkit=10.2 -c pytorch
|
40 |
conda install -c conda-forge pytorch-lightning |
40 |
conda install -c conda-forge pytorch-lightning
|
41 |
conda install -c conda-forge wandb |
41 |
conda install -c conda-forge wandb
|
42 |
conda install -c conda-forge rdkit |
42 |
conda install -c conda-forge rdkit
|
43 |
conda install -c conda-forge biopython |
43 |
conda install -c conda-forge biopython
|
44 |
conda install -c conda-forge imageio |
44 |
conda install -c conda-forge imageio
|
45 |
conda install -c anaconda scipy |
45 |
conda install -c anaconda scipy
|
46 |
conda install -c pyg pytorch-scatter |
46 |
conda install -c pyg pytorch-scatter
|
47 |
conda install -c conda-forge openbabel |
47 |
conda install -c conda-forge openbabel
|
48 |
conda install seaborn |
48 |
conda install seaborn
|
49 |
``` |
49 |
``` |
50 |
|
50 |
|
51 |
The code was tested with the following versions |
51 |
The code was tested with the following versions
|
52 |
| Software | Version | |
52 |
| Software | Version |
|
53 |
|-------------------|-----------| |
53 |
|-------------------|-----------|
|
54 |
| Python | 3.10.4 | |
54 |
| Python | 3.10.4 |
|
55 |
| CUDA | 10.2.89 | |
55 |
| CUDA | 10.2.89 |
|
56 |
| PyTorch | 1.12.1 | |
56 |
| PyTorch | 1.12.1 |
|
57 |
| PyTorch Lightning | 1.7.4 | |
57 |
| PyTorch Lightning | 1.7.4 |
|
58 |
| WandB | 0.13.1 | |
58 |
| WandB | 0.13.1 |
|
59 |
| RDKit | 2022.03.2 | |
59 |
| RDKit | 2022.03.2 |
|
60 |
| BioPython | 1.79 | |
60 |
| BioPython | 1.79 |
|
61 |
| imageio | 2.21.2 | |
61 |
| imageio | 2.21.2 |
|
62 |
| SciPy | 1.7.3 | |
62 |
| SciPy | 1.7.3 |
|
63 |
| PyTorch Scatter | 2.0.9 | |
63 |
| PyTorch Scatter | 2.0.9 |
|
64 |
| OpenBabel | 3.1.1 | |
64 |
| OpenBabel | 3.1.1 | |
65 |
|
65 |
|
66 |
### Pre-trained models |
66 |
### Pre-trained models
|
67 |
Pre-trained models can be downloaded from [Zenodo](https://zenodo.org/record/8183747). |
67 |
Pre-trained models can be downloaded from [Zenodo](https://zenodo.org/record/8183747).
|
68 |
- [CrossDocked, conditional $`C_\alpha`$ model](https://zenodo.org/record/8183747/files/crossdocked_ca_cond.ckpt?download=1) |
68 |
- [CrossDocked, conditional $`C_\alpha`$ model](https://zenodo.org/record/8183747/files/crossdocked_ca_cond.ckpt?download=1)
|
69 |
- [CrossDocked, joint $`C_\alpha`$ model](https://zenodo.org/record/8183747/files/crossdocked_ca_joint.ckpt?download=1) |
69 |
- [CrossDocked, joint $`C_\alpha`$ model](https://zenodo.org/record/8183747/files/crossdocked_ca_joint.ckpt?download=1)
|
70 |
- [CrossDocked, conditional full-atom model](https://zenodo.org/record/8183747/files/crossdocked_fullatom_cond.ckpt?download=1) |
70 |
- [CrossDocked, conditional full-atom model](https://zenodo.org/record/8183747/files/crossdocked_fullatom_cond.ckpt?download=1)
|
71 |
- [CrossDocked, joint full-atom model](https://zenodo.org/record/8183747/files/crossdocked_fullatom_joint.ckpt?download=1) |
71 |
- [CrossDocked, joint full-atom model](https://zenodo.org/record/8183747/files/crossdocked_fullatom_joint.ckpt?download=1)
|
72 |
- [Binding MOAD, conditional $`C_\alpha`$ model](https://zenodo.org/record/8183747/files/moad_ca_cond.ckpt?download=1) |
72 |
- [Binding MOAD, conditional $`C_\alpha`$ model](https://zenodo.org/record/8183747/files/moad_ca_cond.ckpt?download=1)
|
73 |
- [Binding MOAD, joint $`C_\alpha`$ model](https://zenodo.org/record/8183747/files/moad_ca_joint.ckpt?download=1) |
73 |
- [Binding MOAD, joint $`C_\alpha`$ model](https://zenodo.org/record/8183747/files/moad_ca_joint.ckpt?download=1)
|
74 |
- [Binding MOAD, conditional full-atom model](https://zenodo.org/record/8183747/files/moad_fullatom_cond.ckpt?download=1) |
74 |
- [Binding MOAD, conditional full-atom model](https://zenodo.org/record/8183747/files/moad_fullatom_cond.ckpt?download=1)
|
75 |
- [Binding MOAD, joint full-atom model](https://zenodo.org/record/8183747/files/moad_fullatom_joint.ckpt?download=1) |
75 |
- [Binding MOAD, joint full-atom model](https://zenodo.org/record/8183747/files/moad_fullatom_joint.ckpt?download=1) |
76 |
|
76 |
|
77 |
## Step-by-step examples |
77 |
## Step-by-step examples |
78 |
|
78 |
|
79 |
These simple step-by-step examples provide an easy entry point to generating molecules with DiffSBDD. |
79 |
These simple step-by-step examples provide an easy entry point to generating molecules with DiffSBDD.
|
80 |
More details about training and sampling scripts are provided below. |
80 |
More details about training and sampling scripts are provided below. |
81 |
|
81 |
|
82 |
Before we run the sampling scripts we need to download a model checkpoint: |
82 |
Before we run the sampling scripts we need to download a model checkpoint:
|
83 |
```bash |
83 |
```bash
|
84 |
wget -P checkpoints/ https://zenodo.org/record/8183747/files/crossdocked_fullatom_cond.ckpt |
84 |
wget -P checkpoints/ https://zenodo.org/record/8183747/files/crossdocked_fullatom_cond.ckpt
|
85 |
``` |
85 |
```
|
86 |
It will be stored in the `./checkpoints` folder. |
86 |
It will be stored in the `./checkpoints` folder. |
87 |
|
87 |
|
88 |
### De novo design |
88 |
### De novo design |
89 |
|
89 |
|
90 |
Using the trained model weights, we can sample new ligands with a single command. In this example, we use the protein with PDB ID `3RFM` that can be found in the example folder. |
90 |
Using the trained model weights, we can sample new ligands with a single command. In this example, we use the protein with PDB ID `3RFM` that can be found in the example folder.
|
91 |
The PDB file contains a reference ligand in chain A at residue number 330 that we can use to specify the designated binding pocket. |
91 |
The PDB file contains a reference ligand in chain A at residue number 330 that we can use to specify the designated binding pocket.
|
92 |
The following command will generate 20 samples and save them in a file called `3rfm_mol.sdf` in the `./example` folder. |
92 |
The following command will generate 20 samples and save them in a file called `3rfm_mol.sdf` in the `./example` folder.
|
93 |
```bash |
93 |
```bash
|
94 |
python generate_ligands.py checkpoints/crossdocked_fullatom_cond.ckpt --pdbfile example/3rfm.pdb --outfile example/3rfm_mol.sdf --ref_ligand A:330 --n_samples 20 |
94 |
python generate_ligands.py checkpoints/crossdocked_fullatom_cond.ckpt --pdbfile example/3rfm.pdb --outfile example/3rfm_mol.sdf --ref_ligand A:330 --n_samples 20
|
95 |
``` |
95 |
```
|
96 |
Instead of specifying the chain and residue number we can also provide an SDF file with the reference ligand: |
96 |
Instead of specifying the chain and residue number we can also provide an SDF file with the reference ligand:
|
97 |
```bash |
97 |
```bash
|
98 |
python generate_ligands.py checkpoints/crossdocked_fullatom_cond.ckpt --pdbfile example/3rfm.pdb --outfile example/3rfm_mol.sdf --ref_ligand example/3rfm_B_CFF.sdf --n_samples 20 |
98 |
python generate_ligands.py checkpoints/crossdocked_fullatom_cond.ckpt --pdbfile example/3rfm.pdb --outfile example/3rfm_mol.sdf --ref_ligand example/3rfm_B_CFF.sdf --n_samples 20
|
99 |
``` |
99 |
```
|
100 |
If no reference ligand is known, the binding pocket can also be specified as a list of residues as described [below](#sample-molecules-for-a-given-pocket). |
100 |
If no reference ligand is known, the binding pocket can also be specified as a list of residues as described [below](#sample-molecules-for-a-given-pocket). |
101 |
|
101 |
|
102 |
### Substructure inpainting |
102 |
### Substructure inpainting |
103 |
|
103 |
|
104 |
To design molecules around fixed substructures (scaffold elaboration, fragment linking etc.) you can run the `inpaint.py` script. |
104 |
To design molecules around fixed substructures (scaffold elaboration, fragment linking etc.) you can run the `inpaint.py` script.
|
105 |
Here, we demonstrate its usage with a fragment linking example. Similar to `generate_ligands.py`, the inpainting script allows us to define pockets based on a reference ligand in SDF format |
105 |
Here, we demonstrate its usage with a fragment linking example. Similar to `generate_ligands.py`, the inpainting script allows us to define pockets based on a reference ligand in SDF format
|
106 |
or with a chain and residue identifier (if it is in the PDB). |
106 |
or with a chain and residue identifier (if it is in the PDB).
|
107 |
The easiest way to fix substructures is to provide them in a separate SDF file using the `--fix_atoms` flag. |
107 |
The easiest way to fix substructures is to provide them in a separate SDF file using the `--fix_atoms` flag.
|
108 |
However, the script also accepts a list of atom names which must correspond to the atoms of the reference ligand in the PDB file, e.g. `--fix_atoms C1 N6 C5 C12`. |
108 |
However, the script also accepts a list of atom names which must correspond to the atoms of the reference ligand in the PDB file, e.g. `--fix_atoms C1 N6 C5 C12`.
|
109 |
```bash |
109 |
```bash
|
110 |
python inpaint.py checkpoints/crossdocked_fullatom_cond.ckpt --pdbfile example/5ndu.pdb --outfile example/5ndu_linked_mols.sdf --ref_ligand example/5ndu_C_8V2.sdf --fix_atoms example/fragments.sdf --center ligand --add_n_nodes 10 |
110 |
python inpaint.py checkpoints/crossdocked_fullatom_cond.ckpt --pdbfile example/5ndu.pdb --outfile example/5ndu_linked_mols.sdf --ref_ligand example/5ndu_C_8V2.sdf --fix_atoms example/fragments.sdf --center ligand --add_n_nodes 10
|
111 |
``` |
111 |
```
|
112 |
Note that the `--center ligand` option tells DiffSBDD to sample the additional atoms near the center of mass of the fixed substructure, which is not always ideal or desired. |
112 |
Note that the `--center ligand` option tells DiffSBDD to sample the additional atoms near the center of mass of the fixed substructure, which is not always ideal or desired.
|
113 |
For instance, the inputs could be two fragments with very different sizes, in which case the random noise will be sampled very close to the larger fragment. |
113 |
For instance, the inputs could be two fragments with very different sizes, in which case the random noise will be sampled very close to the larger fragment.
|
114 |
We currently also support sampling in the pocket center (`--center pocket`) but in some cases neither of these two options might be suitable and a problem-specific solution is warranted to avoid bad results. |
114 |
We currently also support sampling in the pocket center (`--center pocket`) but in some cases neither of these two options might be suitable and a problem-specific solution is warranted to avoid bad results. |
115 |
|
115 |
|
116 |
Another important parameter is `--add_n_nodes` which determines how many new atoms will be added. If it is not provided, a random number will be sampled. |
116 |
Another important parameter is `--add_n_nodes` which determines how many new atoms will be added. If it is not provided, a random number will be sampled. |
117 |
|
117 |
|
118 |
### Molecular optimization |
118 |
### Molecular optimization |
119 |
|
119 |
|
120 |
You can use DiffSBDD to optimize existing molecules for given properties via the `optimize.py` script. |
120 |
You can use DiffSBDD to optimize existing molecules for given properties via the `optimize.py` script. |
121 |
|
121 |
|
122 |
```bash |
122 |
```bash
|
123 |
python optimize.py --checkpoint checkpoints/crossdocked_fullatom_cond.ckpt --pdbfile example/5ndu.pdb --outfile output.sdf --ref_ligand example/5ndu_C_8V2.sdf --objective sa --population_size 100 --evolution_steps 10 --top_k 10 --timesteps 100 |
123 |
python optimize.py --checkpoint checkpoints/crossdocked_fullatom_cond.ckpt --pdbfile example/5ndu.pdb --outfile output.sdf --ref_ligand example/5ndu_C_8V2.sdf --objective sa --population_size 100 --evolution_steps 10 --top_k 10 --timesteps 100
|
124 |
``` |
124 |
``` |
125 |
|
125 |
|
126 |
Important parameters in the evolutionary algorithum are: |
126 |
Important parameters in the evolutionary algorithum are:
|
127 |
- `--checkpoint`: The checkpoint to use for the noising-denoising model. |
127 |
- `--checkpoint`: The checkpoint to use for the noising-denoising model.
|
128 |
- `--objective`: The optimization objective. Currently supports 'qed' for Quantitative Estimate of Drug-likeness and 'sa' for Synthetic Accessibility. Custom objectives can be implemented within the code. |
128 |
- `--objective`: The optimization objective. Currently supports 'qed' for Quantitative Estimate of Drug-likeness and 'sa' for Synthetic Accessibility. Custom objectives can be implemented within the code.
|
129 |
- `--population_size`: The size of the molecule population to maintain across the optimization generations. |
129 |
- `--population_size`: The size of the molecule population to maintain across the optimization generations.
|
130 |
- `--evolution_steps`: The number of evolutionary steps (generations) to perform during the optimization process. |
130 |
- `--evolution_steps`: The number of evolutionary steps (generations) to perform during the optimization process.
|
131 |
- `--top_k`: The number of top-scoring molecules to select from one generation to the next. |
131 |
- `--top_k`: The number of top-scoring molecules to select from one generation to the next.
|
132 |
- `--timesteps`: The number of noise-denoise steps to use in the optimization algorithum. Defaults to 100 (out of T=500). |
132 |
- `--timesteps`: The number of noise-denoise steps to use in the optimization algorithum. Defaults to 100 (out of T=500). |
133 |
|
133 |
|
134 |
|
134 |
|
135 |
|
135 |
|
136 |
|
136 |
|
137 |
## Benchmarks |
137 |
## Benchmarks
|
138 |
### CrossDocked |
138 |
### CrossDocked |
139 |
|
139 |
|
140 |
#### Data preparation |
140 |
#### Data preparation
|
141 |
Download and extract the dataset as described by the authors of Pocket2Mol: https://github.com/pengxingang/Pocket2Mol/tree/main/data |
141 |
Download and extract the dataset as described by the authors of Pocket2Mol: https://github.com/pengxingang/Pocket2Mol/tree/main/data |
142 |
|
142 |
|
143 |
Process the raw data using |
143 |
Process the raw data using
|
144 |
```bash |
144 |
```bash
|
145 |
python process_crossdock.py <crossdocked_dir> --no_H |
145 |
python process_crossdock.py <crossdocked_dir> --no_H
|
146 |
``` |
146 |
``` |
147 |
|
147 |
|
148 |
### Binding MOAD |
148 |
### Binding MOAD
|
149 |
#### Data preparation |
149 |
#### Data preparation
|
150 |
Download the dataset |
150 |
Download the dataset
|
151 |
```bash |
151 |
```bash
|
152 |
wget http://www.bindingmoad.org/files/biou/every_part_a.zip |
152 |
wget http://www.bindingmoad.org/files/biou/every_part_a.zip
|
153 |
wget http://www.bindingmoad.org/files/biou/every_part_b.zip |
153 |
wget http://www.bindingmoad.org/files/biou/every_part_b.zip
|
154 |
wget http://www.bindingmoad.org/files/csv/every.csv |
154 |
wget http://www.bindingmoad.org/files/csv/every.csv |
155 |
|
155 |
|
156 |
unzip every_part_a.zip |
156 |
unzip every_part_a.zip
|
157 |
unzip every_part_b.zip |
157 |
unzip every_part_b.zip
|
158 |
``` |
158 |
```
|
159 |
Process the raw data using |
159 |
Process the raw data using
|
160 |
``` bash |
160 |
``` bash
|
161 |
python -W ignore process_bindingmoad.py <bindingmoad_dir> |
161 |
python -W ignore process_bindingmoad.py <bindingmoad_dir>
|
162 |
``` |
162 |
```
|
163 |
Add the `--ca_only` flag to create a dataset with $C_\alpha$ pocket representation. |
163 |
Add the `--ca_only` flag to create a dataset with $C_\alpha$ pocket representation. |
164 |
|
164 |
|
165 |
### Sampled molecules |
165 |
### Sampled molecules
|
166 |
Sampled molecules can be found on [Zenodo](https://zenodo.org/record/8239058). |
166 |
Sampled molecules can be found on [Zenodo](https://zenodo.org/record/8239058). |
167 |
|
167 |
|
168 |
## Training |
168 |
## Training
|
169 |
Starting a new training run: |
169 |
Starting a new training run:
|
170 |
```bash |
170 |
```bash
|
171 |
python -u train.py --config <config>.yml |
171 |
python -u train.py --config <config>.yml
|
172 |
``` |
172 |
``` |
173 |
|
173 |
|
174 |
Resuming a previous run: |
174 |
Resuming a previous run:
|
175 |
```bash |
175 |
```bash
|
176 |
python -u train.py --config <config>.yml --resume <checkpoint>.ckpt |
176 |
python -u train.py --config <config>.yml --resume <checkpoint>.ckpt
|
177 |
``` |
177 |
``` |
178 |
|
178 |
|
179 |
## Inference |
179 |
## Inference |
180 |
|
180 |
|
181 |
### Sample molecules for a given pocket |
181 |
### Sample molecules for a given pocket
|
182 |
To sample small molecules for a given pocket with a trained model use the following command: |
182 |
To sample small molecules for a given pocket with a trained model use the following command:
|
183 |
```bash |
183 |
```bash
|
184 |
python generate_ligands.py <checkpoint>.ckpt --pdbfile <pdb_file>.pdb --outfile <output_file> --resi_list <list_of_pocket_residue_ids> |
184 |
python generate_ligands.py <checkpoint>.ckpt --pdbfile <pdb_file>.pdb --outfile <output_file> --resi_list <list_of_pocket_residue_ids>
|
185 |
``` |
185 |
```
|
186 |
For example: |
186 |
For example:
|
187 |
```bash |
187 |
```bash
|
188 |
python generate_ligands.py last.ckpt --pdbfile 1abc.pdb --outfile results/1abc_mols.sdf --resi_list A:1 A:2 A:3 A:4 A:5 A:6 A:7 |
188 |
python generate_ligands.py last.ckpt --pdbfile 1abc.pdb --outfile results/1abc_mols.sdf --resi_list A:1 A:2 A:3 A:4 A:5 A:6 A:7
|
189 |
``` |
189 |
```
|
190 |
Alternatively, the binding pocket can also be specified based on a reference ligand in the same PDB file: |
190 |
Alternatively, the binding pocket can also be specified based on a reference ligand in the same PDB file:
|
191 |
```bash |
191 |
```bash
|
192 |
python generate_ligands.py <checkpoint>.ckpt --pdbfile <pdb_file>.pdb --outfile <output_file> --ref_ligand <chain>:<resi> |
192 |
python generate_ligands.py <checkpoint>.ckpt --pdbfile <pdb_file>.pdb --outfile <output_file> --ref_ligand <chain>:<resi>
|
193 |
``` |
193 |
```
|
194 |
or with a separate SDF file: |
194 |
or with a separate SDF file:
|
195 |
```bash |
195 |
```bash
|
196 |
python generate_ligands.py <checkpoint>.ckpt --pdbfile <pdb_file>.pdb --outfile <output_file> --ref_ligand <ref_ligand>.sdf |
196 |
python generate_ligands.py <checkpoint>.ckpt --pdbfile <pdb_file>.pdb --outfile <output_file> --ref_ligand <ref_ligand>.sdf
|
197 |
``` |
197 |
``` |
198 |
|
198 |
|
199 |
Optional flags: |
199 |
Optional flags:
|
200 |
| Flag | Description | |
200 |
| Flag | Description |
|
201 |
|------|-------------| |
201 |
|------|-------------|
|
202 |
| `--n_samples` | Number of sampled molecules | |
202 |
| `--n_samples` | Number of sampled molecules |
|
203 |
| `--num_nodes_lig` | Size of sampled molecules | |
203 |
| `--num_nodes_lig` | Size of sampled molecules |
|
204 |
| `--timesteps` | Number of denoising steps for inference | |
204 |
| `--timesteps` | Number of denoising steps for inference |
|
205 |
| `--all_frags` | Keep all disconnected fragments | |
205 |
| `--all_frags` | Keep all disconnected fragments |
|
206 |
| `--sanitize` | Sanitize molecules (invalid molecules will be removed if this flag is present) | |
206 |
| `--sanitize` | Sanitize molecules (invalid molecules will be removed if this flag is present) |
|
207 |
| `--relax` | Relax generated structure in force field (does not consider the protein and might introduce clashes) | |
207 |
| `--relax` | Relax generated structure in force field (does not consider the protein and might introduce clashes) |
|
208 |
| `--resamplings` | Inpainting parameter (doesn't apply if conditional model is used) | |
208 |
| `--resamplings` | Inpainting parameter (doesn't apply if conditional model is used) |
|
209 |
| `--jump_length` | Inpainting parameter (doesn't apply if conditional model is used) | |
209 |
| `--jump_length` | Inpainting parameter (doesn't apply if conditional model is used) | |
210 |
|
210 |
|
211 |
### Sample molecules for all pockets in the test set |
211 |
### Sample molecules for all pockets in the test set
|
212 |
`test.py` can be used to sample molecules for the entire testing set: |
212 |
`test.py` can be used to sample molecules for the entire testing set:
|
213 |
```bash |
213 |
```bash
|
214 |
python test.py <checkpoint>.ckpt --test_dir <bindingmoad_dir>/processed_noH/test/ --outdir <output_dir> --sanitize |
214 |
python test.py <checkpoint>.ckpt --test_dir <bindingmoad_dir>/processed_noH/test/ --outdir <output_dir> --sanitize
|
215 |
``` |
215 |
```
|
216 |
There are different ways to determine the size of sampled molecules. |
216 |
There are different ways to determine the size of sampled molecules.
|
217 |
- `--fix_n_nodes`: generates ligands with the same number of nodes as the reference molecule |
217 |
- `--fix_n_nodes`: generates ligands with the same number of nodes as the reference molecule
|
218 |
- `--n_nodes_bias <int>`: samples the number of nodes randomly and adds this bias |
218 |
- `--n_nodes_bias <int>`: samples the number of nodes randomly and adds this bias
|
219 |
- `--n_nodes_min <int>`: samples the number of nodes randomly but clamps it at this value |
219 |
- `--n_nodes_min <int>`: samples the number of nodes randomly but clamps it at this value |
220 |
|
220 |
|
221 |
Other optional flags are analogous to `generate_ligands.py`. |
221 |
Other optional flags are analogous to `generate_ligands.py`. |
222 |
|
222 |
|
223 |
### Fix substructures |
223 |
### Fix substructures
|
224 |
`inpaint.py` can be used for partial ligand redesign with the conditionally trained model, e.g.: |
224 |
`inpaint.py` can be used for partial ligand redesign with the conditionally trained model, e.g.:
|
225 |
```bash |
225 |
```bash
|
226 |
python inpaint.py <checkpoint>.ckpt --pdbfile <pdb_file>.pdb --outfile <output_file> --ref_ligand <chain>:<resi> --fix_atoms C1 N6 C5 C12 |
226 |
python inpaint.py <checkpoint>.ckpt --pdbfile <pdb_file>.pdb --outfile <output_file> --ref_ligand <chain>:<resi> --fix_atoms C1 N6 C5 C12
|
227 |
``` |
227 |
```
|
228 |
`--add_n_nodes` controls the number of newly generated nodes. Other options are the same as before. |
228 |
`--add_n_nodes` controls the number of newly generated nodes. Other options are the same as before. |
229 |
|
229 |
|
230 |
### Metrics |
230 |
### Metrics
|
231 |
For assessing basic molecular properties create an instance of the `MoleculeProperties` class and run its `evaluate` method: |
231 |
For assessing basic molecular properties create an instance of the `MoleculeProperties` class and run its `evaluate` method:
|
232 |
```python |
232 |
```python
|
233 |
from analysis.metrics import MoleculeProperties |
233 |
from analysis.metrics import MoleculeProperties
|
234 |
mol_metrics = MoleculeProperties() |
234 |
mol_metrics = MoleculeProperties()
|
235 |
all_qed, all_sa, all_logp, all_lipinski, per_pocket_diversity = \ |
235 |
all_qed, all_sa, all_logp, all_lipinski, per_pocket_diversity = \
|
236 |
mol_metrics.evaluate(pocket_mols) |
236 |
mol_metrics.evaluate(pocket_mols)
|
237 |
``` |
237 |
```
|
238 |
`evaluate()` expects a list of lists where the inner list contains all RDKit molecules generated for one pocket. |
238 |
`evaluate()` expects a list of lists where the inner list contains all RDKit molecules generated for one pocket. |
239 |
|
239 |
|
240 |
## Citation |
240 |
## Citation
|
241 |
``` |
241 |
```
|
242 |
@article{schneuing2024diffsbdd, |
242 |
@article{schneuing2024diffsbdd,
|
243 |
title={Structure-based drug design with equivariant diffusion models}, |
243 |
title={Structure-based drug design with equivariant diffusion models},
|
244 |
author={Schneuing, Arne and Harris, Charles and Du, Yuanqi and Didi, Kieran and Jamasb, Arian and Igashov, Ilia and Du, Weitao and Gomes, Carla and Blundell, Tom L and Lio, Pietro and Welling, Max and Bronstein, Michael and Correia, Bruno}, |
244 |
author={Schneuing, Arne and Harris, Charles and Du, Yuanqi and Didi, Kieran and Jamasb, Arian and Igashov, Ilia and Du, Weitao and Gomes, Carla and Blundell, Tom L and Lio, Pietro and Welling, Max and Bronstein, Michael and Correia, Bruno},
|
245 |
journal={Nature Computational Science}, |
245 |
journal={Nature Computational Science},
|
246 |
year={2024}, |
246 |
year={2024},
|
247 |
month={Dec}, |
247 |
month={Dec},
|
248 |
day={01}, |
248 |
day={01},
|
249 |
volume={4}, |
249 |
volume={4},
|
250 |
number={12}, |
250 |
number={12},
|
251 |
pages={899-909}, |
251 |
pages={899-909},
|
252 |
issn={2662-8457}, |
252 |
issn={2662-8457},
|
253 |
doi={10.1038/s43588-024-00737-x}, |
253 |
doi={10.1038/s43588-024-00737-x},
|
254 |
url={https://doi.org/10.1038/s43588-024-00737-x} |
254 |
url={https://doi.org/10.1038/s43588-024-00737-x}
|
255 |
} |
255 |
}
|
256 |
``` |
256 |
```
|