DrugGEN / Git / [d33920] /results/README.md

Models:

Amanda-D/

DrugGEN

Downloads: 1

[d33920]: / results / README.md

History

Download this file

46 lines (35 with data), 3.7 kB

DrugGEN Results

SMILES notations of 10,000 de novo generated molecules from DrugGEN model can be downloaded from here for AKT1 and from here for CDK2.
We conducted a molecular docking analysis on the de novo molecules generated from DrugGEN and other target-based generation models, including RELATION, TRIOMHPHE-BOA, ResGen, TargetDiff and Pocket2Mol as well as on real AKT1 and CDK2 inhibitors, using the crystal structure of AKT1 and CDK2, respectively. The top-performing 10% of molecules from docking analyses were compared across target-based methods for evaluation. The SMILES notations of these molecules and their docking scores are available here.
Finally, de novo molecules to effectively target the AKT1 protein are selected via expert curation from the dataset of molecules with binding free energies lower than -8 kcal/mol and predicted as active by DEEPScreen against the AKT1 protein (SMILES notations of the expert selected de novo AKT1 inhibitor molecules).

Evaluation Script

This script takes four arguments:
- gen: A list of SMILES strings representing the de novo generated molecules. Molecules should be found under a column named "SMILES".
- ref1: A list of SMILES strings representing the reference molecules for novelty calculation. (e.g. ChEMBL molecules)
- ref2(optional): A list of SMILES strings representing the reference molecules for novelty calculation. (e.g. selected inhibitors)
- output (optional, default: results): The output file where the computed metrics will be saved.

The following is a generic example of how to use the evaluation script:

python evaluate.py --gen "[SMILES FILE]" --ref1 "[TRAINING SET FILE]" --ref2 "[TEST SET FILE]" --output "[PERFORMANCE RESULTS FILE]"

To evaluate the AKT1 targeted generated molecules used in the paper, run:

python evaluate.py --gen "generated_molecules/DrugGEN_generated_molecules_AKT1.csv" --ref1 "../data/chembl_train.smi" --ref2 "../data/akt_train.smi" --output "results_akt1"

To evaluate the CDK2 targeted generated molecules used in the paper, run:

python evaluate.py --gen "generated_molecules/DrugGEN_generated_molecules_CDK2.csv" --ref1 "../data/chembl_train.smi" --ref2 "../data/cdk2_train.smi" --output "results_cdk2.csv"

The script calculates the following metrics:
- Validity: The fraction of valid molecules in the generated set.
- Uniqueness: The fraction of unique molecules in the generated set.
- Novelty: The fraction of molecules in the generated set that are not present in the reference sets.
- Internal Diversity: The average Tanimoto similarity between all pairs of molecules in the generated set.
- QED: The average QED score of the molecules in the generated set.
- SA: The average SA score of the molecules in the generated set.
- FCD: The average FCD score of the molecules in the generated set against both reference sets.
- Fragment Similarity: The average fragment similarity score of the molecules in the generated set against both reference sets.
- Scaffold Similarity: The average scaffold similarity score of the molecules in the generated set against both reference sets.
- Lipinski: The fraction of molecules in the generated set that pass the Lipinski filter.
- Veber: The fraction of molecules in the generated set that pass the Veber filter.
- PAINS: The fraction of molecules in the generated set that pass the PAINS filter.