Diff of /README.md [000000] .. [f54660]

Switch to unified view

a b/README.md
1
# Machine Learning Modelling to Predict the Efficacy of Cancer Treatment Drugs
2
3
>>List of Changeable Script Parameters
4
- input_dir: Directory path where input files are located, and where output files will be written.
5
- models: List of regression models to be evaluated.
6
- num_sp: Number of splits for cross-validation.
7
- num_rep: Number of repetitions for cross-validation.
8
Other hyperparameters within models and training functions.
9
10
>>Input Files
11
12
Option 1: Read in provided csv file
13
14
The following raw data file can be read in by setting the input_dir variable to the directory containing the file. 
15
- raw_data_erbb1_ic50.csv: CSV file containing the data on EGFR protein inhibitors, including canonical smiles and IC50 values.
16
17
Option 2: Fetch data from CHEMBL
18
Alternatively, by setting the fetch_chembl variable to TRUE, one can obtain the data directly from the CHEMBL database.
19
20
>>Output Files Generated
21
22
The following files will be written directly in the input_dir that was set at the beginning of the script. Files highlighted in orange and intermediate outputs that are used in subsequent steps within the script.
23
- erbb1_bothassay_neglog10_ic50.csv: Processed dataset with transformed IC50 values.
24
- cb_pb_fingerprints.csv: Molecular fingerprints data.
25
- df_pb_cb_for_model_building.csv: Final dataset used for model training.
26
- evaluations_with_cv.csv: Evaluation metrics from cross-validation.
27
- test_results.csv: Final test results for the optimized model.
28
- final_feature_importance.csv: Feature importance from the optimized RandomForest model.
29
30
List of Custom Functions
31
1. logm: Converts IC50 values from nM to -log(M).
32
2. mol_descriptors: Generates molecular descriptors from SMILES.
33
3. morgan_fpts: Generates Morgan fingerprints from SMILES.
34
4. train_evaluate_model_with_cv: Trains models from list of models and performs cross-validation.
35
5. plot_learning_curve: Plots the learning curve for models from list of models.
36
37
38
39
Supplementary File (Drawing fingerprint bits of interest)
40
- Open the DrawFingerprints.ipynb (requires that for_fingerprint_visualization.csv is in same folder)
41
- Run notebook file to see bit 343 and 1366 visualizations