|
a |
|
b/README.md |
|
|
1 |
# MOSA - Multi-omic Synthetic Augmentation |
|
|
2 |
|
|
|
3 |
This repository presents a bespoke Variational Autoencoder (VAE) that integrates all molecular and phenotypic data sets available for cancer cell lines. |
|
|
4 |
|
|
|
5 |
 |
|
|
6 |
|
|
|
7 |
## Installation |
|
|
8 |
### Instruction |
|
|
9 |
1. Clone this repository |
|
|
10 |
2. Create a python (Python 3.10) environment: e.g. `conda create -n mosa python=3.10` |
|
|
11 |
3. Activate the python environment: `conda activate mosa` |
|
|
12 |
4. Run `pip install -r requirements.txt` |
|
|
13 |
5. Install shap from `https://github.com/ZhaoxiangSimonCai/shap`, which is customised to support the data format in MOSA. |
|
|
14 |
5. Run `pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118` |
|
|
15 |
|
|
|
16 |
|
|
|
17 |
### Typical installation time |
|
|
18 |
The installation time largely depends on the internet speed as packages need to be downloaded and installed over the internet. Typically the installation should take less than 10 minutes. |
|
|
19 |
|
|
|
20 |
## Demo |
|
|
21 |
### Instruction |
|
|
22 |
1. Download data files from figshare repository (see links in the manuscript) |
|
|
23 |
2. Configure the paths of the data files in `reports/vae/files/hyperparameters.json` |
|
|
24 |
3. Run MOSA with `python PhenPred/vae/Main.py` |
|
|
25 |
### Expected output |
|
|
26 |
The expected output, including the latent space matrix and reconstructed data matrices, can be downloaded from the figshare repository as described in the paper. |
|
|
27 |
### Expected runtime |
|
|
28 |
As a deep learning-based method, the runtime of MOSA depends on whether a GPU is available for training. MOSA took 52 minutes to train and generate the results using a V100 GPU on the DepMap dataset. |
|
|
29 |
|
|
|
30 |
## Instructions for using MOFA with custom data |
|
|
31 |
Although MOSA is specifically designed for analysing the DepMap dataset, the model can be adapted for any multi-omic datasets. To use MOSA with custom datasets: |
|
|
32 |
1. Prepare the custom dataset following the formats of DepMap data, which can be downloaded from figshare repositories as described in the manuscript. |
|
|
33 |
2. Configure the paths of the data files in `reports/vae/files/hyperparameters.json`. At least two omic datasets are required. |
|
|
34 |
3. Run MOSA with `python PhenPred/vae/Main.py` |
|
|
35 |
4. If certain benchmark analysis cannot be run properly, MOSA can be run by setting `skip_benchmarks=true` in the `hyperparameters.json` to only save the output data, which includes the integrated latent space matrix and reconstructed data for each omics. |
|
|
36 |
5. To further customise data pre-processing, the user can create their own dataset following the style of `PhenPred/vae/DatasetDepMap23Q2.py`, and the use the custome dataset class in the `Main.py`. |
|
|
37 |
|
|
|
38 |
## Reproduction instructions |
|
|
39 |
### To reproduce the benchmark results |
|
|
40 |
1. Download the data from [figshare](https://doi.org/10.6084/m9.figshare.24562765) |
|
|
41 |
2. Place the downloaded files to `reports/vae/files/` |
|
|
42 |
3. In the `Main.py`, configure to run MOSA from pre-computed data ` hyperparameters = Hypers.read_hyperparameters(timestamp="20231023_092657")`. |
|
|
43 |
|
|
|
44 |
### To reproduce from scratch |
|
|
45 |
1. Directly run MOSA with the default configurations as described above. |
|
|
46 |
|
|
|
47 |
## Instructions for Integrating Disentanglement Learning into MOSA |
|
|
48 |
To incorporate disentanglement learning, two additional terms are included in the loss function, following the Disentangled Inferred Prior Variational Autoencoder (DIP-VAE) approach, as described by [Kumar et al. (2018)](https://arxiv.org/abs/1711.00848): |
|
|
49 |
|
|
|
50 |
 |
|
|
51 |
|
|
|
52 |
To use this, update the `hyperparameters.json` file by specifying `dip_vae_type` as either `"i"` or `"ii"` (type ii is recommended), and define the parameters `lambda_d` and `lambda_od` as float values, which control the diagonal and off-diagonal regularization, respectively. |
|
|
53 |
|
|
|
54 |
## Pre-trained models |
|
|
55 |
The pre-trained models can be downloaded from the Hugging Face model hub: [MOSA](https://huggingface.co/QuantitativeBiology/MOSA_pretrained) |
|
|
56 |
|
|
|
57 |
## Citation |
|
|
58 |
Cai, Z et al., Synthetic multi-omics augmentation of cancer cell lines using unsupervised deep learning, 2023 |
|
|
59 |
|