Switch to unified view

a/README.md b/README.md
1
# MOSA - Multi-omic Synthetic Augmentation
1
# MOSA - Multi-omic Synthetic Augmentation
2
2
3
This repository presents a bespoke Variational Autoencoder (VAE) that integrates all molecular and phenotypic data sets available for cancer cell lines.
3
This repository presents a bespoke Variational Autoencoder (VAE) that integrates all molecular and phenotypic data sets available for cancer cell lines.
4
4
5
![MOSA Overview](./figure/MOSA_Overview.png)
5
6
7
## Installation
6
## Installation
8
### Instruction
7
### Instruction
9
1. Clone this repository
8
1. Clone this repository
10
2. Create a python (Python 3.10) environment: e.g. `conda create -n mosa python=3.10`
9
2. Create a python (Python 3.10) environment: e.g. `conda create -n mosa python=3.10`
11
3. Activate the python environment: `conda activate mosa`
10
3. Activate the python environment: `conda activate mosa`
12
4. Run `pip install -r requirements.txt`
11
4. Run `pip install -r requirements.txt`
13
5. Install shap from `https://github.com/ZhaoxiangSimonCai/shap`, which is customised to support the data format in MOSA.
12
5. Install shap from `https://github.com/ZhaoxiangSimonCai/shap`, which is customised to support the data format in MOSA.
14
5. Run `pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118`
13
5. Run `pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118`
15
14
16
15
17
### Typical installation time
16
### Typical installation time
18
The installation time largely depends on the internet speed as packages need to be downloaded and installed over the internet. Typically the installation should take less than 10 minutes.
17
The installation time largely depends on the internet speed as packages need to be downloaded and installed over the internet. Typically the installation should take less than 10 minutes.
19
18
20
## Demo
19
## Demo
21
### Instruction
20
### Instruction
22
1. Download data files from figshare repository (see links in the manuscript)
21
1. Download data files from figshare repository (see links in the manuscript)
23
2. Configure the paths of the data files in `reports/vae/files/hyperparameters.json`
22
2. Configure the paths of the data files in `reports/vae/files/hyperparameters.json`
24
3. Run MOSA with `python PhenPred/vae/Main.py`
23
3. Run MOSA with `python PhenPred/vae/Main.py`
25
### Expected output
24
### Expected output
26
The expected output, including the latent space matrix and reconstructed data matrices, can be downloaded from the figshare repository as described in the paper.
25
The expected output, including the latent space matrix and reconstructed data matrices, can be downloaded from the figshare repository as described in the paper.
27
### Expected runtime
26
### Expected runtime
28
As a deep learning-based method, the runtime of MOSA depends on whether a GPU is available for training. MOSA took 52 minutes to train and generate the results using a V100 GPU on the DepMap dataset.
27
As a deep learning-based method, the runtime of MOSA depends on whether a GPU is available for training. MOSA took 52 minutes to train and generate the results using a V100 GPU on the DepMap dataset.
29
28
30
## Instructions for using MOFA with custom data
29
## Instructions for using MOFA with custom data
31
Although MOSA is specifically designed for analysing the DepMap dataset, the model can be  adapted for any multi-omic datasets. To use MOSA with custom datasets:
30
Although MOSA is specifically designed for analysing the DepMap dataset, the model can be  adapted for any multi-omic datasets. To use MOSA with custom datasets:
32
1. Prepare the custom dataset following the formats of DepMap data, which can be downloaded from figshare repositories as described in the manuscript.
31
1. Prepare the custom dataset following the formats of DepMap data, which can be downloaded from figshare repositories as described in the manuscript.
33
2. Configure the paths of the data files in `reports/vae/files/hyperparameters.json`. At least two omic datasets are required.
32
2. Configure the paths of the data files in `reports/vae/files/hyperparameters.json`. At least two omic datasets are required.
34
3. Run MOSA with `python PhenPred/vae/Main.py`
33
3. Run MOSA with `python PhenPred/vae/Main.py`
35
4. If certain benchmark analysis cannot be run properly, MOSA can be run by setting `skip_benchmarks=true` in the  `hyperparameters.json` to only save the output data, which includes the integrated latent space matrix and reconstructed data for each omics.
34
4. If certain benchmark analysis cannot be run properly, MOSA can be run by setting `skip_benchmarks=true` in the  `hyperparameters.json` to only save the output data, which includes the integrated latent space matrix and reconstructed data for each omics.
36
5. To further customise data pre-processing, the user can create their own dataset following the style of `PhenPred/vae/DatasetDepMap23Q2.py`, and the use the custome dataset class in the `Main.py`.
35
5. To further customise data pre-processing, the user can create their own dataset following the style of `PhenPred/vae/DatasetDepMap23Q2.py`, and the use the custome dataset class in the `Main.py`.
37
36
38
## Reproduction instructions
37
## Reproduction instructions
39
### To reproduce the benchmark results
38
### To reproduce the benchmark results
40
1. Download the data from [figshare](https://doi.org/10.6084/m9.figshare.24562765)
39
1. Download the data from [figshare](https://doi.org/10.6084/m9.figshare.24562765)
41
2. Place the downloaded files to `reports/vae/files/` 
40
2. Place the downloaded files to `reports/vae/files/` 
42
3. In the `Main.py`, configure to run MOSA from pre-computed data ` hyperparameters = Hypers.read_hyperparameters(timestamp="20231023_092657")`.
41
3. In the `Main.py`, configure to run MOSA from pre-computed data ` hyperparameters = Hypers.read_hyperparameters(timestamp="20231023_092657")`.
43
42
44
### To reproduce from scratch
43
### To reproduce from scratch
45
1. Directly run MOSA with the default configurations as described above.
44
1. Directly run MOSA with the default configurations as described above.
46
45
47
## Instructions for Integrating Disentanglement Learning into MOSA
46
## Instructions for Integrating Disentanglement Learning into MOSA
48
To incorporate disentanglement learning, two additional terms are included in the loss function, following the Disentangled Inferred Prior Variational Autoencoder (DIP-VAE) approach, as described by [Kumar et al. (2018)](https://arxiv.org/abs/1711.00848):
47
To incorporate disentanglement learning, two additional terms are included in the loss function, following the Disentangled Inferred Prior Variational Autoencoder (DIP-VAE) approach, as described by [Kumar et al. (2018)](https://arxiv.org/abs/1711.00848):
49
48
50
![DIP-VAE loss term](./figure/dipvae_lossterm.png)
49
![DIP-VAE loss term](./figure/dipvae_lossterm.png)
51
50
52
To use this, update the `hyperparameters.json` file by specifying `dip_vae_type` as either `"i"` or `"ii"` (type ii is recommended), and define the parameters `lambda_d` and `lambda_od` as float values, which control the diagonal and off-diagonal regularization, respectively.
51
To use this, update the `hyperparameters.json` file by specifying `dip_vae_type` as either `"i"` or `"ii"` (type ii is recommended), and define the parameters `lambda_d` and `lambda_od` as float values, which control the diagonal and off-diagonal regularization, respectively.
53
52
54
## Pre-trained models
53
## Pre-trained models
55
The pre-trained models can be downloaded from the Hugging Face model hub: [MOSA](https://huggingface.co/QuantitativeBiology/MOSA_pretrained)
54
The pre-trained models can be downloaded from the Hugging Face model hub: [MOSA](https://huggingface.co/QuantitativeBiology/MOSA_pretrained)
56
55
57
## Citation
56
## Citation
58
Cai, Z et al., Synthetic multi-omics augmentation of cancer cell lines using unsupervised deep learning, 2023
57
Cai, Z et al., Synthetic multi-omics augmentation of cancer cell lines using unsupervised deep learning, 2023
59
58