PhenPred / Git / Diff of /README.md

Models:

AlyssaS/

PhenPred

Downloads: 1

Diff of /README.md [305123] .. [f87bf8]

Switch to unified view


# MOSA - Multi-omic Synthetic Augmentation

This repository presents a bespoke Variational Autoencoder (VAE) that integrates all molecular and phenotypic data sets available for cancer cell lines.



## Installation
### Instruction
1. Clone this repository
2. Create a python (Python 3.10) environment: e.g. `conda create -n mosa python=3.10`
3. Activate the python environment: `conda activate mosa`
4. Run `pip install -r requirements.txt`
5. Install shap from `https://github.com/ZhaoxiangSimonCai/shap`, which is customised to support the data format in MOSA.
5. Run `pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118`


### Typical installation time
The installation time largely depends on the internet speed as packages need to be downloaded and installed over the internet. Typically the installation should take less than 10 minutes.

## Demo
### Instruction
1. Download data files from figshare repository (see links in the manuscript)
2. Configure the paths of the data files in `reports/vae/files/hyperparameters.json`
3. Run MOSA with `python PhenPred/vae/Main.py`
### Expected output
The expected output, including the latent space matrix and reconstructed data matrices, can be downloaded from the figshare repository as described in the paper.
### Expected runtime
As a deep learning-based method, the runtime of MOSA depends on whether a GPU is available for training. MOSA took 52 minutes to train and generate the results using a V100 GPU on the DepMap dataset.

## Instructions for using MOFA with custom data
Although MOSA is specifically designed for analysing the DepMap dataset, the model can be  adapted for any multi-omic datasets. To use MOSA with custom datasets:
1. Prepare the custom dataset following the formats of DepMap data, which can be downloaded from figshare repositories as described in the manuscript.
2. Configure the paths of the data files in `reports/vae/files/hyperparameters.json`. At least two omic datasets are required.
3. Run MOSA with `python PhenPred/vae/Main.py`
4. If certain benchmark analysis cannot be run properly, MOSA can be run by setting `skip_benchmarks=true` in the  `hyperparameters.json` to only save the output data, which includes the integrated latent space matrix and reconstructed data for each omics.
5. To further customise data pre-processing, the user can create their own dataset following the style of `PhenPred/vae/DatasetDepMap23Q2.py`, and the use the custome dataset class in the `Main.py`.

## Reproduction instructions
### To reproduce the benchmark results
1. Download the data from [figshare](https://doi.org/10.6084/m9.figshare.24562765)
2. Place the downloaded files to `reports/vae/files/` 
3. In the `Main.py`, configure to run MOSA from pre-computed data ` hyperparameters = Hypers.read_hyperparameters(timestamp="20231023_092657")`.

### To reproduce from scratch
1. Directly run MOSA with the default configurations as described above.

## Instructions for Integrating Disentanglement Learning into MOSA
To incorporate disentanglement learning, two additional terms are included in the loss function, following the Disentangled Inferred Prior Variational Autoencoder (DIP-VAE) approach, as described by [Kumar et al. (2018)](https://arxiv.org/abs/1711.00848):

![DIP-VAE loss term](./figure/dipvae_lossterm.png)

To use this, update the `hyperparameters.json` file by specifying `dip_vae_type` as either `"i"` or `"ii"` (type ii is recommended), and define the parameters `lambda_d` and `lambda_od` as float values, which control the diagonal and off-diagonal regularization, respectively.

## Pre-trained models
The pre-trained models can be downloaded from the Hugging Face model hub: [MOSA](https://huggingface.co/QuantitativeBiology/MOSA_pretrained)

## Citation
Cai, Z et al., Synthetic multi-omics augmentation of cancer cell lines using unsupervised deep learning, 2023


	a/README.md		b/README.md
1	# MOSA - Multi-omic Synthetic Augmentation	1	# MOSA - Multi-omic Synthetic Augmentation
2		2
3	This repository presents a bespoke Variational Autoencoder (VAE) that integrates all molecular and phenotypic data sets available for cancer cell lines.	3	This repository presents a bespoke Variational Autoencoder (VAE) that integrates all molecular and phenotypic data sets available for cancer cell lines.
4		4
5	![MOSA Overview](./figure/MOSA_Overview.png)	5
6
7	## Installation	6	## Installation
8	### Instruction	7	### Instruction
9	1. Clone this repository	8	1. Clone this repository
10	2. Create a python (Python 3.10) environment: e.g. `conda create -n mosa python=3.10`	9	2. Create a python (Python 3.10) environment: e.g. `conda create -n mosa python=3.10`
11	3. Activate the python environment: `conda activate mosa`	10	3. Activate the python environment: `conda activate mosa`
12	4. Run `pip install -r requirements.txt`	11	4. Run `pip install -r requirements.txt`
13	5. Install shap from `https://github.com/ZhaoxiangSimonCai/shap`, which is customised to support the data format in MOSA.	12	5. Install shap from `https://github.com/ZhaoxiangSimonCai/shap`, which is customised to support the data format in MOSA.
14	5. Run `pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118`	13	5. Run `pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118`
15		14
16		15
17	### Typical installation time	16	### Typical installation time
18	The installation time largely depends on the internet speed as packages need to be downloaded and installed over the internet. Typically the installation should take less than 10 minutes.	17	The installation time largely depends on the internet speed as packages need to be downloaded and installed over the internet. Typically the installation should take less than 10 minutes.
19		18
20	## Demo	19	## Demo
21	### Instruction	20	### Instruction
22	1. Download data files from figshare repository (see links in the manuscript)	21	1. Download data files from figshare repository (see links in the manuscript)
23	2. Configure the paths of the data files in `reports/vae/files/hyperparameters.json`	22	2. Configure the paths of the data files in `reports/vae/files/hyperparameters.json`
24	3. Run MOSA with `python PhenPred/vae/Main.py`	23	3. Run MOSA with `python PhenPred/vae/Main.py`
25	### Expected output	24	### Expected output
26	The expected output, including the latent space matrix and reconstructed data matrices, can be downloaded from the figshare repository as described in the paper.	25	The expected output, including the latent space matrix and reconstructed data matrices, can be downloaded from the figshare repository as described in the paper.
27	### Expected runtime	26	### Expected runtime
28	As a deep learning-based method, the runtime of MOSA depends on whether a GPU is available for training. MOSA took 52 minutes to train and generate the results using a V100 GPU on the DepMap dataset.	27	As a deep learning-based method, the runtime of MOSA depends on whether a GPU is available for training. MOSA took 52 minutes to train and generate the results using a V100 GPU on the DepMap dataset.
29		28
30	## Instructions for using MOFA with custom data	29	## Instructions for using MOFA with custom data
31	Although MOSA is specifically designed for analysing the DepMap dataset, the model can be adapted for any multi-omic datasets. To use MOSA with custom datasets:	30	Although MOSA is specifically designed for analysing the DepMap dataset, the model can be adapted for any multi-omic datasets. To use MOSA with custom datasets:
32	1. Prepare the custom dataset following the formats of DepMap data, which can be downloaded from figshare repositories as described in the manuscript.	31	1. Prepare the custom dataset following the formats of DepMap data, which can be downloaded from figshare repositories as described in the manuscript.
33	2. Configure the paths of the data files in `reports/vae/files/hyperparameters.json`. At least two omic datasets are required.	32	2. Configure the paths of the data files in `reports/vae/files/hyperparameters.json`. At least two omic datasets are required.
34	3. Run MOSA with `python PhenPred/vae/Main.py`	33	3. Run MOSA with `python PhenPred/vae/Main.py`
35	4. If certain benchmark analysis cannot be run properly, MOSA can be run by setting `skip_benchmarks=true` in the `hyperparameters.json` to only save the output data, which includes the integrated latent space matrix and reconstructed data for each omics.	34	4. If certain benchmark analysis cannot be run properly, MOSA can be run by setting `skip_benchmarks=true` in the `hyperparameters.json` to only save the output data, which includes the integrated latent space matrix and reconstructed data for each omics.
36	5. To further customise data pre-processing, the user can create their own dataset following the style of `PhenPred/vae/DatasetDepMap23Q2.py`, and the use the custome dataset class in the `Main.py`.	35	5. To further customise data pre-processing, the user can create their own dataset following the style of `PhenPred/vae/DatasetDepMap23Q2.py`, and the use the custome dataset class in the `Main.py`.
37		36
38	## Reproduction instructions	37	## Reproduction instructions
39	### To reproduce the benchmark results	38	### To reproduce the benchmark results
40	1. Download the data from [figshare](https://doi.org/10.6084/m9.figshare.24562765)	39	1. Download the data from [figshare](https://doi.org/10.6084/m9.figshare.24562765)
41	2. Place the downloaded files to `reports/vae/files/`	40	2. Place the downloaded files to `reports/vae/files/`
42	3. In the `Main.py`, configure to run MOSA from pre-computed data ` hyperparameters = Hypers.read_hyperparameters(timestamp="20231023_092657")`.	41	3. In the `Main.py`, configure to run MOSA from pre-computed data ` hyperparameters = Hypers.read_hyperparameters(timestamp="20231023_092657")`.
43		42
44	### To reproduce from scratch	43	### To reproduce from scratch
45	1. Directly run MOSA with the default configurations as described above.	44	1. Directly run MOSA with the default configurations as described above.
46		45
47	## Instructions for Integrating Disentanglement Learning into MOSA	46	## Instructions for Integrating Disentanglement Learning into MOSA
48	To incorporate disentanglement learning, two additional terms are included in the loss function, following the Disentangled Inferred Prior Variational Autoencoder (DIP-VAE) approach, as described by [Kumar et al. (2018)](https://arxiv.org/abs/1711.00848):	47	To incorporate disentanglement learning, two additional terms are included in the loss function, following the Disentangled Inferred Prior Variational Autoencoder (DIP-VAE) approach, as described by [Kumar et al. (2018)](https://arxiv.org/abs/1711.00848):
49		48
50	![DIP-VAE loss term](./figure/dipvae_lossterm.png)	49	![DIP-VAE loss term](./figure/dipvae_lossterm.png)
51		50
52	To use this, update the `hyperparameters.json` file by specifying `dip_vae_type` as either `"i"` or `"ii"` (type ii is recommended), and define the parameters `lambda_d` and `lambda_od` as float values, which control the diagonal and off-diagonal regularization, respectively.	51	To use this, update the `hyperparameters.json` file by specifying `dip_vae_type` as either `"i"` or `"ii"` (type ii is recommended), and define the parameters `lambda_d` and `lambda_od` as float values, which control the diagonal and off-diagonal regularization, respectively.
53		52
54	## Pre-trained models	53	## Pre-trained models
55	The pre-trained models can be downloaded from the Hugging Face model hub: [MOSA](https://huggingface.co/QuantitativeBiology/MOSA_pretrained)	54	The pre-trained models can be downloaded from the Hugging Face model hub: [MOSA](https://huggingface.co/QuantitativeBiology/MOSA_pretrained)
56		55
57	## Citation	56	## Citation
58	Cai, Z et al., Synthetic multi-omics augmentation of cancer cell lines using unsupervised deep learning, 2023	57	Cai, Z et al., Synthetic multi-omics augmentation of cancer cell lines using unsupervised deep learning, 2023
59		58