MOVE / Git / Diff of /README.md

Models:

AlyssaS/

MOVE

Downloads: 1

Data:

Tabular

Time Series Specialty:

Endocrinology Laboratory:

Blood Tests EHR:

Demographics

Diagnoses

Medications Omics:

Genomics

Multi-omics

Transcriptomics Wearable:

Activity Clinical Purpose:

Treatment Response Assessment Task:

Biomarker Discovery

Diff of /README.md [c23b31] .. [5ef06f]

Switch to unified view


# MOVE (Multi-Omics Variational autoEncoder)

[![PyPI version](https://badge.fury.io/py/move-dl.svg)](https://badge.fury.io/py/move-dl)
[![Documentation Status](https://readthedocs.org/projects/move-dl/badge/?version=latest)](https://move-dl.readthedocs.io/?badge=latest)

The code in this repository can be used to run our Multi-Omics Variational
autoEncoder (MOVE) framework for integration of omics and clinical variabels
spanning both categorial and continuous data. Our approach includes training
ensemble VAE models and using *in silico* perturbation experiments to identify
cross omics associations. The manuscript has been published in Nature
Biotechnology:

Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. *et al*. Discovery of
drug–omics associations in type 2 diabetes with generative deep-learning
models. *Nat Biotechnol* (2023). https://doi.org/10.1038/s41587-022-01520-x

We developed the method based on a Type 2 Diabetes cohort from the IMI DIRECT
project containing 789 newly diagnosed T2D patients. The cohort and data
creation is described in
[Koivula et al.](https://dx.doi.org/10.1007%2Fs00125-019-4906-1) and
[Wesolowska-Andersen et al.](https://doi.org/10.1016/j.xcrm.2021.100477). For
the analysis we included the following data:

Multi-omics data sets:
```
Genomics
Transcriptomics
Proteomics
Metabolomics
Metagenomics
```

Other data sets:
```
Clinical data (blood measurements, imaging data, ...)
Questionnaire data (diet etc)
Accelerometer data
Medication data
```

# Installation

## Installing MOVE package

MOVE is written in Python and can be installed using `pip`:

```bash
>>> pip install move-dl
```

## Requirements

MOVE should run on any environmnet where Python is available. The variational
autoencoder architecture is implemented in PyTorch.

The training of the VAEs can be done using CPUs only or GPU acceleration. If
you do not have powerful GPUs available, it is possible to run using only CPUs.
For instance, the tutorial data set consisting of simulated drug, metabolomics
and proteomics data for 500 individuals runs fine on a standard macbook.

 Note: The pip installation of `move-dl` does not setup your local GPU automatically

# The MOVE pipeline

MOVE has five-six steps:

```
01. Encode the data into a format that can be read by MOVE
02. Finding the right architecture of the network focusing on reconstruction accuracy
03. Finding the right architecture of the network focusing on stability of the model
04. Use model, determined from steps 02-03, to create and analyze the latent space
05. Identify associations between a categorical and continuous datasets
05a. Using an ensemble of VAEs with the t-test approach
05b. Using an ensemble of VAEs with the Bayesian decision theory approach
06. If both 5a and 5b were run select the overlap between them
```

## How to run MOVE

Please refer to our [**documentation**](https://move-dl.readthedocs.io/) for
examples and [tutorials](https://move-dl.readthedocs.io/tutorial/index.html)
on how to run MOVE.

Additionally, you can copy
[this notebook](https://colab.research.google.com/drive/1RFWNsuGymCmppPsElBvDuA9zRbGskKmi?usp=sharing)
and follow its instructions to get familiar with our pipeline.

# Data sets

## DIRECT data set

The data used in notebooks are not available for testing due to the informed
consent given by study participants, the various national ethical approvals for
the study, and the European General Data Protection Regulation (GDPR).
Therefore, individual-level clinical and omics data cannot be transferred from
the centralized IMI-DIRECT repository. Requests for access to summary statistics
IMI-DIRECT data, including those presented here, can be made to
DIRECTdataaccess@Dundee.ac.uk. Requesters will be informed on how summary-level
data can be accessed via the DIRECT secure analysis platform following
submission of appropriate application. The IMI-DIRECT data access policy is
available [here](https://directdiabetes.org).

## Simulated and publicaly available data sets

We have therefore provided two datasets to test the workflow: a simulated
dataset and a publicly-available maize rhizosphere microbiome data set.

# Citation

To cite MOVE, use the following information:

Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. *et al*. Discovery of
drug–omics associations in type 2 diabetes with generative deep-learning models.
*Nat Biotechnol* (2023). https://doi.org/10.1038/s41587-022-01520-x

	a/README.md		b/README.md
1	# MOVE (Multi-Omics Variational autoEncoder)	1	# MOVE (Multi-Omics Variational autoEncoder)
2		2
3	[![PyPI version](https://badge.fury.io/py/move-dl.svg)](https://badge.fury.io/py/move-dl)	3	[![PyPI version](https://badge.fury.io/py/move-dl.svg)](https://badge.fury.io/py/move-dl)
4	[![Documentation Status](https://readthedocs.org/projects/move-dl/badge/?version=latest)](https://move-dl.readthedocs.io/?badge=latest)	4	[![Documentation Status](https://readthedocs.org/projects/move-dl/badge/?version=latest)](https://move-dl.readthedocs.io/?badge=latest)
5		5
6	The code in this repository can be used to run our Multi-Omics Variational	6	The code in this repository can be used to run our Multi-Omics Variational
7	autoEncoder (MOVE) framework for integration of omics and clinical variabels	7	autoEncoder (MOVE) framework for integration of omics and clinical variabels
8	spanning both categorial and continuous data. Our approach includes training	8	spanning both categorial and continuous data. Our approach includes training
9	ensemble VAE models and using in silico perturbation experiments to identify	9	ensemble VAE models and using in silico perturbation experiments to identify
10	cross omics associations. The manuscript has been published in Nature	10	cross omics associations. The manuscript has been published in Nature
11	Biotechnology:	11	Biotechnology:
12		12
13	> Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. et al. Discovery of	13	Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. et al. Discovery of
14	> drug–omics associations in type 2 diabetes with generative deep-learning	14	drug–omics associations in type 2 diabetes with generative deep-learning
15	> models. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-022-01520-x	15	models. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-022-01520-x
16		16
17	We developed the method based on a Type 2 Diabetes cohort from the IMI DIRECT	17	We developed the method based on a Type 2 Diabetes cohort from the IMI DIRECT
18	project containing 789 newly diagnosed T2D patients. The cohort and data	18	project containing 789 newly diagnosed T2D patients. The cohort and data
19	creation is described in	19	creation is described in
20	[Koivula et al.](https://dx.doi.org/10.1007%2Fs00125-019-4906-1) and	20	[Koivula et al.](https://dx.doi.org/10.1007%2Fs00125-019-4906-1) and
21	[Wesolowska-Andersen et al.](https://doi.org/10.1016/j.xcrm.2021.100477). For	21	[Wesolowska-Andersen et al.](https://doi.org/10.1016/j.xcrm.2021.100477). For
22	the analysis we included the following data:	22	the analysis we included the following data:
23		23
24	Multi-omics data sets:	24	Multi-omics data sets:
25	```	25	```
26	Genomics	26	Genomics
27	Transcriptomics	27	Transcriptomics
28	Proteomics	28	Proteomics
29	Metabolomics	29	Metabolomics
30	Metagenomics	30	Metagenomics
31	```	31	```
32		32
33	Other data sets:	33	Other data sets:
34	```	34	```
35	Clinical data (blood measurements, imaging data, ...)	35	Clinical data (blood measurements, imaging data, ...)
36	Questionnaire data (diet etc)	36	Questionnaire data (diet etc)
37	Accelerometer data	37	Accelerometer data
38	Medication data	38	Medication data
39	```	39	```
40		40
41	# Installation	41	# Installation
42		42
43	## Installing MOVE package	43	## Installing MOVE package
44		44
45	MOVE is written in Python and can be installed using `pip`:	45	MOVE is written in Python and can be installed using `pip`:
46		46
47	```bash	47	```bash
48	>>> pip install move-dl	48	>>> pip install move-dl
49	```	49	```
50		50
51	## Requirements	51	## Requirements
52		52
53	MOVE should run on any environmnet where Python is available. The variational	53	MOVE should run on any environmnet where Python is available. The variational
54	autoencoder architecture is implemented in PyTorch.	54	autoencoder architecture is implemented in PyTorch.
55		55
56	The training of the VAEs can be done using CPUs only or GPU acceleration. If	56	The training of the VAEs can be done using CPUs only or GPU acceleration. If
57	you do not have powerful GPUs available, it is possible to run using only CPUs.	57	you do not have powerful GPUs available, it is possible to run using only CPUs.
58	For instance, the tutorial data set consisting of simulated drug, metabolomics	58	For instance, the tutorial data set consisting of simulated drug, metabolomics
59	and proteomics data for 500 individuals runs fine on a standard macbook.	59	and proteomics data for 500 individuals runs fine on a standard macbook.
60		60
61	> Note: The pip installation of `move-dl` does not setup your local GPU automatically	61	Note: The pip installation of `move-dl` does not setup your local GPU automatically
62		62
63	# The MOVE pipeline	63	# The MOVE pipeline
64		64
65	MOVE has five-six steps:	65	MOVE has five-six steps:
66		66
67	```	67	```
68	01. Encode the data into a format that can be read by MOVE	68	01. Encode the data into a format that can be read by MOVE
69	02. Finding the right architecture of the network focusing on reconstruction accuracy	69	02. Finding the right architecture of the network focusing on reconstruction accuracy
70	03. Finding the right architecture of the network focusing on stability of the model	70	03. Finding the right architecture of the network focusing on stability of the model
71	04. Use model, determined from steps 02-03, to create and analyze the latent space	71	04. Use model, determined from steps 02-03, to create and analyze the latent space
72	05. Identify associations between a categorical and continuous datasets	72	05. Identify associations between a categorical and continuous datasets
73	05a. Using an ensemble of VAEs with the t-test approach	73	05a. Using an ensemble of VAEs with the t-test approach
74	05b. Using an ensemble of VAEs with the Bayesian decision theory approach	74	05b. Using an ensemble of VAEs with the Bayesian decision theory approach
75	06. If both 5a and 5b were run select the overlap between them	75	06. If both 5a and 5b were run select the overlap between them
76	```	76	```
77		77
78	## How to run MOVE	78	## How to run MOVE
79		79
80	Please refer to our [documentation](https://move-dl.readthedocs.io/) for	80	Please refer to our [documentation](https://move-dl.readthedocs.io/) for
81	examples and [tutorials](https://move-dl.readthedocs.io/tutorial/index.html)	81	examples and [tutorials](https://move-dl.readthedocs.io/tutorial/index.html)
82	on how to run MOVE.	82	on how to run MOVE.
83		83
84	Additionally, you can copy	84	Additionally, you can copy
85	[this notebook](https://colab.research.google.com/drive/1RFWNsuGymCmppPsElBvDuA9zRbGskKmi?usp=sharing)	85	[this notebook](https://colab.research.google.com/drive/1RFWNsuGymCmppPsElBvDuA9zRbGskKmi?usp=sharing)
86	and follow its instructions to get familiar with our pipeline.	86	and follow its instructions to get familiar with our pipeline.
87		87
88	# Data sets	88	# Data sets
89		89
90	## DIRECT data set	90	## DIRECT data set
91		91
92	The data used in notebooks are not available for testing due to the informed	92	The data used in notebooks are not available for testing due to the informed
93	consent given by study participants, the various national ethical approvals for	93	consent given by study participants, the various national ethical approvals for
94	the study, and the European General Data Protection Regulation (GDPR).	94	the study, and the European General Data Protection Regulation (GDPR).
95	Therefore, individual-level clinical and omics data cannot be transferred from	95	Therefore, individual-level clinical and omics data cannot be transferred from
96	the centralized IMI-DIRECT repository. Requests for access to summary statistics	96	the centralized IMI-DIRECT repository. Requests for access to summary statistics
97	IMI-DIRECT data, including those presented here, can be made to	97	IMI-DIRECT data, including those presented here, can be made to
98	DIRECTdataaccess@Dundee.ac.uk. Requesters will be informed on how summary-level	98	DIRECTdataaccess@Dundee.ac.uk. Requesters will be informed on how summary-level
99	data can be accessed via the DIRECT secure analysis platform following	99	data can be accessed via the DIRECT secure analysis platform following
100	submission of appropriate application. The IMI-DIRECT data access policy is	100	submission of appropriate application. The IMI-DIRECT data access policy is
101	available [here](https://directdiabetes.org).	101	available [here](https://directdiabetes.org).
102		102
103	## Simulated and publicaly available data sets	103	## Simulated and publicaly available data sets
104		104
105	We have therefore provided two datasets to test the workflow: a simulated	105	We have therefore provided two datasets to test the workflow: a simulated
106	dataset and a publicly-available maize rhizosphere microbiome data set.	106	dataset and a publicly-available maize rhizosphere microbiome data set.
107		107
108	# Citation	108	# Citation
109		109
110	To cite MOVE, use the following information:	110	To cite MOVE, use the following information:
111		111
112	Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. et al. Discovery of	112	Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. et al. Discovery of
113	drug–omics associations in type 2 diabetes with generative deep-learning models.	113	drug–omics associations in type 2 diabetes with generative deep-learning models.
114	Nat Biotechnol (2023). https://doi.org/10.1038/s41587-022-01520-x	114	Nat Biotechnol (2023). https://doi.org/10.1038/s41587-022-01520-x