Data: Tabular Time Series Specialty: Endocrinology Laboratory: Blood Tests EHR: Demographics Diagnoses Medications Omics: Genomics Multi-omics Transcriptomics Wearable: Activity Clinical Purpose: Treatment Response Assessment Task: Biomarker Discovery

Switch to unified view

a/README.md b/README.md
1
# MOVE (Multi-Omics Variational autoEncoder)
1
# MOVE (Multi-Omics Variational autoEncoder)
2
2
3
[![PyPI version](https://badge.fury.io/py/move-dl.svg)](https://badge.fury.io/py/move-dl)
3
[![PyPI version](https://badge.fury.io/py/move-dl.svg)](https://badge.fury.io/py/move-dl)
4
[![Documentation Status](https://readthedocs.org/projects/move-dl/badge/?version=latest)](https://move-dl.readthedocs.io/?badge=latest)
4
[![Documentation Status](https://readthedocs.org/projects/move-dl/badge/?version=latest)](https://move-dl.readthedocs.io/?badge=latest)
5
5
6
The code in this repository can be used to run our Multi-Omics Variational
6
The code in this repository can be used to run our Multi-Omics Variational
7
autoEncoder (MOVE) framework for integration of omics and clinical variabels
7
autoEncoder (MOVE) framework for integration of omics and clinical variabels
8
spanning both categorial and continuous data. Our approach includes training
8
spanning both categorial and continuous data. Our approach includes training
9
ensemble VAE models and using *in silico* perturbation experiments to identify
9
ensemble VAE models and using *in silico* perturbation experiments to identify
10
cross omics associations. The manuscript has been published in Nature
10
cross omics associations. The manuscript has been published in Nature
11
Biotechnology:
11
Biotechnology:
12
12
13
> Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. *et al*. Discovery of
13
Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. *et al*. Discovery of
14
> drug–omics associations in type 2 diabetes with generative deep-learning
14
drug–omics associations in type 2 diabetes with generative deep-learning
15
> models. *Nat Biotechnol* (2023). https://doi.org/10.1038/s41587-022-01520-x
15
models. *Nat Biotechnol* (2023). https://doi.org/10.1038/s41587-022-01520-x
16
16
17
We developed the method based on a Type 2 Diabetes cohort from the IMI DIRECT
17
We developed the method based on a Type 2 Diabetes cohort from the IMI DIRECT
18
project containing 789 newly diagnosed T2D patients. The cohort and data
18
project containing 789 newly diagnosed T2D patients. The cohort and data
19
creation is described in
19
creation is described in
20
[Koivula et al.](https://dx.doi.org/10.1007%2Fs00125-019-4906-1) and
20
[Koivula et al.](https://dx.doi.org/10.1007%2Fs00125-019-4906-1) and
21
[Wesolowska-Andersen et al.](https://doi.org/10.1016/j.xcrm.2021.100477). For
21
[Wesolowska-Andersen et al.](https://doi.org/10.1016/j.xcrm.2021.100477). For
22
the analysis we included the following data:
22
the analysis we included the following data:
23
23
24
Multi-omics data sets:
24
Multi-omics data sets:
25
```
25
```
26
Genomics
26
Genomics
27
Transcriptomics
27
Transcriptomics
28
Proteomics
28
Proteomics
29
Metabolomics
29
Metabolomics
30
Metagenomics
30
Metagenomics
31
```
31
```
32
32
33
Other data sets:
33
Other data sets:
34
```
34
```
35
Clinical data (blood measurements, imaging data, ...)
35
Clinical data (blood measurements, imaging data, ...)
36
Questionnaire data (diet etc)
36
Questionnaire data (diet etc)
37
Accelerometer data
37
Accelerometer data
38
Medication data
38
Medication data
39
```
39
```
40
40
41
# Installation
41
# Installation
42
42
43
## Installing MOVE package
43
## Installing MOVE package
44
44
45
MOVE is written in Python and can be installed using `pip`:
45
MOVE is written in Python and can be installed using `pip`:
46
46
47
```bash
47
```bash
48
>>> pip install move-dl
48
>>> pip install move-dl
49
```
49
```
50
50
51
## Requirements
51
## Requirements
52
52
53
MOVE should run on any environmnet where Python is available. The variational
53
MOVE should run on any environmnet where Python is available. The variational
54
autoencoder architecture is implemented in PyTorch.
54
autoencoder architecture is implemented in PyTorch.
55
55
56
The training of the VAEs can be done using CPUs only or GPU acceleration. If
56
The training of the VAEs can be done using CPUs only or GPU acceleration. If
57
you do not have powerful GPUs available, it is possible to run using only CPUs.
57
you do not have powerful GPUs available, it is possible to run using only CPUs.
58
For instance, the tutorial data set consisting of simulated drug, metabolomics
58
For instance, the tutorial data set consisting of simulated drug, metabolomics
59
and proteomics data for 500 individuals runs fine on a standard macbook.
59
and proteomics data for 500 individuals runs fine on a standard macbook.
60
60
61
> Note: The pip installation of `move-dl` does not setup your local GPU automatically
61
 Note: The pip installation of `move-dl` does not setup your local GPU automatically
62
62
63
# The MOVE pipeline
63
# The MOVE pipeline
64
64
65
MOVE has five-six steps:
65
MOVE has five-six steps:
66
66
67
```
67
```
68
01. Encode the data into a format that can be read by MOVE
68
01. Encode the data into a format that can be read by MOVE
69
02. Finding the right architecture of the network focusing on reconstruction accuracy
69
02. Finding the right architecture of the network focusing on reconstruction accuracy
70
03. Finding the right architecture of the network focusing on stability of the model
70
03. Finding the right architecture of the network focusing on stability of the model
71
04. Use model, determined from steps 02-03, to create and analyze the latent space
71
04. Use model, determined from steps 02-03, to create and analyze the latent space
72
05. Identify associations between a categorical and continuous datasets
72
05. Identify associations between a categorical and continuous datasets
73
05a. Using an ensemble of VAEs with the t-test approach
73
05a. Using an ensemble of VAEs with the t-test approach
74
05b. Using an ensemble of VAEs with the Bayesian decision theory approach
74
05b. Using an ensemble of VAEs with the Bayesian decision theory approach
75
06. If both 5a and 5b were run select the overlap between them
75
06. If both 5a and 5b were run select the overlap between them
76
```
76
```
77
77
78
## How to run MOVE
78
## How to run MOVE
79
79
80
Please refer to our [**documentation**](https://move-dl.readthedocs.io/) for
80
Please refer to our [**documentation**](https://move-dl.readthedocs.io/) for
81
examples and [tutorials](https://move-dl.readthedocs.io/tutorial/index.html)
81
examples and [tutorials](https://move-dl.readthedocs.io/tutorial/index.html)
82
on how to run MOVE.
82
on how to run MOVE.
83
83
84
Additionally, you can copy
84
Additionally, you can copy
85
[this notebook](https://colab.research.google.com/drive/1RFWNsuGymCmppPsElBvDuA9zRbGskKmi?usp=sharing)
85
[this notebook](https://colab.research.google.com/drive/1RFWNsuGymCmppPsElBvDuA9zRbGskKmi?usp=sharing)
86
and follow its instructions to get familiar with our pipeline.
86
and follow its instructions to get familiar with our pipeline.
87
87
88
# Data sets
88
# Data sets
89
89
90
## DIRECT data set
90
## DIRECT data set
91
91
92
The data used in notebooks are not available for testing due to the informed
92
The data used in notebooks are not available for testing due to the informed
93
consent given by study participants, the various national ethical approvals for
93
consent given by study participants, the various national ethical approvals for
94
the study, and the European General Data Protection Regulation (GDPR).
94
the study, and the European General Data Protection Regulation (GDPR).
95
Therefore, individual-level clinical and omics data cannot be transferred from
95
Therefore, individual-level clinical and omics data cannot be transferred from
96
the centralized IMI-DIRECT repository. Requests for access to summary statistics
96
the centralized IMI-DIRECT repository. Requests for access to summary statistics
97
IMI-DIRECT data, including those presented here, can be made to
97
IMI-DIRECT data, including those presented here, can be made to
98
DIRECTdataaccess@Dundee.ac.uk. Requesters will be informed on how summary-level
98
DIRECTdataaccess@Dundee.ac.uk. Requesters will be informed on how summary-level
99
data can be accessed via the DIRECT secure analysis platform following
99
data can be accessed via the DIRECT secure analysis platform following
100
submission of appropriate application. The IMI-DIRECT data access policy is
100
submission of appropriate application. The IMI-DIRECT data access policy is
101
available [here](https://directdiabetes.org).
101
available [here](https://directdiabetes.org).
102
102
103
## Simulated and publicaly available data sets
103
## Simulated and publicaly available data sets
104
104
105
We have therefore provided two datasets to test the workflow: a simulated
105
We have therefore provided two datasets to test the workflow: a simulated
106
dataset and a publicly-available maize rhizosphere microbiome data set.
106
dataset and a publicly-available maize rhizosphere microbiome data set.
107
107
108
# Citation
108
# Citation
109
109
110
To cite MOVE, use the following information:
110
To cite MOVE, use the following information:
111
111
112
Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. *et al*. Discovery of
112
Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. *et al*. Discovery of
113
drug–omics associations in type 2 diabetes with generative deep-learning models.
113
drug–omics associations in type 2 diabetes with generative deep-learning models.
114
*Nat Biotechnol* (2023). https://doi.org/10.1038/s41587-022-01520-x
114
*Nat Biotechnol* (2023). https://doi.org/10.1038/s41587-022-01520-x