Data: Tabular Time Series Specialty: Endocrinology Laboratory: Blood Tests EHR: Demographics Diagnoses Medications Omics: Genomics Multi-omics Transcriptomics Wearable: Activity Clinical Purpose: Treatment Response Assessment Task: Biomarker Discovery
Diff of /README.md [000000] .. [c23b31]

Switch to unified view

a b/README.md
1
# MOVE (Multi-Omics Variational autoEncoder)
2
3
[![PyPI version](https://badge.fury.io/py/move-dl.svg)](https://badge.fury.io/py/move-dl)
4
[![Documentation Status](https://readthedocs.org/projects/move-dl/badge/?version=latest)](https://move-dl.readthedocs.io/?badge=latest)
5
6
The code in this repository can be used to run our Multi-Omics Variational
7
autoEncoder (MOVE) framework for integration of omics and clinical variabels
8
spanning both categorial and continuous data. Our approach includes training
9
ensemble VAE models and using *in silico* perturbation experiments to identify
10
cross omics associations. The manuscript has been published in Nature
11
Biotechnology:
12
13
> Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. *et al*. Discovery of
14
> drug–omics associations in type 2 diabetes with generative deep-learning
15
> models. *Nat Biotechnol* (2023). https://doi.org/10.1038/s41587-022-01520-x
16
17
We developed the method based on a Type 2 Diabetes cohort from the IMI DIRECT
18
project containing 789 newly diagnosed T2D patients. The cohort and data
19
creation is described in
20
[Koivula et al.](https://dx.doi.org/10.1007%2Fs00125-019-4906-1) and
21
[Wesolowska-Andersen et al.](https://doi.org/10.1016/j.xcrm.2021.100477). For
22
the analysis we included the following data:
23
24
Multi-omics data sets:
25
```
26
Genomics
27
Transcriptomics
28
Proteomics
29
Metabolomics
30
Metagenomics
31
```
32
33
Other data sets:
34
```
35
Clinical data (blood measurements, imaging data, ...)
36
Questionnaire data (diet etc)
37
Accelerometer data
38
Medication data
39
```
40
41
# Installation
42
43
## Installing MOVE package
44
45
MOVE is written in Python and can be installed using `pip`:
46
47
```bash
48
>>> pip install move-dl
49
```
50
51
## Requirements
52
53
MOVE should run on any environmnet where Python is available. The variational
54
autoencoder architecture is implemented in PyTorch.
55
56
The training of the VAEs can be done using CPUs only or GPU acceleration. If
57
you do not have powerful GPUs available, it is possible to run using only CPUs.
58
For instance, the tutorial data set consisting of simulated drug, metabolomics
59
and proteomics data for 500 individuals runs fine on a standard macbook.
60
61
> Note: The pip installation of `move-dl` does not setup your local GPU automatically
62
63
# The MOVE pipeline
64
65
MOVE has five-six steps:
66
67
```
68
01. Encode the data into a format that can be read by MOVE
69
02. Finding the right architecture of the network focusing on reconstruction accuracy
70
03. Finding the right architecture of the network focusing on stability of the model
71
04. Use model, determined from steps 02-03, to create and analyze the latent space
72
05. Identify associations between a categorical and continuous datasets
73
05a. Using an ensemble of VAEs with the t-test approach
74
05b. Using an ensemble of VAEs with the Bayesian decision theory approach
75
06. If both 5a and 5b were run select the overlap between them
76
```
77
78
## How to run MOVE
79
80
Please refer to our [**documentation**](https://move-dl.readthedocs.io/) for
81
examples and [tutorials](https://move-dl.readthedocs.io/tutorial/index.html)
82
on how to run MOVE.
83
84
Additionally, you can copy
85
[this notebook](https://colab.research.google.com/drive/1RFWNsuGymCmppPsElBvDuA9zRbGskKmi?usp=sharing)
86
and follow its instructions to get familiar with our pipeline.
87
88
# Data sets
89
90
## DIRECT data set
91
92
The data used in notebooks are not available for testing due to the informed
93
consent given by study participants, the various national ethical approvals for
94
the study, and the European General Data Protection Regulation (GDPR).
95
Therefore, individual-level clinical and omics data cannot be transferred from
96
the centralized IMI-DIRECT repository. Requests for access to summary statistics
97
IMI-DIRECT data, including those presented here, can be made to
98
DIRECTdataaccess@Dundee.ac.uk. Requesters will be informed on how summary-level
99
data can be accessed via the DIRECT secure analysis platform following
100
submission of appropriate application. The IMI-DIRECT data access policy is
101
available [here](https://directdiabetes.org).
102
103
## Simulated and publicaly available data sets
104
105
We have therefore provided two datasets to test the workflow: a simulated
106
dataset and a publicly-available maize rhizosphere microbiome data set.
107
108
# Citation
109
110
To cite MOVE, use the following information:
111
112
Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. *et al*. Discovery of
113
drug–omics associations in type 2 diabetes with generative deep-learning models.
114
*Nat Biotechnol* (2023). https://doi.org/10.1038/s41587-022-01520-x