|
a |
|
b/README.md |
|
|
1 |
# integration_analysis_scripts |
|
|
2 |
Scripts for multi-omics integration |
|
|
3 |
|
|
|
4 |
## Unsupervised analysis: *integration_unsupervised.R* |
|
|
5 |
|
|
|
6 |
This script performs unsupervised analyses (clustering) from transformed expression data (e.g., log fpkm) and methylation beta values |
|
|
7 |
|
|
|
8 |
## Prerequisites |
|
|
9 |
This R script requires the following packages: |
|
|
10 |
- iClusterPlus |
|
|
11 |
- gplots |
|
|
12 |
- lattice |
|
|
13 |
|
|
|
14 |
### Usage |
|
|
15 |
```bash |
|
|
16 |
Rscript integration_unsupervised.R [options] |
|
|
17 |
``` |
|
|
18 |
|
|
|
19 |
| **PARAMETER** | **DEFAULT** | **DESCRIPTION** | |
|
|
20 |
|-----------|--------------:|-------------| |
|
|
21 |
*-d* | NULL | File with somatic mutation data | |
|
|
22 |
*-C* | NULL | File with copy number variation data | |
|
|
23 |
*-r* | NULL | File with expression data | |
|
|
24 |
*-m* | NULL | File with methylation data (beta values) | |
|
|
25 |
*-k* | 2 | Minimum number of clusters | |
|
|
26 |
*-K* | 6 | Maximum number of clusters | |
|
|
27 |
*-c* | 2 | Number of cores | |
|
|
28 |
*-o* | out | output prefix | |
|
|
29 |
*-h* | | Show help message and exit| |
|
|
30 |
|
|
|
31 |
For example, one can type |
|
|
32 |
```bash |
|
|
33 |
Rscript integration_unsupervised.R -r expression_matrix.txt -o output/ |
|
|
34 |
``` |
|
|
35 |
|
|
|
36 |
### Details |
|
|
37 |
The script involves 3 steps |
|
|
38 |
- **Data transformation** of methylation beta values, using the logit function |
|
|
39 |
- **Clustering** across a range of LASSO lambda penalties and for each number of clusters *K* using iClusterPlus |
|
|
40 |
- **Selection** of the best lambda value (BIC) for each *K*, and plot of the R^2 as a function of *K* to help the choice of *K* |
|
|
41 |
- **Selection of the top features** differentiating the clusters |
|
|
42 |
|
|
|
43 |
### Output |
|
|
44 |
- A figure with R^2 as a function of *K*, and cluster memberships of each sample as a function of *K* |
|
|
45 |
|
|
|
46 |
In addition, for each value of *K*: |
|
|
47 |
- an .RData file with clustering results |
|
|
48 |
- a heatmap with the top features for each dataset |
|
|
49 |
- a .txt file with the name of the top features for each dataset |
|
|
50 |
|
|
|
51 |
## Regression analysis for unsupervised analysis: *PCA_regression.R* |
|
|
52 |
|
|
|
53 |
This script provides functions to perform regression analysis between variables (e.g., batch variables or clinical variables) and latent factors as obtained by PCA or group factor analysis. |
|
|
54 |
|
|
|
55 |
|