Diff of /README.md [000000] .. [91a5d7]

Switch to unified view

a b/README.md
1
# integration_analysis_scripts
2
Scripts for multi-omics integration
3
4
## Unsupervised analysis: *integration_unsupervised.R*
5
6
This script performs unsupervised analyses (clustering) from transformed expression data (e.g., log fpkm) and methylation beta values
7
8
## Prerequisites
9
This R script requires the following packages:
10
- iClusterPlus
11
- gplots
12
- lattice
13
14
### Usage
15
```bash
16
Rscript integration_unsupervised.R [options]
17
```
18
19
| **PARAMETER** | **DEFAULT** | **DESCRIPTION** |
20
|-----------|--------------:|-------------| 
21
*-d* | NULL | File with somatic mutation data |
22
*-C* | NULL | File with copy number variation data |
23
*-r* | NULL | File with expression data |
24
*-m* | NULL | File with methylation data (beta values) |
25
*-k* | 2 | Minimum number of clusters |
26
*-K* | 6 | Maximum number of clusters |
27
*-c* | 2 | Number of cores |
28
*-o* | out | output prefix |
29
*-h*    |  | Show help message and exit|
30
31
For example, one can type
32
```bash
33
Rscript integration_unsupervised.R -r expression_matrix.txt -o output/
34
```
35
36
### Details
37
The script involves 3 steps
38
- **Data transformation** of methylation beta values, using the logit function
39
- **Clustering** across a range of LASSO lambda penalties and for each number of clusters *K* using iClusterPlus
40
- **Selection** of the best lambda value (BIC) for each *K*, and plot of the R^2 as a function of *K* to help the choice of *K*
41
- **Selection of the top features** differentiating the clusters
42
43
### Output
44
- A figure with R^2 as a function of *K*, and cluster memberships of each sample as a function of *K*
45
46
In addition, for each value of *K*:
47
- an .RData file with clustering results
48
- a heatmap with the top features for each dataset
49
- a .txt file with the name of the top features for each dataset
50
51
## Regression analysis for unsupervised analysis: *PCA_regression.R*
52
53
This script provides functions to perform regression analysis between variables (e.g., batch variables or clinical variables) and latent factors as obtained by PCA or group factor analysis.
54
55