--- a +++ b/README.md @@ -0,0 +1,55 @@ +# integration_analysis_scripts +Scripts for multi-omics integration + +## Unsupervised analysis: *integration_unsupervised.R* + +This script performs unsupervised analyses (clustering) from transformed expression data (e.g., log fpkm) and methylation beta values + +## Prerequisites +This R script requires the following packages: +- iClusterPlus +- gplots +- lattice + +### Usage +```bash +Rscript integration_unsupervised.R [options] +``` + +| **PARAMETER** | **DEFAULT** | **DESCRIPTION** | +|-----------|--------------:|-------------| +*-d* | NULL | File with somatic mutation data | +*-C* | NULL | File with copy number variation data | +*-r* | NULL | File with expression data | +*-m* | NULL | File with methylation data (beta values) | +*-k* | 2 | Minimum number of clusters | +*-K* | 6 | Maximum number of clusters | +*-c* | 2 | Number of cores | +*-o* | out | output prefix | +*-h* | | Show help message and exit| + +For example, one can type +```bash +Rscript integration_unsupervised.R -r expression_matrix.txt -o output/ +``` + +### Details +The script involves 3 steps +- **Data transformation** of methylation beta values, using the logit function +- **Clustering** across a range of LASSO lambda penalties and for each number of clusters *K* using iClusterPlus +- **Selection** of the best lambda value (BIC) for each *K*, and plot of the R^2 as a function of *K* to help the choice of *K* +- **Selection of the top features** differentiating the clusters + +### Output +- A figure with R^2 as a function of *K*, and cluster memberships of each sample as a function of *K* + +In addition, for each value of *K*: +- an .RData file with clustering results +- a heatmap with the top features for each dataset +- a .txt file with the name of the top features for each dataset + +## Regression analysis for unsupervised analysis: *PCA_regression.R* + +This script provides functions to perform regression analysis between variables (e.g., batch variables or clinical variables) and latent factors as obtained by PCA or group factor analysis. + +