Card

integration_analysis_scripts

Scripts for multi-omics integration

Unsupervised analysis: integration_unsupervised.R

This script performs unsupervised analyses (clustering) from transformed expression data (e.g., log fpkm) and methylation beta values

Prerequisites

This R script requires the following packages:
- iClusterPlus
- gplots
- lattice

Usage

Rscript integration_unsupervised.R [options]
PARAMETER DEFAULT DESCRIPTION
-d NULL File with somatic mutation data
-C NULL File with copy number variation data
-r NULL File with expression data
-m NULL File with methylation data (beta values)
-k 2 Minimum number of clusters
-K 6 Maximum number of clusters
-c 2 Number of cores
-o out output prefix
-h Show help message and exit

For example, one can type

Rscript integration_unsupervised.R -r expression_matrix.txt -o output/

Details

The script involves 3 steps
- Data transformation of methylation beta values, using the logit function
- Clustering across a range of LASSO lambda penalties and for each number of clusters K using iClusterPlus
- Selection of the best lambda value (BIC) for each K, and plot of the R^2 as a function of K to help the choice of K
- Selection of the top features differentiating the clusters

Output

  • A figure with R^2 as a function of K, and cluster memberships of each sample as a function of K

In addition, for each value of K:
- an .RData file with clustering results
- a heatmap with the top features for each dataset
- a .txt file with the name of the top features for each dataset

Regression analysis for unsupervised analysis: PCA_regression.R

This script provides functions to perform regression analysis between variables (e.g., batch variables or clinical variables) and latent factors as obtained by PCA or group factor analysis.