Switch to unified view

a b/examples/walkthrough_mESC_dataset.Rmd
1
2
---
3
title: "Walkthrough with mESC dataset"
4
author: "Suoqin Jin, Lihua Zhang"
5
output: html_document
6
mainfont: Arial
7
vignette: >
8
  %\VignetteIndexEntry{Integrative analysis of single cell multi-omics data using scAI}
9
  %\VignetteEngine{knitr::rmarkdown}
10
  %\VignetteEncoding{UTF-8}
11
---
12
13
```{r setup, include = FALSE}
14
knitr::opts_chunk$set(
15
  collapse = TRUE,
16
  comment = "#>",
17
  root.dir = './'
18
)
19
```
20
21
This walkthrough will perform integrative analysis of paired single cell RNA-seq and DNA methylation data of mouse embryonic development.
22
This data describes mouse embryonic stem cells that are cultured in "2i" and ''serum" conditions, including 77 cells profiled by parallel single cell methylation and transcriptome sequencing technique scM&T-seq.
23
24
25
Load the required libraries
26
```{r message=FALSE,warning=FALSE}
27
library(scAI)
28
library(dplyr)
29
library(cowplot)
30
```
31
32
## Load data
33
The algorithm takes a list of two digital data matrices as input. Genes/loci should be in rows and cells in columns. rownames and colnames should be included. Before running the scAI model, we need to normalize the data to account for library size and select highly variable features. Our preprocessed data matrices after normalization and feature selection were provided for this walkthrough.
34
35
```{r}
36
load("/Users/suoqinjin/Documents/scAI/data/data_mESC.rda")
37
X <- data_mESC$data # List of data matrix
38
labels <- data_mESC$labels # the collected time of cells, which is used for validation
39
```
40
41
## Create a scAI object
42
```{r}
43
scAI_outs <- create_scAIobject(raw.data = X)
44
```
45
46
## Preprocess data
47
Perform quality control to remove low-quality cells and genes, and normalize the data.
48
Since this is a preprocessed data, we do not need to normalize the data. Thus we set `assay = NULL`.
49
50
```{r, results='asis'}
51
scAI_outs <- preprocessing(scAI_outs, assay = NULL, minFeatures = 200, minCells = 1,
52
                           libararyflag = F, logNormalize = F)
53
54
```
55
56
Add cell information into *pData* slot of the object
57
```{r}
58
scAI_outs <- addpData(scAI_outs, pdata = labels, pdata.name = "Conditions")
59
```
60
61
## Run scAI model
62
As depending on the random initilization the results might differ, we run scAI multiple times (e.g. nrun = 5) and output the best result. User can also output results from all runs by setting *keep_all = TRUE*. The key parameters here are the number of component/clusters (k). The `selectK` function can aid in selecting k. A suitable k is the one at which the magnitude of cophenetic correlation begins to fall.
63
```{r}
64
scAI_outs <- run_scAI(scAI_outs, K = 3, nrun = 5,alpha = 0.01,lambda = 1000, gamma = 100000)
65
66
```
67
68
## Identify cell clusters
69
We can also identify cell clusters based on the inferred cell loading matrix using Leiden algorithm.
70
```{r}
71
scAI_outs <- identifyClusters(scAI_outs, resolution = 1)
72
levels(scAI_outs@identity)
73
74
```
75
76
## Visualize cells onto the low-dimensional space
77
We can visualize cells onto the low-dimensional space generated by t-SNE, FIt-sne or UMAP.
78
Here, we perform UMAP dimension reduction. Cells are colored by either the published cell labels or the clustering inferred by scAI.
79
```{r, fig.width=7,fig.height = 3,  fig.wide = TRUE, fig.align = "center"}
80
scAI_outs <- reducedDims(scAI_outs, method = "umap",do.scale = F)
81
gg1 <- cellVisualization(scAI_outs, scAI_outs@embed$umap, color.by = "Conditions",show.legend = T, title = "Conditions")
82
gg2 <- cellVisualization(scAI_outs, scAI_outs@embed$umap, color.by = "cluster", ylabel = NULL, title = "scAI clusters")
83
cowplot::plot_grid(gg1, gg2)
84
85
```
86
87
Here, we perform comparison of the visualization of raw DNA-seq data with the aggregated data. Cells are colored by the collected time.
88
```{r, fig.width=9,fig.height = 3,  fig.wide = TRUE, fig.align = "center"}
89
cell_coords.RNA <- reducedDims(scAI_outs, data.use = scAI_outs@norm.data$RNA, do.scale = T, method = "pca", return.object = F)
90
cell_coords.DNA <- reducedDims(scAI_outs, data.use = scAI_outs@norm.data$DNA, do.scale = T, method = "pca", return.object = F)
91
cell_coords.DNAagg <- reducedDims(scAI_outs, data.use = scAI_outs@agg.data, do.scale = T, method = "pca", return.object = F)
92
93
gg1 <- cellVisualization(scAI_outs, cell_coords.RNA, color.by = "cluster",  show.legend = F, title = "scRNA-seq",xlabel = "PCA1", ylabel = "PCA2")
94
gg2 <- cellVisualization(scAI_outs, cell_coords.DNA, color.by = "cluster",show.legend = F, xlabel = "PCA1",ylabel = NULL,title = "scDNA-seq")
95
gg3 <- cellVisualization(scAI_outs, cell_coords.DNAagg, color.by = "cluster", xlabel = "PCA1",ylabel = NULL, title = "Aggregated scDNA-seq")
96
cowplot::plot_grid(gg1, gg2, gg3, ncol = 3)
97
98
```
99
## Feature plot
100
We can overlay the expression of features, or the cell loading values onto the low-dimensional space, e.g., VscAI, tsne, umap
101
```{r, fig.width=9, fig.height=2.5,  fig.wide = TRUE, fig.align = "center"}
102
featureScoreVisualization(scAI_outs, feature.scores = t(scAI_outs@fit$H), feature.use = c('factor1','factor2','factor3'),  method = "umap", nCol = 3, cell.size = 0.1, show.legend = T, show.legend.combined = F)
103
```
104
105
### Ranking the features (genes/loci) and highlighting the important markers in each factor
106
```{r, fig.width=6, fig.height=3,  fig.wide = TRUE, fig.align = "center"}
107
# show the markers of GR activation
108
feature_genes = c('Zfp42','Esrrb','Morc1','Fbxo15','Jam2','Klf4','Tcl1','Tbx3',
109
                  'Tex19.1','Krt8','Cald1','Anxa5','Tagln','Ahnak','Dsp','Anxa3','Krt19','Fgf5');
110
111
featureRankingPlot(scAI_outs, assay = 'RNA', feature.show = feature_genes, top.p = 0.5, ylabel = "Gene score")
112
```
113
### Identify differentially expressed features of each cell cluster
114
```{r}
115
markers.RNA.cluster <- identifyClusterMarkers(scAI_outs, assay = "RNA")
116
markers.DNA.cluster <- identifyClusterMarkers(scAI_outs, assay = 'DNA')
117
118
```
119
### Generate a heatmap to show the expression patterns of top features in cell clusters
120
```{r, fig.width=10,fig.height = 4, fig.align = "center"}
121
n.top = 10
122
markers.RNA.clusterTop <- markers.RNA.cluster %>% group_by(clusters) %>% top_n(n.top, logFC) %>% slice(1:n.top)
123
featureHeatmap(scAI_outs, assay = "RNA", feature.use = markers.RNA.clusterTop$features, group.by = "cluster")
124
markers.DNA.clusterTop <- markers.DNA.cluster %>% group_by(clusters) %>% top_n(n.top, logFC) %>% slice(1:n.top)
125
featureHeatmap(scAI_outs, assay = "DNA", feature.use = markers.DNA.clusterTop$features, group.by = "cluster")
126
```
127
128
## Embedding cells, genes, loci and factors into 2D-dimensions using our new visualization method VscAI
129
130
```{r message=FALSE,warning=FALSE}
131
scAI_outs <- getEmbeddings(scAI_outs)
132
133
```
134
135
### Visualization of the embedded cells, genes, loci and factors
136
User can provide a vector of the features (e.g., key marker genes/loci) to explore the biological meaning of the cell groups and enhance the interpretation of the data. 
137
```{r, fig.width=5.5,fig.height=4, fig.align = "center"}
138
VscAIplot(scAI_outs, gene.use = feature_genes, loci.use = NULL, loci.use.names = NULL, color.by = "cluster")
139
```
140
141
## Feature plot
142
We can overlay the expression of features onto the low-dimensional space, e.g., VscAI, tsne, umap
143
```{r, fig.width=9, fig.height=2.5,  fig.wide = TRUE, fig.align = "center"}
144
featureVisualization(scAI_outs, assay = "RNA", feature.use = c('Tcl1','Krt19','Dsp','Fgf5'),  method = "umap", nCol = 4, cell.size = 0.1, show.legend = F, show.legend.combined = F)
145
```
146
147