Switch to unified view

a b/examples/walkthrough_simulation_dataset8.Rmd
1
2
---
3
title: "Walkthrough with the simulated dataset 8"
4
author: "Suoqin Jin, Lihua Zhang"
5
output: html_document
6
mainfont: Arial
7
vignette: >
8
  %\VignetteIndexEntry{Integrative analysis of single cell multi-omics data using scAI}
9
  %\VignetteEngine{knitr::rmarkdown}
10
  %\VignetteEncoding{UTF-8}
11
---
12
13
  ```{r setup, include = FALSE}
14
knitr::opts_chunk$set(
15
  collapse = TRUE,
16
  comment = "#>",
17
  root.dir = './'
18
)
19
```
20
21
22
This walkthrough outlines the key steps of scAI using the simulated dataset 8. This simulated data consist of paired single-cell RNA-seq and ATAC-seq data, including five imbalanced cell clusters with five clusters in scRNA-seq data and three clusters in scATAC-seq data.
23
24
Load the required libraries
25
```{r message=FALSE,warning=FALSE}
26
library(scAI)
27
library(dplyr)
28
library(cowplot)
29
library(ggplot2)
30
```
31
32
## Load data
33
The algorithm takes a list of two digital data matrices as input. Genes/loci should be in rows and cells in columns. rownames and colnames should be included. Before running the scAI model, we need to normalize the data to account for library size and select highly variable features.
34
35
```{r}
36
load("/Users/suoqinjin/Documents/scAI/data/data_simulation8.rda")
37
X <- data_simulation8$data # List of data matrix
38
labels.RNA <- data_simulation8$labels.RNA # the true labels of cells, five clusters in scRNA-seq data
39
labels.ATAC <- data_simulation8$labels.ATAC # the true labels of cells, three clusters in scATAC-seq data
40
```
41
42
## Create a scAI object
43
```{r}
44
scAI_outs <- create_scAIobject(raw.data = X)
45
```
46
## Preprocess data
47
Perform quality control to remove low-quality cells and genes, and normalize the data.
48
Since this is a simulated data, we do not need to normalize the data. Thus we set `assay = NULL`.
49
50
```{r, results='asis'}
51
scAI_outs <- preprocessing(scAI_outs, assay = NULL, minFeatures = 200, minCells = 1,
52
                           libararyflag = F, logNormalize = F)
53
```
54
Add cell information into *pData* slot of the object
55
```{r}
56
scAI_outs <- addpData(scAI_outs, pdata = cbind(labels.RNA,labels.ATAC), pdata.name = c("labels.RNA","labels.ATAC"))
57
```
58
59
## Run scAI model
60
As depending on the random initilization the results might differ, we run scAI multiple times (e.g. nrun = 5) and output the best result. User can also output results from all runs by setting *keep_all = TRUE*. The key parameters here are the number of factors/clusters (k). The `selectK` function can aid in selecting k. A suitable k is the one at which the magnitude of cophenetic correlation begins to fall.
61
```{r}
62
scAI_outs <- run_scAI(scAI_outs, K = 5, nrun = 2)
63
```
64
65
## Visualize the inferred biologically relevant components
66
We plot the heatmap of the three learned low-rank matrices using hierarchical clustering. The ground truth labels of cells are used for validation (not necessary).
67
```{r, fig.width=7,fig.height = 8, fig.wide = TRUE, fig.align = "center"}
68
lmHeatmap(scAI_outs, color.by = "labels.RNA")
69
```
70
71
## Visualize cells onto the low-dimensional space
72
We can visualize cells onto the low-dimensional space using t-SNE, FIt-sne or UMAP.
73
Here, we perform comparison of the visualization of raw ATAC-seq data with the aggregated data. Cells are colored by the true labels.
74
```{r, fig.width=7,fig.height = 3.5,  fig.wide = TRUE, fig.align = "center"}
75
cell_coords.ori <- reducedDims(scAI_outs, data.use = scAI_outs@norm.data$ATAC, do.scale = F, method = "umap", return.object = F)
76
cell_coords.agg <- reducedDims(scAI_outs, data.use = scAI_outs@agg.data, do.scale = F, method = "umap", return.object = F)
77
78
gg1 <- cellVisualization(scAI_outs, cell_coords.ori, color.by = "labels.ATAC",show.legend = F, title = "scATAC-seq")
79
gg2 <- cellVisualization(scAI_outs, cell_coords.agg, color.by = "labels.ATAC", ylabel = NULL, title = "Aggregated scATAC-seq")
80
cowplot::plot_grid(gg1, gg2)
81
```
82
83
## Identify enriched features in each factor
84
```{r}
85
markers_RNA <- identifyFactorMarkers(scAI_outs, assay = 'RNA', n.top = 5)
86
markers_ATAC <- identifyFactorMarkers(scAI_outs, assay = 'ATAC',  n.top = 5)
87
```
88
89
### Ranking the features (genes/loci) and show the top markers in each factor
90
```{r, fig.width=8, fig.height=3,  fig.wide = TRUE, fig.align = "center"}
91
featureRankingPlot(scAI_outs, assay = 'RNA', feature.show = markers_RNA$markers.top$features, top.p = 0.1, ylabel = "Gene score", ncol = 5)
92
featureRankingPlot(scAI_outs, assay = 'ATAC', feature.show = markers_ATAC$markers.top$features, top.p = 0.1, ylabel = "Locus score", ncol = 5)
93
94
```
95
96
## Embedding cells, genes, loci and factors into 2D-dimensions using our new visualization method VscAI
97
98
```{r message=FALSE,warning=FALSE}
99
scAI_outs <- getEmbeddings(scAI_outs)
100
101
```
102
103
### Visualization of the embedding using VscAI
104
User can provide a vector of the features (e.g., key marker genes/loci) to explore the biological meaning of the cell groups and enhance the interpretation of the data. Here, we select the top two features of each factor.
105
```{r, fig.width=10,fig.height=4, fig.align = "center"}
106
genes.embed <- markers_RNA$markers.top %>% group_by(factors) %>% slice(1:2)
107
genes.embed <- as.character(genes.embed$features)
108
loci.embed <- markers_ATAC$markers.top %>% group_by(factors) %>% slice(1:2)
109
loci.embed <- as.character(loci.embed$features)
110
111
gg1 <- VscAIplot(scAI_outs, gene.use = genes.embed, loci.use = NULL, loci.use.names = NULL, color.by = "labels.RNA")
112
gg2 <- VscAIplot(scAI_outs, gene.use = NULL, loci.use = loci.embed, loci.use.names = loci.embed, color.by = "labels.ATAC")
113
cowplot::plot_grid(gg1, gg2)
114
115
```
116
117
## Feature plot
118
We can overlay the expression of features, or the cell loading values onto the low-dimensional space, e.g., VscAI, tsne, umap
119
```{r, fig.width=9, fig.height=5,  fig.wide = TRUE, fig.align = "center"}
120
featureScoreVisualization(scAI_outs, feature.scores = t(scAI_outs@fit$H), feature.use = c('factor1','factor2','factor3','factor4','factor5'),  method = "VscAI", nCol = 3, cell.size = 0.1, show.legend = T, show.legend.combined = F)
121
```
122
123
## Identify cell clusters
124
We can also identify cell clusters based on the inferred cell loading matrix using Leiden algorithm.
125
```{r}
126
scAI_outs <- identifyClusters(scAI_outs, resolution = 0.05)
127
128
```
129
## Visualize cells onto the low-dimensional space
130
We can visualize cells onto the low-dimensional space generated by t-SNE, FIt-sne or UMAP.
131
Here, we perform UMAP dimension reduction. Cells are colored by the clustering inferred by scAI.
132
```{r, fig.width=4.5,fig.height = 3.5,  fig.wide = TRUE, fig.align = "center"}
133
scAI_outs <- reducedDims(scAI_outs, method = "umap")
134
cellVisualization(scAI_outs, scAI_outs@embed$umap, color.by = "cluster")
135
136
```
137