Diff of /Preprocessing.Rmd [000000] .. [c9fae8]

Switch to unified view

a b/Preprocessing.Rmd
1
---
2
title: "Preprocessing"
3
output:
4
  html_document:
5
    df_print: paged
6
editor_options: 
7
  chunk_output_type: console
8
---
9
10
# Preprocessing Data (Following OSTA Analysis Pipeline)
11
12
In this markdown, analysis and quality control steps are performed following chapters 8 and 9 of the book 'Orchestrating Spatially-Resolved Transcriptomics Analysis with Bioconductor' <https://lmweber.org/OSTA-book/>. Data is fetched and plotted, then QC metrics are calculated and data is filtered based on QC thresholds for a single sample.
13
14
### Overview of Data
15
16
Data is sourced from spatialLIBD (<http://spatial.libd.org/spatialLIBD/>), a project containing LIBD human dorsolateral pre-frontal cortex (DLPFC) spatial transcriptomics data from the 10x Genomics Visium Platform (<https://10xgenomics.com/products/spatial-gene-expression>). The data contains 12 lowres samples each with a scale factor of 0.0450045.
17
18
### Importing Libraries
19
20
```{r Import Libraries, message=FALSE}
21
library(spatialLIBD)
22
library(ggspavis)
23
library(scater)
24
library(ggplot2)
25
```
26
27
### Helper Functions for QC
28
29
The following functions are called by the driver code to perform plotting and QC operations.
30
31
```{r Helper Functions for QC, message=FALSE}
32
#' Plots a SpatialExperiment object by transforming the image into a rasterGrob object and calling ggplot
33
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
34
#' @return None
35
plot_image <- function(spe_sub) {
36
  # get histology image
37
  img <- SpatialExperiment::imgRaster(spe_sub)
38
  
39
  ## Transform to a rasterGrob object
40
  grob <- grid::rasterGrob(img, width = grid::unit(1, "npc"), height = grid::unit(1, "npc"))
41
  
42
  # Create df for plotting
43
  sample_df <- as.data.frame(colData(spe_sub), optional = TRUE)
44
  
45
  ## Make a plot using geom_spatial
46
  p <- ggplot2::ggplot(
47
      sample_df,
48
      ggplot2::aes(
49
          x = pxl_col_in_fullres * SpatialExperiment::scaleFactors(spe_sub),
50
          y = pxl_row_in_fullres * SpatialExperiment::scaleFactors(spe_sub),
51
      )
52
  ) +
53
      geom_spatial(
54
          data = tibble::tibble(grob = list(grob)),
55
          ggplot2::aes(grob = grob),
56
          x = 0.5,
57
          y = 0.5
58
      )
59
  
60
  ## Show the plot
61
  print(p)
62
  p
63
}
64
65
#' Identifies mitochondrial genes, and adds per spot QC metrics including mitochondrial genes as a feature subset
66
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
67
#' @return spe_sub Spatial Experiment object with added QC metrics
68
add_QC_metrics <- function(spe_sub) {
69
  # identify mitochondrial genes
70
  is_mito <- grepl("(^MT-)|(^mt-)", rowData(spe_sub)$gene_name)
71
  
72
  # add per spot QC metrics, including mitochondrial genes as a feature subset
73
  spe_sub <- addPerCellQC(spe_sub, subsets = list(mito = is_mito))
74
  spe_sub
75
}
76
77
#' Sets a lower filtering threshold for library size, plots QC plot and spatial pattern of spots
78
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
79
#' @param threshold lower limit on library size
80
#' @return spe_sub Spatial Experiment object with added col data after lib size thresholding
81
threshold_lib_size <- function(spe_sub, threshold) {
82
  # continuous variable containing total number of genes at each spot
83
  hist(colData(spe_sub)$sum, breaks = 20)
84
  
85
  # set filtering threshold for library size
86
  print(plotQC(spe_sub, type = "scatter", 
87
         metric_x = "cell_count", metric_y = "sum", 
88
         threshold_y = threshold))
89
  
90
  # check number of spots below threshold
91
  qc_lib_size <- colData(spe_sub)$sum < threshold
92
  print(table(qc_lib_size))
93
  
94
  # look for spatial pattern in discarded spots
95
  colData(spe_sub)$qc_lib_size <- qc_lib_size
96
  
97
  # plot spots
98
  threshold_row <- colData(spe_sub)[colData(spe_sub)$qc_lib_size,][,'array_row']
99
  threshold_col <- colData(spe_sub)[colData(spe_sub)$qc_lib_size,][,'array_col']
100
  print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point() + ggtitle("Discarded Spots (Low Library Size)"))
101
  
102
  spe_sub
103
}
104
105
#' Sets a lower filtering threshold for number of expressed genes, plots QC plot and spatial pattern of spots
106
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
107
#' @param threshold lower limit on number of expressed genes
108
#' @return spe_sub Spatial Experiment object with added col data after expressed gene thresholding
109
threshold_expressed_genes <- function(spe_sub, threshold) {
110
  # continuous variable containing total number of genes at each spot
111
  hist(colData(spe_sub)$detected, breaks = 20)
112
  
113
  # set filtering threshold for library size at 500
114
  print(plotQC(spe_sub, type = "scatter", 
115
         metric_x = "cell_count", metric_y = "detected", 
116
         threshold_y = threshold))
117
  
118
  # check number of spots below threshold
119
  qc_detected <- colData(spe_sub)$detected < threshold
120
  print(table(qc_lib_size))
121
  
122
  # look for spatial pattern in discarded spots
123
  colData(spe_sub)$qc_detected <- qc_detected
124
  
125
  # plot spots
126
  threshold_row <- colData(spe_sub)[colData(spe_sub)$qc_detected,][,'array_row']
127
  threshold_col <- colData(spe_sub)[colData(spe_sub)$qc_detected,][,'array_col']
128
  print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point() + ggtitle("Discarded Spots (Low Expressed Genes)"))
129
  
130
  spe_sub
131
}
132
133
#' Sets an upper threshold on the proportion of mitochondrial reads
134
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
135
#' @param threshold upper limit on proportional of mitochondrial reads
136
#' @return spe_sub Spatial Experiment object with added col data after mitochondrial read thresholding
137
threshold_mito_reads <- function(spe_sub, threshold) {
138
  # histogram of mitochondrial read proportions
139
  hist(colData(spe_sub)$subsets_mito_percent, breaks = 20)
140
  
141
  # plot mitochondrial read proportion vs. number of cells per spot
142
  print(plotQC(spe_sub, type = "scatter", 
143
         metric_x = "cell_count", metric_y = "subsets_mito_percent", 
144
         threshold_y = threshold))
145
  
146
  # select QC threshold for mitochondrial read proportion
147
  qc_mito <- colData(spe_sub)$subsets_mito_percent > threshold
148
  print(table(qc_mito))
149
  
150
  # look for spatial pattern in discarded spots
151
  colData(spe_sub)$qc_mito <- qc_mito
152
  
153
  # plot spots
154
  threshold_row <- colData(spe_sub)[colData(spe_sub)$qc_mito,][,'array_row']
155
  threshold_col <- colData(spe_sub)[colData(spe_sub)$qc_mito,][,'array_col']
156
  print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point() + ggtitle("Discarded Spots (High Mitochondrial Reads)"))
157
  
158
  spe_sub
159
}
160
161
#' Sets an upper threshold on the number of cells per spot
162
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
163
#' @param threshold upper limit on number of cells per spot
164
#' @return spe_sub Spatial Experiment object with added col data after cell number thresholding
165
threshold_num_cells <- function(spe_sub, threshold) {
166
  # histogram of number of cells per spot
167
  hist(colData(spe_sub)$cell_count, breaks = 20)
168
  
169
  # distribution of cells per spot
170
  tbl_cells_per_spot <- table(colData(spe_sub)$cell_count)
171
  
172
  # plot number of expressed genes vs. number of cells per spot
173
  print(plotQC(spe_sub, type = "scatter", 
174
         metric_x = "cell_count", metric_y = "detected", 
175
         threshold_x = threshold))
176
  
177
  # select QC threshold for number of cells per spot
178
  qc_cell_count <- colData(spe_sub)$cell_count > threshold
179
  print(table(qc_cell_count))
180
  colData(spe_sub)$qc_cell_count <- qc_cell_count
181
  
182
  # plot spots
183
  threshold_row <- colData(spe_sub)[colData(spe_sub)$qc_cell_count,][,'array_row']
184
  threshold_col <- colData(spe_sub)[colData(spe_sub)$qc_cell_count,][,'array_col']
185
  print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point())
186
  
187
  spe_sub
188
}
189
190
#' Discard spots based on previous QC metrics
191
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
192
#' @return spe_sub Spatial Experiment object with low quality spots removed
193
discard_spots <- function(spe_sub) {
194
  # get discarded spots from each QC operation
195
  qc_lib_size <- colData(spe_sub)$qc_lib_size
196
  qc_detected <- colData(spe_sub)$qc_detected
197
  qc_mito <- colData(spe_sub)$qc_mito
198
  qc_cell_count <- colData(spe_sub)$qc_cell_count
199
  
200
  # number of discarded spots for each metric
201
  apply(cbind(qc_lib_size, qc_detected, qc_mito, qc_cell_count), 2, sum)
202
  
203
  # combined set of discarded spots
204
  discard <- qc_lib_size | qc_detected | qc_mito | qc_cell_count
205
  print(table(discard))
206
  
207
  # store in object
208
  colData(spe_sub)$discard <- discard
209
  
210
  # check spatial pattern of combined set of discarded spots
211
  # plot spots
212
  threshold_row <- colData(spe_sub)[colData(spe_sub)$discard,][,'array_row']
213
  threshold_col <- colData(spe_sub)[colData(spe_sub)$discard,][,'array_col']
214
  print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point())
215
  
216
  spe_sub <- spe_sub[, !colData(spe_sub)$discard]
217
  print(dim(spe_sub))
218
  spe_sub
219
}
220
221
#' Discard uninformative genes from the Spatial Experiment object
222
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
223
#' @param var_threshold lower threshold for amount of variance willing to tolerate among the spots for a gene
224
#' @return spe_sub Spatial Experiment object with uninformative genes removed
225
discard_uninformative_genes <- function(spe_sub, var_threshold=1e-4) {
226
  logcounts <- t(logcounts(spe_sub))
227
  mito <- (rowData(spe_sub)$gene_name %in% rowData(spe_sub)$gene_name[is_mito])
228
  zero_expression <- apply(logcounts, 2, function(x) length(unique(x)) == 1)
229
  low_var <- apply(logcounts, 2, function(x) var(x) < 1e-4) 
230
  discard <- zero_expression | low_var | mito
231
  spe_sub <- spe_sub[!discard,]
232
  spe_sub
233
}
234
```
235
236
### Driver Code
237
238
In the following block, data is first fetched from SpatialLIBD and filtered to only include a single sample. Then, the sample is plotted and QC metrics are added to the SpatialExperiment object.
239
240
Then, the following four QC operations are performed to get rid of low quality spots. Note that all spots are already over tissue.
241
242
1.  Set a lower threshold over library size. We want to remove spots with a low number of UMIs counts per spot.
243
2.  Set a lower threshold on the number of expressed features (i.e. genes) per spot.
244
3.  Set an upper threshold on the mumber of mitochondrial reads per spot. A high proportion of mitochondrial reads can indicate cell damage.
245
4.  Set an upper threshold on the number of cells per spot. Spots with high cell counts have a low number of expressed genes, meaning that experiments failed.
246
247
After performing QC operations, low quality spots are removed from the SpatialExperiment object.
248
249
```{r QC, message=FALSE}
250
# Fetch data from SpatialLIBD
251
spe <- fetch_data(type='spe')
252
253
# Filter spatial experiment object to only include sample ID
254
sample_id <- "151673"
255
spe_sub <- spe[, spe$sample_id == sample_id]
256
257
# Plot image and add QC metrics to spe object
258
img <- plot_image(spe_sub)
259
spe_sub <- add_QC_metrics(spe_sub)
260
261
# Perform QC operations
262
spe_sub <- threshold_lib_size(spe_sub, 800)
263
spe_sub <- threshold_expressed_genes(spe_sub, 550)
264
spe_sub <- threshold_mito_reads(spe_sub, 28)
265
spe_sub <- threshold_num_cells(spe_sub, 18)
266
267
# Discard low quality spots determined by QC operations
268
spe_sub <- discard_spots(spe_sub)
269
spe_sub <- discard_uninformative_genes(spe_sub)
270
```