271 lines (214 with data), 11.4 kB
---
title: "Preprocessing"
output:
html_document:
df_print: paged
editor_options:
chunk_output_type: console
---
# Preprocessing Data (Following OSTA Analysis Pipeline)
In this markdown, analysis and quality control steps are performed following chapters 8 and 9 of the book 'Orchestrating Spatially-Resolved Transcriptomics Analysis with Bioconductor' <https://lmweber.org/OSTA-book/>. Data is fetched and plotted, then QC metrics are calculated and data is filtered based on QC thresholds for a single sample.
### Overview of Data
Data is sourced from spatialLIBD (<http://spatial.libd.org/spatialLIBD/>), a project containing LIBD human dorsolateral pre-frontal cortex (DLPFC) spatial transcriptomics data from the 10x Genomics Visium Platform (<https://10xgenomics.com/products/spatial-gene-expression>). The data contains 12 lowres samples each with a scale factor of 0.0450045.
### Importing Libraries
```{r Import Libraries, message=FALSE}
library(spatialLIBD)
library(ggspavis)
library(scater)
library(ggplot2)
```
### Helper Functions for QC
The following functions are called by the driver code to perform plotting and QC operations.
```{r Helper Functions for QC, message=FALSE}
#' Plots a SpatialExperiment object by transforming the image into a rasterGrob object and calling ggplot
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
#' @return None
plot_image <- function(spe_sub) {
# get histology image
img <- SpatialExperiment::imgRaster(spe_sub)
## Transform to a rasterGrob object
grob <- grid::rasterGrob(img, width = grid::unit(1, "npc"), height = grid::unit(1, "npc"))
# Create df for plotting
sample_df <- as.data.frame(colData(spe_sub), optional = TRUE)
## Make a plot using geom_spatial
p <- ggplot2::ggplot(
sample_df,
ggplot2::aes(
x = pxl_col_in_fullres * SpatialExperiment::scaleFactors(spe_sub),
y = pxl_row_in_fullres * SpatialExperiment::scaleFactors(spe_sub),
)
) +
geom_spatial(
data = tibble::tibble(grob = list(grob)),
ggplot2::aes(grob = grob),
x = 0.5,
y = 0.5
)
## Show the plot
print(p)
p
}
#' Identifies mitochondrial genes, and adds per spot QC metrics including mitochondrial genes as a feature subset
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
#' @return spe_sub Spatial Experiment object with added QC metrics
add_QC_metrics <- function(spe_sub) {
# identify mitochondrial genes
is_mito <- grepl("(^MT-)|(^mt-)", rowData(spe_sub)$gene_name)
# add per spot QC metrics, including mitochondrial genes as a feature subset
spe_sub <- addPerCellQC(spe_sub, subsets = list(mito = is_mito))
spe_sub
}
#' Sets a lower filtering threshold for library size, plots QC plot and spatial pattern of spots
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
#' @param threshold lower limit on library size
#' @return spe_sub Spatial Experiment object with added col data after lib size thresholding
threshold_lib_size <- function(spe_sub, threshold) {
# continuous variable containing total number of genes at each spot
hist(colData(spe_sub)$sum, breaks = 20)
# set filtering threshold for library size
print(plotQC(spe_sub, type = "scatter",
metric_x = "cell_count", metric_y = "sum",
threshold_y = threshold))
# check number of spots below threshold
qc_lib_size <- colData(spe_sub)$sum < threshold
print(table(qc_lib_size))
# look for spatial pattern in discarded spots
colData(spe_sub)$qc_lib_size <- qc_lib_size
# plot spots
threshold_row <- colData(spe_sub)[colData(spe_sub)$qc_lib_size,][,'array_row']
threshold_col <- colData(spe_sub)[colData(spe_sub)$qc_lib_size,][,'array_col']
print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point() + ggtitle("Discarded Spots (Low Library Size)"))
spe_sub
}
#' Sets a lower filtering threshold for number of expressed genes, plots QC plot and spatial pattern of spots
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
#' @param threshold lower limit on number of expressed genes
#' @return spe_sub Spatial Experiment object with added col data after expressed gene thresholding
threshold_expressed_genes <- function(spe_sub, threshold) {
# continuous variable containing total number of genes at each spot
hist(colData(spe_sub)$detected, breaks = 20)
# set filtering threshold for library size at 500
print(plotQC(spe_sub, type = "scatter",
metric_x = "cell_count", metric_y = "detected",
threshold_y = threshold))
# check number of spots below threshold
qc_detected <- colData(spe_sub)$detected < threshold
print(table(qc_lib_size))
# look for spatial pattern in discarded spots
colData(spe_sub)$qc_detected <- qc_detected
# plot spots
threshold_row <- colData(spe_sub)[colData(spe_sub)$qc_detected,][,'array_row']
threshold_col <- colData(spe_sub)[colData(spe_sub)$qc_detected,][,'array_col']
print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point() + ggtitle("Discarded Spots (Low Expressed Genes)"))
spe_sub
}
#' Sets an upper threshold on the proportion of mitochondrial reads
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
#' @param threshold upper limit on proportional of mitochondrial reads
#' @return spe_sub Spatial Experiment object with added col data after mitochondrial read thresholding
threshold_mito_reads <- function(spe_sub, threshold) {
# histogram of mitochondrial read proportions
hist(colData(spe_sub)$subsets_mito_percent, breaks = 20)
# plot mitochondrial read proportion vs. number of cells per spot
print(plotQC(spe_sub, type = "scatter",
metric_x = "cell_count", metric_y = "subsets_mito_percent",
threshold_y = threshold))
# select QC threshold for mitochondrial read proportion
qc_mito <- colData(spe_sub)$subsets_mito_percent > threshold
print(table(qc_mito))
# look for spatial pattern in discarded spots
colData(spe_sub)$qc_mito <- qc_mito
# plot spots
threshold_row <- colData(spe_sub)[colData(spe_sub)$qc_mito,][,'array_row']
threshold_col <- colData(spe_sub)[colData(spe_sub)$qc_mito,][,'array_col']
print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point() + ggtitle("Discarded Spots (High Mitochondrial Reads)"))
spe_sub
}
#' Sets an upper threshold on the number of cells per spot
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
#' @param threshold upper limit on number of cells per spot
#' @return spe_sub Spatial Experiment object with added col data after cell number thresholding
threshold_num_cells <- function(spe_sub, threshold) {
# histogram of number of cells per spot
hist(colData(spe_sub)$cell_count, breaks = 20)
# distribution of cells per spot
tbl_cells_per_spot <- table(colData(spe_sub)$cell_count)
# plot number of expressed genes vs. number of cells per spot
print(plotQC(spe_sub, type = "scatter",
metric_x = "cell_count", metric_y = "detected",
threshold_x = threshold))
# select QC threshold for number of cells per spot
qc_cell_count <- colData(spe_sub)$cell_count > threshold
print(table(qc_cell_count))
colData(spe_sub)$qc_cell_count <- qc_cell_count
# plot spots
threshold_row <- colData(spe_sub)[colData(spe_sub)$qc_cell_count,][,'array_row']
threshold_col <- colData(spe_sub)[colData(spe_sub)$qc_cell_count,][,'array_col']
print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point())
spe_sub
}
#' Discard spots based on previous QC metrics
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
#' @return spe_sub Spatial Experiment object with low quality spots removed
discard_spots <- function(spe_sub) {
# get discarded spots from each QC operation
qc_lib_size <- colData(spe_sub)$qc_lib_size
qc_detected <- colData(spe_sub)$qc_detected
qc_mito <- colData(spe_sub)$qc_mito
qc_cell_count <- colData(spe_sub)$qc_cell_count
# number of discarded spots for each metric
apply(cbind(qc_lib_size, qc_detected, qc_mito, qc_cell_count), 2, sum)
# combined set of discarded spots
discard <- qc_lib_size | qc_detected | qc_mito | qc_cell_count
print(table(discard))
# store in object
colData(spe_sub)$discard <- discard
# check spatial pattern of combined set of discarded spots
# plot spots
threshold_row <- colData(spe_sub)[colData(spe_sub)$discard,][,'array_row']
threshold_col <- colData(spe_sub)[colData(spe_sub)$discard,][,'array_col']
print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point())
spe_sub <- spe_sub[, !colData(spe_sub)$discard]
print(dim(spe_sub))
spe_sub
}
#' Discard uninformative genes from the Spatial Experiment object
#' @param spe_sub Spatial Experiment object filtered to only contain one sample
#' @param var_threshold lower threshold for amount of variance willing to tolerate among the spots for a gene
#' @return spe_sub Spatial Experiment object with uninformative genes removed
discard_uninformative_genes <- function(spe_sub, var_threshold=1e-4) {
logcounts <- t(logcounts(spe_sub))
mito <- (rowData(spe_sub)$gene_name %in% rowData(spe_sub)$gene_name[is_mito])
zero_expression <- apply(logcounts, 2, function(x) length(unique(x)) == 1)
low_var <- apply(logcounts, 2, function(x) var(x) < 1e-4)
discard <- zero_expression | low_var | mito
spe_sub <- spe_sub[!discard,]
spe_sub
}
```
### Driver Code
In the following block, data is first fetched from SpatialLIBD and filtered to only include a single sample. Then, the sample is plotted and QC metrics are added to the SpatialExperiment object.
Then, the following four QC operations are performed to get rid of low quality spots. Note that all spots are already over tissue.
1. Set a lower threshold over library size. We want to remove spots with a low number of UMIs counts per spot.
2. Set a lower threshold on the number of expressed features (i.e. genes) per spot.
3. Set an upper threshold on the mumber of mitochondrial reads per spot. A high proportion of mitochondrial reads can indicate cell damage.
4. Set an upper threshold on the number of cells per spot. Spots with high cell counts have a low number of expressed genes, meaning that experiments failed.
After performing QC operations, low quality spots are removed from the SpatialExperiment object.
```{r QC, message=FALSE}
# Fetch data from SpatialLIBD
spe <- fetch_data(type='spe')
# Filter spatial experiment object to only include sample ID
sample_id <- "151673"
spe_sub <- spe[, spe$sample_id == sample_id]
# Plot image and add QC metrics to spe object
img <- plot_image(spe_sub)
spe_sub <- add_QC_metrics(spe_sub)
# Perform QC operations
spe_sub <- threshold_lib_size(spe_sub, 800)
spe_sub <- threshold_expressed_genes(spe_sub, 550)
spe_sub <- threshold_mito_reads(spe_sub, 28)
spe_sub <- threshold_num_cells(spe_sub, 18)
# Discard low quality spots determined by QC operations
spe_sub <- discard_spots(spe_sub)
spe_sub <- discard_uninformative_genes(spe_sub)
```