Diff of /Preprocessing.Rmd [000000] .. [c9fae8]

Switch to side-by-side view

--- a
+++ b/Preprocessing.Rmd
@@ -0,0 +1,270 @@
+---
+title: "Preprocessing"
+output:
+  html_document:
+    df_print: paged
+editor_options: 
+  chunk_output_type: console
+---
+
+# Preprocessing Data (Following OSTA Analysis Pipeline)
+
+In this markdown, analysis and quality control steps are performed following chapters 8 and 9 of the book 'Orchestrating Spatially-Resolved Transcriptomics Analysis with Bioconductor' <https://lmweber.org/OSTA-book/>. Data is fetched and plotted, then QC metrics are calculated and data is filtered based on QC thresholds for a single sample.
+
+### Overview of Data
+
+Data is sourced from spatialLIBD (<http://spatial.libd.org/spatialLIBD/>), a project containing LIBD human dorsolateral pre-frontal cortex (DLPFC) spatial transcriptomics data from the 10x Genomics Visium Platform (<https://10xgenomics.com/products/spatial-gene-expression>). The data contains 12 lowres samples each with a scale factor of 0.0450045.
+
+### Importing Libraries
+
+```{r Import Libraries, message=FALSE}
+library(spatialLIBD)
+library(ggspavis)
+library(scater)
+library(ggplot2)
+```
+
+### Helper Functions for QC
+
+The following functions are called by the driver code to perform plotting and QC operations.
+
+```{r Helper Functions for QC, message=FALSE}
+#' Plots a SpatialExperiment object by transforming the image into a rasterGrob object and calling ggplot
+#' @param spe_sub Spatial Experiment object filtered to only contain one sample
+#' @return None
+plot_image <- function(spe_sub) {
+  # get histology image
+  img <- SpatialExperiment::imgRaster(spe_sub)
+  
+  ## Transform to a rasterGrob object
+  grob <- grid::rasterGrob(img, width = grid::unit(1, "npc"), height = grid::unit(1, "npc"))
+  
+  # Create df for plotting
+  sample_df <- as.data.frame(colData(spe_sub), optional = TRUE)
+  
+  ## Make a plot using geom_spatial
+  p <- ggplot2::ggplot(
+      sample_df,
+      ggplot2::aes(
+          x = pxl_col_in_fullres * SpatialExperiment::scaleFactors(spe_sub),
+          y = pxl_row_in_fullres * SpatialExperiment::scaleFactors(spe_sub),
+      )
+  ) +
+      geom_spatial(
+          data = tibble::tibble(grob = list(grob)),
+          ggplot2::aes(grob = grob),
+          x = 0.5,
+          y = 0.5
+      )
+  
+  ## Show the plot
+  print(p)
+  p
+}
+
+#' Identifies mitochondrial genes, and adds per spot QC metrics including mitochondrial genes as a feature subset
+#' @param spe_sub Spatial Experiment object filtered to only contain one sample
+#' @return spe_sub Spatial Experiment object with added QC metrics
+add_QC_metrics <- function(spe_sub) {
+  # identify mitochondrial genes
+  is_mito <- grepl("(^MT-)|(^mt-)", rowData(spe_sub)$gene_name)
+  
+  # add per spot QC metrics, including mitochondrial genes as a feature subset
+  spe_sub <- addPerCellQC(spe_sub, subsets = list(mito = is_mito))
+  spe_sub
+}
+
+#' Sets a lower filtering threshold for library size, plots QC plot and spatial pattern of spots
+#' @param spe_sub Spatial Experiment object filtered to only contain one sample
+#' @param threshold lower limit on library size
+#' @return spe_sub Spatial Experiment object with added col data after lib size thresholding
+threshold_lib_size <- function(spe_sub, threshold) {
+  # continuous variable containing total number of genes at each spot
+  hist(colData(spe_sub)$sum, breaks = 20)
+  
+  # set filtering threshold for library size
+  print(plotQC(spe_sub, type = "scatter", 
+         metric_x = "cell_count", metric_y = "sum", 
+         threshold_y = threshold))
+  
+  # check number of spots below threshold
+  qc_lib_size <- colData(spe_sub)$sum < threshold
+  print(table(qc_lib_size))
+  
+  # look for spatial pattern in discarded spots
+  colData(spe_sub)$qc_lib_size <- qc_lib_size
+  
+  # plot spots
+  threshold_row <- colData(spe_sub)[colData(spe_sub)$qc_lib_size,][,'array_row']
+  threshold_col <- colData(spe_sub)[colData(spe_sub)$qc_lib_size,][,'array_col']
+  print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point() + ggtitle("Discarded Spots (Low Library Size)"))
+  
+  spe_sub
+}
+
+#' Sets a lower filtering threshold for number of expressed genes, plots QC plot and spatial pattern of spots
+#' @param spe_sub Spatial Experiment object filtered to only contain one sample
+#' @param threshold lower limit on number of expressed genes
+#' @return spe_sub Spatial Experiment object with added col data after expressed gene thresholding
+threshold_expressed_genes <- function(spe_sub, threshold) {
+  # continuous variable containing total number of genes at each spot
+  hist(colData(spe_sub)$detected, breaks = 20)
+  
+  # set filtering threshold for library size at 500
+  print(plotQC(spe_sub, type = "scatter", 
+         metric_x = "cell_count", metric_y = "detected", 
+         threshold_y = threshold))
+  
+  # check number of spots below threshold
+  qc_detected <- colData(spe_sub)$detected < threshold
+  print(table(qc_lib_size))
+  
+  # look for spatial pattern in discarded spots
+  colData(spe_sub)$qc_detected <- qc_detected
+  
+  # plot spots
+  threshold_row <- colData(spe_sub)[colData(spe_sub)$qc_detected,][,'array_row']
+  threshold_col <- colData(spe_sub)[colData(spe_sub)$qc_detected,][,'array_col']
+  print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point() + ggtitle("Discarded Spots (Low Expressed Genes)"))
+  
+  spe_sub
+}
+
+#' Sets an upper threshold on the proportion of mitochondrial reads
+#' @param spe_sub Spatial Experiment object filtered to only contain one sample
+#' @param threshold upper limit on proportional of mitochondrial reads
+#' @return spe_sub Spatial Experiment object with added col data after mitochondrial read thresholding
+threshold_mito_reads <- function(spe_sub, threshold) {
+  # histogram of mitochondrial read proportions
+  hist(colData(spe_sub)$subsets_mito_percent, breaks = 20)
+  
+  # plot mitochondrial read proportion vs. number of cells per spot
+  print(plotQC(spe_sub, type = "scatter", 
+         metric_x = "cell_count", metric_y = "subsets_mito_percent", 
+         threshold_y = threshold))
+  
+  # select QC threshold for mitochondrial read proportion
+  qc_mito <- colData(spe_sub)$subsets_mito_percent > threshold
+  print(table(qc_mito))
+  
+  # look for spatial pattern in discarded spots
+  colData(spe_sub)$qc_mito <- qc_mito
+  
+  # plot spots
+  threshold_row <- colData(spe_sub)[colData(spe_sub)$qc_mito,][,'array_row']
+  threshold_col <- colData(spe_sub)[colData(spe_sub)$qc_mito,][,'array_col']
+  print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point() + ggtitle("Discarded Spots (High Mitochondrial Reads)"))
+  
+  spe_sub
+}
+
+#' Sets an upper threshold on the number of cells per spot
+#' @param spe_sub Spatial Experiment object filtered to only contain one sample
+#' @param threshold upper limit on number of cells per spot
+#' @return spe_sub Spatial Experiment object with added col data after cell number thresholding
+threshold_num_cells <- function(spe_sub, threshold) {
+  # histogram of number of cells per spot
+  hist(colData(spe_sub)$cell_count, breaks = 20)
+  
+  # distribution of cells per spot
+  tbl_cells_per_spot <- table(colData(spe_sub)$cell_count)
+  
+  # plot number of expressed genes vs. number of cells per spot
+  print(plotQC(spe_sub, type = "scatter", 
+         metric_x = "cell_count", metric_y = "detected", 
+         threshold_x = threshold))
+  
+  # select QC threshold for number of cells per spot
+  qc_cell_count <- colData(spe_sub)$cell_count > threshold
+  print(table(qc_cell_count))
+  colData(spe_sub)$qc_cell_count <- qc_cell_count
+  
+  # plot spots
+  threshold_row <- colData(spe_sub)[colData(spe_sub)$qc_cell_count,][,'array_row']
+  threshold_col <- colData(spe_sub)[colData(spe_sub)$qc_cell_count,][,'array_col']
+  print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point())
+  
+  spe_sub
+}
+
+#' Discard spots based on previous QC metrics
+#' @param spe_sub Spatial Experiment object filtered to only contain one sample
+#' @return spe_sub Spatial Experiment object with low quality spots removed
+discard_spots <- function(spe_sub) {
+  # get discarded spots from each QC operation
+  qc_lib_size <- colData(spe_sub)$qc_lib_size
+  qc_detected <- colData(spe_sub)$qc_detected
+  qc_mito <- colData(spe_sub)$qc_mito
+  qc_cell_count <- colData(spe_sub)$qc_cell_count
+  
+  # number of discarded spots for each metric
+  apply(cbind(qc_lib_size, qc_detected, qc_mito, qc_cell_count), 2, sum)
+  
+  # combined set of discarded spots
+  discard <- qc_lib_size | qc_detected | qc_mito | qc_cell_count
+  print(table(discard))
+  
+  # store in object
+  colData(spe_sub)$discard <- discard
+  
+  # check spatial pattern of combined set of discarded spots
+  # plot spots
+  threshold_row <- colData(spe_sub)[colData(spe_sub)$discard,][,'array_row']
+  threshold_col <- colData(spe_sub)[colData(spe_sub)$discard,][,'array_col']
+  print(ggplot(data.frame(threshold_row, threshold_col), aes(x=threshold_row, y=threshold_col)) + geom_point())
+  
+  spe_sub <- spe_sub[, !colData(spe_sub)$discard]
+  print(dim(spe_sub))
+  spe_sub
+}
+
+#' Discard uninformative genes from the Spatial Experiment object
+#' @param spe_sub Spatial Experiment object filtered to only contain one sample
+#' @param var_threshold lower threshold for amount of variance willing to tolerate among the spots for a gene
+#' @return spe_sub Spatial Experiment object with uninformative genes removed
+discard_uninformative_genes <- function(spe_sub, var_threshold=1e-4) {
+  logcounts <- t(logcounts(spe_sub))
+  mito <- (rowData(spe_sub)$gene_name %in% rowData(spe_sub)$gene_name[is_mito])
+  zero_expression <- apply(logcounts, 2, function(x) length(unique(x)) == 1)
+  low_var <- apply(logcounts, 2, function(x) var(x) < 1e-4) 
+  discard <- zero_expression | low_var | mito
+  spe_sub <- spe_sub[!discard,]
+  spe_sub
+}
+```
+
+### Driver Code
+
+In the following block, data is first fetched from SpatialLIBD and filtered to only include a single sample. Then, the sample is plotted and QC metrics are added to the SpatialExperiment object.
+
+Then, the following four QC operations are performed to get rid of low quality spots. Note that all spots are already over tissue.
+
+1.  Set a lower threshold over library size. We want to remove spots with a low number of UMIs counts per spot.
+2.  Set a lower threshold on the number of expressed features (i.e. genes) per spot.
+3.  Set an upper threshold on the mumber of mitochondrial reads per spot. A high proportion of mitochondrial reads can indicate cell damage.
+4.  Set an upper threshold on the number of cells per spot. Spots with high cell counts have a low number of expressed genes, meaning that experiments failed.
+
+After performing QC operations, low quality spots are removed from the SpatialExperiment object.
+
+```{r QC, message=FALSE}
+# Fetch data from SpatialLIBD
+spe <- fetch_data(type='spe')
+
+# Filter spatial experiment object to only include sample ID
+sample_id <- "151673"
+spe_sub <- spe[, spe$sample_id == sample_id]
+
+# Plot image and add QC metrics to spe object
+img <- plot_image(spe_sub)
+spe_sub <- add_QC_metrics(spe_sub)
+
+# Perform QC operations
+spe_sub <- threshold_lib_size(spe_sub, 800)
+spe_sub <- threshold_expressed_genes(spe_sub, 550)
+spe_sub <- threshold_mito_reads(spe_sub, 28)
+spe_sub <- threshold_num_cells(spe_sub, 18)
+
+# Discard low quality spots determined by QC operations
+spe_sub <- discard_spots(spe_sub)
+spe_sub <- discard_uninformative_genes(spe_sub)
+```