[6df130]: / notebooks / GSEA_analysis.Rmd

Download this file

139 lines (100 with data), 3.8 kB

---
title: "Gene set enrichment analysis"
date: "Last modified: `r format(Sys.time(), '%B %d, %Y')`"
tags: [scRNA-seq, seurat, melanoma, immunotherapy, PBMCs] 
output:
  html_document:
    theme: flatly
    highlight: zenburn
    toc: true
    number_sections: false
    toc_depth: 2
    toc_float:
      collapsed: false
      smooth_scroll: true
    code_folding: hide
    self_contained: yes
---

# Notebook Description

This notebook compiles the code and outputs for gene set enrichment analysis (GSEA) for the set of differentially expressed genes (DEGs) previously identified with the DE analysis. GSEA is performed using [enrichR](https://cran.r-project.org/web/packages/enrichR/vignettes/enrichR.html). 

Of note, we use the set of DEGs identified using the [MAST](https://bioconductor.org/packages/release/bioc/html/MAST.html) method.  

# Initialize environment

Install required packages.

```{r install_packages, results='hide', message=F, warning=F, error=F}

# Define packages to install
pkg.list = c('svDialogs', 'dplyr', 'openxlsx', 'enrichR', 'UpSetR')

# Define packages not already installed
pkg.install <- pkg.list[!(pkg.list %in% installed.packages()[, 'Package'])]

# Install uninstalled packages
if (length(pkg.install) > 0) {
  install.packages(pkg.install)
}

```

Load installed packages.  

```{r load_packages, results="hide", message=F, warning=F, error=F}

# Load packages
library(svDialogs)    # for prompting user-input
library(dplyr)        # for data processing
library(openxlsx)     # to write data to excel
library(enrichR)      # to perform GSEA
library(UpSetR)       # to visualize multiple set comparisons

```

Load pre-processed data. 

```{r load_data, warning=F, message=F}

# Load saved DE data
de <- list()
de$ttest <- import_list('results/DE_ttest.xlsx')
de$mast <- import_list('results/DE_mast.xlsx')

```

# GSEA analyses

**Description**: determine pathways enriched by a given set of differentially expressed genes (DEGs).  

Total runtime: ~2 minutes.  

```{r GSEA_analysis, warning=F, message=T}

# Start timer
t.start <- Sys.time()

# Instantiate GSEA variable
gsea <- list()

# Ensure comparison to human genes
setEnrichrSite('Enrichr')

# Select databases to inquire
dbs <- dlg_list(sort(listEnrichrDbs()$libraryName), multiple=T)$res

# Loop through all DE fields
f <- names(de$mast)
for (i in 1:length(f)) {
  x <- de$mast[[f[i]]]
  y <- enrichr(x$gene[x$p_val_adj < 0.05], dbs)
  z <- y$GO_Biological_Process_2021
  n <- sum(z$Adjusted.P.value < 0.05)
  gsea[[f[i]]] <- z[z$Adjusted.P.value < 0.05, ]
  cat(sprintf('No. of enriched pathways for %s: %d\n', f[i], n))
  plotEnrich(z, orderBy='Adjusted.P.value') + 
    ggtitle(sprintf('Case: %s (Total = %d)', f[i], n))
  ggsave(paste0('plots/GSEA_', f[i], '.pdf'), width=7, height=5, units='in')
}

# End timer + log time elapsed
t.end <- Sys.time()
t.end - t.start

```

Visualize intersecting pathways between comparisons.  

```{r GSEA_intersect, warning=F, message=T}

# Define plots data
df <- list('tumor'=gsea$tumor$Term, 
           'immune'=gsea$immune$Term, 
           'B'=gsea$B$Term, 
           'CD8'=gsea$CD8$Term, 
           'Macrophage'=gsea$Macrophage$Term, 
           'Memory'=gsea$Memory$Term, 
           'Activated T'=gsea$Activated$Term, 
           'response'=gsea$response$Term)

# Generate and save UpSet plot
pdf(file='plots/GSEA_intersect.pdf', width=11, height=8)
upset(fromList(df), sets=names(df), order.by='freq')

```

# Save results (if prompted)

```{r save_outputs, warning=F, message=T}

save.data <- dlgInput('Save all results? (T/F)', F)$res %>% as.logical(.)
if (save.data) { 
  # GSEA results
  write.xlsx(gsea, 'results/GSEA.xlsx', row.names=F)
}

```