Card

CD45+ cells from human bladder cancer specimens

Joshua Wong
This README file was generated on 19/10/2023

  1. Collaboration with the Surgical Oncology Unit at the Princess Alexandra Hospital for Bladder Cancer project.
    in revision in eBioMedicine [TGF-B signalling limits effector function capacity of NK cell anti-tumour immunity in human bladder tumours] EBIOM-D-23-03549.
  2. Date of data collection 2022-2023
  3. Geographic location of data collection: Brisbane, Australia
  4. Author Information A. Principal Investigator Contact Information Name: A/Prof. Fernando Guimaraes
    Institution: The University of Queensland, AUS Email: f.guimaraes@uq.edu.au
  5. Associate or Co-investigator Contact Information Name: Joshua Wong Institution: The University of Queensland, AUS Email: joshua.wong@uq.net.au

DATA & FILE OVERVIEW

  1. File List:

A) B3_DGE.mtx
B) B3_all_genes.csv
C) B3_cell_metadata.csv
D) B4_DGE.mtx
E) B4_all_genes.csv
F) B4_cell_metadata.csv
G) B5_DGE.mtx
H) B5_all_genes.csv
I) B5_cell_metadata.csv
J) T3_DGE.mtx
K) T3_all_genes.csv
L) T3_cell_metadata.csv
M) T4_DGE.mtx
N) T4_all_genes.csv
O) T4_cell_metadata.csv
P) T5_DGE.mtx
Q) T5_all_genes.csv
R) T5_cell_metadata.csv
G) scRNAseq_analysis_FINAL.R

DGE.mtx

is a sparse matrix with cell-gene counts. Each row corresponds to a cell and eachcolumn corresponds to a gene (see "
genes.csv" file for names/gene-id of each column).

all_genes.csv

contains the gene name, gene id, and genome for each column in DGE.mtx.This file is the same as the file in the refrence genome dir.

cell_metadata.csv

file contains information about each cell including the cell barcode,species, sample, well in each round of barcoding, and number of transcript/genes detected.

  1. Relationship between files, if important:

B3, B4, B5 matrix files are CD45+ samples from patient blood and must be integrated together

T3, T4, T5 matrix files are CD45+ samples from patient tumour and must be integrated together

  1. Additional related data collected that was not included in the current data package: None
  2. Are there multiple versions of the dataset? No
    A. If yes, name of file(s) that was updated: NA
    i. Why was the file updated? NA
    ii. When was the file updated? NA

METHOD

Single-cell RNA-seq data is pre-processed using the ParseBiosciences-Pipeline. Data normalization, unsupervised cell clustering, and differential expression analysis were carried out by the Seurat R package.

How to use this Script

R version 4.1.0
Seurat Version 4.3.0

1. Reading in Parse matrix and gene count files into R

  1. use the function readMM read in mtx.file

example

B3_mat <- readMM(paste0(B3_DGE_folder, "B3_DGE.mtx"))

  1. Use read.delim to read in CSV files.

Examples

B3_cell_meta <- read.delim(paste0(B3_DGE_folder, "B3_cell_metadata.csv"),
stringsAsFactor = FALSE, sep = ",")
B3_genes <- read.delim(paste0(B3_DGE_folder, "B3_all_genes.csv"),
stringsAsFactor = FALSE, sep = ",")

  1. For blood or tumour samples subset:

location <- "blood"

B3_cell_meta["location"] <- location

head(B3_cell_meta)

Please refer to following tutorial for assistance https://support.parsebiosciences.com/hc/en-us/articles/360053078092-Seurat-Tutorial-65k-PBMCs

2. Seurat_setup.R

Cells with low quality metrics such as high mitochondrial gene content (> 5%) and low number of genes detected (<200) were removed. Cells with transcripts from both hg38 were removed as doublets. RNA counts were log normalized using the standard Seurat workflow (14). To visualize cells based on an unsupervised transcriptomic analysis, we first ran PCA using 2,000 variable genes. The integrated sample counts were scaled, and variable features used for principal-component analysis (PCA). The top 50 principal components from this analysis were then used as an input for dimensionality reduction by Uniform Manifold Approximation and Projection (UMAP). Shared-nearest-neighbour based clustering using the top 50 principal components was used to generate clusters with a resolution = 1.4. After clustering cells, we further filtered out the NK cell cluster using the ‘subset’ function. The top highly variable genes were again selected and PCA was used to find principal components. The top 15 PCs were visualized using UMAP. The signature genes identifying each cluster were found using FindAllMarkers.

This was done following standard seurat workflow https://satijalab.org/seurat/articles/pbmc3k_tutorial.html

3. Single sample GSEA

Single sample gene set enrichment analysis (ssGSEA) was performed using the escape package (v1.8.0). Hallmark gene sets were retrieved from the molecular signature database (MSigDB).

This was done following standard workflow: https://bioconductor.org/packages/devel/bioc/vignettes/escape/inst/doc/vignette.html