Joshua Wong
This README file was generated on 19/10/2023
A) B3_DGE.mtx
B) B3_all_genes.csv
C) B3_cell_metadata.csv
D) B4_DGE.mtx
E) B4_all_genes.csv
F) B4_cell_metadata.csv
G) B5_DGE.mtx
H) B5_all_genes.csv
I) B5_cell_metadata.csv
J) T3_DGE.mtx
K) T3_all_genes.csv
L) T3_cell_metadata.csv
M) T4_DGE.mtx
N) T4_all_genes.csv
O) T4_cell_metadata.csv
P) T5_DGE.mtx
Q) T5_all_genes.csv
R) T5_cell_metadata.csv
G) scRNAseq_analysis_FINAL.R
is a sparse matrix with cell-gene counts. Each row corresponds to a cell and eachcolumn corresponds to a gene (see "
genes.csv" file for names/gene-id of each column).
contains the gene name, gene id, and genome for each column in DGE.mtx.This file is the same as the file in the refrence genome dir.
file contains information about each cell including the cell barcode,species, sample, well in each round of barcoding, and number of transcript/genes detected.
B3, B4, B5 matrix files are CD45+ samples from patient blood and must be integrated together
T3, T4, T5 matrix files are CD45+ samples from patient tumour and must be integrated together
Single-cell RNA-seq data is pre-processed using the ParseBiosciences-Pipeline. Data normalization, unsupervised cell clustering, and differential expression analysis were carried out by the Seurat R package.
R version 4.1.0
Seurat Version 4.3.0
example
B3_mat <- readMM(paste0(B3_DGE_folder, "B3_DGE.mtx"))
Examples
B3_cell_meta <- read.delim(paste0(B3_DGE_folder, "B3_cell_metadata.csv"),
stringsAsFactor = FALSE, sep = ",")
B3_genes <- read.delim(paste0(B3_DGE_folder, "B3_all_genes.csv"),
stringsAsFactor = FALSE, sep = ",")
location <- "blood"
B3_cell_meta["location"] <- location
head(B3_cell_meta)
Please refer to following tutorial for assistance https://support.parsebiosciences.com/hc/en-us/articles/360053078092-Seurat-Tutorial-65k-PBMCs
Cells with low quality metrics such as high mitochondrial gene content (> 5%) and low number of genes detected (<200) were removed. Cells with transcripts from both hg38 were removed as doublets. RNA counts were log normalized using the standard Seurat workflow (14). To visualize cells based on an unsupervised transcriptomic analysis, we first ran PCA using 2,000 variable genes. The integrated sample counts were scaled, and variable features used for principal-component analysis (PCA). The top 50 principal components from this analysis were then used as an input for dimensionality reduction by Uniform Manifold Approximation and Projection (UMAP). Shared-nearest-neighbour based clustering using the top 50 principal components was used to generate clusters with a resolution = 1.4. After clustering cells, we further filtered out the NK cell cluster using the ‘subset’ function. The top highly variable genes were again selected and PCA was used to find principal components. The top 15 PCs were visualized using UMAP. The signature genes identifying each cluster were found using FindAllMarkers.
This was done following standard seurat workflow https://satijalab.org/seurat/articles/pbmc3k_tutorial.html
Single sample gene set enrichment analysis (ssGSEA) was performed using the escape package (v1.8.0). Hallmark gene sets were retrieved from the molecular signature database (MSigDB).
This was done following standard workflow: https://bioconductor.org/packages/devel/bioc/vignettes/escape/inst/doc/vignette.html