Diff of /README.md [000000] .. [f2f289]

Switch to unified view

a b/README.md
1
### CD45+ cells from human bladder cancer specimens
2
3
Joshua Wong
4
This README file was generated on 19/10/2023
5
6
1. Collaboration with the Surgical Oncology Unit at the Princess Alexandra Hospital for Bladder Cancer project.
7
    in revision in eBioMedicine [TGF-B signalling limits effector function capacity of NK cell anti-tumour immunity in human bladder tumours] EBIOM-D-23-03549.
8
2. Date of data collection 2022-2023
9
3. Geographic location of data collection: Brisbane, Australia
10
4. Author Information A. Principal Investigator Contact Information Name: A/Prof. Fernando Guimaraes
11
    Institution: The University of Queensland, AUS Email: f.guimaraes@uq.edu.au
12
5. Associate or Co-investigator Contact Information Name: Joshua Wong Institution: The University of Queensland, AUS Email: joshua.wong@uq.net.au
13
14
### DATA & FILE OVERVIEW
15
16
1. File List:
17
18
A) B3\_DGE.mtx
19
B) B3\_all\_genes.csv
20
C) B3\_cell\_metadata.csv
21
D) B4\_DGE.mtx
22
E) B4\_all\_genes.csv
23
F) B4\_cell\_metadata.csv
24
G) B5\_DGE.mtx
25
H) B5\_all\_genes.csv
26
I) B5\_cell\_metadata.csv
27
J) T3\_DGE.mtx
28
K) T3\_all\_genes.csv
29
L) T3\_cell\_metadata.csv
30
M) T4\_DGE.mtx
31
N) T4\_all\_genes.csv
32
O) T4\_cell\_metadata.csv
33
P) T5\_DGE.mtx
34
Q) T5\_all\_genes.csv
35
R) T5\_cell\_metadata.csv
36
G) scRNAseq\_analysis\_FINAL.R
37
38
#### DGE.mtx
39
40
is a sparse matrix with cell-gene counts. Each row corresponds to a cell and eachcolumn corresponds to a gene (see "
41
genes.csv" file for names/gene-id of each column).
42
43
#### all\_genes.csv
44
45
contains the gene name, gene id, and genome for each column in DGE.mtx.This file is the same as the file in the refrence genome dir.
46
47
#### cell\_metadata.csv
48
49
file contains information about each cell including the cell barcode,species, sample, well in each round of barcoding, and number of transcript/genes detected.
50
51
2. Relationship between files, if important:
52
53
B3, B4, B5 matrix files are CD45+ samples from patient blood and must be integrated together
54
55
T3, T4, T5 matrix files are CD45+ samples from patient tumour and must be integrated together
56
57
3. Additional related data collected that was not included in the current data package: None
58
4. Are there multiple versions of the dataset? No
59
    A. If yes, name of file(s) that was updated: NA
60
    i. Why was the file updated? NA
61
    ii. When was the file updated? NA
62
63
### METHOD
64
65
Single-cell RNA-seq data is pre-processed using the ParseBiosciences-Pipeline. Data normalization, unsupervised cell clustering, and differential expression analysis were carried out by the Seurat R package.
66
67
## How to use this Script
68
69
R version 4.1.0
70
Seurat Version 4.3.0
71
72
#### 1\. Reading in Parse matrix and gene count files into R
73
74
1. use the function readMM read in mtx.file
75
76
example
77
78
B3\_mat <- readMM(paste0(B3\_DGE\_folder, "B3\_DGE.mtx"))
79
80
2. Use read.delim to read in CSV files.
81
82
Examples
83
84
B3\_cell\_meta <- read.delim(paste0(B3\_DGE\_folder, "B3\_cell\_metadata.csv"),
85
stringsAsFactor = FALSE, sep = ",")
86
B3\_genes <- read.delim(paste0(B3\_DGE\_folder, "B3\_all\_genes.csv"),
87
stringsAsFactor = FALSE, sep = ",")
88
89
3. For blood or tumour samples subset:
90
91
location <- "blood"
92
93
B3\_cell\_meta["location"] <- location
94
95
head(B3\_cell\_meta)
96
97
Please refer to following tutorial for assistance https://support.parsebiosciences.com/hc/en-us/articles/360053078092-Seurat-Tutorial-65k-PBMCs
98
99
#### 2\. Seurat\_setup\.R
100
101
Cells with low quality metrics such as high mitochondrial gene content (> 5%) and low number of genes detected (<200) were removed. Cells with transcripts from both hg38 were removed as doublets. RNA counts were log normalized using the standard Seurat workflow (14). To visualize cells based on an unsupervised transcriptomic analysis, we first ran PCA using 2,000 variable genes. The integrated sample counts were scaled, and variable features used for principal-component analysis (PCA). The top 50 principal components from this analysis were then used as an input for dimensionality reduction by Uniform Manifold Approximation and Projection (UMAP). Shared-nearest-neighbour based clustering using the top 50 principal components was used to generate clusters with a resolution = 1.4. After clustering cells, we further filtered out the NK cell cluster using the ‘subset’ function. The top highly variable genes were again selected and PCA was used to find principal components. The top 15 PCs were visualized using UMAP. The signature genes identifying each cluster were found using FindAllMarkers.
102
103
This was done following standard seurat workflow https://satijalab.org/seurat/articles/pbmc3k\_tutorial.html
104
105
#### 3\. Single sample GSEA
106
107
Single sample gene set enrichment analysis (ssGSEA) was performed using the escape package (v1.8.0). Hallmark gene sets were retrieved from the molecular signature database (MSigDB).
108
109
This was done following standard workflow: https://bioconductor.org/packages/devel/bioc/vignettes/escape/inst/doc/vignette.html