|
a |
|
b/2-Transcriptome/README.md |
|
|
1 |
|
|
|
2 |
|
|
|
3 |
## 1. Software and database required |
|
|
4 |
|
|
|
5 |
**Software:** |
|
|
6 |
|
|
|
7 |
- Cutadapt (v2.5): https://cutadapt.readthedocs.io/en/stable/ |
|
|
8 |
- Hisat2 (v2.1.0): https://daehwankimlab.github.io/hisat2/ |
|
|
9 |
- RSEM (v1.3.3): https://deweylab.github.io/RSEM/ |
|
|
10 |
- DESeq2 (v1.20): https://bioconductor.org/packages/release/bioc/html/DESeq2.html |
|
|
11 |
|
|
|
12 |
**Database:** |
|
|
13 |
|
|
|
14 |
- RSEM reference database(GRch38)https://www.gencodegenes.org/human/ |
|
|
15 |
|
|
|
16 |
## 2. Data preparation |
|
|
17 |
|
|
|
18 |
**Create directories and copy raw sequencing files:** |
|
|
19 |
|
|
|
20 |
```shell |
|
|
21 |
mkdir 00_rawdata 01_fastp 02_cutadapt 03_rsem |
|
|
22 |
mv *.fastq 00_rawdata |
|
|
23 |
cd 00_rawdata/ |
|
|
24 |
ls *_1.fastq.gz > ../filelist |
|
|
25 |
cd .. |
|
|
26 |
``` |
|
|
27 |
|
|
|
28 |
## 3. Quality filtering |
|
|
29 |
|
|
|
30 |
**Adapter sequence detection:** |
|
|
31 |
|
|
|
32 |
```shell |
|
|
33 |
for i in `cat filelist` |
|
|
34 |
do |
|
|
35 |
j=${i/_1./_2.} |
|
|
36 |
fastp \ |
|
|
37 |
--detect_adapter_for_pe \ |
|
|
38 |
-i 00_rawdata/$i \ |
|
|
39 |
-I 00_rawdata/$j \ |
|
|
40 |
-o 01_fastp/$i \ |
|
|
41 |
-O 01_fastp/$j \ |
|
|
42 |
-z 4 -q 20 -u 40 -n 6 |
|
|
43 |
done |
|
|
44 |
``` |
|
|
45 |
|
|
|
46 |
**Remove adapter:** |
|
|
47 |
|
|
|
48 |
```shell |
|
|
49 |
for i in `cat filelist` |
|
|
50 |
do |
|
|
51 |
j=${i/_1./_2.} |
|
|
52 |
cutadapt \ |
|
|
53 |
--discard-trimmed \ |
|
|
54 |
-O 18 \ |
|
|
55 |
-j 20 \ |
|
|
56 |
-b AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \ |
|
|
57 |
-B AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \ |
|
|
58 |
-o 02_cutadapt/$i \ |
|
|
59 |
-p 02_cutadapt/$j \ |
|
|
60 |
00_rawdata/$i \ |
|
|
61 |
00_rawdata/$j |
|
|
62 |
done |
|
|
63 |
``` |
|
|
64 |
|
|
|
65 |
## 4. Sequence alignment and gene expression calculation |
|
|
66 |
|
|
|
67 |
**Install RSEM:** |
|
|
68 |
|
|
|
69 |
```shell |
|
|
70 |
wget https://github.com/deweylab/RSEM/archive/v1.3.3.tar.gz |
|
|
71 |
tar -zxvf v1.3.3.tar.gz |
|
|
72 |
cd rsem |
|
|
73 |
make install prefix=/software/biosoft/rsem/ |
|
|
74 |
echo 'PATH=$PATH:/software/biosoft/rsem/bin/' >> ~/.bashrc |
|
|
75 |
source ~/.bashrc |
|
|
76 |
``` |
|
|
77 |
|
|
|
78 |
**Build index:** |
|
|
79 |
|
|
|
80 |
```shell |
|
|
81 |
rsem-prepare-reference \ |
|
|
82 |
-p 20 \ |
|
|
83 |
--gtf \ |
|
|
84 |
./gencode.v39.annotation.gtf \ |
|
|
85 |
--hisat2-hca \ |
|
|
86 |
./GRCh38.primary_assembly.genome.fa \ |
|
|
87 |
./grch38 |
|
|
88 |
``` |
|
|
89 |
|
|
|
90 |
**Align RNA-seq against human genome and calculate gene expression:** |
|
|
91 |
|
|
|
92 |
```shell |
|
|
93 |
for i in `cat filelist` |
|
|
94 |
do |
|
|
95 |
j=${i/_1./_2.} |
|
|
96 |
k=${i%%_*} |
|
|
97 |
rsem-calculate-expression \ |
|
|
98 |
--paired-end \ |
|
|
99 |
-p 20 \ |
|
|
100 |
--hisat2-hca \ |
|
|
101 |
02_cutadapt/$i \ |
|
|
102 |
02_cutadapt/$j \ |
|
|
103 |
/software/biosoft/rsem/database/grch38/grch38 \ |
|
|
104 |
03_rsem/$k |
|
|
105 |
done |
|
|
106 |
``` |
|
|
107 |
|
|
|
108 |
The count and FPKM data of all genes in each sample should be present in [SampleID].genes.results |
|
|
109 |
|
|
|
110 |
Concatenate the results of each sample to generate the file count.txt |
|
|
111 |
|
|
|
112 |
## 5. Variance stabilizing transformation of count data |
|
|
113 |
|
|
|
114 |
```R |
|
|
115 |
data<-read.table("count.txt",row.names=1,sep="\t",header=T) |
|
|
116 |
group<-read.table("group.txt",sep="\t",header=T) ## optional metadata |
|
|
117 |
dds <- DESeqDataSetFromMatrix(countData = data, colData = group, design= ~ Group) |
|
|
118 |
vst<-assay(varianceStabilizingTransformation(dds)) |
|
|
119 |
write.table(vst,"transcriptome.txt",append=FALSE,sep="\t",quote=FALSE) |
|
|
120 |
``` |
|
|
121 |
|