Diff of /2-Transcriptome/README.md [000000] .. [16eabd]

Switch to unified view

a b/2-Transcriptome/README.md
1
2
3
## 1. Software and database required
4
5
**Software:**
6
7
- Cutadapt (v2.5): https://cutadapt.readthedocs.io/en/stable/
8
- Hisat2 (v2.1.0): https://daehwankimlab.github.io/hisat2/
9
- RSEM (v1.3.3): https://deweylab.github.io/RSEM/
10
- DESeq2 (v1.20): https://bioconductor.org/packages/release/bioc/html/DESeq2.html
11
12
**Database:**
13
14
- RSEM reference database(GRch38)https://www.gencodegenes.org/human/
15
16
## 2. Data preparation
17
18
**Create directories and copy raw sequencing files:**
19
20
```shell
21
mkdir 00_rawdata 01_fastp 02_cutadapt 03_rsem
22
mv *.fastq 00_rawdata
23
cd 00_rawdata/
24
ls *_1.fastq.gz > ../filelist
25
cd ..
26
```
27
28
## 3. Quality filtering
29
30
**Adapter sequence detection:**
31
32
```shell
33
for i in `cat filelist`
34
do
35
    j=${i/_1./_2.}
36
    fastp \
37
    --detect_adapter_for_pe \
38
    -i 00_rawdata/$i \
39
    -I 00_rawdata/$j \
40
    -o 01_fastp/$i \
41
    -O 01_fastp/$j \
42
    -z 4 -q 20 -u 40 -n 6
43
done
44
```
45
46
**Remove adapter:**
47
48
```shell
49
for i in `cat filelist`
50
do
51
    j=${i/_1./_2.}
52
    cutadapt \
53
    --discard-trimmed \
54
    -O 18 \
55
    -j 20 \
56
    -b AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
57
    -B AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
58
    -o 02_cutadapt/$i \
59
    -p 02_cutadapt/$j \
60
    00_rawdata/$i \
61
    00_rawdata/$j 
62
done
63
```
64
65
## 4. Sequence alignment and gene expression calculation
66
67
**Install RSEM:**
68
69
```shell
70
wget https://github.com/deweylab/RSEM/archive/v1.3.3.tar.gz
71
tar -zxvf v1.3.3.tar.gz
72
cd rsem
73
make install prefix=/software/biosoft/rsem/ 
74
echo 'PATH=$PATH:/software/biosoft/rsem/bin/' >> ~/.bashrc
75
source ~/.bashrc
76
```
77
78
**Build index:**
79
80
```shell
81
rsem-prepare-reference \
82
-p 20 \
83
--gtf \
84
./gencode.v39.annotation.gtf \
85
--hisat2-hca \
86
./GRCh38.primary_assembly.genome.fa \
87
./grch38
88
```
89
90
**Align RNA-seq against human genome and calculate gene expression:**
91
92
```shell
93
for i in `cat filelist`
94
do
95
    j=${i/_1./_2.}
96
    k=${i%%_*}
97
    rsem-calculate-expression \
98
    --paired-end \
99
    -p 20 \
100
    --hisat2-hca \
101
    02_cutadapt/$i \
102
    02_cutadapt/$j \
103
    /software/biosoft/rsem/database/grch38/grch38 \
104
    03_rsem/$k
105
done
106
```
107
108
The count and FPKM data of all genes in each sample should be present in [SampleID].genes.results
109
110
Concatenate the results of each sample to generate the file count.txt
111
112
## 5. Variance stabilizing transformation of count data
113
114
```R
115
data<-read.table("count.txt",row.names=1,sep="\t",header=T)
116
group<-read.table("group.txt",sep="\t",header=T) ## optional metadata
117
dds <- DESeqDataSetFromMatrix(countData = data, colData = group, design= ~ Group)
118
vst<-assay(varianceStabilizingTransformation(dds))
119
write.table(vst,"transcriptome.txt",append=FALSE,sep="\t",quote=FALSE)
120
```
121