COPD_multiomics / Git / Diff of /2-Transcriptome/README.md

Models:

AlyssaS/

COPD_multiomics

Downloads: 1

Diff of /2-Transcriptome/README.md [000000] .. [16eabd]

Switch to unified view

 b/2-Transcriptome/README.md
+## 1. Software and database required
+**Software:**
+- Cutadapt (v2.5): https://cutadapt.readthedocs.io/en/stable/
+- Hisat2 (v2.1.0): https://daehwankimlab.github.io/hisat2/
+- RSEM (v1.3.3): https://deweylab.github.io/RSEM/
+- DESeq2 (v1.20): https://bioconductor.org/packages/release/bioc/html/DESeq2.html
+**Database:**
+- RSEM reference database（GRch38）https://www.gencodegenes.org/human/
+## 2. Data preparation
+**Create directories and copy raw sequencing files：**
+```shell
+mkdir 00_rawdata 01_fastp 02_cutadapt 03_rsem
+mv *.fastq 00_rawdata
+cd 00_rawdata/
+ls *_1.fastq.gz > ../filelist
+cd ..
+```
+## 3. Quality filtering
+**Adapter sequence detection：**
+```shell
+for i in `cat filelist`
+do
+    j=${i/_1./_2.}
+    fastp \
+    --detect_adapter_for_pe \
+    -i 00_rawdata/$i \
+    -I 00_rawdata/$j \
+    -o 01_fastp/$i \
+    -O 01_fastp/$j \
+    -z 4 -q 20 -u 40 -n 6
+done
+```
+**Remove adapter：**
+```shell
+for i in `cat filelist`
+do
+    j=${i/_1./_2.}
+    cutadapt \
+    --discard-trimmed \
+    -O 18 \
+    -j 20 \
+    -b AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
+    -B AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
+    -o 02_cutadapt/$i \
+    -p 02_cutadapt/$j \
+_rawdata/$i \
+_rawdata/$j
+done
+```
+## 4. Sequence alignment and gene expression calculation
+**Install RSEM:**
+```shell
+wget https://github.com/deweylab/RSEM/archive/v1.3.3.tar.gz
+tar -zxvf v1.3.3.tar.gz
+cd rsem
+make install prefix=/software/biosoft/rsem/
+echo 'PATH=$PATH:/software/biosoft/rsem/bin/' >> ~/.bashrc
+source ~/.bashrc
+```
+**Build index：**
+```shell
+rsem-prepare-reference \
+-p 20 \
+--gtf \
+./gencode.v39.annotation.gtf \
+--hisat2-hca \
+./GRCh38.primary_assembly.genome.fa \
+./grch38
+```
+**Align RNA-seq against human genome and calculate gene expression:**
+```shell
+for i in `cat filelist`
+do
+    j=${i/_1./_2.}
+    k=${i%%_*}
+    rsem-calculate-expression \
+    --paired-end \
+    -p 20 \
+    --hisat2-hca \
+_cutadapt/$i \
+_cutadapt/$j \
+    /software/biosoft/rsem/database/grch38/grch38 \
+_rsem/$k
+done
+```
+The count and FPKM data of all genes in each sample should be present in [SampleID].genes.results
+Concatenate the results of each sample to generate the file count.txt
+## 5. Variance stabilizing transformation of count data
+```R
+data<-read.table("count.txt",row.names=1,sep="\t",header=T)
+group<-read.table("group.txt",sep="\t",header=T) ## optional metadata
+dds <- DESeqDataSetFromMatrix(countData = data, colData = group, design= ~ Group)
+vst<-assay(varianceStabilizingTransformation(dds))
+write.table(vst,"transcriptome.txt",append=FALSE,sep="\t",quote=FALSE)
+```