General bulk RNA-seq analysis

Preprocess

Detailed source code is located here bulkRNApre.R with a vignette explaining each step. The flow chart of preprocess pipeline of bulk RNA-seq shows in the bottom.

Software preparation

By default, the path to these installed software should be added to $PATH in order to use them directly. Or we can install them in a virtual environment like conda. Specifically, we can use the full path to the software in commands if necessary.

0. Dictionary design

After all the preprocess, the dictionary structure will look like this:

0_fastq  1_qc  2_trim  3_qc  4_map  5_infer  6_count  code  raw

The raw sub-dir contains the raw fastq files need (download from public databases, see downloadData.R for detail)

total 13G
drwxr-xr-x  2 lvyl lic 4.0K Mar 31 00:36 ./
drwxr-xr-x 12 lvyl lic 4.0K Mar 31 16:51 ../
-rw-r--r--  1 lvyl lic 925M Mar 30 23:17 SRR11459602.sra_1.fastq.gz
-rw-r--r--  1 lvyl lic 955M Mar 30 23:17 SRR11459602.sra_2.fastq.gz
-rw-r--r--  1 lvyl lic 927M Mar 30 23:24 SRR11459603.sra_1.fastq.gz
-rw-r--r--  1 lvyl lic 947M Mar 30 23:24 SRR11459603.sra_2.fastq.gz
-rw-r--r--  1 lvyl lic 899M Mar 30 23:32 SRR11459604.sra_1.fastq.gz
-rw-r--r--  1 lvyl lic 954M Mar 30 23:32 SRR11459604.sra_2.fastq.gz
-rw-r--r--  1 lvyl lic 894M Mar 30 23:39 SRR11459605.sra_1.fastq.gz
-rw-r--r--  1 lvyl lic 894M Mar 30 23:39 SRR11459605.sra_2.fastq.gz
-rw-r--r--  1 lvyl lic 891M Mar 30 23:55 SRR11459606.sra_1.fastq.gz
-rw-r--r--  1 lvyl lic 907M Mar 30 23:55 SRR11459606.sra_2.fastq.gz
-rw-r--r--  1 lvyl lic 832M Mar 31 00:07 SRR11459607.sra_1.fastq.gz
-rw-r--r--  1 lvyl lic 840M Mar 31 00:07 SRR11459607.sra_2.fastq.gz
-rw-r--r--  1 lvyl lic 853M Mar 31 00:20 SRR11459608.sra_1.fastq.gz
-rw-r--r--  1 lvyl lic 868M Mar 31 00:20 SRR11459608.sra_2.fastq.gz

There are some special designing in the pipeline.

We use symbol links to rename raw files. There are two major advantages: 1. clarify the original file of renamed files. 2. save disk space, deleting raw files will release the used space. Hard links could not do these.

total 8.0K
drwxr-xr-x  2 lvyl lic 4.0K Mar 31 01:04 ./
drwxr-xr-x 12 lvyl lic 4.0K Mar 31 16:51 ../
lrwxrwxrwx  1 lvyl lic   33 Mar 31 00:59 F-ES1_1.fq.gz -> ../raw/SRR11459602.sra_1.fastq.gz
lrwxrwxrwx  1 lvyl lic   33 Mar 31 00:59 F-ES1_2.fq.gz -> ../raw/SRR11459602.sra_2.fastq.gz
lrwxrwxrwx  1 lvyl lic   33 Mar 31 00:59 F-ES2_1.fq.gz -> ../raw/SRR11459603.sra_1.fastq.gz
lrwxrwxrwx  1 lvyl lic   33 Mar 31 00:59 F-ES2_2.fq.gz -> ../raw/SRR11459603.sra_2.fastq.gz
lrwxrwxrwx  1 lvyl lic   33 Mar 31 00:59 F-H1_1.fq.gz -> ../raw/SRR11459604.sra_1.fastq.gz
lrwxrwxrwx  1 lvyl lic   33 Mar 31 00:59 F-H1_2.fq.gz -> ../raw/SRR11459604.sra_2.fastq.gz
lrwxrwxrwx  1 lvyl lic   33 Mar 31 00:59 XF-ES1_1.fq.gz -> ../raw/SRR11459605.sra_1.fastq.gz
lrwxrwxrwx  1 lvyl lic   33 Mar 31 00:59 XF-ES1_2.fq.gz -> ../raw/SRR11459605.sra_2.fastq.gz
lrwxrwxrwx  1 lvyl lic   33 Mar 31 00:59 XF-ES2_1.fq.gz -> ../raw/SRR11459606.sra_1.fastq.gz
lrwxrwxrwx  1 lvyl lic   33 Mar 31 00:59 XF-ES2_2.fq.gz -> ../raw/SRR11459606.sra_2.fastq.gz
lrwxrwxrwx  1 lvyl lic   33 Mar 31 00:59 XF-H1_1.fq.gz -> ../raw/SRR11459607.sra_1.fastq.gz
lrwxrwxrwx  1 lvyl lic   33 Mar 31 00:59 XF-H1_2.fq.gz -> ../raw/SRR11459607.sra_2.fastq.gz
lrwxrwxrwx  1 lvyl lic   33 Mar 31 00:59 XF-iPS_1.fq.gz -> ../raw/SRR11459608.sra_1.fastq.gz
lrwxrwxrwx  1 lvyl lic   33 Mar 31 00:59 XF-iPS_2.fq.gz -> ../raw/SRR11459608.sra_2.fastq.gz

We split the multiple commands into batched scripts in some time consuming steps. For example, if we have 100 mapping tasks, we could split these tasks to 10 batched scripts with 10 commands one by one in each script. In this way, we can run these commands ‘parallelly’. If each script uses 20 cores, we will use total 10 * 20 = 200 cores at the same time in a local server. However, we can submit these tasks to a computing cluster with the 10 scripts running on 10 different nodes. A special function is designed to fulfill this demand.

code/
├── 1_qc.sh
├── 2_trim
│   ├── batch0.sh
│   ├── batch1.sh
│   ├── batch2.sh
│   ├── batch3.sh
│   ├── batch4.sh
│   ├── batch5.sh
│   ├── batch6.sh
│   └── submit.sh
├── 3_qc.sh
├── 4_map
│   ├── batch0.sh
│   ├── batch1.sh
│   ├── batch2.sh
│   ├── batch3.sh
│   ├── batch4.sh
│   ├── batch5.sh
│   ├── batch6.sh
│   └── submit.sh
├── 5_infer.sh
└── 6_count.sh

1. Quality Control

The quality control is performed using FastQC. Then multiple QC reports can be integrated to one report using multiQC.

2. Trimming adapters

After removing the Illumina universal adapter TruSeq3-PE-2.fa using Trimmomatic, we can get clean reads with no adapter left.

3. Mapping reads to reference genome

The mapping is performed using STAR.

4. Counting reads of each feature

Before counting, we need to clearify whether sequencing are strand-specific. This inferance is performed using RSeQC. The result will be noisy if we set wrong parameters to count strand-specific sequencing reads.

After counting, the file counts.txt will be used to perform downstream analysis.

Downstream analysis

1. Loading count data

Complete codes of this part can be find here.

First, some packages are loaded in convenient.

library(tidyverse)
library(magrittr)

All the raw count files are located in specified dir. Here we only have one.

dataPath <- "data"
(usedData <- list.files(dataPath, "txt", full = T))

##  [1] "data/counts.txt"

Next reading all the count files while removing their fist 7 columns of gene annotation data.

dataMtx <- map(usedData, ~ read_delim(.x, delim = "\t", comment = "#")) %>%
  map(~ .x[-1:-7]) %>%
  purrr::reduce(cbind) %>%
  as.matrix()

##  Parsed with column specification:
##  cols(
##    Geneid = col_character(),
##    Chr = col_character(),
##    Start = col_character(),
##    End = col_character(),
##    Strand = col_character(),
##    Length = col_double(),
##    gene_name = col_character(),
##    `4_map/F-ES1Aligned.out.bam` = col_double(),
##    `4_map/F-ES2Aligned.out.bam` = col_double(),
##    `4_map/F-H1Aligned.out.bam` = col_double(),
##    `4_map/XF-ES1Aligned.out.bam` = col_double(),
##    `4_map/XF-ES2Aligned.out.bam` = col_double(),
##    `4_map/XF-H1Aligned.out.bam` = col_double(),
##    `4_map/XF-iPSAligned.out.bam` = col_double()
##  )

head(dataMtx)

##       4_map/F-ES1Aligned.out.bam 4_map/F-ES2Aligned.out.bam
##  [1,]                          0                          0
##  [2,]                          0                          0
##  [3,]                          0                          0
##  [4,]                          1                          0
##  [5,]                          0                          0
##  [6,]                        252                        224
##       4_map/F-H1Aligned.out.bam 4_map/XF-ES1Aligned.out.bam
##  [1,]                         1                           0
##  [2,]                         0                           0
##  [3,]                         0                           0
##  [4,]                         0                           2
##  [5,]                         0                           0
##  [6,]                       176                        1187
##       4_map/XF-ES2Aligned.out.bam 4_map/XF-H1Aligned.out.bam
##  [1,]                           1                          0
##  [2,]                           0                          0
##  [3,]                           0                          0
##  [4,]                           2                          2
##  [5,]                           0                          0
##  [6,]                        1230                        932
##       4_map/XF-iPSAligned.out.bam
##  [1,]                           4
##  [2,]                           0
##  [3,]                           0
##  [4,]                           0
##  [5,]                           0
##  [6,]                         613

These gene annotation data are also useful, so we can seperate and save them to load when in need.

anno <- readRDS("../../data/hg19anno.rds")
head(anno)

##             Geneid                       Chr
##  1 ENSG00000243485               1;1;1;1;1;1
##  2 ENSG00000237613                 1;1;1;1;1
##  3 ENSG00000186092                         1
##  4 ENSG00000238009 1;1;1;1;1;1;1;1;1;1;1;1;1
##  5 ENSG00000239945                       1;1
##  6 ENSG00000237683                       1;1
##                                                                                      Start
##  1                                                     29554;30267;30366;30564;30976;30976
##  2                                                           34554;35245;35277;35721;35721
##  3                                                                                   69091
##  4 89295;92091;92230;110953;112700;112700;112700;120721;120775;129055;129055;129081;133374
##  5                                                                             89551;90287
##  6                                                                           134901;137621
##                                                                                        End
##  1                                                     30039;30667;30503;30667;31109;31097
##  2                                                           35174;35481;35481;36073;36081
##  3                                                                                   70008
##  4 91629;92240;92240;111357;112804;112804;112804;120932;120932;129173;129217;129223;133566
##  5                                                                             90050;91105
##  6                                                                           135802;139379
##                       Strand Length    gene_name
##  1               +;+;+;+;+;+   1021   MIR1302-10
##  2                 -;-;-;-;-   1219      FAM138A
##  3                         +    918        OR4F5
##  4 -;-;-;-;-;-;-;-;-;-;-;-;-   3569 RP11-34P13.7
##  5                       -;-   1319 RP11-34P13.8
##  6                       -;-   2661   AL627309.1

We can give dataMtx meaningful row names as gene names and column names as sample name.

rownames(dataMtx) <- scater::uniquifyFeatureNames(anno$Geneid, anno$gene_name)
colnames(dataMtx) %<>% str_replace_all(".*map/|Aligned.*", "")
head(dataMtx)

##               F-ES1 F-ES2 F-H1 XF-ES1 XF-ES2 XF-H1 XF-iPS
##  MIR1302-10       0     0    1      0      1     0      4
##  FAM138A          0     0    0      0      0     0      0
##  OR4F5            0     0    0      0      0     0      0
##  RP11-34P13.7     1     0    0      2      2     2      0
##  RP11-34P13.8     0     0    0      0      0     0      0
##  AL627309.1     252   224  176   1187   1230   932    613

Then the count matrix should be saved for later use.

saveRDS(dataMtx, "dataMtx.rds")

2. Differentially expressed gene analysis

Using DESeq2

For samples with duplicates, we can use DESeq2 to perform DEG analysis.

Complete codes of this part can be find here.

library(glue)
library(DESeq2)

We can select part of samples to analyze. Here we keep all.

colnames(dataMtx)

##  [1] "F-ES1"  "F-ES2"  "F-H1"   "XF-ES1" "XF-ES2" "XF-H1"  "XF-iPS"

usedMtx <- dataMtx[, 1:7]

A SummarizedExperiment object should be created using count data. We can set condition factors to tell DESeq2 how these samples are grouped. Then we can create a DESeqDataSet using these data.

# dir.create("DESeq2")

se <- SummarizedExperiment(assays = list(counts = as.matrix(usedMtx)))
se$condition <- factor(
  rep(c("F", "XF"), c(3, 4)),
  levels = c("F", "XF")
)

dds <- DESeqDataSet(se, design = ~ condition)

##  converting counts to integer mode

dds

##  class: DESeqDataSet 
##  dim: 32738 7 
##  metadata(1): version
##  assays(1): counts
##  rownames(32738): MIR1302-10 FAM138A ... AC002321.2 AC002321.1
##  rowData names(0):
##  colnames(7): F-ES1 F-ES2 ... XF-H1 XF-iPS
##  colData names(1): condition

Lowly expressed genes among samples will be filtered.

rs <- rowMeans(usedMtx)
geneKeep <- rs > quantile(rs, 0.4)
sum(geneKeep)

##  [1] 19562

dds <- dds[geneKeep, ]

The variance stabilizing transformated matrix will be used for visualization later.

vsd <- vst(dds, blind = T)
saveRDS(vsd, "DESeq2/vsd.rds")

DEG analysis will be performed between groups of samples.

dds <- DESeq(dds)

##  estimating size factors

##  estimating dispersions

##  gene-wise dispersion estimates

##  mean-dispersion relationship

##  final dispersion estimates

##  fitting model and testing

resultsNames(dds)

##  [1] "Intercept"         "condition_XF_vs_F"

The results of DEG can be exported as Excel readable .csv files.

DEGres <- list()
DEGres$XF_vs_F <- lfcShrink(dds, coef = "condition_XF_vs_F", type = "apeglm")

iwalk(DEGres, ~ write.csv(.x, glue("DESeq2/{.y}.DEG.csv")))

saveRDS(dds, "DESeq2/dds.rds")

Using DESeq

For samples with no replicates, we can use DESeq to get a ‘p value’ for each gene between one sample and another. It is not very useful for well designed experiments. However, we can still use it for visulization.

Complete codes of this part can be find here.

library(DESeq)

dataMtx <- readRDS("dataMtx.rds")

colnames(dataMtx)
usedMtx <- dataMtx[c(
  c("wanted sample names"),
  NULL
)]

After loading data and selecting wanted samples, we can create a CountDataSet object with count data matrix and condition factor. Note that we should set the parameters method = "blind" and sharingMode = "fit-only" here if we have no replicates.

dir.create("DESeq")

condition <- factor(
  c("a", "b", "c"),
  levels = c("a", "b", "c")
)

cds <- newCountDataSet(usedMtx, condition) %>%
  estimateSizeFactors() %>%
  estimateDispersions(method = "blind", sharingMode = "fit-only")

The variance stabilizing transformated matrix will be used for visualization later.

vsd <- varianceStabilizingTransformation(cds)
saveRDS(vsd, "DESeq/vsd.rds")

Perform DEG test using nbinomTest if there is no other choice.

# DEGres <- list()
# DEGres$a_vs_b <- nbinomTest(cds, "a", "b")
# DEGres$a_vs_c <- nbinomTest(cds, "a", "c")

# iwalk(DEGres, ~ write_csv(.x, glue("DESeq/{.y}.DEG.csv")))

3. Plot samples and DEGs

We can perform PCA and visualize the results.

Detailed codes for this plot can be find here.

For DEGs, there are some ways to visualize the difference between group of samples.

volcano plot

Detailed codes for this plot can be find here.

MA plot

Detailed codes for this plot can be find here.

Heatmap is a good way to describe the comprehensive expression pattern of used samples. We can plot heatmaps with a large amount of genes with some interested ones labeled on the side.

Or we can just plot heatmaps with selected interesting genes.

Detailed codes for this two plots can be find here.

4. Performing GO analysis

To understand the biology function of differentially expressed genes, GO analysis will be performed using clusterProfiler.

library(data.table)
library(org.Hs.eg.db)
library(clusterProfiler)

DEGs will be loaded as data.table with NAs of adjusted p value are set to 1.

diffData <- fread("DESeq2/XF_vs_F.DEG.csv")
colnames(diffData)[1] <- "gene"

diffData[is.na(padj), padj := 1][]

##                           gene   baseMean log2FoldChange     lfcSE
##      1:             AL627309.1 650.219270    2.124178784 0.2881312
##      2:          RP11-34P13.14   3.886484    0.036843965 0.2206781
##      3:             AP006222.2   4.870546    0.025704071 0.2175207
##      4:          RP4-669L17.10  26.311563    0.143448138 0.2707604
##      5:          RP11-206L10.3  22.037879    0.870527760 0.8516016
##     ---                                                           
##  19558:             AC011841.1  32.805143    0.093406180 0.2395464
##  19559:             AL354822.1  18.898812    0.010123712 0.2168090
##  19560:  PNRC2_ENSG00000215700 108.159850   -0.008642293 0.2152445
##  19561: SRSF10_ENSG00000215699 268.299806   -0.028366678 0.1671313
##  19562:             CU459201.1   9.041196    0.007793443 0.2105759
##               pvalue         padj
##      1: 7.097370e-15 2.965104e-11
##      2: 3.821245e-02 1.000000e+00
##      3: 2.837381e-01 1.000000e+00
##      4: 6.858933e-02 3.549276e-01
##      5: 6.466336e-03 1.004265e-01
##     ---                          
##  19558: 8.653553e-02 3.924275e-01
##  19559:           NA 1.000000e+00
##  19560: 7.245608e-01 9.135938e-01
##  19561: 7.895098e-01 9.382771e-01
##  19562: 8.778994e-01 9.677162e-01

The genes with fold change > 2 or < -2 with significant difference are defined as up- or down-regulated, respectively.

diffData[, type := "ns"]
diffData[log2FoldChange > 1 & padj < 0.05, type := "up"][log2FoldChange < -1 & padj < 0.05, type := "down"][]

##                           gene   baseMean log2FoldChange     lfcSE
##      1:             AL627309.1 650.219270    2.124178784 0.2881312
##      2:          RP11-34P13.14   3.886484    0.036843965 0.2206781
##      3:             AP006222.2   4.870546    0.025704071 0.2175207
##      4:          RP4-669L17.10  26.311563    0.143448138 0.2707604
##      5:          RP11-206L10.3  22.037879    0.870527760 0.8516016
##     ---                                                           
##  19558:             AC011841.1  32.805143    0.093406180 0.2395464
##  19559:             AL354822.1  18.898812    0.010123712 0.2168090
##  19560:  PNRC2_ENSG00000215700 108.159850   -0.008642293 0.2152445
##  19561: SRSF10_ENSG00000215699 268.299806   -0.028366678 0.1671313
##  19562:             CU459201.1   9.041196    0.007793443 0.2105759
##               pvalue         padj type
##      1: 7.097370e-15 2.965104e-11   up
##      2: 3.821245e-02 1.000000e+00   ns
##      3: 2.837381e-01 1.000000e+00   ns
##      4: 6.858933e-02 3.549276e-01   ns
##      5: 6.466336e-03 1.004265e-01   ns
##     ---                               
##  19558: 8.653553e-02 3.924275e-01   ns
##  19559:           NA 1.000000e+00   ns
##  19560: 7.245608e-01 9.135938e-01   ns
##  19561: 7.895098e-01 9.382771e-01   ns
##  19562: 8.778994e-01 9.677162e-01   ns

These gene sets are converted to Entrez ID and perform GO enrichment test.

geneList <- list()
geneList$XFup <- diffData[type == "up", gene]
geneList$Fup <- diffData[type == "down", gene]

egoList <- map(geneList, ~ {
  enrichGO(
    gene = na.omit(select(org.Hs.eg.db, keys = .x, columns = "ENTREZID", keytype = "SYMBOL")$ENTREZID),
    OrgDb = "org.Hs.eg.db", ont = "BP", pvalueCutoff = 1, qvalueCutoff = 1, readable = T)
})

iwalk(egoList, ~ write.csv(.x@result, str_c(.y, ".GO.csv")))

The top enriched terms of each gene set can be visualized using histogram.

Detailed codes for this plot can be find here.

Other content will be added soon.

sessionInfo()

##  R version 3.6.3 (2020-02-29)
##  Platform: x86_64-pc-linux-gnu (64-bit)
##  Running under: Ubuntu 20.04.2 LTS
##  
##  Matrix products: default
##  BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
##  LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##  
##  locale:
##   [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##   [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##   [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
##  [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
##  
##  attached base packages:
##  [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
##  [8] methods   base     
##  
##  other attached packages:
##   [1] clusterProfiler_3.14.3      org.Hs.eg.db_3.10.0        
##   [3] AnnotationDbi_1.48.0        data.table_1.12.8          
##   [5] DESeq2_1.26.0               SummarizedExperiment_1.16.0
##   [7] DelayedArray_0.12.0         BiocParallel_1.20.0        
##   [9] matrixStats_0.56.0          Biobase_2.46.0             
##  [11] GenomicRanges_1.38.0        GenomeInfoDb_1.22.0        
##  [13] IRanges_2.20.0              S4Vectors_0.24.0           
##  [15] BiocGenerics_0.32.0         glue_1.4.2                 
##  [17] magrittr_1.5                forcats_0.4.0              
##  [19] stringr_1.4.0               dplyr_1.0.2                
##  [21] purrr_0.3.3                 readr_1.3.1                
##  [23] tidyr_1.0.0                 tibble_3.0.3               
##  [25] ggplot2_3.3.2               tidyverse_1.3.0            
##  
##  loaded via a namespace (and not attached):
##    [1] readxl_1.3.1               backports_1.1.5           
##    [3] Hmisc_4.2-0                fastmatch_1.1-0           
##    [5] plyr_1.8.6                 igraph_1.2.5              
##    [7] splines_3.6.3              scater_1.14.0             
##    [9] urltools_1.7.3             digest_0.6.27             
##   [11] htmltools_0.5.1.1          GOSemSim_2.12.0           
##   [13] viridis_0.5.1              GO.db_3.10.0              
##   [15] fansi_0.4.0                checkmate_1.9.4           
##   [17] memoise_1.1.0              cluster_2.1.0             
##   [19] graphlayouts_0.6.0         annotate_1.64.0           
##   [21] modelr_0.1.5               prettyunits_1.0.2         
##   [23] enrichplot_1.6.1           colorspace_1.4-1          
##   [25] ggrepel_0.8.1              blob_1.2.0                
##   [27] rvest_0.3.5                haven_2.2.0               
##   [29] xfun_0.22                  crayon_1.3.4              
##   [31] RCurl_1.95-4.12            jsonlite_1.6              
##   [33] genefilter_1.68.0          survival_3.1-8            
##   [35] polyclip_1.10-0            gtable_0.3.0              
##   [37] zlibbioc_1.32.0            XVector_0.26.0            
##   [39] BiocSingular_1.2.0         SingleCellExperiment_1.8.0
##   [41] scales_1.0.0               DOSE_3.12.0               
##   [43] DBI_1.0.0                  Rcpp_1.0.5                
##   [45] progress_1.2.2             viridisLite_0.3.0         
##   [47] xtable_1.8-4               htmlTable_1.13.2          
##   [49] gridGraphics_0.5-0         europepmc_0.3             
##   [51] foreign_0.8-75             bit_1.1-14                
##   [53] rsvd_1.0.2                 Formula_1.2-3             
##   [55] htmlwidgets_1.5.1          httr_1.4.1                
##   [57] fgsea_1.12.0               RColorBrewer_1.1-2        
##   [59] acepack_1.4.1              ellipsis_0.3.0            
##   [61] farver_2.0.1               pkgconfig_2.0.3           
##   [63] XML_3.98-1.20              nnet_7.3-13               
##   [65] sass_0.3.1                 dbplyr_1.4.2              
##   [67] locfit_1.5-9.1             ggplotify_0.0.5           
##   [69] tidyselect_1.1.0           rlang_0.4.7               
##   [71] reshape2_1.4.3             munsell_0.5.0             
##   [73] cellranger_1.1.0           tools_3.6.3               
##   [75] cli_2.0.0                  generics_0.0.2            
##   [77] RSQLite_2.1.2              ggridges_0.5.1            
##   [79] broom_0.7.1                evaluate_0.14             
##   [81] yaml_2.2.0                 knitr_1.25                
##   [83] bit64_0.9-7                fs_1.3.1                  
##   [85] tidygraph_1.1.2            ggraph_2.0.2              
##   [87] DO.db_2.9                  xml2_1.2.2                
##   [89] compiler_3.6.3             rstudioapi_0.10           
##   [91] beeswarm_0.2.3             reprex_0.3.0              
##   [93] tweenr_1.0.1               geneplotter_1.64.0        
##   [95] bslib_0.2.4                stringi_1.5.3             
##   [97] lattice_0.20-41            Matrix_1.2-18             
##   [99] vctrs_0.3.4                pillar_1.4.6              
##  [101] lifecycle_0.2.0            BiocManager_1.30.10       
##  [103] triebeard_0.3.0            jquerylib_0.1.3           
##  [105] BiocNeighbors_1.4.1        cowplot_1.0.0             
##  [107] bitops_1.0-6               irlba_2.3.3               
##  [109] qvalue_2.18.0              R6_2.4.0                  
##  [111] latticeExtra_0.6-28        gridExtra_2.3             
##  [113] vipor_0.4.5                MASS_7.3-51.5             
##  [115] assertthat_0.2.1           withr_2.1.2               
##  [117] GenomeInfoDbData_1.2.2     hms_0.5.2                 
##  [119] grid_3.6.3                 rpart_4.1-15              
##  [121] rvcheck_0.1.8              rmarkdown_2.7             
##  [123] DelayedMatrixStats_1.8.0   ggforce_0.3.1             
##  [125] lubridate_1.7.4            base64enc_0.1-3           
##  [127] ggbeeswarm_0.6.0

General bulk RNA-seq analysis

Yulin Lyu lvyulin@pku.edu.cn

2021-04-05

Preprocess

Software preparation

0. Dictionary design

1. Quality Control

2. Trimming adapters

3. Mapping reads to reference genome

4. Counting reads of each feature

Downstream analysis

1. Loading count data

2. Differentially expressed gene analysis

Using DESeq2

Using DESeq

3. Plot samples and DEGs

4. Performing GO analysis