|
a |
|
b/docs/mapping_small_rna.md |
|
|
1 |
# Mapping small RNA-seq |
|
|
2 |
|
|
|
3 |
## Prepare genome annotation |
|
|
4 |
|
|
|
5 |
For mapping of small RNA-seq reads, exSEEK adopts sequential mapping strategy, which assign reads to gene annotations sequentially according the the ordered defined by the user. |
|
|
6 |
By default, exSEEK assign reads in the following order: |
|
|
7 |
|
|
|
8 |
spike-in, rRNA, lncRNA, miRNA, mRNA, piRNA, snoRNA, snRNA, srpRNA, tRNA, tucpRNA, Y_RNA, genome, circRNA |
|
|
9 |
|
|
|
10 |
We derived the genome annotation file from various sources: |
|
|
11 |
|
|
|
12 |
| Type | Number of genes | Source | |
|
|
13 |
| :--- | :--- | :--- | |
|
|
14 |
| miRNA | 1917 | miRBase hairpin \(Version 22\) | |
|
|
15 |
| piRNA | 23431 | piRNABank | |
|
|
16 |
| lncRNA | 15778 | GENCODE V27 and mitranscriptome | |
|
|
17 |
| rRNA | 37 | NCBI refSeq 109 | |
|
|
18 |
| mRNA | 19836 | GENCODE V27 | |
|
|
19 |
| snoRNA | 943 | GENCODE V27 | |
|
|
20 |
| snRNA | 1900 | GENCODE V27 | |
|
|
21 |
| srpRNA | 680 | GENCODE V27 | |
|
|
22 |
| tRNA | 649 | GENCODE V27 | |
|
|
23 |
| tucpRNA | 3734 | GENCODE V27 | |
|
|
24 |
| Y\_RNA | 756 | GENCODE V27 | |
|
|
25 |
| circRNA | 140527 | circBase | |
|
|
26 |
| repeats | - | UCSC Genome Browser \(rmsk\) | |
|
|
27 |
| promoter | - | ChromHMM tracks from 9 cell lines from UCSC Genome Browser | |
|
|
28 |
| enhancer | - | ChromHMM tracks from 9 cell lines from UCSC Genome Browser | |
|
|
29 |
|
|
|
30 |
spike-in is a special type of genome annotation that should be provided by the user if spike-in sequences are used. |
|
|
31 |
|
|
|
32 |
The paths of the bowtie2 index files: |
|
|
33 |
|
|
|
34 |
| Type | FASTA file | bowtie2 index file | |
|
|
35 |
| :--- | :--- | :--- | |
|
|
36 |
| spike-in | `${genome_dir}/fasta/spikein_small.fa` | `${genome_dir}/index/bowtie2/spikein` | |
|
|
37 |
| rRNA | `${genome_dir}/fasta/rRNA.fa` | `${genome_dir}/index/bowtie2/rRNA` | |
|
|
38 |
| miRNA | `${genome_dir}/fasta/miRNA.fa` | `${genome_dir}/rsem_index/bowtie2/miRNA` | |
|
|
39 |
| piRNA | `${genome_dir}/fasta/piRNA.fa` | `${genome_dir}/rsem_index/bowtie2/piRNA` | |
|
|
40 |
| lncRNA | `${genome_dir}/fasta/lncRNA.fa` | `${genome_dir}/rsem_index/bowtie2/lncRNA` | |
|
|
41 |
| mRNA | `${genome_dir}/fasta/mRNA.fa` | `${genome_dir}/rsem_index/bowtie2/mRNA` | |
|
|
42 |
| snoRNA | `${genome_dir}/fasta/snoRNA.fa` | `${genome_dir}/rsem_index/bowtie2/snoRNA` | |
|
|
43 |
| snRNA | `${genome_dir}/fasta/snRNA.fa` | `${genome_dir}/rsem_index/bowtie2/snRNA` | |
|
|
44 |
| srpRNA | `${genome_dir}/fasta/srpRNA.fa` | `${genome_dir}/rsem_index/bowtie2/srpRNA` | |
|
|
45 |
| tRNA | `${genome_dir}/fasta/tRNA.fa` | `${genome_dir}/rsem_index/bowtie2/tRNA` | |
|
|
46 |
| tucpRNA | `${genome_dir}/fasta/tucpRNA.fa` | `${genome_dir}/rsem_index/bowtie2/tucpRNA` | |
|
|
47 |
| Y_RNA | `${genome_dir}/fasta/Y_RNA.fa` | `${genome_dir}/rsem_index/bowtie2/Y_RNA` | |
|
|
48 |
| circRNA | `${genome_dir}/fasta/circRNA.fa` | `${genome_dir}/rsem_index/bowtie2/circRNA` | |
|
|
49 |
|
|
|
50 |
**Note**: `${genome_dir}` is the root directory of genome annotation files. |
|
|
51 |
|
|
|
52 |
### Build bowtie2 index for spike-in sequences |
|
|
53 |
|
|
|
54 |
If your samples contain spike-in sequences, you should first prepare a FASTA file of your spike-in sequences and copy it to `${genome_dir}/fasta/spikein_small.fa`. |
|
|
55 |
Then create an index file (`${genome_dir}/fasta/spikein_small.fai`) by the following command: |
|
|
56 |
|
|
|
57 |
```bash |
|
|
58 |
samtools faidx ${genome_dir}/fasta/spikein_small.fa |
|
|
59 |
``` |
|
|
60 |
|
|
|
61 |
Run the following commands to build bowtie2 index files for spike-in sequences: |
|
|
62 |
|
|
|
63 |
```bash |
|
|
64 |
cut -f1,2 ${genome_dir}/fasta/spikein_small.fa.fai > ${genome_dir}/chrom_sizes/spikein_small |
|
|
65 |
{ |
|
|
66 |
echo -e 'chrom\tstart\tend\tname\tscore\tstrand\tgene_id\ttranscript_id\tgene_name\ttranscript_name\tgene_type\ttranscript_type\tsource' |
|
|
67 |
awk 'BEGIN{OFS="\t";FS="\t"}{print $1,0,$2,$1,0,"+",$1,$1,$1,$1,"spikein","spikein","spikein"}' ${genome_dir}/fasta/spikein_small.fa.fai |
|
|
68 |
} > ${genome_dir}/transcript_table/spikein_small.txt |
|
|
69 |
bowtie2-build ${genome_dir}/fasta/spikein_small.fa ${genome_dir}/index/bowtie2/spikein_small |
|
|
70 |
``` |