Diff of /docs/mapping_small_rna.md [000000] .. [4c33d4]

Switch to side-by-side view

--- a
+++ b/docs/mapping_small_rna.md
@@ -0,0 +1,70 @@
+# Mapping small RNA-seq
+
+## Prepare genome annotation
+
+For mapping of small RNA-seq reads, exSEEK adopts sequential mapping strategy, which assign reads to gene annotations sequentially according the the ordered defined by the user.
+By default, exSEEK assign reads in the following order:
+
+spike-in, rRNA, lncRNA, miRNA, mRNA, piRNA, snoRNA, snRNA, srpRNA, tRNA, tucpRNA, Y_RNA, genome, circRNA
+
+We derived the genome annotation file from various sources: 
+
+| Type | Number of genes | Source |
+| :--- | :--- | :--- |
+| miRNA | 1917 | miRBase hairpin \(Version 22\) |
+| piRNA | 23431 | piRNABank |
+| lncRNA | 15778 | GENCODE V27 and mitranscriptome |
+| rRNA | 37 | NCBI refSeq 109 |
+| mRNA | 19836 | GENCODE V27 |
+| snoRNA | 943 | GENCODE V27 |
+| snRNA | 1900 | GENCODE V27 |
+| srpRNA | 680 | GENCODE V27 |
+| tRNA | 649 | GENCODE V27 |
+| tucpRNA | 3734 | GENCODE V27 |
+| Y\_RNA | 756 | GENCODE V27 |
+| circRNA | 140527 | circBase |
+| repeats | - | UCSC Genome Browser \(rmsk\) |
+| promoter | - | ChromHMM tracks from 9 cell lines from UCSC Genome Browser |
+| enhancer | - | ChromHMM tracks from 9 cell lines from UCSC Genome Browser |
+
+spike-in is a special type of genome annotation that should be provided by the user if spike-in sequences are used. 
+
+The paths of the bowtie2 index files:
+
+| Type | FASTA file | bowtie2 index file |
+| :--- | :--- | :--- |
+| spike-in | `${genome_dir}/fasta/spikein_small.fa` | `${genome_dir}/index/bowtie2/spikein` |
+| rRNA | `${genome_dir}/fasta/rRNA.fa` | `${genome_dir}/index/bowtie2/rRNA` |
+| miRNA | `${genome_dir}/fasta/miRNA.fa` | `${genome_dir}/rsem_index/bowtie2/miRNA` |
+| piRNA | `${genome_dir}/fasta/piRNA.fa` | `${genome_dir}/rsem_index/bowtie2/piRNA` |
+| lncRNA | `${genome_dir}/fasta/lncRNA.fa` | `${genome_dir}/rsem_index/bowtie2/lncRNA` |
+| mRNA | `${genome_dir}/fasta/mRNA.fa` | `${genome_dir}/rsem_index/bowtie2/mRNA` |
+| snoRNA | `${genome_dir}/fasta/snoRNA.fa` | `${genome_dir}/rsem_index/bowtie2/snoRNA` |
+| snRNA | `${genome_dir}/fasta/snRNA.fa` | `${genome_dir}/rsem_index/bowtie2/snRNA` |
+| srpRNA | `${genome_dir}/fasta/srpRNA.fa` | `${genome_dir}/rsem_index/bowtie2/srpRNA` |
+| tRNA | `${genome_dir}/fasta/tRNA.fa` | `${genome_dir}/rsem_index/bowtie2/tRNA` |
+| tucpRNA | `${genome_dir}/fasta/tucpRNA.fa` | `${genome_dir}/rsem_index/bowtie2/tucpRNA` |
+| Y_RNA | `${genome_dir}/fasta/Y_RNA.fa` | `${genome_dir}/rsem_index/bowtie2/Y_RNA` |
+| circRNA | `${genome_dir}/fasta/circRNA.fa` | `${genome_dir}/rsem_index/bowtie2/circRNA` |
+
+**Note**: `${genome_dir}` is the root directory of genome annotation files.
+
+### Build bowtie2 index for spike-in sequences
+
+If your samples contain spike-in sequences, you should first prepare a FASTA file of your spike-in sequences and copy it to `${genome_dir}/fasta/spikein_small.fa`. 
+Then create an index file (`${genome_dir}/fasta/spikein_small.fai`) by the following command:
+
+```bash
+samtools faidx ${genome_dir}/fasta/spikein_small.fa
+```
+
+Run the following commands to build bowtie2 index files for spike-in sequences:
+
+```bash
+cut -f1,2 ${genome_dir}/fasta/spikein_small.fa.fai > ${genome_dir}/chrom_sizes/spikein_small
+{
+    echo -e 'chrom\tstart\tend\tname\tscore\tstrand\tgene_id\ttranscript_id\tgene_name\ttranscript_name\tgene_type\ttranscript_type\tsource'
+    awk 'BEGIN{OFS="\t";FS="\t"}{print $1,0,$2,$1,0,"+",$1,$1,$1,$1,"spikein","spikein","spikein"}' ${genome_dir}/fasta/spikein_small.fa.fai
+} > ${genome_dir}/transcript_table/spikein_small.txt
+bowtie2-build ${genome_dir}/fasta/spikein_small.fa ${genome_dir}/index/bowtie2/spikein_small
+```