For mapping of small RNA-seq reads, exSEEK adopts sequential mapping strategy, which assign reads to gene annotations sequentially according the the ordered defined by the user.
By default, exSEEK assign reads in the following order:
spike-in, rRNA, lncRNA, miRNA, mRNA, piRNA, snoRNA, snRNA, srpRNA, tRNA, tucpRNA, Y_RNA, genome, circRNA
We derived the genome annotation file from various sources:
Type | Number of genes | Source |
---|---|---|
miRNA | 1917 | miRBase hairpin (Version 22) |
piRNA | 23431 | piRNABank |
lncRNA | 15778 | GENCODE V27 and mitranscriptome |
rRNA | 37 | NCBI refSeq 109 |
mRNA | 19836 | GENCODE V27 |
snoRNA | 943 | GENCODE V27 |
snRNA | 1900 | GENCODE V27 |
srpRNA | 680 | GENCODE V27 |
tRNA | 649 | GENCODE V27 |
tucpRNA | 3734 | GENCODE V27 |
Y_RNA | 756 | GENCODE V27 |
circRNA | 140527 | circBase |
repeats | - | UCSC Genome Browser (rmsk) |
promoter | - | ChromHMM tracks from 9 cell lines from UCSC Genome Browser |
enhancer | - | ChromHMM tracks from 9 cell lines from UCSC Genome Browser |
spike-in is a special type of genome annotation that should be provided by the user if spike-in sequences are used.
The paths of the bowtie2 index files:
Type | FASTA file | bowtie2 index file |
---|---|---|
spike-in | ${genome_dir}/fasta/spikein_small.fa |
${genome_dir}/index/bowtie2/spikein |
rRNA | ${genome_dir}/fasta/rRNA.fa |
${genome_dir}/index/bowtie2/rRNA |
miRNA | ${genome_dir}/fasta/miRNA.fa |
${genome_dir}/rsem_index/bowtie2/miRNA |
piRNA | ${genome_dir}/fasta/piRNA.fa |
${genome_dir}/rsem_index/bowtie2/piRNA |
lncRNA | ${genome_dir}/fasta/lncRNA.fa |
${genome_dir}/rsem_index/bowtie2/lncRNA |
mRNA | ${genome_dir}/fasta/mRNA.fa |
${genome_dir}/rsem_index/bowtie2/mRNA |
snoRNA | ${genome_dir}/fasta/snoRNA.fa |
${genome_dir}/rsem_index/bowtie2/snoRNA |
snRNA | ${genome_dir}/fasta/snRNA.fa |
${genome_dir}/rsem_index/bowtie2/snRNA |
srpRNA | ${genome_dir}/fasta/srpRNA.fa |
${genome_dir}/rsem_index/bowtie2/srpRNA |
tRNA | ${genome_dir}/fasta/tRNA.fa |
${genome_dir}/rsem_index/bowtie2/tRNA |
tucpRNA | ${genome_dir}/fasta/tucpRNA.fa |
${genome_dir}/rsem_index/bowtie2/tucpRNA |
Y_RNA | ${genome_dir}/fasta/Y_RNA.fa |
${genome_dir}/rsem_index/bowtie2/Y_RNA |
circRNA | ${genome_dir}/fasta/circRNA.fa |
${genome_dir}/rsem_index/bowtie2/circRNA |
Note: ${genome_dir}
is the root directory of genome annotation files.
If your samples contain spike-in sequences, you should first prepare a FASTA file of your spike-in sequences and copy it to ${genome_dir}/fasta/spikein_small.fa
.
Then create an index file (${genome_dir}/fasta/spikein_small.fai
) by the following command:
samtools faidx ${genome_dir}/fasta/spikein_small.fa
Run the following commands to build bowtie2 index files for spike-in sequences:
cut -f1,2 ${genome_dir}/fasta/spikein_small.fa.fai > ${genome_dir}/chrom_sizes/spikein_small
{
echo -e 'chrom\tstart\tend\tname\tscore\tstrand\tgene_id\ttranscript_id\tgene_name\ttranscript_name\tgene_type\ttranscript_type\tsource'
awk 'BEGIN{OFS="\t";FS="\t"}{print $1,0,$2,$1,0,"+",$1,$1,$1,$1,"spikein","spikein","spikein"}' ${genome_dir}/fasta/spikein_small.fa.fai
} > ${genome_dir}/transcript_table/spikein_small.txt
bowtie2-build ${genome_dir}/fasta/spikein_small.fa ${genome_dir}/index/bowtie2/spikein_small