Targeted Bisulfite Sequencing for Biomarker Discovery
The code to produce the following graphs is contained in the QC jupyter notebook. The goal of the QC analysis is to
assess alignment performance by looking at general alignment statistics, the number of reads mapping to regions
targeted by the probes, and the observed duplication rate.
We can start assessing alignment quality by looking at the output logs generated by the alignment tool
BSBolt. The alignment log give information on the total number of read pairs
observed, the bisulfite strand where reads mapped, and the number of unmapped reads / bisulfite ambiguous alignments. The
mappability is calculated as $mappability = 2 * total read pairs - unmapped reads$.
Using bedtools multicov we will investigate the average number of reads that map to the targeted regions. The coverage
shown is plotted for all mapped reads and mapped reads with duplicates removed.
Using samtools flagstat we can investigate the alignment file by looking at the sam flags set for each read. Sam flags are bitwise combination of different alignment attributes. Coverage information taken from the bedtools multicov has been combined
with the flagstat output to provide additional details.