Switch to unified view

a/README.md b/README.md
1
# MethodsTBS
1
# MethodsTBS
2
2
3
Targeted Bisulfite Sequencing for Biomarker Discovery
3
Targeted Bisulfite Sequencing for Biomarker Discovery
4
4
5
## Processing Scripts
5
## Processing Scripts
6
6
7
- Scripts for processing targeted bisuflite sequencing data written for use with SGE cluster environment
7
- Scripts for processing targeted bisuflite sequencing data written for use with SGE cluster environment
8
  - Adapter Trimming
8
  - Adapter Trimming
9
  - Indexing
9
  - Indexing
10
  - Read alignment
10
  - Read alignment
11
  - Methylation Calling
11
  - Methylation Calling
12
12
13
## Analysis Notebooks
13
## Analysis Notebooks
14
14
15
- Jupyter notebook describing
15
- Jupyter notebook describing
16
  - Quality Control Pipeline
16
  - Quality Control Pipeline
17
  - Fitting Epigenetic Clock Using Targeted Bisulfite Sequencing Data
17
  - Fitting Epigenetic Clock Using Targeted Bisulfite Sequencing Data
18
18
19
## Targeted Bisulfite Sequencing QC
19
## Targeted Bisulfite Sequencing QC
20
20
21
The code to produce the following graphs is contained in the QC jupyter notebook. The goal of the QC analysis is to
21
The code to produce the following graphs is contained in the QC jupyter notebook. The goal of the QC analysis is to
22
assess alignment performance by looking at general alignment statistics, the number of reads mapping to regions
22
assess alignment performance by looking at general alignment statistics, the number of reads mapping to regions
23
targeted by the probes, and the observed duplication rate.
23
targeted by the probes, and the observed duplication rate.
24
24
25
### Alignment QC
25
### Alignment QC
26
26
27
We can start assessing alignment quality by looking at the output logs generated by the alignment tool
27
We can start assessing alignment quality by looking at the output logs generated by the alignment tool
28
[BSBolt](https://bsbolt.readthedocs.io/en/latest/). The alignment log give information on the total number of read pairs
28
[BSBolt](https://bsbolt.readthedocs.io/en/latest/). The alignment log give information on the total number of read pairs
29
observed, the bisulfite strand where reads mapped, and the number of unmapped reads / bisulfite ambiguous alignments. The
29
observed, the bisulfite strand where reads mapped, and the number of unmapped reads / bisulfite ambiguous alignments. The
30
mappability is calculated as $mappability = 2 * total read pairs - unmapped reads$.
30
mappability is calculated as $mappability = 2 * total read pairs - unmapped reads$.
31
31
32
![png](imgs/bsb_alignment_log_stats.png)
33
34
Using *bedtools multicov* we will investigate the average number of reads that map to the targeted regions. The coverage
32
Using *bedtools multicov* we will investigate the average number of reads that map to the targeted regions. The coverage
35
shown is plotted for all mapped reads and mapped reads with duplicates removed.
33
shown is plotted for all mapped reads and mapped reads with duplicates removed.
36
34
37
![png](imgs/bedtools_multicov.png)
38
39
Using *samtools flagstat* we can investigate the alignment file by looking at the sam flags set for each read. Sam flags are bitwise combination of different alignment attributes. Coverage information taken from the *bedtools multicov* has been combined
35
Using *samtools flagstat* we can investigate the alignment file by looking at the sam flags set for each read. Sam flags are bitwise combination of different alignment attributes. Coverage information taken from the *bedtools multicov* has been combined
40
with the *flagstat* output to provide additional details.
36
with the *flagstat* output to provide additional details.
41
42
![png](imgs/sam_flagstat.png)