|
a/README.md |
|
b/README.md |
1 |
# MethodsTBS |
1 |
# MethodsTBS |
2 |
|
2 |
|
3 |
Targeted Bisulfite Sequencing for Biomarker Discovery |
3 |
Targeted Bisulfite Sequencing for Biomarker Discovery |
4 |
|
4 |
|
5 |
## Processing Scripts |
5 |
## Processing Scripts |
6 |
|
6 |
|
7 |
- Scripts for processing targeted bisuflite sequencing data written for use with SGE cluster environment |
7 |
- Scripts for processing targeted bisuflite sequencing data written for use with SGE cluster environment
|
8 |
- Adapter Trimming |
8 |
- Adapter Trimming
|
9 |
- Indexing |
9 |
- Indexing
|
10 |
- Read alignment |
10 |
- Read alignment
|
11 |
- Methylation Calling |
11 |
- Methylation Calling |
12 |
|
12 |
|
13 |
## Analysis Notebooks |
13 |
## Analysis Notebooks |
14 |
|
14 |
|
15 |
- Jupyter notebook describing |
15 |
- Jupyter notebook describing
|
16 |
- Quality Control Pipeline |
16 |
- Quality Control Pipeline
|
17 |
- Fitting Epigenetic Clock Using Targeted Bisulfite Sequencing Data |
17 |
- Fitting Epigenetic Clock Using Targeted Bisulfite Sequencing Data |
18 |
|
18 |
|
19 |
## Targeted Bisulfite Sequencing QC |
19 |
## Targeted Bisulfite Sequencing QC |
20 |
|
20 |
|
21 |
The code to produce the following graphs is contained in the QC jupyter notebook. The goal of the QC analysis is to |
21 |
The code to produce the following graphs is contained in the QC jupyter notebook. The goal of the QC analysis is to
|
22 |
assess alignment performance by looking at general alignment statistics, the number of reads mapping to regions |
22 |
assess alignment performance by looking at general alignment statistics, the number of reads mapping to regions
|
23 |
targeted by the probes, and the observed duplication rate. |
23 |
targeted by the probes, and the observed duplication rate. |
24 |
|
24 |
|
25 |
### Alignment QC |
25 |
### Alignment QC |
26 |
|
26 |
|
27 |
We can start assessing alignment quality by looking at the output logs generated by the alignment tool |
27 |
We can start assessing alignment quality by looking at the output logs generated by the alignment tool
|
28 |
[BSBolt](https://bsbolt.readthedocs.io/en/latest/). The alignment log give information on the total number of read pairs |
28 |
[BSBolt](https://bsbolt.readthedocs.io/en/latest/). The alignment log give information on the total number of read pairs
|
29 |
observed, the bisulfite strand where reads mapped, and the number of unmapped reads / bisulfite ambiguous alignments. The |
29 |
observed, the bisulfite strand where reads mapped, and the number of unmapped reads / bisulfite ambiguous alignments. The
|
30 |
mappability is calculated as $mappability = 2 * total read pairs - unmapped reads$. |
30 |
mappability is calculated as $mappability = 2 * total read pairs - unmapped reads$. |
31 |
|
31 |
|
32 |
 |
|
|
33 |
|
|
|
34 |
Using *bedtools multicov* we will investigate the average number of reads that map to the targeted regions. The coverage |
32 |
Using *bedtools multicov* we will investigate the average number of reads that map to the targeted regions. The coverage
|
35 |
shown is plotted for all mapped reads and mapped reads with duplicates removed. |
33 |
shown is plotted for all mapped reads and mapped reads with duplicates removed. |
36 |
|
34 |
|
37 |
 |
|
|
38 |
|
|
|
39 |
Using *samtools flagstat* we can investigate the alignment file by looking at the sam flags set for each read. Sam flags are bitwise combination of different alignment attributes. Coverage information taken from the *bedtools multicov* has been combined |
35 |
Using *samtools flagstat* we can investigate the alignment file by looking at the sam flags set for each read. Sam flags are bitwise combination of different alignment attributes. Coverage information taken from the *bedtools multicov* has been combined
|
40 |
with the *flagstat* output to provide additional details. |
36 |
with the *flagstat* output to provide additional details.
|
41 |
|
|
|
42 |
 |
|
|