|
a |
|
b/vignettes/BloodCancerMultiOmics2017-dataOverview.Rmd |
|
|
1 |
--- |
|
|
2 |
title: "BloodCancerMultiOmics2017 - data overview" |
|
|
3 |
author: "Małgorzata Oleś" |
|
|
4 |
output: |
|
|
5 |
BiocStyle::html_document: |
|
|
6 |
toc_float: true |
|
|
7 |
vignette: > |
|
|
8 |
%\VignetteIndexEntry{BloodCancerMultiOmics2017 - data overview} |
|
|
9 |
%\VignetteEngine{knitr::rmarkdown} |
|
|
10 |
%\VignetteEncoding{UTF-8} |
|
|
11 |
--- |
|
|
12 |
|
|
|
13 |
# Prerequisites |
|
|
14 |
|
|
|
15 |
```{r loadlib, message=FALSE} |
|
|
16 |
library("BloodCancerMultiOmics2017") |
|
|
17 |
# additional |
|
|
18 |
library("Biobase") |
|
|
19 |
library("SummarizedExperiment") |
|
|
20 |
library("DESeq2") |
|
|
21 |
library("reshape2") |
|
|
22 |
library("ggplot2") |
|
|
23 |
library("dplyr") |
|
|
24 |
library("BiocStyle") |
|
|
25 |
``` |
|
|
26 |
|
|
|
27 |
|
|
|
28 |
# Introduction |
|
|
29 |
|
|
|
30 |
Primary tumor samples from blood cancer patients underwent functional and molecular characterization. `r Biocpkg("BloodCancerMultiOmics2017")` includes the resulting preprocessed data. A quick overview of the available data is provided below. For the details on experimental settings please refer to: |
|
|
31 |
|
|
|
32 |
S Dietrich\*, M Oleś\*, J Lu\* et al. *Drug-perturbation-based stratification of blood cancer* |
|
|
33 |
<br> |
|
|
34 |
*J. Clin. Invest.* (2018); 128(1):427–445. doi:10.1172/JCI93801. |
|
|
35 |
|
|
|
36 |
\* equal contribution |
|
|
37 |
|
|
|
38 |
|
|
|
39 |
# Data overview |
|
|
40 |
|
|
|
41 |
Load all of the available data. |
|
|
42 |
```{r} |
|
|
43 |
data("conctab", "drpar", "lpdAll", "patmeta", "day23rep", "drugs", |
|
|
44 |
"methData", "validateExp", "dds", "exprTreat", "mutCOM", |
|
|
45 |
"cytokineViab") |
|
|
46 |
``` |
|
|
47 |
|
|
|
48 |
The data sets are objects of different classes (`data.frame`, `ExpressionSet`, `NChannelSet`, `RangedSummarizedExperiment`, `DESeqDataSet`), and include data for either all studied patient samples or only a subset of these. The overview below shortly describes and summarizes the data available. Please note that the presence of a given patient sample ID within the data set doesn't necessarily mean that the data is available for this sample (the slot could be filled with NAs). |
|
|
49 |
|
|
|
50 |
Patient samples per data set. |
|
|
51 |
```{r numberOfSamples} |
|
|
52 |
samplesPerData = list( |
|
|
53 |
drpar = colnames(drpar), |
|
|
54 |
lpdAll = colnames(lpdAll), |
|
|
55 |
day23rep = colnames(day23rep), |
|
|
56 |
methData = colnames(methData), |
|
|
57 |
patmeta = rownames(patmeta), |
|
|
58 |
validateExp = unique(validateExp$patientID), |
|
|
59 |
dds = colData(dds)$PatID, |
|
|
60 |
exprTreat = unique(pData(exprTreat)$PatientID), |
|
|
61 |
mutCOM = rownames(mutCOM), |
|
|
62 |
cytokineViab = unique(cytokineViab$Patient) |
|
|
63 |
) |
|
|
64 |
``` |
|
|
65 |
|
|
|
66 |
List of all samples present in data sets. |
|
|
67 |
```{r} |
|
|
68 |
(samples = sort(unique(unlist(samplesPerData)))) |
|
|
69 |
``` |
|
|
70 |
|
|
|
71 |
Total number of samples. |
|
|
72 |
```{r} |
|
|
73 |
length(samples) |
|
|
74 |
``` |
|
|
75 |
|
|
|
76 |
A plot summarizing the presence of a given patient sample within each data set. |
|
|
77 |
```{r sampleOverlap, fig.height=4, fig.width=8, echo=FALSE} |
|
|
78 |
plotTab = melt(samplesPerData, value.name="PatientID") |
|
|
79 |
plotTab$L1 = factor(plotTab$L1, levels=c("patmeta", |
|
|
80 |
"mutCOM", |
|
|
81 |
"lpdAll", |
|
|
82 |
"methData", |
|
|
83 |
"exprTreat", |
|
|
84 |
"dds", |
|
|
85 |
"cytokineViab", |
|
|
86 |
"day23rep", |
|
|
87 |
"validateExp", |
|
|
88 |
"drpar")) |
|
|
89 |
|
|
|
90 |
# order of the samples in the plot |
|
|
91 |
tmp = do.call(cbind, lapply(samplesPerData[c("drpar", |
|
|
92 |
"validateExp", |
|
|
93 |
"day23rep", |
|
|
94 |
"dds", |
|
|
95 |
"exprTreat", |
|
|
96 |
"methData", |
|
|
97 |
"cytokineViab")], |
|
|
98 |
function(x) { |
|
|
99 |
samples %in% x |
|
|
100 |
})) |
|
|
101 |
|
|
|
102 |
rownames(tmp) = samples |
|
|
103 |
ord = order(tmp[,1], tmp[,2], tmp[,3], tmp[,4], tmp[,5], tmp[,6], tmp[,7], |
|
|
104 |
decreasing=TRUE) |
|
|
105 |
ordSamples = rownames(tmp)[ord] |
|
|
106 |
plotTab$PatientID = factor(plotTab$PatientID, levels=ordSamples) |
|
|
107 |
|
|
|
108 |
ggplot(plotTab, aes(x=PatientID, y=L1)) + geom_tile(fill="lightseagreen") + |
|
|
109 |
scale_y_discrete(expand=c(0,0)) + |
|
|
110 |
ylab("Data objects") + |
|
|
111 |
xlab("Patient samples") + |
|
|
112 |
geom_vline(xintercept=seq(10, length(samples),10), color="grey") + |
|
|
113 |
geom_hline(yintercept=seq(0.5, length(levels(plotTab$L1)), 1), |
|
|
114 |
color="dimgrey") + |
|
|
115 |
theme(panel.grid=element_blank(), |
|
|
116 |
text=element_text(size=18), |
|
|
117 |
axis.text.x=element_blank(), |
|
|
118 |
axis.ticks.x=element_blank(), |
|
|
119 |
panel.background=element_rect(color="gainsboro")) |
|
|
120 |
``` |
|
|
121 |
|
|
|
122 |
The classification below stratifies data sets according to different types of experiments performed and included. Please refer to the manual for a more detailed information on the content of these data objects. |
|
|
123 |
|
|
|
124 |
|
|
|
125 |
## Patient metadata |
|
|
126 |
|
|
|
127 |
Patient metadata is provided in the `patmeta` object. |
|
|
128 |
```{r} |
|
|
129 |
# Number of patients per disease |
|
|
130 |
sort(table(patmeta$Diagnosis), decreasing=TRUE) |
|
|
131 |
|
|
|
132 |
# Number of samples from pretreated patients |
|
|
133 |
table(!patmeta$IC50beforeTreatment) |
|
|
134 |
|
|
|
135 |
# IGHV status of CLL patients |
|
|
136 |
table(patmeta[patmeta$Diagnosis=="CLL", "IGHV"]) |
|
|
137 |
``` |
|
|
138 |
|
|
|
139 |
|
|
|
140 |
## High-throughput drug screen data |
|
|
141 |
|
|
|
142 |
The viability measurements from the high-throughput drug screen are included in the `drpar` object. The metadata about the drugs and drug concentrations used can be found in `drugs` and `conctab` objects, respectively. |
|
|
143 |
|
|
|
144 |
The `drpar` object includes multiple channels, each of which consists of cells' viability data for a single drug concentration step. Channels `viaraw.1_5` and `viaraw.4_5` contain the mean viability score between multiple concentration steps as indicated at the end of the channel name. |
|
|
145 |
|
|
|
146 |
```{r} |
|
|
147 |
channelNames(drpar) |
|
|
148 |
|
|
|
149 |
# show viability data for the first 5 patients and 7 drugs in their lowest conc. |
|
|
150 |
assayData(drpar)[["viaraw.1"]][1:7,1:5] |
|
|
151 |
``` |
|
|
152 |
|
|
|
153 |
Drug metadata. |
|
|
154 |
```{r} |
|
|
155 |
# number of drugs |
|
|
156 |
nrow(drugs) |
|
|
157 |
|
|
|
158 |
# type of information included in the object |
|
|
159 |
colnames(drugs) |
|
|
160 |
``` |
|
|
161 |
|
|
|
162 |
Drug concentration steps (c1 - lowest, c5 - highest). |
|
|
163 |
```{r} |
|
|
164 |
head(conctab) |
|
|
165 |
``` |
|
|
166 |
|
|
|
167 |
The reproducibility of the screening platform was assessed by screening `r unname(ncol(day23rep))` patient samples in two replicates. The viability measurements are available for two time points: 48 h and 72 h after adding the drug. The screen was performed for `r length(unique(fData(day23rep)$DrugID))` drugs in 1-2 different drug concentrations (`r table(table(fData(day23rep)$DrugID))["1"]` in 1 and `r table(table(fData(day23rep)$DrugID))["2"]` in 2 drug concentrations). This data is provided in `day23rep`. |
|
|
168 |
```{r} |
|
|
169 |
channelNames(day23rep) |
|
|
170 |
|
|
|
171 |
# show viability data for 48 h time point for all patients marked as |
|
|
172 |
# replicate 1 and 3 first drugs in all their conc. |
|
|
173 |
drugs2Show = unique(fData(day23rep)$DrugID)[1:3] |
|
|
174 |
assayData(day23rep)[["day2rep1"]][fData(day23rep)$DrugID %in% drugs2Show,] |
|
|
175 |
``` |
|
|
176 |
|
|
|
177 |
The follow-up drug screen, which confirmed the targets and the signaling pathway dependence of the patient samples was performed for `r length(unique(validateExp$patientID))` samples and the following drugs: `r paste(unique(validateExp$Drug), collapse=", ")`. |
|
|
178 |
|
|
|
179 |
| Drug name | Target | |
|
|
180 |
|-------------|--------| |
|
|
181 |
| Cobimetinib | MEK | |
|
|
182 |
| Trametinib | MEK | |
|
|
183 |
| SCH772984 | ERK1/2 | |
|
|
184 |
| Ganetespib | Hsp90 | |
|
|
185 |
| Onalespib | Hsp90 | |
|
|
186 |
|
|
|
187 |
The data is included in the `validateExp` object. |
|
|
188 |
```{r} |
|
|
189 |
head(validateExp) |
|
|
190 |
``` |
|
|
191 |
|
|
|
192 |
Moreover, we also performed a small drug screen in order to check the influence of the different cytokines/chemokines on the viability of the samples. These data are included in `cytokineViab` object. |
|
|
193 |
|
|
|
194 |
```{r} |
|
|
195 |
head(cytokineViab) |
|
|
196 |
``` |
|
|
197 |
|
|
|
198 |
|
|
|
199 |
## Gene mutation data |
|
|
200 |
|
|
|
201 |
The `mutCOM` object contains information on the presence of gene mutations in the studied patient samples. |
|
|
202 |
```{r} |
|
|
203 |
# there is only one channel with the binary type of data for each gene |
|
|
204 |
channelNames(mutCOM) |
|
|
205 |
|
|
|
206 |
# the feature data includes detailed information about mutations in |
|
|
207 |
# TP53 and BRAF genes, as well as clone size of |
|
|
208 |
#del17p13, KRAS, UMODL1, CREBBP, PRPF8, trisomy12 mutations |
|
|
209 |
colnames(fData(mutCOM)) |
|
|
210 |
``` |
|
|
211 |
|
|
|
212 |
|
|
|
213 |
## Gene expression data |
|
|
214 |
|
|
|
215 |
RNA-Seq data preprocessed with `r Biocpkg("DESeq2")` is provided in the `dds` object. |
|
|
216 |
|
|
|
217 |
```{r} |
|
|
218 |
# show count data for the first 5 patients and 7 genes |
|
|
219 |
assay(dds)[1:7,1:5] |
|
|
220 |
|
|
|
221 |
# show the above with patient sample ids |
|
|
222 |
assay(dds)[1:7,1:5] %>% `colnames<-` (colData(dds)$PatID[1:5]) |
|
|
223 |
|
|
|
224 |
# number of genes and patient samples |
|
|
225 |
nrow(dds); ncol(dds) |
|
|
226 |
``` |
|
|
227 |
|
|
|
228 |
Additionally, `r length(unique(pData(exprTreat)$PatientID))` patient samples underwent gene expression profiling using Illumina microarrays before and 12 h after treatment with `r tmp=unique(pData(exprTreat)$DrugID); length(tmp[!is.na(tmp)])` drugs. These data are included in the `exprTreat` data object. |
|
|
229 |
```{r} |
|
|
230 |
# patient samples included in the data set |
|
|
231 |
(p = unique(pData(exprTreat)$PatientID)) |
|
|
232 |
|
|
|
233 |
# type of metadata included for each gene |
|
|
234 |
colnames(fData(exprTreat)) |
|
|
235 |
|
|
|
236 |
# show expression level for the first patient and 3 first probes |
|
|
237 |
Biobase::exprs(exprTreat)[1:3, pData(exprTreat)$PatientID==p[1]] |
|
|
238 |
``` |
|
|
239 |
|
|
|
240 |
|
|
|
241 |
## DNA methylation data |
|
|
242 |
|
|
|
243 |
DNA methylation included in `methData` object contains data for `r ncol(methData)` patient samples and 5000 of the most variable CpG sites. |
|
|
244 |
|
|
|
245 |
```{r} |
|
|
246 |
# show the methylation for the first 7 CpGs and the first 5 patient samples |
|
|
247 |
assay(methData)[1:7,1:5] |
|
|
248 |
|
|
|
249 |
# type of metadata included for CpGs |
|
|
250 |
colnames(rowData(methData)) |
|
|
251 |
|
|
|
252 |
# number of patient samples screened with the given platform type |
|
|
253 |
table(colData(methData)$platform) |
|
|
254 |
``` |
|
|
255 |
|
|
|
256 |
|
|
|
257 |
## Other |
|
|
258 |
|
|
|
259 |
Object `lpdAll` is a convenient assembly of data contained in the other data objects mentioned earlier in this vignette. For details, please refer to the manual. |
|
|
260 |
|
|
|
261 |
```{r} |
|
|
262 |
# number of rows in the dataset for each type of data |
|
|
263 |
table(fData(lpdAll)$type) |
|
|
264 |
|
|
|
265 |
# show viability data for drug ibrutinib, idelalisib and dasatinib |
|
|
266 |
# (in the mean of the two lowest concentration steps) and |
|
|
267 |
# the first 5 patient samples |
|
|
268 |
Biobase::exprs(lpdAll)[which( |
|
|
269 |
with(fData(lpdAll), |
|
|
270 |
name %in% c("ibrutinib", "idelalisib", "dasatinib") & |
|
|
271 |
subtype=="4:5")), 1:5] |
|
|
272 |
``` |
|
|
273 |
|
|
|
274 |
|
|
|
275 |
# Original data |
|
|
276 |
|
|
|
277 |
The raw data from the whole exome sequencing, RNA-seq and DNA methylation arrays is stored in the European Genome-Phenome Archive (EGA) under accession number EGAS0000100174. |
|
|
278 |
|
|
|
279 |
The preprocesed DNA methylation data, which include complete list of CpG sites (not only the 5000 with the highest variance) can be accessed through Bioconductor ExperimentHub platform. |
|
|
280 |
|
|
|
281 |
```{r eval=FALSE} |
|
|
282 |
library("ExperimentHub") |
|
|
283 |
|
|
|
284 |
eh = ExperimentHub() |
|
|
285 |
obj = query(eh, "CLLmethylation") |
|
|
286 |
meth = obj[["EH1071"]] # extract the methylation data |
|
|
287 |
``` |
|
|
288 |
|
|
|
289 |
|
|
|
290 |
# Session info |
|
|
291 |
|
|
|
292 |
```{r} |
|
|
293 |
sessionInfo() |
|
|
294 |
``` |