a b/README.md
1
# ImmunogenomicLandscape-BloodCancers
2
3
## Description:
4
Scripts related to results presented in Dufva and Pölönen et al. Immunogenomic landscape of hematological malignancies.
5
6
## Note:
7
8
If you use the data, analysis, results, Please cite:
9
- Dufva Pölönen et al. https://doi.org/10.1016/j.ccell.2020.06.002
10
11
- Data DOI: 10.7303/syn21991014
12
13
If you use Hemap data, cite also:
14
- Hemap: An interactive online resource for characterizing molecular phenotypes across hematologic malignancies
15
Petri Pölönen, Juha Mehtonen, Jake Lin, Thomas Liuksiala, Sergei Häyrynen, Susanna Teppo, Artturi Mäkinen, Ashwini Kumar, Disha Malani, Virva Pohjolainen, Kimmo Porkka, Caroline A. Heckman, Patrick May, Ville Hautamäki, Kirsi J. Granberg, Olli Lohi, Matti Nykter and Merja Heinäniemi Cancer Res April 2 2019 DOI: 10.1158/0008-5472.CAN-18-2970
16
17
If you use other publicly available data sets that were analyzed here (TCGA AML/DLBCL, Chapuy et al., Tyner et al., etc), please also cite the original research.
18
19
If you are only interested in the data sets analyzed here, check **datasets_synapseID.txt** for synapse accession codes.
20
21
### To reproduce parts of the results:
22
- Get synapse credentials https://www.synapse.org
23
- Access synapse project syn21991014
24
- Download project data:
25
    - Input files individually (see scripts for filenames and download from https://www.synapse.org/#!Synapse:syn21991014/files/) (Recommended) 
26
    - Programmatic access (synapse, check synapseID_Filename.txt for accession codes):
27
        ```
28
        pip install synapseclient
29
        synapse get synapseID
30
        ```
31
    - Synapse bulk (70 gb):
32
        ```
33
        pip install synapseclient
34
        synapse get syn21991014 -r
35
        ```
36
37
- clone the git project:
38
        ```
39
        git clone https://github.com/systemsgenomics/ImmunogenomicLandscape-BloodCancers.git
40
        ```
41
42
- Install the required R packages
43
44
- Run the analysis:
45
    - **set working directory** to the folder with the data. setwd("path/data")
46
    - **modify GIT_HOME** variable in R script (folder where the git folder is cloned). GIT_HOME typically points to **common_scripts** that contains various statistical and visualisation tools that were used in the analysis.
47
48
49
## Scripts to reproduce the analysis, plots and tables
50
51
The processed and intermediate files for these scripts can be downloaded to reproduce the analysis. Inputs for these scripts have been generated using the scripts under preprocessing.
52
53
#### Figure 1, Figure S1:
54
```
55
Fig1C_AML_cytscore_flow_RNAseq_comparison.R (Fig1C)
56
Fig1_plots.R (Figure 1F-H)
57
Statistical_analysis_Cytolytic_Score_development.R (FigureS1B, G-I)
58
```
59
#### Figure 2, Figure S2 and Table S2:
60
```
61
Fig2_microenvironment_analysis.R (TableS2, Fig2A-B, FigS2C)
62
FigS2AB_microenvironment_analysis_GSEA_GSVA.R (FigS2A-B)
63
FigS2DE_microenvironment_validation_CLL_AML.R (FigS2D-E)
64
```
65
#### Figure 3, Figure S3 and Table S3:
66
```
67
Fig3A_DLBCL_cytscore_oncoprint.R (Fig3A)
68
Fig3BC_DLBCL_cytscore_boxplots.R (Fig3B-C)
69
Fig3D_TCGA_AML_cytscore_oncoprint.R (Fig3D)
70
Fig3_FigS3_MDSsignature.R (Fig3E, FigS3D)
71
Fig3_plots_scRNA.R  (Fig3F-H and K, FigS3K and O)
72
Statistical_analysis_DE_analysis_MDSlike.R (TableS3 tab)
73
Statistical_analysis_scRNA_MDSlike_analysis.R  (TableS3 tabs)
74
Statistical_analysis_Szabo_TCell_analysis.R (Fig3I, FigS3M)
75
Statistical_analysis_Yang_NKCell_analysis.R (Fig3J, FigS3N)
76
FigS3L.R (TableS3 tab, FigS3L)
77
```
78
#### Figure 4, Figure S4 and Table S4:
79
```
80
Statistical_analysis_HLAII_Score_development.R (FigS4A-B)
81
Fig4D_TCGA_AML_complexheatmap_CIITA.R (Fig4D)
82
Statistical_analysis_FIMM_AML_RRBS.R (Fig4G and H, FigS4L)
83
FigS4C_AML_HLAIIscore_flow_RNAseq_comparison.R (FigS4C)
84
FigS4D_TCGA_AML_global_hypermethylation.R (FigS4D)
85
FigS4EF_CIITA_methylation_validation_ERRBS.R (FigS4E-F)
86
FigS4K_CCLE_CIITA_methylation.R (FigS4K)
87
FigS4J_CIITA_methylation_validation_GSE49031.R (FigS4J)
88
```
89
#### Figure 5, Figure S5 and Table S5:
90
```
91
Fig5A_ligands_heatmap.R (Fig5A)
92
Fig5_DE_analysis_costim.R  (Fig5B, FigS5B, Table S5 tabs)
93
Fig5C_TCGA_AML_ligand_correlation_volcanoplot.R (Fig5C)
94
Fig5D_S5G_DLBCL_GSE98588_ligand_boxplots.R (Fig5D, FigS5G)
95
Fig5E_PDL1_IHC_boxplot.R (Fig5E)
96
Fig5F_VISTA_IHC_boxplot.R (Fig5F)
97
Fig5G_CD70_CRISPR_T cell_stimulation.R (Fig5G)
98
FigS5C.R (FigS5C)
99
FigS5DE_TCGA_ligand_methylation_AML_DLBCL_comparison.R (FigS5D-E)
100
```
101
#### Figure 6, Figure S6 and Table S6:
102
```
103
Fig6B_S6F_CGA_tSNEplot_hemap.R (Fig6B, FigS6F)
104
Fig6C_S6B_CGA_Hemap.R (Fig6C, FigS6B)
105
Fig6D_CCLE_CGA_heatmap.R (Fig6D)
106
Fig6F_CoMMpass_CGA_boxplot.R (Fig6F)
107
Fig6G_CoMMpass_CGA_oncoprint.R (Fig6G)
108
Fig6H_DLBCL_GSE98588_CGA_oncoprint.R (Fig6H)
109
Fig6H_FigS6H_CGA_heatmap_GSE98588.R (Fig6H, FigS6H, FigS6I)
110
FigS6G_CGA_GSEA_hemapMM.R (FigS6G)
111
FigS6A_GTEx_CGA_heatmap.R (FigS6A)
112
FigS6C_Hemap_CGA_dotplots.R (FigS6C)
113
FigS6D_TCGA_antigen_methylation.R (FigS6D)
114
FigS6E_CoMMpass_CGA_heatmap.R (FigS6E)
115
Statistical_analysis_CGA_discovery_Hemap.R
116
```
117
#### Figure 7, Figure S7 and Table S7:
118
```
119
Fig7_Univariate_Coxph_survival.R (TableS7)
120
Fig7_univariate_survival_forestplot.R (Fig7A, Fig7B-D, FigS7A-B)
121
Fig7_multivariable_regression_eNet_survival.R (Fig7E-F, FigS7C-G)
122
```
123
#### Statistical association analysis, TableS1-6
124
```
125
Statistical_analysis_correlations_BeatAML.R (TableS3-5)
126
Statistical_analysis_correlations_CoMMpass.R (TableS4-6)
127
Statistical_analysis_correlations_GSE98588_DLBCL.R (TableS3-6)
128
Statistical_analysis_correlations_Reddy.R (TableS3-5)
129
Statistical_analysis_correlations_TCGA_AML.R (TableS3-5)
130
Statistical_analysis_correlations_TCGA_DLBCL.R
131
```
132
## Scripts related to data preprocessing (under folder preprocessing)
133
These scripts are for reference only. Raw/processed/input data would have to be downloaded and processed for these scripts. Check the publication for data accession codes.
134
135
#### Data Preprocessing
136
```
137
Preprocessing_CIBERSORT_MCPcounter.R
138
Preprocessing_normalize_hguarray_GSE98588.R
139
```
140
#### Generating featurematrix for each cohort
141
```
142
Preprocessing_Hemap_featurematrix_generation.R (TableS1)
143
Preprocessing_REDDY_DLBCL_featurematrix_generation.R
144
Preprocessing_TCGA_AML_featurematrix_generation.R
145
Preprocessing_TCGA_DLBCL_featurematrix_generation.R
146
Preprocessing_coMMpass_featurematrix_generation.R
147
Preprocessing_GSE98588_DLBCL_featurematrix_generation.R
148
Preprocessing_PanALL.R
149
```
150
#### Subtype analysis for each cohort
151
```
152
Preprocessing_MM_subtyping.R
153
Preprocessing_DLBCL_subtyping.R
154
Preprocessing_ALL_subtyping.R
155
Preprocessing_AML_subtyping.R
156
```
157
#### Methylation data processing
158
```
159
Preprocessing_TCGA_AML_add_meth_probes.R
160
Preprocessing_TCGA_meth_data_genelist.R
161
Preprocessing_FIMM_AML_RRBS.R
162
Preprocessing_AML_RRBS_meth_de_analysis.R
163
```
164
#### scRNA data preprocessing for statistical analysis
165
```
166
Preprocessing_scRNA_CLL_GSE111014.R
167
Preprocessing_scRNA_FIMM_AML.R
168
Preprocessing_scRNA_Galen_AML.R
169
Preprocessing_scRNA_HCA.R
170
Preprocessing_scRNA_PB_Citeseq.R
171
Preprocessing_scRNA_Szabo_Tcells_dataprocessing.R
172
Preprocessing_scRNA_Yang_NK_dataprocessing.R
173
Preprocessing_scRNA_integrate_Tcells_NKcells.R
174
```
175
176
177
## SessionInfo (R)
178
```
179
R version 3.6.0 (2019-04-26)
180
Platform: x86_64-redhat-linux-gnu (64-bit)
181
Running under: CentOS Linux 7 (Core)
182
183
Matrix products: default
184
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
185
186
locale:
187
 [1] LC_CTYPE=C                 LC_NUMERIC=C
188
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
189
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
190
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
191
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
192
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
193
194
attached base packages:
195
 [1] grid      parallel  stats4    stats     graphics  grDevices utils
196
 [8] datasets  methods   base
197
198
other attached packages:
199
 [1] tidyr_1.0.0
200
 [2] dplyr_0.8.3
201
 [3] cowplot_1.0.0
202
 [4] tibble_2.1.3
203
 [5] stringr_1.4.0
204
 [6] survminer_0.4.6
205
 [7] survcomp_1.34.0
206
 [8] prodlim_2018.04.18
207
 [9] Seurat_3.1.1.9021
208
[10] RnBeads_2.2.0
209
[11] plyr_1.8.4
210
[12] methylumi_2.30.0
211
[13] minfi_1.30.0
212
[14] bumphunter_1.26.0
213
[15] locfit_1.5-9.1
214
[16] iterators_1.0.12
215
[17] foreach_1.4.7
216
[18] Biostrings_2.52.0
217
[19] XVector_0.24.0
218
[20] SummarizedExperiment_1.14.1
219
[21] DelayedArray_0.10.0
220
[22] BiocParallel_1.18.1
221
[23] FDb.InfiniumMethylation.hg19_2.2.0
222
[24] org.Hs.eg.db_3.8.2
223
[25] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
224
[26] GenomicFeatures_1.36.4
225
[27] reshape2_1.4.3
226
[28] scales_1.0.0
227
[29] illuminaio_0.26.0
228
[30] matrixStats_0.54.0
229
[31] gridExtra_2.3
230
[32] gplots_3.0.1.1
231
[33] fields_10.0
232
[34] maps_3.3.0
233
[35] spam_2.4-0
234
[36] dotCall64_1.0-0
235
[37] ff_2.2-14
236
[38] bit_1.1-14
237
[39] cluster_2.1.0
238
[40] MASS_7.3-51.4
239
[41] GenomicRanges_1.36.0
240
[42] GenomeInfoDb_1.20.0
241
[43] RColorBrewer_1.1-2
242
[44] openxlsx_4.1.4
243
[45] multipanelfigure_2.0.2
244
[46] mclust_5.4.5
245
[47] Matrix_1.2-17
246
[48] Hmisc_4.2-0
247
[49] Formula_1.2-3
248
[50] survival_2.44-1.1
249
[51] lattice_0.20-38
250
[52] GSVA_1.32.0
251
[53] ggridges_0.5.1
252
[54] ggrastr_0.1.7
253
[55] ggpubr_0.2.3
254
[56] future_1.14.0
255
[57] forestplot_1.9
256
[58] checkmate_1.9.4
257
[59] magrittr_1.5
258
[60] EnhancedVolcano_1.3.5
259
[61] ggrepel_0.8.1
260
[62] ggplot2_3.2.1
261
[63] edgeR_3.26.8
262
[64] limma_3.40.6
263
[65] data.table_1.12.4
264
[66] ComplexHeatmap_2.0.0
265
[67] circlize_0.4.7
266
[68] caTools_1.17.1.2
267
[69] AnnotationDbi_1.46.1
268
[70] IRanges_2.18.2
269
[71] S4Vectors_0.22.0
270
[72] Biobase_2.44.0
271
[73] BiocGenerics_0.30.0
272
[74] methylSig_0.1
273
[75] CePa_0.7.0
274
[76] viridis_0.5.1
275
```