Processed data can be downloaded from our Google Drive.
The data directory structure for TCGA-GBM + TCGA-LGG validation is listed below.
- all_datasets.csv: Contains survival time, censor status, IDH mutation status, and CNV data for 769 TCGA IDs.
- grade_data.csv: Contains age, gender, histologic grade and subtype data for 769 TCGA IDs.
- mRNA_Expression_z-Scores_RNA_Seq_RSEM.txt: Contains mRNAseq data for the TCGA-GBM project (obtained from the top differentially expressed genes from cBioPortal).
- mRNA_Expression_Zscores_RSEM.txt: Contains mRNAseq data for the TCGA-LGG project (obtained from the top differentially expressed genes from cBioPortal).
- pnas_splits.csv: Splits from Mobadersany et al. used for 15-fold cross-validation.
- all_st: 1505 1024 X 1024 histology ROIs for the 769 TCGA IDs (Stain Normalized) used for training Histology CNN
- all_st_cpc_img/pt_bi/: Graph features for the 1505 histology ROIs used for training Histology GCN
- all_st_patches_512: 13545 512 X 512 patches (9 overlapping (stride = 256) patches extracted per image in all_st) used for testing Histology CNN, and training + testing Pathomic Fusion. Instead of random cropping, all_st_patches_512 can be interpretted as fixed crops per image.
- all_st_patches_512_cpc: Graph features for the 13545 histology ROIs used for training Histology GCN. Since we did not need to use a patch-based strategy for training the GCN, these .pt files are .pt files duplicated from all_st_cpc_img to align the graph and image input before loading it in the PyTorch Dataset Loader.
- splits: Pickle files containing the data splits for 15-fold cross-validation. Depending on the task (grade vs. survival) or model being trained (CNN, GCN, SNN, Pathomic Fusion), missing data was excluded. In the pickle filename, the string "all_st" vs. "all_st_patches_512" indicates that the genomic data was aligned with the 1024 X 1024 images in all_st/all_st_cpc or 512 X 512 images in all_st_patches_512 / all_st_patches_512_cpc. The ending string with pattern "INT_INT_INT_STR" indicates: 0/1 for if we should ignore patients with missing molecular subtype, 0/1 for if we should ignore patients with missing histology subtype, 0/1 for we should ignore patients with missing molecular subtype, 0/1 for if we should use extracted VGG19 embeddings from all_st_patches_512 for Pathomic Fusion, and "rnaseq" for if we should use RNAseq. Additional details can be found in make_splits.py.
./
└── data
└── TCGA_GBMLGG
├── all_datasets.csv
├── grade_data.csv
├── mRNA_Expression_z-Scores_RNA_Seq_RSEM.txt
├── mRNA_Expression_Zscores_RSEM.txt
├── pnas_splits.csv
├── gbmlgg
├── all_st
├── TCGA-02-0001-01Z-00-DX1.83fce43e-42ac-4dcd-b156-2908e75f2e47_1.png
├── TCGA-02-0001-01Z-00-DX2.b521a862-280c-4251-ab54-5636f20605d0_1.png
├── ...
├── all_st_cpc
└── pt
├── TCGA-02-0001-01Z-00-DX1.83fce43e-42ac-4dcd-b156-2908e75f2e47_1.pt
├── TCGA-02-0001-01Z-00-DX2.b521a862-280c-4251-ab54-5636f20605d0_1.pt
├── ...
├── all_st_patches_512
├── TCGA-02-0001-01Z-00-DX1.83fce43e-42ac-4dcd-b156-2908e75f2e47_1_0_0.png
├── TCGA-02-0001-01Z-00-DX1.83fce43e-42ac-4dcd-b156-2908e75f2e47_1_0_256.png
├── ...
├── all_st_patches_512_cpc
└── pt
├── TCGA-02-0001-01Z-00-DX1.83fce43e-42ac-4dcd-b156-2908e75f2e47_1_0_0.pt
├── TCGA-02-0001-01Z-00-DX1.83fce43e-42ac-4dcd-b156-2908e75f2e47_1_0_256.pt
├── ...
└── splits
├── gbmlgg15cv_all_st_0_0_0.pkl
├── gbmlgg15cv_all_st_0_1_0.pkl
├── ...
├── Other (Paired) Datasets :)
All pretrained models and predictions can be downloaded from our Google Drive, and are organized as follows below.
./
└── checkpoints
├── surv_15
├── path
├── path_1.pt
├── path_1_pred_train.pkl
├── path_1_pred_test.pkl
├── ...
├── ...
└── grad_15
├── path
├── ...
├── ...
where "surv_15" and "grad_15" refers to the 15-fold cross-validation on Pathomic Fusion for survival outcome prediction and grade classification respectively.
Commands for training each model:
python train_cv.py --exp_name surv_15_rnaseq --task surv --mode path --model_name path --niter 0 --niter_decay 50 --batch_size 8 --lr 0.0005 --reg_type none --lambda_reg 0 --gpu_ids 0
python test_cv.py --exp_name surv_15_rnaseq --task surv --mode path --model_name path --niter 0 --niter_decay 50 --batch_size 8 --lr 0.0005 --reg_type none --lambda_reg 0 --gpu_ids 0 --use_vgg_features 1
python train_cv.py --exp_name grad_15 --task grad --mode path --model_name path --niter 0 --niter_decay 50 --batch_size 8 --lr 0.0005 --reg_type none --lambda_reg 0 --act LSM --label_dim 3 --gpu_ids 0
python test_cv.py --exp_name grad_15 --task grad --mode path --model_name path --niter 0 --niter_decay 50 --batch_size 8 --lr 0.0005 --reg_type none --lambda_reg 0 --act LSM --label_dim 3 --gpu_ids 0 --use_vgg_features 1
python train_cv.py --exp_name surv_15_rnaseq --task surv --mode graph --model_name graph --niter 0 --niter_decay 50 --lr 0.002 --init_type max --reg_type none --lambda_reg 0 -use_vgg_features 1 --gpu_ids 0
python train_cv.py --exp_name grad_15 --task grad --mode graph --model_name graph --niter 0 --niter_decay 50 --lr 0.002 --init_type max --reg_type none --lambda_reg 0 -use_vgg_features 1 --act LSM --label_dim 3 --gpu_ids 0
python train_cv.py --exp_name surv_15_rnaseq --task surv --mode omic --model_name omic --niter 0 --niter_decay 50 --batch_size 64 --reg_type all --init_type max --lr 0.002 --weight_decay 5e-4 --gpu_ids 0 --use_rnaseq 1 --input_size_omic 320 --verbose 1
python train_cv.py --exp_name grad_15 --task grad --mode omic --model_name omic --niter 0 --niter_decay 50 --batch_size 64 --reg_type all --init_type max --lr 0.002 --weight_decay 5e-4 --act LSM --label_dim 3 --gpu_ids 0
python train_cv.py --exp_name surv_15_rnaseq --task surv --mode pathomic --model_name pathomic_fusion --niter 10 --niter_decay 20 --lr 0.0001 --beta1 0.5 --fusion_type pofusion --mmhid 64 --use_bilinear 1 --use_vgg_features 1 --gpu_ids 0 --omic_gate 0 --use_rnaseq 1 --input_size_omic 320
python train_cv.py --exp_name grad_15 --task grad --mode pathomic --model_name pathomic_fusion --niter 10 --niter_decay 20 --lr 0.0001 --beta1 0.5 --fusion_type pofusion --mmhid 64 --use_bilinear 1 --use_vgg_features 1 --gpu_ids 0 --path_gate 0 --omic_scale 2 --act LSM --label_dim 3
python train_cv.py --exp_name surv_15_rnaseq --task surv --mode graphomic --model_name graphomic_fusion --niter 10 --niter_decay 20 --lr 0.0001 --beta1 0.5 --fusion_type pofusion --mmhid 64 --use_bilinear 1 --use_vgg_features 1 --gpu_ids 0 --omic_gate 0 --grph_scale 2 --use_rnaseq 1 --input_size_omic 320
python train_cv.py --exp_name grad_15 --task grad --mode graphomic --model_name graphomic_fusion --niter 10 --niter_decay 20 --lr 0.0001 --beta1 0.5 --fusion_type pofusion --mmhid 64 --use_bilinear 1 --use_vgg_features 1 --gpu_ids 0 --grph_gate 0 --omic_scale 2 --act LSM --label_dim 3
python train_cv.py --exp_name surv_15_rnaseq --task surv --mode pathgraphomic --model_name pathgraphomic_fusion --niter 10 --niter_decay 20 --lr 0.0001 --beta1 0.5 --fusion_type pofusion_A --mmhid 64 --use_bilinear 1 --use_vgg_features 1 --gpu_ids 0 --omic_gate 0 --grph_scale 2 --use_rnaseq 1 --input_size_omic 320
python train_cv.py --exp_name grad_15 --task grad --mode pathgraphomic --model_name pathgraphomic_fusion --niter 10 --niter_decay 20 --lr 0.0001 --beta1 0.5 --fusion_type pofusion_B --mmhid 64 --use_bilinear 1 --use_vgg_features 1 --gpu_ids 0 --path_gate 0 --act LSM --label_dim 3
Raw histology region-of-interests for the TCGA-GBM and TCGA-LGG projects can be downloaded from Mobadersany et al.. For stain normalization, we used a python implementation of Sparse Stain Normalization from Vahdane et al. implemented in StainTools.
This project is licensed under the GNU GPLv3 License - see the LICENSE.md file for details
If you find our work useful in your research, please consider citing our paper at:
@article{chen2020pathomic,
title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis},
author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal},
journal={IEEE Transactions on Medical Imaging},
year={2020},
publisher={IEEE}
}
© Mahmood Lab - This code is made available under the GPLv3 License and is available for non-commercial academic purposes.