Switch to unified view

a/README.md b/README.md
1
# Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Diagnosis and Prognosis
1
# Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Diagnosis and Prognosis
2
2
3
3
4
<details>
4
<details>
5
<summary>
5
<summary>
6
  <b>Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis</b>, IEEE Transactions on Medical Imaging, 2020.
6
  <b>Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis</b>, IEEE Transactions on Medical Imaging, 2020.
7
  <a href="https://ieeexplore.ieee.org/document/9186053" target="blank">[HTML]</a>
7
  <a href="https://ieeexplore.ieee.org/document/9186053" target="blank">[HTML]</a>
8
  <a href="https://arxiv.org/abs/1912.08937" target="blank">[arXiv]</a>
8
  <a href="https://arxiv.org/abs/1912.08937" target="blank">[arXiv]</a>
9
  <a href="https://www.youtube.com/watch?v=TrjGEUVX5YE" target="blank">[Talk]</a>
9
  <a href="https://www.youtube.com/watch?v=TrjGEUVX5YE" target="blank">[Talk]</a>
10
  <br><em>Richard J Chen, Ming Y Lu, Jingwen Wang, Drew FK Williamson, Scott J Rodig, Neal I Lindeman, Faisal Mahmood</em></br>
10
  <br><em>Richard J Chen, Ming Y Lu, Jingwen Wang, Drew FK Williamson, Scott J Rodig, Neal I Lindeman, Faisal Mahmood</em></br>
11
</summary>
11
</summary>
12
12
13
```bash
13
```bash
14
@article{chen2020pathomic,
14
@article{chen2020pathomic,
15
  title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis},
15
  title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis},
16
  author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal},
16
  author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal},
17
  journal={IEEE Transactions on Medical Imaging},
17
  journal={IEEE Transactions on Medical Imaging},
18
  year={2020},
18
  year={2020},
19
  publisher={IEEE}
19
  publisher={IEEE}
20
}
20
}
21
```
21
```
22
</details>
22
</details>
23
23
24
**Summary:** We propose a simple and scalable method for integrating histology images and -omic data using attention gating and tensor fusion. Histopathology images can be processed using CNNs or GCNs for parameter efficiency or a combination of the the two. The setup is adaptable for integrating multiple -omic modalities with histopathology and can be used for improved diagnostic, prognostic and therapeutic response determinations. 
24
**Summary:** We propose a simple and scalable method for integrating histology images and -omic data using attention gating and tensor fusion. Histopathology images can be processed using CNNs or GCNs for parameter efficiency or a combination of the the two. The setup is adaptable for integrating multiple -omic modalities with histopathology and can be used for improved diagnostic, prognostic and therapeutic response determinations. 
25
25
26
<img src="https://github.com/mahmoodlab/PathomicFusion/blob/master/main_fig.jpg" width="1024"/>
26
<img src="https://github.com/mahmoodlab/PathomicFusion/blob/master/main_fig.jpg?raw=true" width="1024"/>
27
27
28
## Community / Follow-Up Work :)
28
## Community / Follow-Up Work :)
29
<table>
29
<table>
30
<tr>
30
<tr>
31
<td>GitHub Repositories / Projects</td>
31
<td>GitHub Repositories / Projects</td>
32
<td>
32
<td>
33
<a href="https://github.com/Liruiqing-ustc/HFBSurv" target="_blank">★</a>
33
<a href="https://github.com/Liruiqing-ustc/HFBSurv" target="_blank">★</a>
34
<a href="https://github.com/mahmoodlab/PORPOISE" target="_blank">★</a>
34
<a href="https://github.com/mahmoodlab/PORPOISE" target="_blank">★</a>
35
<a href="https://github.com/TencentAILabHealthcare/MLA-GNN" target="_blank">★</a>
35
<a href="https://github.com/TencentAILabHealthcare/MLA-GNN" target="_blank">★</a>
36
<a href="https://github.com/zcwang0702/HGPN" target="_blank">★</a>
36
<a href="https://github.com/zcwang0702/HGPN" target="_blank">★</a>
37
<a href="https://github.com/isfj/GPDBN" target="_blank">★</a>
37
<a href="https://github.com/isfj/GPDBN" target="_blank">★</a>
38
</td>
38
</td>
39
</tr>
39
</tr>
40
</table>
40
</table>
41
41
42
  
42
  
43
## Updates
43
## Updates
44
* 05/26/2021: Updated Google Drive with all models and processed data for TCGA-GBMLGG and TCGA-KIRC. found using the [following link](https://drive.google.com/drive/u/1/folders/1swiMrz84V3iuzk8x99vGIBd5FCVncOlf). The data made available for TCGA-GBMLGG are the **same ROIs** used by [Mobadersany et al.](https://github.com/PathologyDataScience/SCNN)
44
* 05/26/2021: Updated Google Drive with all models and processed data for TCGA-GBMLGG and TCGA-KIRC. found using the [following link](https://drive.google.com/drive/u/1/folders/1swiMrz84V3iuzk8x99vGIBd5FCVncOlf). The data made available for TCGA-GBMLGG are the **same ROIs** used by [Mobadersany et al.](https://github.com/PathologyDataScience/SCNN)
45
45
46
## Setup
46
## Setup
47
47
48
### Prerequisites
48
### Prerequisites
49
- Linux (Tested on Ubuntu 18.04)
49
- Linux (Tested on Ubuntu 18.04)
50
- NVIDIA GPU (Tested on Nvidia GeForce RTX 2080 Tis on local workstations, and Nvidia V100s using Google Cloud)
50
- NVIDIA GPU (Tested on Nvidia GeForce RTX 2080 Tis on local workstations, and Nvidia V100s using Google Cloud)
51
- CUDA + cuDNN (Tested on CUDA 10.1 and cuDNN 7.5. CPU mode and CUDA without CuDNN may work with minimal modification, but untested.)
51
- CUDA + cuDNN (Tested on CUDA 10.1 and cuDNN 7.5. CPU mode and CUDA without CuDNN may work with minimal modification, but untested.)
52
- torch>=1.1.0
52
- torch>=1.1.0
53
- torch_geometric=1.3.0
53
- torch_geometric=1.3.0
54
54
55
## Code Base Structure
55
## Code Base Structure
56
The code base structure is explained below: 
56
The code base structure is explained below: 
57
- **train_cv.py**: Cross-validation script for training unimodal and multimodal networks. This script will save evaluation metrics and predictions on the train + test split for each epoch on every split in **checkpoints**.
57
- **train_cv.py**: Cross-validation script for training unimodal and multimodal networks. This script will save evaluation metrics and predictions on the train + test split for each epoch on every split in **checkpoints**.
58
- **test_cv.py**: Script for testing unimodal and unimodal networks on only the test split.
58
- **test_cv.py**: Script for testing unimodal and unimodal networks on only the test split.
59
- **train_test.py**: Contains the definitions for "train" and "test". 
59
- **train_test.py**: Contains the definitions for "train" and "test". 
60
- **networks.py**: Contains PyTorch model definitions for all unimodal and multimodal network.
60
- **networks.py**: Contains PyTorch model definitions for all unimodal and multimodal network.
61
- **fusion.py**: Contains PyTorch model definitions for fusion.
61
- **fusion.py**: Contains PyTorch model definitions for fusion.
62
- **data_loaders.py**: Contains the PyTorch DatasetLoader definition for loading multimodal data.
62
- **data_loaders.py**: Contains the PyTorch DatasetLoader definition for loading multimodal data.
63
- **options.py**: Contains all the options for the argparser.
63
- **options.py**: Contains all the options for the argparser.
64
- **make_splits.py**: Script for generating a pickle file that saves + aligns the path for multimodal data for cross-validation.
64
- **make_splits.py**: Script for generating a pickle file that saves + aligns the path for multimodal data for cross-validation.
65
- **run_cox_baselines.py**: Script for running Cox baselines.
65
- **run_cox_baselines.py**: Script for running Cox baselines.
66
- **utils.py**: Contains definitions for collating, survival loss functions, data preprocessing, evaluation, figure plotting, etc...
66
- **utils.py**: Contains definitions for collating, survival loss functions, data preprocessing, evaluation, figure plotting, etc...
67
67
68
The directory structure for your multimodal dataset should look similar to the following:
68
The directory structure for your multimodal dataset should look similar to the following:
69
```bash
69
```bash
70
./
70
./
71
├── data
71
├── data
72
      └── PROJECT
72
      └── PROJECT
73
            ├── INPUT A (e.g. Image)
73
            ├── INPUT A (e.g. Image)
74
                ├── image_001.png
74
                ├── image_001.png
75
                ├── image_002.png
75
                ├── image_002.png
76
                ├── ...
76
                ├── ...
77
            ├── INPUT B (e.g. Graph)
77
            ├── INPUT B (e.g. Graph)
78
                ├── image_001.pkl
78
                ├── image_001.pkl
79
                ├── image_002.pkl
79
                ├── image_002.pkl
80
                ├── ...
80
                ├── ...
81
            └── INPUT C (e.g. Genomic)
81
            └── INPUT C (e.g. Genomic)
82
                └── genomic_data.csv
82
                └── genomic_data.csv
83
└── checkpoints
83
└── checkpoints
84
        └── PROJECT
84
        └── PROJECT
85
            ├── TASK X (e.g. Survival Analysis)
85
            ├── TASK X (e.g. Survival Analysis)
86
                ├── path
86
                ├── path
87
                    ├── ...
87
                    ├── ...
88
                ├── ...
88
                ├── ...
89
            └── TASK Y (e.g. Grade Classification)
89
            └── TASK Y (e.g. Grade Classification)
90
                ├── path
90
                ├── path
91
                    ├── ...
91
                    ├── ...
92
                ├── ...
92
                ├── ...
93
```
93
```
94
94
95
Depending on which modalities you are interested in combining, you must: (1) write your own function for aligning multimodal data in **make_splits.py**, (2) create your DatasetLoader in **data_loaders.py**, (3) modify the **options.py** for your data and task. Models will be saved to the **checkpoints** directory, with each model for each task saved in its own directory. At the moment, the only supervised learning tasks implemented are survival outcome prediction and grade classification.
95
Depending on which modalities you are interested in combining, you must: (1) write your own function for aligning multimodal data in **make_splits.py**, (2) create your DatasetLoader in **data_loaders.py**, (3) modify the **options.py** for your data and task. Models will be saved to the **checkpoints** directory, with each model for each task saved in its own directory. At the moment, the only supervised learning tasks implemented are survival outcome prediction and grade classification.
96
96
97
## Training and Evaluation
97
## Training and Evaluation
98
Here are example commands for training unimodal + multimodal networks.
98
Here are example commands for training unimodal + multimodal networks.
99
99
100
### Survival Model for Input A
100
### Survival Model for Input A
101
Example shown below for training a survival model for mode A and saving the model checkpoints + predictions at the end of each split. In this example, we would create a folder called "CNN_A" in "./checkpoints/example/" for all the models in cross-validation. It assumes that "A" is defined as a mode in **dataset_loaders.py** for handling modality-specific data-preprocessing steps (random crop + flip + jittering for images), and that there is a network defined for input A in **networks.py**. "surv" is already defined as a task for training networks for survival analysis in **options.py, networks.py, train_test.py, train_cv.py**.
101
Example shown below for training a survival model for mode A and saving the model checkpoints + predictions at the end of each split. In this example, we would create a folder called "CNN_A" in "./checkpoints/example/" for all the models in cross-validation. It assumes that "A" is defined as a mode in **dataset_loaders.py** for handling modality-specific data-preprocessing steps (random crop + flip + jittering for images), and that there is a network defined for input A in **networks.py**. "surv" is already defined as a task for training networks for survival analysis in **options.py, networks.py, train_test.py, train_cv.py**.
102
102
103
```
103
```
104
python train_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task surv --mode A --model_name CNN_A --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0
104
python train_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task surv --mode A --model_name CNN_A --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0
105
```
105
```
106
To obtain test predictions on only the test splits in your cross-validation, you can replace "train_cv" with "test_cv".
106
To obtain test predictions on only the test splits in your cross-validation, you can replace "train_cv" with "test_cv".
107
```
107
```
108
python test_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task surv --mode input_A --model input_A_CNN --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0
108
python test_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task surv --mode input_A --model input_A_CNN --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0
109
```
109
```
110
110
111
### Grade Classification Model for Input A + B
111
### Grade Classification Model for Input A + B
112
Example shown below for training a grade classification model for fusing modes A and B. Similar to the previous example, we would create a folder called "Fusion_AB" in "./checkpoints/example/" for all the models in cross-validation. It assumes that "AB" is defined as a mode in **dataset_loaders.py** for handling multiple inputs A and B at the same time. "grad" is already defined as a task for training networks for grade classification in **options.py, networks.py, train_test.py, train_cv.py**.
112
Example shown below for training a grade classification model for fusing modes A and B. Similar to the previous example, we would create a folder called "Fusion_AB" in "./checkpoints/example/" for all the models in cross-validation. It assumes that "AB" is defined as a mode in **dataset_loaders.py** for handling multiple inputs A and B at the same time. "grad" is already defined as a task for training networks for grade classification in **options.py, networks.py, train_test.py, train_cv.py**.
113
```
113
```
114
python train_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task grad --mode AB --model_name Fusion_AB --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0
114
python train_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task grad --mode AB --model_name Fusion_AB --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0
115
```
115
```
116
116
117
## Reproducibility
117
## Reproducibility
118
To reporduce the results in our paper and for exact data preprocessing, implementation, and experimental details please follow the instructions here: [./data/TCGA_GBMLGG/](https://github.com/mahmoodlab/PathomicFusion/tree/master/data/TCGA_GBMLGG). Processed data and trained models can be downloaded [here](https://drive.google.com/drive/folders/1swiMrz84V3iuzk8x99vGIBd5FCVncOlf?usp=sharing).
118
To reporduce the results in our paper and for exact data preprocessing, implementation, and experimental details please follow the instructions here: [./data/TCGA_GBMLGG/](https://github.com/mahmoodlab/PathomicFusion/tree/master/data/TCGA_GBMLGG). Processed data and trained models can be downloaded [here](https://drive.google.com/drive/folders/1swiMrz84V3iuzk8x99vGIBd5FCVncOlf?usp=sharing).
119
119
120
## Issues
120
## Issues
121
- Please open new threads or report issues directly (for urgent blockers) to richardchen@g.harvard.edu.
121
- Please open new threads or report issues directly (for urgent blockers) to richardchen@g.harvard.edu.
122
- Immediate response to minor issues may not be available.
122
- Immediate response to minor issues may not be available.
123
123
124
## Licenses, Usages, and Acknowledgements
124
## Licenses, Usages, and Acknowledgements
125
- This project is licensed under the GNU GPLv3 License - see the [LICENSE.md](LICENSE.md) file for details. A provisional patent on this work has been filed by the Brigham and Women's Hospital.
125
- This project is licensed under the GNU GPLv3 License - see the [LICENSE.md](LICENSE.md) file for details. A provisional patent on this work has been filed by the Brigham and Women's Hospital.
126
- This code is inspired by [SALMON](https://github.com/huangzhii/SALMON) and [SCNN](https://github.com/CancerDataScience/SCNN). Code base structure was inspired by [pytorch-CycleGAN-and-pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix).
126
- This code is inspired by [SALMON](https://github.com/huangzhii/SALMON) and [SCNN](https://github.com/CancerDataScience/SCNN). Code base structure was inspired by [pytorch-CycleGAN-and-pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix).
127
- Subsidized computing resources for this project were provided by Nvidia and Google Cloud. 
127
- Subsidized computing resources for this project were provided by Nvidia and Google Cloud. 
128
- If you find our work useful in your research, please consider citing our paper at:
128
- If you find our work useful in your research, please consider citing our paper at:
129
129
130
```bash
130
```bash
131
@article{chen2020pathomic,
131
@article{chen2020pathomic,
132
  title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis},
132
  title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis},
133
  author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal},
133
  author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal},
134
  journal={IEEE Transactions on Medical Imaging},
134
  journal={IEEE Transactions on Medical Imaging},
135
  year={2020},
135
  year={2020},
136
  publisher={IEEE}
136
  publisher={IEEE}
137
}
137
}
138
```
138
```
139
139
140
© [Mahmood Lab](http://www.mahmoodlab.org) - This code is made available under the GPLv3 License and is available for non-commercial academic purposes. 
140
© [Mahmood Lab](http://www.mahmoodlab.org) - This code is made available under the GPLv3 License and is available for non-commercial academic purposes.