|
a/README.md |
|
b/README.md |
1 |
# Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Diagnosis and Prognosis |
1 |
# Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Diagnosis and Prognosis |
2 |
|
2 |
|
3 |
|
3 |
|
4 |
<details> |
4 |
<details>
|
5 |
<summary> |
5 |
<summary>
|
6 |
<b>Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis</b>, IEEE Transactions on Medical Imaging, 2020. |
6 |
<b>Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis</b>, IEEE Transactions on Medical Imaging, 2020.
|
7 |
<a href="https://ieeexplore.ieee.org/document/9186053" target="blank">[HTML]</a> |
7 |
<a href="https://ieeexplore.ieee.org/document/9186053" target="blank">[HTML]</a>
|
8 |
<a href="https://arxiv.org/abs/1912.08937" target="blank">[arXiv]</a> |
8 |
<a href="https://arxiv.org/abs/1912.08937" target="blank">[arXiv]</a>
|
9 |
<a href="https://www.youtube.com/watch?v=TrjGEUVX5YE" target="blank">[Talk]</a> |
9 |
<a href="https://www.youtube.com/watch?v=TrjGEUVX5YE" target="blank">[Talk]</a>
|
10 |
<br><em>Richard J Chen, Ming Y Lu, Jingwen Wang, Drew FK Williamson, Scott J Rodig, Neal I Lindeman, Faisal Mahmood</em></br> |
10 |
<br><em>Richard J Chen, Ming Y Lu, Jingwen Wang, Drew FK Williamson, Scott J Rodig, Neal I Lindeman, Faisal Mahmood</em></br>
|
11 |
</summary> |
11 |
</summary> |
12 |
|
12 |
|
13 |
```bash |
13 |
```bash
|
14 |
@article{chen2020pathomic, |
14 |
@article{chen2020pathomic,
|
15 |
title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis}, |
15 |
title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis},
|
16 |
author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal}, |
16 |
author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal},
|
17 |
journal={IEEE Transactions on Medical Imaging}, |
17 |
journal={IEEE Transactions on Medical Imaging},
|
18 |
year={2020}, |
18 |
year={2020},
|
19 |
publisher={IEEE} |
19 |
publisher={IEEE}
|
20 |
} |
20 |
}
|
21 |
``` |
21 |
```
|
22 |
</details> |
22 |
</details> |
23 |
|
23 |
|
24 |
**Summary:** We propose a simple and scalable method for integrating histology images and -omic data using attention gating and tensor fusion. Histopathology images can be processed using CNNs or GCNs for parameter efficiency or a combination of the the two. The setup is adaptable for integrating multiple -omic modalities with histopathology and can be used for improved diagnostic, prognostic and therapeutic response determinations. |
24 |
**Summary:** We propose a simple and scalable method for integrating histology images and -omic data using attention gating and tensor fusion. Histopathology images can be processed using CNNs or GCNs for parameter efficiency or a combination of the the two. The setup is adaptable for integrating multiple -omic modalities with histopathology and can be used for improved diagnostic, prognostic and therapeutic response determinations. |
25 |
|
25 |
|
26 |
<img src="https://github.com/mahmoodlab/PathomicFusion/blob/master/main_fig.jpg" width="1024"/> |
26 |
<img src="https://github.com/mahmoodlab/PathomicFusion/blob/master/main_fig.jpg?raw=true" width="1024"/> |
27 |
|
27 |
|
28 |
## Community / Follow-Up Work :) |
28 |
## Community / Follow-Up Work :)
|
29 |
<table> |
29 |
<table>
|
30 |
<tr> |
30 |
<tr>
|
31 |
<td>GitHub Repositories / Projects</td> |
31 |
<td>GitHub Repositories / Projects</td>
|
32 |
<td> |
32 |
<td>
|
33 |
<a href="https://github.com/Liruiqing-ustc/HFBSurv" target="_blank">★</a> |
33 |
<a href="https://github.com/Liruiqing-ustc/HFBSurv" target="_blank">★</a>
|
34 |
<a href="https://github.com/mahmoodlab/PORPOISE" target="_blank">★</a> |
34 |
<a href="https://github.com/mahmoodlab/PORPOISE" target="_blank">★</a>
|
35 |
<a href="https://github.com/TencentAILabHealthcare/MLA-GNN" target="_blank">★</a> |
35 |
<a href="https://github.com/TencentAILabHealthcare/MLA-GNN" target="_blank">★</a>
|
36 |
<a href="https://github.com/zcwang0702/HGPN" target="_blank">★</a> |
36 |
<a href="https://github.com/zcwang0702/HGPN" target="_blank">★</a>
|
37 |
<a href="https://github.com/isfj/GPDBN" target="_blank">★</a> |
37 |
<a href="https://github.com/isfj/GPDBN" target="_blank">★</a>
|
38 |
</td> |
38 |
</td>
|
39 |
</tr> |
39 |
</tr>
|
40 |
</table> |
40 |
</table> |
41 |
|
41 |
|
42 |
|
42 |
|
43 |
## Updates |
43 |
## Updates
|
44 |
* 05/26/2021: Updated Google Drive with all models and processed data for TCGA-GBMLGG and TCGA-KIRC. found using the [following link](https://drive.google.com/drive/u/1/folders/1swiMrz84V3iuzk8x99vGIBd5FCVncOlf). The data made available for TCGA-GBMLGG are the **same ROIs** used by [Mobadersany et al.](https://github.com/PathologyDataScience/SCNN) |
44 |
* 05/26/2021: Updated Google Drive with all models and processed data for TCGA-GBMLGG and TCGA-KIRC. found using the [following link](https://drive.google.com/drive/u/1/folders/1swiMrz84V3iuzk8x99vGIBd5FCVncOlf). The data made available for TCGA-GBMLGG are the **same ROIs** used by [Mobadersany et al.](https://github.com/PathologyDataScience/SCNN) |
45 |
|
45 |
|
46 |
## Setup |
46 |
## Setup |
47 |
|
47 |
|
48 |
### Prerequisites |
48 |
### Prerequisites
|
49 |
- Linux (Tested on Ubuntu 18.04) |
49 |
- Linux (Tested on Ubuntu 18.04)
|
50 |
- NVIDIA GPU (Tested on Nvidia GeForce RTX 2080 Tis on local workstations, and Nvidia V100s using Google Cloud) |
50 |
- NVIDIA GPU (Tested on Nvidia GeForce RTX 2080 Tis on local workstations, and Nvidia V100s using Google Cloud)
|
51 |
- CUDA + cuDNN (Tested on CUDA 10.1 and cuDNN 7.5. CPU mode and CUDA without CuDNN may work with minimal modification, but untested.) |
51 |
- CUDA + cuDNN (Tested on CUDA 10.1 and cuDNN 7.5. CPU mode and CUDA without CuDNN may work with minimal modification, but untested.)
|
52 |
- torch>=1.1.0 |
52 |
- torch>=1.1.0
|
53 |
- torch_geometric=1.3.0 |
53 |
- torch_geometric=1.3.0 |
54 |
|
54 |
|
55 |
## Code Base Structure |
55 |
## Code Base Structure
|
56 |
The code base structure is explained below: |
56 |
The code base structure is explained below:
|
57 |
- **train_cv.py**: Cross-validation script for training unimodal and multimodal networks. This script will save evaluation metrics and predictions on the train + test split for each epoch on every split in **checkpoints**. |
57 |
- **train_cv.py**: Cross-validation script for training unimodal and multimodal networks. This script will save evaluation metrics and predictions on the train + test split for each epoch on every split in **checkpoints**.
|
58 |
- **test_cv.py**: Script for testing unimodal and unimodal networks on only the test split. |
58 |
- **test_cv.py**: Script for testing unimodal and unimodal networks on only the test split.
|
59 |
- **train_test.py**: Contains the definitions for "train" and "test". |
59 |
- **train_test.py**: Contains the definitions for "train" and "test".
|
60 |
- **networks.py**: Contains PyTorch model definitions for all unimodal and multimodal network. |
60 |
- **networks.py**: Contains PyTorch model definitions for all unimodal and multimodal network.
|
61 |
- **fusion.py**: Contains PyTorch model definitions for fusion. |
61 |
- **fusion.py**: Contains PyTorch model definitions for fusion.
|
62 |
- **data_loaders.py**: Contains the PyTorch DatasetLoader definition for loading multimodal data. |
62 |
- **data_loaders.py**: Contains the PyTorch DatasetLoader definition for loading multimodal data.
|
63 |
- **options.py**: Contains all the options for the argparser. |
63 |
- **options.py**: Contains all the options for the argparser.
|
64 |
- **make_splits.py**: Script for generating a pickle file that saves + aligns the path for multimodal data for cross-validation. |
64 |
- **make_splits.py**: Script for generating a pickle file that saves + aligns the path for multimodal data for cross-validation.
|
65 |
- **run_cox_baselines.py**: Script for running Cox baselines. |
65 |
- **run_cox_baselines.py**: Script for running Cox baselines.
|
66 |
- **utils.py**: Contains definitions for collating, survival loss functions, data preprocessing, evaluation, figure plotting, etc... |
66 |
- **utils.py**: Contains definitions for collating, survival loss functions, data preprocessing, evaluation, figure plotting, etc... |
67 |
|
67 |
|
68 |
The directory structure for your multimodal dataset should look similar to the following: |
68 |
The directory structure for your multimodal dataset should look similar to the following:
|
69 |
```bash |
69 |
```bash
|
70 |
./ |
70 |
./
|
71 |
├── data |
71 |
├── data
|
72 |
└── PROJECT |
72 |
└── PROJECT
|
73 |
├── INPUT A (e.g. Image) |
73 |
├── INPUT A (e.g. Image)
|
74 |
├── image_001.png |
74 |
├── image_001.png
|
75 |
├── image_002.png |
75 |
├── image_002.png
|
76 |
├── ... |
76 |
├── ...
|
77 |
├── INPUT B (e.g. Graph) |
77 |
├── INPUT B (e.g. Graph)
|
78 |
├── image_001.pkl |
78 |
├── image_001.pkl
|
79 |
├── image_002.pkl |
79 |
├── image_002.pkl
|
80 |
├── ... |
80 |
├── ...
|
81 |
└── INPUT C (e.g. Genomic) |
81 |
└── INPUT C (e.g. Genomic)
|
82 |
└── genomic_data.csv |
82 |
└── genomic_data.csv
|
83 |
└── checkpoints |
83 |
└── checkpoints
|
84 |
└── PROJECT |
84 |
└── PROJECT
|
85 |
├── TASK X (e.g. Survival Analysis) |
85 |
├── TASK X (e.g. Survival Analysis)
|
86 |
├── path |
86 |
├── path
|
87 |
├── ... |
87 |
├── ...
|
88 |
├── ... |
88 |
├── ...
|
89 |
└── TASK Y (e.g. Grade Classification) |
89 |
└── TASK Y (e.g. Grade Classification)
|
90 |
├── path |
90 |
├── path
|
91 |
├── ... |
91 |
├── ...
|
92 |
├── ... |
92 |
├── ...
|
93 |
``` |
93 |
``` |
94 |
|
94 |
|
95 |
Depending on which modalities you are interested in combining, you must: (1) write your own function for aligning multimodal data in **make_splits.py**, (2) create your DatasetLoader in **data_loaders.py**, (3) modify the **options.py** for your data and task. Models will be saved to the **checkpoints** directory, with each model for each task saved in its own directory. At the moment, the only supervised learning tasks implemented are survival outcome prediction and grade classification. |
95 |
Depending on which modalities you are interested in combining, you must: (1) write your own function for aligning multimodal data in **make_splits.py**, (2) create your DatasetLoader in **data_loaders.py**, (3) modify the **options.py** for your data and task. Models will be saved to the **checkpoints** directory, with each model for each task saved in its own directory. At the moment, the only supervised learning tasks implemented are survival outcome prediction and grade classification. |
96 |
|
96 |
|
97 |
## Training and Evaluation |
97 |
## Training and Evaluation
|
98 |
Here are example commands for training unimodal + multimodal networks. |
98 |
Here are example commands for training unimodal + multimodal networks. |
99 |
|
99 |
|
100 |
### Survival Model for Input A |
100 |
### Survival Model for Input A
|
101 |
Example shown below for training a survival model for mode A and saving the model checkpoints + predictions at the end of each split. In this example, we would create a folder called "CNN_A" in "./checkpoints/example/" for all the models in cross-validation. It assumes that "A" is defined as a mode in **dataset_loaders.py** for handling modality-specific data-preprocessing steps (random crop + flip + jittering for images), and that there is a network defined for input A in **networks.py**. "surv" is already defined as a task for training networks for survival analysis in **options.py, networks.py, train_test.py, train_cv.py**. |
101 |
Example shown below for training a survival model for mode A and saving the model checkpoints + predictions at the end of each split. In this example, we would create a folder called "CNN_A" in "./checkpoints/example/" for all the models in cross-validation. It assumes that "A" is defined as a mode in **dataset_loaders.py** for handling modality-specific data-preprocessing steps (random crop + flip + jittering for images), and that there is a network defined for input A in **networks.py**. "surv" is already defined as a task for training networks for survival analysis in **options.py, networks.py, train_test.py, train_cv.py**. |
102 |
|
102 |
|
103 |
``` |
103 |
```
|
104 |
python train_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task surv --mode A --model_name CNN_A --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0 |
104 |
python train_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task surv --mode A --model_name CNN_A --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0
|
105 |
``` |
105 |
```
|
106 |
To obtain test predictions on only the test splits in your cross-validation, you can replace "train_cv" with "test_cv". |
106 |
To obtain test predictions on only the test splits in your cross-validation, you can replace "train_cv" with "test_cv".
|
107 |
``` |
107 |
```
|
108 |
python test_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task surv --mode input_A --model input_A_CNN --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0 |
108 |
python test_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task surv --mode input_A --model input_A_CNN --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0
|
109 |
``` |
109 |
``` |
110 |
|
110 |
|
111 |
### Grade Classification Model for Input A + B |
111 |
### Grade Classification Model for Input A + B
|
112 |
Example shown below for training a grade classification model for fusing modes A and B. Similar to the previous example, we would create a folder called "Fusion_AB" in "./checkpoints/example/" for all the models in cross-validation. It assumes that "AB" is defined as a mode in **dataset_loaders.py** for handling multiple inputs A and B at the same time. "grad" is already defined as a task for training networks for grade classification in **options.py, networks.py, train_test.py, train_cv.py**. |
112 |
Example shown below for training a grade classification model for fusing modes A and B. Similar to the previous example, we would create a folder called "Fusion_AB" in "./checkpoints/example/" for all the models in cross-validation. It assumes that "AB" is defined as a mode in **dataset_loaders.py** for handling multiple inputs A and B at the same time. "grad" is already defined as a task for training networks for grade classification in **options.py, networks.py, train_test.py, train_cv.py**.
|
113 |
``` |
113 |
```
|
114 |
python train_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task grad --mode AB --model_name Fusion_AB --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0 |
114 |
python train_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task grad --mode AB --model_name Fusion_AB --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0
|
115 |
``` |
115 |
``` |
116 |
|
116 |
|
117 |
## Reproducibility |
117 |
## Reproducibility
|
118 |
To reporduce the results in our paper and for exact data preprocessing, implementation, and experimental details please follow the instructions here: [./data/TCGA_GBMLGG/](https://github.com/mahmoodlab/PathomicFusion/tree/master/data/TCGA_GBMLGG). Processed data and trained models can be downloaded [here](https://drive.google.com/drive/folders/1swiMrz84V3iuzk8x99vGIBd5FCVncOlf?usp=sharing). |
118 |
To reporduce the results in our paper and for exact data preprocessing, implementation, and experimental details please follow the instructions here: [./data/TCGA_GBMLGG/](https://github.com/mahmoodlab/PathomicFusion/tree/master/data/TCGA_GBMLGG). Processed data and trained models can be downloaded [here](https://drive.google.com/drive/folders/1swiMrz84V3iuzk8x99vGIBd5FCVncOlf?usp=sharing). |
119 |
|
119 |
|
120 |
## Issues |
120 |
## Issues
|
121 |
- Please open new threads or report issues directly (for urgent blockers) to richardchen@g.harvard.edu. |
121 |
- Please open new threads or report issues directly (for urgent blockers) to richardchen@g.harvard.edu.
|
122 |
- Immediate response to minor issues may not be available. |
122 |
- Immediate response to minor issues may not be available. |
123 |
|
123 |
|
124 |
## Licenses, Usages, and Acknowledgements |
124 |
## Licenses, Usages, and Acknowledgements
|
125 |
- This project is licensed under the GNU GPLv3 License - see the [LICENSE.md](LICENSE.md) file for details. A provisional patent on this work has been filed by the Brigham and Women's Hospital. |
125 |
- This project is licensed under the GNU GPLv3 License - see the [LICENSE.md](LICENSE.md) file for details. A provisional patent on this work has been filed by the Brigham and Women's Hospital.
|
126 |
- This code is inspired by [SALMON](https://github.com/huangzhii/SALMON) and [SCNN](https://github.com/CancerDataScience/SCNN). Code base structure was inspired by [pytorch-CycleGAN-and-pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix). |
126 |
- This code is inspired by [SALMON](https://github.com/huangzhii/SALMON) and [SCNN](https://github.com/CancerDataScience/SCNN). Code base structure was inspired by [pytorch-CycleGAN-and-pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix).
|
127 |
- Subsidized computing resources for this project were provided by Nvidia and Google Cloud. |
127 |
- Subsidized computing resources for this project were provided by Nvidia and Google Cloud.
|
128 |
- If you find our work useful in your research, please consider citing our paper at: |
128 |
- If you find our work useful in your research, please consider citing our paper at: |
129 |
|
129 |
|
130 |
```bash |
130 |
```bash
|
131 |
@article{chen2020pathomic, |
131 |
@article{chen2020pathomic,
|
132 |
title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis}, |
132 |
title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis},
|
133 |
author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal}, |
133 |
author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal},
|
134 |
journal={IEEE Transactions on Medical Imaging}, |
134 |
journal={IEEE Transactions on Medical Imaging},
|
135 |
year={2020}, |
135 |
year={2020},
|
136 |
publisher={IEEE} |
136 |
publisher={IEEE}
|
137 |
} |
137 |
}
|
138 |
``` |
138 |
``` |
139 |
|
139 |
|
140 |
© [Mahmood Lab](http://www.mahmoodlab.org) - This code is made available under the GPLv3 License and is available for non-commercial academic purposes. |
140 |
© [Mahmood Lab](http://www.mahmoodlab.org) - This code is made available under the GPLv3 License and is available for non-commercial academic purposes.
|