Diff of /README.md [000000] .. [efd906]

Switch to unified view

a b/README.md
1
# multipit
2
3
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
4
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
5
6
This repository provides a set of Python tools to perform multimodal learning with tabular data. It contains the code used in our study: 
7
8
[Captier, N., Lerousseau, M., Orlhac, F. et al. Integration of clinical, pathological, radiological, and transcriptomic data improves prediction for first-line immunotherapy outcome in metastatic non-small cell lung cancer. Nat Commun 16, 614 (2025).](https://doi.org/10.1038/s41467-025-55847-5)
9
10
## Installation
11
### Dependencies
12
- lifelines (>= 0.27.4)
13
- matplotlib (>= 3.5.1)
14
- numpy (>= 1.21.5)
15
- pandas (= 1.5.3)
16
- pyyaml (>= 6.0)
17
- scikit-learn (>= 1.2.0)
18
- scikit-survival (>= 0.21.0)
19
- seaborn (=0.13.0)
20
- shap (>= 0.41.0)
21
- xgboost (>= 1.7.5)
22
### Install from source
23
Clone the repository:
24
```
25
git clone https://github.com/sysbio-curie/multipit
26
```
27
28
## Key features
29
* **Early and late fusion implementations**: 4 estimators compatible with scikit-learn and scikit-surv to fuse several tabular modalities in a single multimodal model.
30
  * [`multipit.multi_model.EarlyFusionClassifier`](multipit/multi_model/earlyfusion.py) and [`multipit.multi_model.LateFusionClassifier`](multipit/multi_model/latefusion.py) for binary classification.
31
  * [`multipit.multi_model.EarlyFusionSurvival`](multipit/multi_model/earlyfusion.py) and [`multipit.multi_model.LateFusionSurvival`](multipit/multi_model/latefusion.py) for survival prediction.
32
   
33
34
* **Scripts to reproduce the experiments of our study**: Scripts to perform late fusion an early fusion of clinical, radiomic, pathomic and transcriptomic features with a repeated cross-validation scheme. Scripts to compute and collect the SHAP values associated with each unimodal predictive model (see [scripts](scripts) folder).
35
   
36
37
* **Plotting functions and notebooks to reproduce the figures of our study**: several functions to plot and compare the performances of different multimodal combinations as well as to display feature importance with SHAP values.
38
  * [plot_results.ipynb](notebooks/plot_results.ipynb) 
39
  * [benchmark.ipynb](notebooks/benchmark.ipynb)
40
  * [plot_shap.ipynb](notebooks/plot_shap.ipynb)
41
42
## Deep-multipit
43
44
We also provide another Github repository, named [deep-multipit](https://github.com/sysbio-curie/deep-multipit) with a Pytorch implementation of an end-to-end integration strategy with attention weights, inspired by [Vanguri *et al*, 2022](https://www.nature.com/articles/s43018-022-00416-8).
45
46
## Run scripts
47
48
Modify the configurations in `.yaml` config files (in config/ subfolder) then run the following command in your terminal:
49
50
```
51
python latefusion.py -c config/config_latefusion.yaml -s path/to/results/folder
52
```
53
54
````
55
python collect_shap_survival.py -c config/config_latefusion_survival.yaml -s path/to/results/folder
56
````
57
58
**Warning:** For Windows OS paths must be written with '\\' or '\\\' separators (instead of '/').
59
60
**Note:** In order to modify more deeply the loading of the data or the predictive pipelines, please update the `PredictionTask` class in the file [_init_scripts.py](scripts/_init_scripts.py). 
61
62
## Examples
63
In the [examples](examples) folder we provide a brief example on how to slightly modify the scripts and codes from our original experiments to perform multimodal learning for the prediction of Overall Survival from clinical and RNA-seq data extracted from TCGA (i.e., stage III and IV TCGA-LUAD and TCGA-LUSC samples).   
64
65
We simply updated the `PredictionTask` class in a new file [_init_scripts_tcga.py](examples/tcga_lung/_init_scripts_tcga.py) to load TGCA data and build predictive pipelines.
66
67
**Note:** clinical and transcriptomic data extracted for 201 stage III/IV TCGA patients (i.e., LUAD or LUSC) are available in the [data](data) folder.
68
69
## Citing multipit
70
71
If you use multipit in a scientific publication, we would appreciate citation to the [following paper](https://doi.org/10.1038/s41467-025-55847-5):
72
73
```
74
Captier, N., Lerousseau, M., Orlhac, F. et al. Integration of clinical, pathological, radiological, and transcriptomic data improves prediction for first-line immunotherapy outcome in metastatic non-small cell lung cancer. Nat Commun 16, 614 (2025). https://doi.org/10.1038/s41467-025-55847-5
75
```
76
77
## Acknowledgements
78
79
This repository was created as part of the PhD project of [Nicolas Captier](https://ncaptier.github.io/) in the [Computational Systems Biology of Cancer group](https://institut-curie.org/team/barillot) and the [Laboratory of Translational Imaging in Oncology (LITO)](https://www.lito-web.fr/en/) of Institut Curie.