--- a
+++ b/README.md
@@ -0,0 +1,79 @@
+# multipit
+
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+
+This repository provides a set of Python tools to perform multimodal learning with tabular data. It contains the code used in our study: 
+
+[Captier, N., Lerousseau, M., Orlhac, F. et al. Integration of clinical, pathological, radiological, and transcriptomic data improves prediction for first-line immunotherapy outcome in metastatic non-small cell lung cancer. Nat Commun 16, 614 (2025).](https://doi.org/10.1038/s41467-025-55847-5)
+
+## Installation
+### Dependencies
+- lifelines (>= 0.27.4)
+- matplotlib (>= 3.5.1)
+- numpy (>= 1.21.5)
+- pandas (= 1.5.3)
+- pyyaml (>= 6.0)
+- scikit-learn (>= 1.2.0)
+- scikit-survival (>= 0.21.0)
+- seaborn (=0.13.0)
+- shap (>= 0.41.0)
+- xgboost (>= 1.7.5)
+### Install from source
+Clone the repository:
+```
+git clone https://github.com/sysbio-curie/multipit
+```
+
+## Key features
+* **Early and late fusion implementations**: 4 estimators compatible with scikit-learn and scikit-surv to fuse several tabular modalities in a single multimodal model.
+  * [`multipit.multi_model.EarlyFusionClassifier`](multipit/multi_model/earlyfusion.py) and [`multipit.multi_model.LateFusionClassifier`](multipit/multi_model/latefusion.py) for binary classification.
+  * [`multipit.multi_model.EarlyFusionSurvival`](multipit/multi_model/earlyfusion.py) and [`multipit.multi_model.LateFusionSurvival`](multipit/multi_model/latefusion.py) for survival prediction.
+   
+
+* **Scripts to reproduce the experiments of our study**: Scripts to perform late fusion an early fusion of clinical, radiomic, pathomic and transcriptomic features with a repeated cross-validation scheme. Scripts to compute and collect the SHAP values associated with each unimodal predictive model (see [scripts](scripts) folder).
+   
+
+* **Plotting functions and notebooks to reproduce the figures of our study**: several functions to plot and compare the performances of different multimodal combinations as well as to display feature importance with SHAP values.
+  * [plot_results.ipynb](notebooks/plot_results.ipynb) 
+  * [benchmark.ipynb](notebooks/benchmark.ipynb)
+  * [plot_shap.ipynb](notebooks/plot_shap.ipynb)
+
+## Deep-multipit
+
+We also provide another Github repository, named [deep-multipit](https://github.com/sysbio-curie/deep-multipit) with a Pytorch implementation of an end-to-end integration strategy with attention weights, inspired by [Vanguri *et al*, 2022](https://www.nature.com/articles/s43018-022-00416-8).
+
+## Run scripts
+
+Modify the configurations in `.yaml` config files (in config/ subfolder) then run the following command in your terminal:
+
+```
+python latefusion.py -c config/config_latefusion.yaml -s path/to/results/folder
+```
+
+````
+python collect_shap_survival.py -c config/config_latefusion_survival.yaml -s path/to/results/folder
+````
+
+**Warning:** For Windows OS paths must be written with '\\' or '\\\' separators (instead of '/').
+
+**Note:** In order to modify more deeply the loading of the data or the predictive pipelines, please update the `PredictionTask` class in the file [_init_scripts.py](scripts/_init_scripts.py). 
+
+## Examples
+In the [examples](examples) folder we provide a brief example on how to slightly modify the scripts and codes from our original experiments to perform multimodal learning for the prediction of Overall Survival from clinical and RNA-seq data extracted from TCGA (i.e., stage III and IV TCGA-LUAD and TCGA-LUSC samples).   
+
+We simply updated the `PredictionTask` class in a new file [_init_scripts_tcga.py](examples/tcga_lung/_init_scripts_tcga.py) to load TGCA data and build predictive pipelines.
+
+**Note:** clinical and transcriptomic data extracted for 201 stage III/IV TCGA patients (i.e., LUAD or LUSC) are available in the [data](data) folder.
+
+## Citing multipit
+
+If you use multipit in a scientific publication, we would appreciate citation to the [following paper](https://doi.org/10.1038/s41467-025-55847-5):
+
+```
+Captier, N., Lerousseau, M., Orlhac, F. et al. Integration of clinical, pathological, radiological, and transcriptomic data improves prediction for first-line immunotherapy outcome in metastatic non-small cell lung cancer. Nat Commun 16, 614 (2025). https://doi.org/10.1038/s41467-025-55847-5
+```
+
+## Acknowledgements
+
+This repository was created as part of the PhD project of [Nicolas Captier](https://ncaptier.github.io/) in the [Computational Systems Biology of Cancer group](https://institut-curie.org/team/barillot) and the [Laboratory of Translational Imaging in Oncology (LITO)](https://www.lito-web.fr/en/) of Institut Curie.