|
a |
|
b/README.md |
|
|
1 |
## RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance |
|
|
2 |
**Authors:** [Chantal Pellegrini][cp], [Ege Özsoy][eo], [Benjamin Busam][bb], [Nassir Navab][nn], [Matthias Keicher][mk] |
|
|
3 |
|
|
|
4 |
[cp]:https://www.cs.cit.tum.de/camp/members/chantal-pellegrini/ |
|
|
5 |
[eo]:https://www.cs.cit.tum.de/camp/members/ege-oezsoy/ |
|
|
6 |
[mk]:https://www.cs.cit.tum.de/camp/members/matthias-keicher/ |
|
|
7 |
[nn]:https://www.cs.cit.tum.de/camp/members/cv-nassir-navab/nassir-navab/ |
|
|
8 |
[bb]:https://www.cs.cit.tum.de/camp/members/benjamin-busam-1/ |
|
|
9 |
|
|
|
10 |
[](https://chantalmp.github.io/RaDialog/) [](https://openreview.net/pdf?id=trUvr1gSNI) [](https://arxiv.org/abs/2311.18681) [](https://physionet.org/content/radialog-instruct-dataset/1.1.0/) [](https://huggingface.co/ChantalPellegrini/RaDialog-interactive-radiology-report-generation) |
|
|
11 |
|
|
|
12 |
**✨ News ✨** |
|
|
13 |
- 27 March 2025: RaDialog has been accepted to MIDL 2025: The new paper includes additional experiments and details about [RaDialog_v2](https://github.com/ChantalMP/RaDialog_LLaVA) and can be found on [OpenReview](https://openreview.net/pdf?id=trUvr1gSNI) |
|
|
14 |
- 12 July 2024: We published a **new version of our Instruct Datset** including additional tasks on [PhysioNet](https://physionet.org/content/radialog-instruct-dataset/1.1.0/) |
|
|
15 |
- 29 May 2024: The **new and improved version of RaDialog** is now publically available on [Huggingface](https://huggingface.co/ChantalPellegrini/RaDialog-interactive-radiology-report-generation) and [Github](https://github.com/ChantalMP/RaDialog_LLaVA). This new version is much **better in conversational assistance**, **easier to use** and allows a simple **inference setup with huggingface**! |
|
|
16 |
- 26 March 2024: RaDialog Instruct Dataset now available on [PhysioNet](https://physionet.org/content/radialog-instruct-dataset/1.0.0/)! |
|
|
17 |
--- |
|
|
18 |
|
|
|
19 |
<img align="right" src="figs/example.png" alt="teaser" width="50%" style="margin-left: 20px"> |
|
|
20 |
|
|
|
21 |
Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems. |
|
|
22 |
|
|
|
23 |
## Installation |
|
|
24 |
|
|
|
25 |
### Environment Setup: |
|
|
26 |
|
|
|
27 |
#### 1) RaDialog Environment |
|
|
28 |
- clone this repository and move to the radialog directory with `cd RaDialog` |
|
|
29 |
- Install the RaDialog environment with `conda create --name radialog python=3.7` |
|
|
30 |
- Activate the environment with `conda activate radialog` |
|
|
31 |
- Install the requirements with `pip install -r requirements.txt` |
|
|
32 |
- Install hl-ml-multimodal with `pip install hi-ml-multimodal==0.2.0` |
|
|
33 |
- Reinstall correct versions of torch and transformers with `pip install torch==1.13.0 transformers==4.28.1` |
|
|
34 |
- Install java and set JAVA_HOME and PATH in local_config.py (we used jre1.8.0) |
|
|
35 |
|
|
|
36 |
#### 2) CheXbert Environment |
|
|
37 |
- Install the CheXbert environment with `conda create --name chexbert python=3.7` |
|
|
38 |
- Activate the environment with `conda activate chexbert` |
|
|
39 |
- Move to the chexbert directory with `cd chexbert` |
|
|
40 |
- Install the requirements with `pip install -r requirements.txt` |
|
|
41 |
- Set the absolute path to the chexbert env and folder in `RaDialog/local_config.py` |
|
|
42 |
|
|
|
43 |
### Prepare the Data and Models: |
|
|
44 |
|
|
|
45 |
#### 1) Download pretrained models |
|
|
46 |
- Download the pretrained models from [here](https://github.com/ChantalMP/RaDialog/releases/tag/weights) |
|
|
47 |
- place chexbert.pth in RaDialog/chexbert/src/checkpoint/ |
|
|
48 |
- unzip vicuna-7b-img-instruct.zip and vicuna-7b-img-report.zip and place folders into RaDialog/checkpoints/ |
|
|
49 |
- unzip chexpert_train and place folder into RaDialog/findings_classifier/checkpoints/ |
|
|
50 |
- unzip embs and place folder into RaDialog/pretraining/ |
|
|
51 |
- unzip checkpoint_4.pth and place it into outputs/stage1_pt_instruct_blip_origlr_img448/ |
|
|
52 |
|
|
|
53 |
|
|
|
54 |
#### 2) Download MIMIC-CXR |
|
|
55 |
- Download the MIMIC-CXR-JPG dataset from [here](https://www.physionet.org/content/mimic-cxr-jpg/2.0.0/) |
|
|
56 |
- The dataset should be saved in .../physionet.org/files/mimic-cxr-jpg |
|
|
57 |
- Go to physionet.org/files/mimic-cxr-jpg/files/ and unzip mimic-cxr-2.0.0-split.csv.gz |
|
|
58 |
- from [here](https://physionet.org/content/mimic-cxr/2.0.0/), dowload mimic-cxr-reports.zip |
|
|
59 |
- unzip it and place the folder in the same directory as the MIMIC-CXR-JPG dataset (e.g. physionet.org/files/) |
|
|
60 |
- in local_config.py set the path to the MIMIC-CXR dataset (e.g. .../physionet.org/files/) |
|
|
61 |
- in model/lavis/defaults_report.yaml set the path to the MIMIC-CXR-JPG dataset (e.g. .../physionet.org/files/mimic-cxr-jpg/2.0.0 ) |
|
|
62 |
|
|
|
63 |
#### 3) Create sectioned report data |
|
|
64 |
- go to the mimic-cxr folder in the code with `cd mimic-cxr` |
|
|
65 |
- run `python create_section_files.py` to prepare the report data |
|
|
66 |
- go back to the RaDialog directory with `cd ..` |
|
|
67 |
|
|
|
68 |
#### 4) Prepare the instruct dataset |
|
|
69 |
|
|
|
70 |
- As MIMIC-CXR needs a certified PhysioNet account to be accessed, we can not publish our instruct dataset directly. |
|
|
71 |
- We are working on publishing the instruct dataset on PhysioNet. In the meantime, you can create an instruct dataset yourself by following the steps below or just use our pre-trained model. |
|
|
72 |
- The MIMIC-NLE data has to be generated first, as it also contains protected data. Follow the instructions [here](https://github.com/maximek3/MIMIC-NLE/tree/main) to generate the MIMIC-NLE data and set the path to the MIMIC-NLE data in `local_config.py`. |
|
|
73 |
- For the correction task, you can write us, then we can share the used incorrect predictions with you. |
|
|
74 |
- To generate data without Correction or Reasoning (MIMIC-NLE), please comment our line 335 or 336 in "create_data.py" accordingly. |
|
|
75 |
|
|
|
76 |
Data for RaDialog-RG: |
|
|
77 |
- run `python -m data.create_data --mode "RG"` to generate the report generation dataset in the required format (no instruct data) |
|
|
78 |
|
|
|
79 |
Data for RaDialog-INS: |
|
|
80 |
- run `python -m data.create_data --mode "INS"` to generate the instruct dataset |
|
|
81 |
|
|
|
82 |
|
|
|
83 |
### Run Demo: |
|
|
84 |
- run `python demo.py --cfg-path pretraining/configs/blip2_pretrain_stage1_emb.yaml` to start the demo |
|
|
85 |
- connect to the demo with a browser at `http://127.0.0.1:7860` and start chatting with RaDialog |
|
|
86 |
|
|
|
87 |
### Evaluate RaDialog on MIMIC-CXR test set: |
|
|
88 |
- RaDialog-RG: run `python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-report/checkpoint-11200` |
|
|
89 |
- RaDialog-INS: run `python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-instruct/checkpoint-4800` |
|
|
90 |
- RaDialog-INS (correction): run `python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-instruct/checkpoint-4800 --do_corr` |
|
|
91 |
- RaDialog-INS (findings QA): run `python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-instruct/checkpoint-4800 --do_cp_all_qa` (or --do_cp_bin_qa) |
|
|
92 |
|
|
|
93 |
### Train RaDialog: |
|
|
94 |
#### 1) CheXbert classifier Training |
|
|
95 |
- run `python -m findings_classifier.chexpert_train --train --run_name "train_chexbert"` |
|
|
96 |
- in chexpert_train.py set ckpt_path (line 152) to the path of the trained model you just trained |
|
|
97 |
- then run `python -m findings_classifier.chexpert_train --run_name "save_preds"` to save the predictions of the trained model |
|
|
98 |
|
|
|
99 |
#### 2) Alignment Module Pretraining |
|
|
100 |
- run `python -m pretraining.train --cfg-path pretraining/configs/blip2_pretrain_stage1.yaml`, we used the 4th epoch checkpoint |
|
|
101 |
- run `python -m pretraining.train --cfg-path pretraining/configs/blip2_pretrain_stage1_emb.yaml`, to save the embeddings of the trained model |
|
|
102 |
|
|
|
103 |
#### 3) LLM Training |
|
|
104 |
Train RaDialog-RG: |
|
|
105 |
- run `python finetune.py --use_embs True --base_model 'vicuna_v7' --output_dir 'checkpoints/lora-vicuna-7b-report' --wandb_run_name lora-vicuna-7b-report --prompt_template_name vicuna_v11 --data_path "data/data_files/mimic_cxr_reports_stratified.json" --cutoff_len 600 --num_epochs 10` |
|
|
106 |
- we used checkpoint-11200 |
|
|
107 |
|
|
|
108 |
Train RaDialog-INS: |
|
|
109 |
- run `python finetune.py --use_embs True --base_model 'vicuna_v7' --output_dir 'checkpoints/lora-vicuna-7b-instruct' --wandb_run_name lora-vicuna-7b-instruct --prompt_template_name vicuna_v11 --data_path "data/data_files/mimic_cxr_instruct_stratified.json" --cutoff_len 800 --num_epochs 10` |
|
|
110 |
- we used checkpoint-4800 |
|
|
111 |
|
|
|
112 |
To use a model from a checkpoint, you'll need to perform the following steps: |
|
|
113 |
- make a copy of "pytorch_model.bin" and rename it to "adapter_model.bin" |
|
|
114 |
- copy adapter_config.json to the checkpoint folder (it will be generated after the last epoch or you can copy it from the checkpoints we provide) |
|
|
115 |
|
|
|
116 |
## Reference |
|
|
117 |
When using our model or dataset, please cite: |
|
|
118 |
|
|
|
119 |
``` |
|
|
120 |
@article{pellegrini2023radialog, |
|
|
121 |
title={RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance}, |
|
|
122 |
author={Pellegrini, Chantal and {\"O}zsoy, Ege and Busam, Benjamin and Navab, Nassir and Keicher, Matthias}, |
|
|
123 |
journal={arXiv preprint arXiv:2311.18681}, |
|
|
124 |
year={2023} |
|
|
125 |
} |
|
|
126 |
``` |