Diff of /README.md [000000] .. [4abb48]

Switch to unified view

a b/README.md
1
## RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance
2
**Authors:** [Chantal Pellegrini][cp], [Ege Özsoy][eo], [Benjamin Busam][bb], [Nassir Navab][nn], [Matthias Keicher][mk]
3
4
[cp]:https://www.cs.cit.tum.de/camp/members/chantal-pellegrini/
5
[eo]:https://www.cs.cit.tum.de/camp/members/ege-oezsoy/
6
[mk]:https://www.cs.cit.tum.de/camp/members/matthias-keicher/
7
[nn]:https://www.cs.cit.tum.de/camp/members/cv-nassir-navab/nassir-navab/
8
[bb]:https://www.cs.cit.tum.de/camp/members/benjamin-busam-1/
9
10
[![](https://img.shields.io/badge/Project_Page-green)](https://chantalmp.github.io/RaDialog/) [![](https://img.shields.io/badge/OpenReview-Paper-blue)](https://openreview.net/pdf?id=trUvr1gSNI) [![](https://img.shields.io/badge/Arxiv-2307.05766-blue)](https://arxiv.org/abs/2311.18681) [![](https://img.shields.io/badge/PhysioNet-Dataset-lightgrey)](https://physionet.org/content/radialog-instruct-dataset/1.1.0/) [![](https://img.shields.io/badge/Huggingface-yellow)](https://huggingface.co/ChantalPellegrini/RaDialog-interactive-radiology-report-generation)
11
12
**✨ News ✨**
13
- 27 March 2025: RaDialog has been accepted to MIDL 2025: The new paper includes additional experiments and details about [RaDialog_v2](https://github.com/ChantalMP/RaDialog_LLaVA) and can be found on [OpenReview](https://openreview.net/pdf?id=trUvr1gSNI)
14
- 12 July 2024: We published a **new version of our Instruct Datset** including additional tasks on [PhysioNet](https://physionet.org/content/radialog-instruct-dataset/1.1.0/)
15
- 29 May 2024: The **new and improved version of RaDialog** is now publically available on [Huggingface](https://huggingface.co/ChantalPellegrini/RaDialog-interactive-radiology-report-generation) and [Github](https://github.com/ChantalMP/RaDialog_LLaVA). This new version is much **better in conversational assistance**, **easier to use** and allows a simple **inference setup with huggingface**!
16
- 26 March 2024: RaDialog Instruct Dataset now available on [PhysioNet](https://physionet.org/content/radialog-instruct-dataset/1.0.0/)!
17
---
18
19
<img align="right" src="figs/example.png" alt="teaser" width="50%" style="margin-left: 20px">
20
21
Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems.
22
23
## Installation
24
25
### Environment Setup:
26
27
#### 1) RaDialog Environment
28
- clone this repository and move to the radialog directory with `cd RaDialog`
29
- Install the RaDialog environment with `conda create --name radialog python=3.7`
30
- Activate the environment with `conda activate radialog`
31
- Install the requirements with `pip install -r requirements.txt`
32
- Install hl-ml-multimodal with `pip install hi-ml-multimodal==0.2.0`
33
- Reinstall correct versions of torch and transformers with `pip install torch==1.13.0 transformers==4.28.1`
34
- Install java and set JAVA_HOME and PATH in local_config.py (we used jre1.8.0)
35
36
#### 2) CheXbert Environment
37
- Install the CheXbert environment with `conda create --name chexbert python=3.7`
38
- Activate the environment with `conda activate chexbert`
39
- Move to the chexbert directory with `cd chexbert`
40
- Install the requirements with `pip install -r requirements.txt`
41
- Set the absolute path to the chexbert env and folder in `RaDialog/local_config.py`
42
43
### Prepare the Data and Models:
44
45
#### 1) Download pretrained models
46
- Download the pretrained models from [here](https://github.com/ChantalMP/RaDialog/releases/tag/weights)
47
- place chexbert.pth in RaDialog/chexbert/src/checkpoint/
48
- unzip vicuna-7b-img-instruct.zip and vicuna-7b-img-report.zip and place folders into RaDialog/checkpoints/
49
- unzip chexpert_train and place folder into RaDialog/findings_classifier/checkpoints/
50
- unzip embs and place folder into RaDialog/pretraining/
51
- unzip checkpoint_4.pth and place it into outputs/stage1_pt_instruct_blip_origlr_img448/
52
53
54
#### 2) Download MIMIC-CXR
55
- Download the MIMIC-CXR-JPG dataset from [here](https://www.physionet.org/content/mimic-cxr-jpg/2.0.0/)
56
- The dataset should be saved in .../physionet.org/files/mimic-cxr-jpg
57
- Go to physionet.org/files/mimic-cxr-jpg/files/ and unzip mimic-cxr-2.0.0-split.csv.gz
58
- from [here](https://physionet.org/content/mimic-cxr/2.0.0/), dowload mimic-cxr-reports.zip
59
- unzip it and place the folder in the same directory as the MIMIC-CXR-JPG dataset (e.g. physionet.org/files/)
60
- in local_config.py set the path to the MIMIC-CXR dataset (e.g. .../physionet.org/files/)
61
- in model/lavis/defaults_report.yaml set the path to the MIMIC-CXR-JPG dataset (e.g. .../physionet.org/files/mimic-cxr-jpg/2.0.0 )
62
63
#### 3) Create sectioned report data
64
- go to the mimic-cxr folder in the code with `cd mimic-cxr`
65
- run `python create_section_files.py` to prepare the report data
66
- go back to the RaDialog directory with `cd ..`
67
68
#### 4) Prepare the instruct dataset
69
70
- As MIMIC-CXR needs a certified PhysioNet account to be accessed, we can not publish our instruct dataset directly.
71
- We are working on publishing the instruct dataset on PhysioNet. In the meantime, you can create an instruct dataset yourself by following the steps below or just use our pre-trained model.
72
- The MIMIC-NLE data has to be generated first, as it also contains protected data. Follow the instructions [here](https://github.com/maximek3/MIMIC-NLE/tree/main) to generate the MIMIC-NLE data and set the path to the MIMIC-NLE data in `local_config.py`.
73
- For the correction task, you can write us, then we can share the used incorrect predictions with you.
74
- To generate data without Correction or Reasoning (MIMIC-NLE), please comment our line 335 or 336 in "create_data.py" accordingly.
75
76
Data for RaDialog-RG:
77
- run `python -m data.create_data --mode "RG"` to generate the report generation dataset in the required format (no instruct data)
78
79
Data for RaDialog-INS:
80
- run `python -m data.create_data --mode "INS"` to generate the instruct dataset
81
82
83
### Run Demo:
84
- run `python demo.py --cfg-path pretraining/configs/blip2_pretrain_stage1_emb.yaml` to start the demo
85
- connect to the demo with a browser at `http://127.0.0.1:7860` and start chatting with RaDialog
86
87
### Evaluate RaDialog on MIMIC-CXR test set:
88
- RaDialog-RG: run `python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-report/checkpoint-11200`
89
- RaDialog-INS: run `python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-instruct/checkpoint-4800`
90
- RaDialog-INS (correction): run `python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-instruct/checkpoint-4800 --do_corr` 
91
- RaDialog-INS (findings QA): run `python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-instruct/checkpoint-4800 --do_cp_all_qa` (or --do_cp_bin_qa)
92
93
### Train RaDialog:
94
#### 1) CheXbert classifier Training
95
- run `python -m findings_classifier.chexpert_train --train --run_name "train_chexbert"`
96
- in chexpert_train.py set ckpt_path (line 152) to the path of the trained model you just trained
97
- then run `python -m findings_classifier.chexpert_train --run_name "save_preds"` to save the predictions of the trained model
98
99
#### 2) Alignment Module Pretraining
100
- run `python -m pretraining.train --cfg-path pretraining/configs/blip2_pretrain_stage1.yaml`, we used the 4th epoch checkpoint
101
- run `python -m pretraining.train --cfg-path pretraining/configs/blip2_pretrain_stage1_emb.yaml`, to save the embeddings of the trained model
102
103
#### 3) LLM Training
104
Train RaDialog-RG:
105
- run `python finetune.py --use_embs True --base_model 'vicuna_v7' --output_dir 'checkpoints/lora-vicuna-7b-report' --wandb_run_name lora-vicuna-7b-report --prompt_template_name vicuna_v11 --data_path "data/data_files/mimic_cxr_reports_stratified.json" --cutoff_len 600 --num_epochs 10`
106
- we used checkpoint-11200
107
108
Train RaDialog-INS:
109
- run `python finetune.py --use_embs True --base_model 'vicuna_v7' --output_dir 'checkpoints/lora-vicuna-7b-instruct' --wandb_run_name lora-vicuna-7b-instruct --prompt_template_name vicuna_v11 --data_path "data/data_files/mimic_cxr_instruct_stratified.json" --cutoff_len 800 --num_epochs 10`
110
- we used checkpoint-4800
111
112
To use a model from a checkpoint, you'll need to perform the following steps:
113
- make a copy of "pytorch_model.bin" and rename it to "adapter_model.bin"
114
- copy adapter_config.json to the checkpoint folder (it will be generated after the last epoch or you can copy it from the checkpoints we provide) 
115
116
## Reference
117
When using our model or dataset, please cite:
118
119
```
120
@article{pellegrini2023radialog,
121
  title={RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance},
122
  author={Pellegrini, Chantal and {\"O}zsoy, Ege and Busam, Benjamin and Navab, Nassir and Keicher, Matthias},
123
  journal={arXiv preprint arXiv:2311.18681},
124
  year={2023}
125
}
126
```