CLOVER / Git / Diff of /README.md

Models:

DanielG/

CLOVER

Downloads: 1

Diff of /README.md [dc40d0] .. [8bfbf9]

Switch to unified view


# Cost-effective Instruction Learning for Pathology Vision and Language Analysis (CLOVER)

The advent of vision-language models fosters the interactive conversations between AI-enabled models and humans. Yet applying these models into clinics must deal with daunting challenges around large-scale training data, financial, and computational resources. Here we propose a cost-effective instruction learning framework for conversational pathology named as CLOVER. CLOVER only trains a lightweight module and uses instruction tuning while freezing the parameters of the large language model. Instead of using costly GPT-4, we propose well-designed prompts on GPT-3.5 for building generation-based instructions, emphasizing the utility of pathological knowledge derived from the Internet source. To augment the use of instructions, we construct a high-quality set of template-based instructions in the context of digital pathology. From two benchmark datasets, our findings reveal the strength of hybrid-form instructions in the visual question-answer in pathology. Extensive results show the cost-effectiveness of CLOVER in answering both open-ended and closed-ended questions, where CLOVER outperforms strong baselines that possess 37 times more training parameters and use instruction data generated from GPT-4. Through the instruction tuning, CLOVER exhibits robustness of few-shot learning in the external clinical dataset. These findings demonstrate that cost-effective modeling of CLOVER could accelerate the adoption of rapid conversational applications in the landscape of digital pathology.









## Release
- Checkpoints and instruction dataset will be released soon. 
 


## Workflow of CLOVER

<p align="center">
  

  *CLOVER employs the training framework of BLIP-2 to achieve a fast domain tuning with lightweight parameters. The entire training process of CLOVER includes two major stages: (i) alignment of vision and language and (ii) supervised fine-tuning with instructions. The alignment compels the model to acquire valuable representations between vision and language. Instruction fine-tuning is vital here for activating LLMs to excel in visual language question answering. Stage 1 requires inputs of image-text pairs, where we use the large-scale Quilt-1M dataset. Stage 2 demands domain-specific instruction data. As we have seen a significant lack of the required instruction data in the literature, we propose a low-cost solution of instruction data generation carefully designed for analyzing pathological data.*
</p>



## Contents
- [Cost-effective Instruction Learning for Pathology Vision and Language Analysis (CLOVER)](#cost-effective-instruction-learning-for-pathology-vision-and-language-analysis-clover)
  - [Release](#release)
  - [Workflow of CLOVER](#workflow-of-clover)
  - [Contents](#contents)
    - [Data Download](#data-download)
    - [Installation](#installation)
    - [Training](#training)
    - [Inference](#inference)
  - [Case Study](#case-study)
  - [Related Projects](#related-projects)




### Data Download
- Stage 1: Quilt-1M dataset can be downloaded from [Google](https://docs.google.com/forms/d/e/1FAIpQLSdSe06DIbPn71jA2rCxe_5tUPfyHhSH1Z7ZTJBxWM26cnpZFg/viewform) or [Zenodo](https://zenodo.org/records/8239942).
- Stage 2: CLOVER Instructions will be released. Of course, you can also use our prompt to generate the data from [PY FILE](./generate_instructions.py) if you want.


### Installation

1. Creating conda environment
```bash
conda create -n clover python=3.9
conda activate clover
```

2. Building from source
```bash
git clone https://github.com/JLINEkai/CLOVER.git
cd CLOVER
pip install -r requirements.txt
```


### Training
- Stage 1 (Alignment): 
```bash
python train_blip2qformer.py
```
- Stage 2 (Instruction finetuning): 
  
You can choose large language model (LLM) in [FILE](.\lavis\projects\blip2\train\pretrain_stage2.yaml). We provide FlanT5XL and Vicuna 7B.
```bash
python -m torch.distributed.run --nproc_per_node=1 train.py 
````

### Inference

```bash
python -m torch.distributed.run --nproc_per_node=1 evaluate.py --cfg-path lavis/projects/blip2/eval/vqav2_zeroshot_flant5xl_eval.yaml
````


## Case Study













If you have any questions, please send an email to chenkaitao@pjlab.org.cn.

## Related Projects
- Our model is based on BLIP-2 [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://github.com/salesforce/LAVIS/tree/main)





	a/README.md		b/README.md
1	# Cost-effective Instruction Learning for Pathology Vision and Language Analysis (CLOVER)	1	# Cost-effective Instruction Learning for Pathology Vision and Language Analysis (CLOVER)
2		2
3	The advent of vision-language models fosters the interactive conversations between AI-enabled models and humans. Yet applying these models into clinics must deal with daunting challenges around large-scale training data, financial, and computational resources. Here we propose a cost-effective instruction learning framework for conversational pathology named as CLOVER. CLOVER only trains a lightweight module and uses instruction tuning while freezing the parameters of the large language model. Instead of using costly GPT-4, we propose well-designed prompts on GPT-3.5 for building generation-based instructions, emphasizing the utility of pathological knowledge derived from the Internet source. To augment the use of instructions, we construct a high-quality set of template-based instructions in the context of digital pathology. From two benchmark datasets, our findings reveal the strength of hybrid-form instructions in the visual question-answer in pathology. Extensive results show the cost-effectiveness of CLOVER in answering both open-ended and closed-ended questions, where CLOVER outperforms strong baselines that possess 37 times more training parameters and use instruction data generated from GPT-4. Through the instruction tuning, CLOVER exhibits robustness of few-shot learning in the external clinical dataset. These findings demonstrate that cost-effective modeling of CLOVER could accelerate the adoption of rapid conversational applications in the landscape of digital pathology.	3	The advent of vision-language models fosters the interactive conversations between AI-enabled models and humans. Yet applying these models into clinics must deal with daunting challenges around large-scale training data, financial, and computational resources. Here we propose a cost-effective instruction learning framework for conversational pathology named as CLOVER. CLOVER only trains a lightweight module and uses instruction tuning while freezing the parameters of the large language model. Instead of using costly GPT-4, we propose well-designed prompts on GPT-3.5 for building generation-based instructions, emphasizing the utility of pathological knowledge derived from the Internet source. To augment the use of instructions, we construct a high-quality set of template-based instructions in the context of digital pathology. From two benchmark datasets, our findings reveal the strength of hybrid-form instructions in the visual question-answer in pathology. Extensive results show the cost-effectiveness of CLOVER in answering both open-ended and closed-ended questions, where CLOVER outperforms strong baselines that possess 37 times more training parameters and use instruction data generated from GPT-4. Through the instruction tuning, CLOVER exhibits robustness of few-shot learning in the external clinical dataset. These findings demonstrate that cost-effective modeling of CLOVER could accelerate the adoption of rapid conversational applications in the landscape of digital pathology.
4		4
5		5
6		6
7		7
8		8
9		9
10		10
11		11
12		12
13	## Release	13	## Release
14	- Checkpoints and instruction dataset will be released soon.	14	- Checkpoints and instruction dataset will be released soon.
15		15
16		16
17		17
18	## Workflow of CLOVER	18	## Workflow of CLOVER
19		19
20	<p align="center">	20	<p align="center">
21	<img src="imgs/image.png" width="90%"> <br>	21
22
23	CLOVER employs the training framework of BLIP-2 to achieve a fast domain tuning with lightweight parameters. The entire training process of CLOVER includes two major stages: (i) alignment of vision and language and (ii) supervised fine-tuning with instructions. The alignment compels the model to acquire valuable representations between vision and language. Instruction fine-tuning is vital here for activating LLMs to excel in visual language question answering. Stage 1 requires inputs of image-text pairs, where we use the large-scale Quilt-1M dataset. Stage 2 demands domain-specific instruction data. As we have seen a significant lack of the required instruction data in the literature, we propose a low-cost solution of instruction data generation carefully designed for analyzing pathological data.	22	CLOVER employs the training framework of BLIP-2 to achieve a fast domain tuning with lightweight parameters. The entire training process of CLOVER includes two major stages: (i) alignment of vision and language and (ii) supervised fine-tuning with instructions. The alignment compels the model to acquire valuable representations between vision and language. Instruction fine-tuning is vital here for activating LLMs to excel in visual language question answering. Stage 1 requires inputs of image-text pairs, where we use the large-scale Quilt-1M dataset. Stage 2 demands domain-specific instruction data. As we have seen a significant lack of the required instruction data in the literature, we propose a low-cost solution of instruction data generation carefully designed for analyzing pathological data.
24	</p>	23	</p>
25		24
26		25
27		26
28	## Contents	27	## Contents
29	- [Cost-effective Instruction Learning for Pathology Vision and Language Analysis (CLOVER)](#cost-effective-instruction-learning-for-pathology-vision-and-language-analysis-clover)	28	- [Cost-effective Instruction Learning for Pathology Vision and Language Analysis (CLOVER)](#cost-effective-instruction-learning-for-pathology-vision-and-language-analysis-clover)
30	- [Release](#release)	29	- [Release](#release)
31	- [Workflow of CLOVER](#workflow-of-clover)	30	- [Workflow of CLOVER](#workflow-of-clover)
32	- [Contents](#contents)	31	- [Contents](#contents)
33	- [Data Download](#data-download)	32	- [Data Download](#data-download)
34	- [Installation](#installation)	33	- [Installation](#installation)
35	- [Training](#training)	34	- [Training](#training)
36	- [Inference](#inference)	35	- [Inference](#inference)
37	- [Case Study](#case-study)	36	- [Case Study](#case-study)
38	- [Related Projects](#related-projects)	37	- [Related Projects](#related-projects)
39		38
40		39
41		40
42		41
43	### Data Download	42	### Data Download
44	- Stage 1: Quilt-1M dataset can be downloaded from [Google](https://docs.google.com/forms/d/e/1FAIpQLSdSe06DIbPn71jA2rCxe_5tUPfyHhSH1Z7ZTJBxWM26cnpZFg/viewform) or [Zenodo](https://zenodo.org/records/8239942).	43	- Stage 1: Quilt-1M dataset can be downloaded from [Google](https://docs.google.com/forms/d/e/1FAIpQLSdSe06DIbPn71jA2rCxe_5tUPfyHhSH1Z7ZTJBxWM26cnpZFg/viewform) or [Zenodo](https://zenodo.org/records/8239942).
45	- Stage 2: CLOVER Instructions will be released. Of course, you can also use our prompt to generate the data from [PY FILE](./generate_instructions.py) if you want.	44	- Stage 2: CLOVER Instructions will be released. Of course, you can also use our prompt to generate the data from [PY FILE](./generate_instructions.py) if you want.
46		45
47		46
48	### Installation	47	### Installation
49		48
50	1. Creating conda environment	49	1. Creating conda environment
51	```bash	50	```bash
52	conda create -n clover python=3.9	51	conda create -n clover python=3.9
53	conda activate clover	52	conda activate clover
54	```	53	```
55		54
56	2. Building from source	55	2. Building from source
57	```bash	56	```bash
58	git clone https://github.com/JLINEkai/CLOVER.git	57	git clone https://github.com/JLINEkai/CLOVER.git
59	cd CLOVER	58	cd CLOVER
60	pip install -r requirements.txt	59	pip install -r requirements.txt
61	```	60	```
62		61
63		62
64	### Training	63	### Training
65	- Stage 1 (Alignment):	64	- Stage 1 (Alignment):
66	```bash	65	```bash
67	python train_blip2qformer.py	66	python train_blip2qformer.py
68	```	67	```
69	- Stage 2 (Instruction finetuning):	68	- Stage 2 (Instruction finetuning):
70		69
71	You can choose large language model (LLM) in [FILE](.\lavis\projects\blip2\train\pretrain_stage2.yaml). We provide FlanT5XL and Vicuna 7B.	70	You can choose large language model (LLM) in [FILE](.\lavis\projects\blip2\train\pretrain_stage2.yaml). We provide FlanT5XL and Vicuna 7B.
72	```bash	71	```bash
73	python -m torch.distributed.run --nproc_per_node=1 train.py	72	python -m torch.distributed.run --nproc_per_node=1 train.py
74	````	73	````
75		74
76	### Inference	75	### Inference
77		76
78	```bash	77	```bash
79	python -m torch.distributed.run --nproc_per_node=1 evaluate.py --cfg-path lavis/projects/blip2/eval/vqav2_zeroshot_flant5xl_eval.yaml	78	python -m torch.distributed.run --nproc_per_node=1 evaluate.py --cfg-path lavis/projects/blip2/eval/vqav2_zeroshot_flant5xl_eval.yaml
80	````	79	````
81		80
82		81
83	## Case Study	82	## Case Study
84		83
85	<p align="center">
86	<img src="imgs/case1.png" width="90%"> <br>
87
88	Qualitative comparisons of visual question answering on QUILT-VQA. (Image source: QUILT-VQA)
89	</p>
90
91	<p align="center">
92	<img src="imgs/case2.png" width="90%"> <br>
93
94	Qualitative comparisons of visual question answering on LLaVA-Med-17K. (Image source: [link](https://www.ncbi.nlm.nih.gov/pubmed/26147524))
95	</p>
96
97	If you have any questions, please send an email to chenkaitao@pjlab.org.cn.	84	If you have any questions, please send an email to chenkaitao@pjlab.org.cn.
98		85
99	## Related Projects	86	## Related Projects
100	- Our model is based on BLIP-2 [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://github.com/salesforce/LAVIS/tree/main)	87	- Our model is based on BLIP-2 [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://github.com/salesforce/LAVIS/tree/main)
101		88
102		89
103		90
104		91