KUAKE_QQR_pytorch / Git / Diff of /README.md

Models:

philipB/

KUAKE_QQR_pytorch

Downloads: 1

Diff of /README.md [b14ec3] .. [013ef2]

Switch to unified view


# Medical search Query relevance judgment

## Question description
Correlation between queries (i.e., search terms) evaluates how well two queries match the topics expressed by them, that is, whether and to what extent Query-A and Query-B are escaped. The topic of Query refers to the focus of query, and determining the correlation between two query terms is an important task, which is often used in the search quality optimization scenario of long-tail query. This task data set is generated under such background.
<div align=center>

![examples](https://github.com/auquenton/KUAKE_QQR_pytorch/blob/main/pic/1.png?raw=true)
</div>

## Dataset introduction

[Download](https://tianchi.aliyun.com/competition/entrance/532001/information)

The correlation between Query and Title is divided into three levels (0-2). 0 is the worst correlation, and 2 is the best correlation.

2 points: indicates that A and B are equivalent, the expression is completely consistent

1 score: B is the semantic subset of A, and B refers to A scope less than A

0 score: B is the semantic parent set of A, B refers to A range greater than A; Or A has nothing to do with B semantics

## Structure
```
·
├── data
│   ├──  example_pred.json
│   ├── KUAKE-QQR_dev.json
│   ├── KUAKE-QQR_test.json
│   └── KUAKE-QQR_train.json
├── tencent-ailab-embedding-zh-d100-v0.2.0-s
│   ├── tencent-ailab-embedding-zh-d100-v0.2.0-s.txt
│   └── readme.txt
├── chinese-bert-wwm-ext
│   ├── added_tokens.json
│   ├── config.json
│   ├── pytorch_model.bin
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   ├── tokenizer.json
│   └── vocab.txt
├── pic
│   └── 1.png
├── scripts
│   ├── inference.sh
│   ├── eval.sh
│   └── train.sh
├── train.py
├── eval.py
├── models.py
├── inference.py   
└── README.md
```

## Environment

```shell
pip install gensim
pip install numpy
pip install tqdm
conda install torch
pip install transformer
```

## Prepare
Download corpus from Tencent AI Lab
```shell
wget https://ai.tencent.com/ailab/nlp/zh/data/tencent-ailab-embedding-zh-d100-v0.2.0-s.tar.gz # v0.2.0 100 demention-Small
```
Decompress the corpus
```shell
tar -zxvf tencent-ailab-embedding-zh-d100-v0.2.0-s.tar.gz
```

Download the bert model and configuration file

```shell
mkdir chinese-bert-wwm-ext
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/added_tokens.json
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/config.json
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/pytorch_model.bin
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/special_tokens_map.json
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/tokenizer.json
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/tokenizer_config.json
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/vocab.txt
```
## Train

```python
python train.py --model_name {model_name} --datadir {datadir} --epochs 30 --lr 1e-4 --max_length 32 --batch_size 8 --savepath ./results --gpu 0 --w2v_path {w2v_path}
```
Or run the scripts

```shell
sh scripts/train.sh
```

## Eval

```python
python eval.py --model_name {model_name} --w2v_path {w2v_path} --model_path {model_path}
```
Or run the scripts

```shell
sh scripts/eval.sh
```

## Inference
```python
python inference.py --model_name {model_name} --batch_size 8 --max_length 32 --savepath ./results --datadir {datadir} --model_path {model_path} --gpu 0 --w2v_path {w2v_path}
```
Or run the scripts

```shell
sh scripts/inference.sh
```

## Results

<div align=center>

| Model | Params(M) | Train Acc(%) |Val Acc(%)|Test Acc(%)|
| :----:| :----: | :----: |:----:|:----:|
| SemNN | 200.04 | 64.02 |65.56|61.41|
| SemLSTM | 200.24 | 66.81 |67.00|69.74|
| SemAttention |200.48| 76.14 |74.50|75.57|
| Bert | 102.27 | 95.85 |82.88|82.65|

</div>

	a/README.md		b/README.md
1	# Medical search Query relevance judgment	1	# Medical search Query relevance judgment
2		2
3	## Question description	3	## Question description
4	Correlation between queries (i.e., search terms) evaluates how well two queries match the topics expressed by them, that is, whether and to what extent Query-A and Query-B are escaped. The topic of Query refers to the focus of query, and determining the correlation between two query terms is an important task, which is often used in the search quality optimization scenario of long-tail query. This task data set is generated under such background.	4	Correlation between queries (i.e., search terms) evaluates how well two queries match the topics expressed by them, that is, whether and to what extent Query-A and Query-B are escaped. The topic of Query refers to the focus of query, and determining the correlation between two query terms is an important task, which is often used in the search quality optimization scenario of long-tail query. This task data set is generated under such background.
5	<div align=center>	5	<div align=center>
6		6
7	![examples](./pic/1.png)	7	![examples](https://github.com/auquenton/KUAKE_QQR_pytorch/blob/main/pic/1.png?raw=true)
8	</div>	8	</div>
9		9
10	## Dataset introduction	10	## Dataset introduction
11		11
12	[Download](https://tianchi.aliyun.com/competition/entrance/532001/information)	12	[Download](https://tianchi.aliyun.com/competition/entrance/532001/information)
13		13
14	The correlation between Query and Title is divided into three levels (0-2). 0 is the worst correlation, and 2 is the best correlation.	14	The correlation between Query and Title is divided into three levels (0-2). 0 is the worst correlation, and 2 is the best correlation.
15		15
16	2 points: indicates that A and B are equivalent, the expression is completely consistent	16	2 points: indicates that A and B are equivalent, the expression is completely consistent
17		17
18	1 score: B is the semantic subset of A, and B refers to A scope less than A	18	1 score: B is the semantic subset of A, and B refers to A scope less than A
19		19
20	0 score: B is the semantic parent set of A, B refers to A range greater than A; Or A has nothing to do with B semantics	20	0 score: B is the semantic parent set of A, B refers to A range greater than A; Or A has nothing to do with B semantics
21		21
22	## Structure	22	## Structure
23	```	23	```
24	·	24	·
25	├── data	25	├── data
26	│ ├── example_pred.json	26	│ ├── example_pred.json
27	│ ├── KUAKE-QQR_dev.json	27	│ ├── KUAKE-QQR_dev.json
28	│ ├── KUAKE-QQR_test.json	28	│ ├── KUAKE-QQR_test.json
29	│ └── KUAKE-QQR_train.json	29	│ └── KUAKE-QQR_train.json
30	├── tencent-ailab-embedding-zh-d100-v0.2.0-s	30	├── tencent-ailab-embedding-zh-d100-v0.2.0-s
31	│ ├── tencent-ailab-embedding-zh-d100-v0.2.0-s.txt	31	│ ├── tencent-ailab-embedding-zh-d100-v0.2.0-s.txt
32	│ └── readme.txt	32	│ └── readme.txt
33	├── chinese-bert-wwm-ext	33	├── chinese-bert-wwm-ext
34	│ ├── added_tokens.json	34	│ ├── added_tokens.json
35	│ ├── config.json	35	│ ├── config.json
36	│ ├── pytorch_model.bin	36	│ ├── pytorch_model.bin
37	│ ├── special_tokens_map.json	37	│ ├── special_tokens_map.json
38	│ ├── tokenizer_config.json	38	│ ├── tokenizer_config.json
39	│ ├── tokenizer.json	39	│ ├── tokenizer.json
40	│ └── vocab.txt	40	│ └── vocab.txt
41	├── pic	41	├── pic
42	│ └── 1.png	42	│ └── 1.png
43	├── scripts	43	├── scripts
44	│ ├── inference.sh	44	│ ├── inference.sh
45	│ ├── eval.sh	45	│ ├── eval.sh
46	│ └── train.sh	46	│ └── train.sh
47	├── train.py	47	├── train.py
48	├── eval.py	48	├── eval.py
49	├── models.py	49	├── models.py
50	├── inference.py	50	├── inference.py
51	└── README.md	51	└── README.md
52	```	52	```
53		53
54	## Environment	54	## Environment
55		55
56	```shell	56	```shell
57	pip install gensim	57	pip install gensim
58	pip install numpy	58	pip install numpy
59	pip install tqdm	59	pip install tqdm
60	conda install torch	60	conda install torch
61	pip install transformer	61	pip install transformer
62	```	62	```
63		63
64	## Prepare	64	## Prepare
65	Download corpus from Tencent AI Lab	65	Download corpus from Tencent AI Lab
66	```shell	66	```shell
67	wget https://ai.tencent.com/ailab/nlp/zh/data/tencent-ailab-embedding-zh-d100-v0.2.0-s.tar.gz # v0.2.0 100 demention-Small	67	wget https://ai.tencent.com/ailab/nlp/zh/data/tencent-ailab-embedding-zh-d100-v0.2.0-s.tar.gz # v0.2.0 100 demention-Small
68	```	68	```
69	Decompress the corpus	69	Decompress the corpus
70	```shell	70	```shell
71	tar -zxvf tencent-ailab-embedding-zh-d100-v0.2.0-s.tar.gz	71	tar -zxvf tencent-ailab-embedding-zh-d100-v0.2.0-s.tar.gz
72	```	72	```
73		73
74	Download the bert model and configuration file	74	Download the bert model and configuration file
75		75
76	```shell	76	```shell
77	mkdir chinese-bert-wwm-ext	77	mkdir chinese-bert-wwm-ext
78	wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/added_tokens.json	78	wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/added_tokens.json
79	wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/config.json	79	wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/config.json
80	wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/pytorch_model.bin	80	wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/pytorch_model.bin
81	wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/special_tokens_map.json	81	wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/special_tokens_map.json
82	wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/tokenizer.json	82	wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/tokenizer.json
83	wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/tokenizer_config.json	83	wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/tokenizer_config.json
84	wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/vocab.txt	84	wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/vocab.txt
85	```	85	```
86	## Train	86	## Train
87		87
88	```python	88	```python
89	python train.py --model_name {model_name} --datadir {datadir} --epochs 30 --lr 1e-4 --max_length 32 --batch_size 8 --savepath ./results --gpu 0 --w2v_path {w2v_path}	89	python train.py --model_name {model_name} --datadir {datadir} --epochs 30 --lr 1e-4 --max_length 32 --batch_size 8 --savepath ./results --gpu 0 --w2v_path {w2v_path}
90	```	90	```
91	Or run the scripts	91	Or run the scripts
92		92
93	```shell	93	```shell
94	sh scripts/train.sh	94	sh scripts/train.sh
95	```	95	```
96		96
97	## Eval	97	## Eval
98		98
99	```python	99	```python
100	python eval.py --model_name {model_name} --w2v_path {w2v_path} --model_path {model_path}	100	python eval.py --model_name {model_name} --w2v_path {w2v_path} --model_path {model_path}
101	```	101	```
102	Or run the scripts	102	Or run the scripts
103		103
104	```shell	104	```shell
105	sh scripts/eval.sh	105	sh scripts/eval.sh
106	```	106	```
107		107
108	## Inference	108	## Inference
109	```python	109	```python
110	python inference.py --model_name {model_name} --batch_size 8 --max_length 32 --savepath ./results --datadir {datadir} --model_path {model_path} --gpu 0 --w2v_path {w2v_path}	110	python inference.py --model_name {model_name} --batch_size 8 --max_length 32 --savepath ./results --datadir {datadir} --model_path {model_path} --gpu 0 --w2v_path {w2v_path}
111	```	111	```
112	Or run the scripts	112	Or run the scripts
113		113
114	```shell	114	```shell
115	sh scripts/inference.sh	115	sh scripts/inference.sh
116	```	116	```
117		117
118	## Results	118	## Results
119		119
120	<div align=center>	120	<div align=center>
121		121
122	\| Model \| Params(M) \| Train Acc(%) \|Val Acc(%)\|Test Acc(%)\|	122	\| Model \| Params(M) \| Train Acc(%) \|Val Acc(%)\|Test Acc(%)\|
123	\| :----:\| :----: \| :----: \|:----:\|:----:\|	123	\| :----:\| :----: \| :----: \|:----:\|:----:\|
124	\| SemNN \| 200.04 \| 64.02 \|65.56\|61.41\|	124	\| SemNN \| 200.04 \| 64.02 \|65.56\|61.41\|
125	\| SemLSTM \| 200.24 \| 66.81 \|67.00\|69.74\|	125	\| SemLSTM \| 200.24 \| 66.81 \|67.00\|69.74\|
126	\| SemAttention \|200.48\| 76.14 \|74.50\|75.57\|	126	\| SemAttention \|200.48\| 76.14 \|74.50\|75.57\|
127	\| Bert \| 102.27 \| 95.85 \|82.88\|82.65\|	127	\| Bert \| 102.27 \| 95.85 \|82.88\|82.65\|
128		128
129	</div>	129	</div>