Switch to unified view

a/README.md b/README.md
1
# Medical search Query relevance judgment
1
# Medical search Query relevance judgment
2
2
3
## Question description
3
## Question description
4
Correlation between queries (i.e., search terms) evaluates how well two queries match the topics expressed by them, that is, whether and to what extent Query-A and Query-B are escaped. The topic of Query refers to the focus of query, and determining the correlation between two query terms is an important task, which is often used in the search quality optimization scenario of long-tail query. This task data set is generated under such background.
4
Correlation between queries (i.e., search terms) evaluates how well two queries match the topics expressed by them, that is, whether and to what extent Query-A and Query-B are escaped. The topic of Query refers to the focus of query, and determining the correlation between two query terms is an important task, which is often used in the search quality optimization scenario of long-tail query. This task data set is generated under such background.
5
<div align=center>
5
<div align=center>
6
6
7
![examples](./pic/1.png)
7
![examples](https://github.com/auquenton/KUAKE_QQR_pytorch/blob/main/pic/1.png?raw=true)
8
</div>
8
</div>
9
9
10
## Dataset introduction
10
## Dataset introduction
11
11
12
[Download](https://tianchi.aliyun.com/competition/entrance/532001/information)
12
[Download](https://tianchi.aliyun.com/competition/entrance/532001/information)
13
13
14
The correlation between Query and Title is divided into three levels (0-2). 0 is the worst correlation, and 2 is the best correlation.
14
The correlation between Query and Title is divided into three levels (0-2). 0 is the worst correlation, and 2 is the best correlation.
15
15
16
2 points: indicates that A and B are equivalent, the expression is completely consistent
16
2 points: indicates that A and B are equivalent, the expression is completely consistent
17
17
18
1 score: B is the semantic subset of A, and B refers to A scope less than A
18
1 score: B is the semantic subset of A, and B refers to A scope less than A
19
19
20
0 score: B is the semantic parent set of A, B refers to A range greater than A; Or A has nothing to do with B semantics
20
0 score: B is the semantic parent set of A, B refers to A range greater than A; Or A has nothing to do with B semantics
21
21
22
## Structure
22
## Structure
23
```
23
```
24
·
24
·
25
├── data
25
├── data
26
│   ├──  example_pred.json
26
│   ├──  example_pred.json
27
│   ├── KUAKE-QQR_dev.json
27
│   ├── KUAKE-QQR_dev.json
28
│   ├── KUAKE-QQR_test.json
28
│   ├── KUAKE-QQR_test.json
29
│   └── KUAKE-QQR_train.json
29
│   └── KUAKE-QQR_train.json
30
├── tencent-ailab-embedding-zh-d100-v0.2.0-s
30
├── tencent-ailab-embedding-zh-d100-v0.2.0-s
31
│   ├── tencent-ailab-embedding-zh-d100-v0.2.0-s.txt
31
│   ├── tencent-ailab-embedding-zh-d100-v0.2.0-s.txt
32
│   └── readme.txt
32
│   └── readme.txt
33
├── chinese-bert-wwm-ext
33
├── chinese-bert-wwm-ext
34
│   ├── added_tokens.json
34
│   ├── added_tokens.json
35
│   ├── config.json
35
│   ├── config.json
36
│   ├── pytorch_model.bin
36
│   ├── pytorch_model.bin
37
│   ├── special_tokens_map.json
37
│   ├── special_tokens_map.json
38
│   ├── tokenizer_config.json
38
│   ├── tokenizer_config.json
39
│   ├── tokenizer.json
39
│   ├── tokenizer.json
40
│   └── vocab.txt
40
│   └── vocab.txt
41
├── pic
41
├── pic
42
│   └── 1.png
42
│   └── 1.png
43
├── scripts
43
├── scripts
44
│   ├── inference.sh
44
│   ├── inference.sh
45
│   ├── eval.sh
45
│   ├── eval.sh
46
│   └── train.sh
46
│   └── train.sh
47
├── train.py
47
├── train.py
48
├── eval.py
48
├── eval.py
49
├── models.py
49
├── models.py
50
├── inference.py   
50
├── inference.py   
51
└── README.md
51
└── README.md
52
```
52
```
53
53
54
## Environment
54
## Environment
55
55
56
```shell
56
```shell
57
pip install gensim
57
pip install gensim
58
pip install numpy
58
pip install numpy
59
pip install tqdm
59
pip install tqdm
60
conda install torch
60
conda install torch
61
pip install transformer
61
pip install transformer
62
```
62
```
63
63
64
## Prepare
64
## Prepare
65
Download corpus from Tencent AI Lab
65
Download corpus from Tencent AI Lab
66
```shell
66
```shell
67
wget https://ai.tencent.com/ailab/nlp/zh/data/tencent-ailab-embedding-zh-d100-v0.2.0-s.tar.gz # v0.2.0 100 demention-Small
67
wget https://ai.tencent.com/ailab/nlp/zh/data/tencent-ailab-embedding-zh-d100-v0.2.0-s.tar.gz # v0.2.0 100 demention-Small
68
```
68
```
69
Decompress the corpus
69
Decompress the corpus
70
```shell
70
```shell
71
tar -zxvf tencent-ailab-embedding-zh-d100-v0.2.0-s.tar.gz
71
tar -zxvf tencent-ailab-embedding-zh-d100-v0.2.0-s.tar.gz
72
```
72
```
73
73
74
Download the bert model and configuration file
74
Download the bert model and configuration file
75
75
76
```shell
76
```shell
77
mkdir chinese-bert-wwm-ext
77
mkdir chinese-bert-wwm-ext
78
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/added_tokens.json
78
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/added_tokens.json
79
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/config.json
79
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/config.json
80
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/pytorch_model.bin
80
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/pytorch_model.bin
81
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/special_tokens_map.json
81
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/special_tokens_map.json
82
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/tokenizer.json
82
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/tokenizer.json
83
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/tokenizer_config.json
83
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/tokenizer_config.json
84
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/vocab.txt
84
wget -P chinese-bert-wwm-ext https://huggingface.co/hfl/chinese-bert-wwm-ext/resolve/main/vocab.txt
85
```
85
```
86
## Train
86
## Train
87
87
88
```python
88
```python
89
python train.py --model_name {model_name} --datadir {datadir} --epochs 30 --lr 1e-4 --max_length 32 --batch_size 8 --savepath ./results --gpu 0 --w2v_path {w2v_path}
89
python train.py --model_name {model_name} --datadir {datadir} --epochs 30 --lr 1e-4 --max_length 32 --batch_size 8 --savepath ./results --gpu 0 --w2v_path {w2v_path}
90
```
90
```
91
Or run the scripts
91
Or run the scripts
92
92
93
```shell
93
```shell
94
sh scripts/train.sh
94
sh scripts/train.sh
95
```
95
```
96
96
97
## Eval
97
## Eval
98
98
99
```python
99
```python
100
python eval.py --model_name {model_name} --w2v_path {w2v_path} --model_path {model_path}
100
python eval.py --model_name {model_name} --w2v_path {w2v_path} --model_path {model_path}
101
```
101
```
102
Or run the scripts
102
Or run the scripts
103
103
104
```shell
104
```shell
105
sh scripts/eval.sh
105
sh scripts/eval.sh
106
```
106
```
107
107
108
## Inference
108
## Inference
109
```python
109
```python
110
python inference.py --model_name {model_name} --batch_size 8 --max_length 32 --savepath ./results --datadir {datadir} --model_path {model_path} --gpu 0 --w2v_path {w2v_path}
110
python inference.py --model_name {model_name} --batch_size 8 --max_length 32 --savepath ./results --datadir {datadir} --model_path {model_path} --gpu 0 --w2v_path {w2v_path}
111
```
111
```
112
Or run the scripts
112
Or run the scripts
113
113
114
```shell
114
```shell
115
sh scripts/inference.sh
115
sh scripts/inference.sh
116
```
116
```
117
117
118
## Results
118
## Results
119
119
120
<div align=center>
120
<div align=center>
121
121
122
| Model | Params(M) | Train Acc(%) |Val Acc(%)|Test Acc(%)|
122
| Model | Params(M) | Train Acc(%) |Val Acc(%)|Test Acc(%)|
123
| :----:| :----: | :----: |:----:|:----:|
123
| :----:| :----: | :----: |:----:|:----:|
124
| SemNN | 200.04 | 64.02 |65.56|61.41|
124
| SemNN | 200.04 | 64.02 |65.56|61.41|
125
| SemLSTM | 200.24 | 66.81 |67.00|69.74|
125
| SemLSTM | 200.24 | 66.81 |67.00|69.74|
126
| SemAttention |200.48| 76.14 |74.50|75.57|
126
| SemAttention |200.48| 76.14 |74.50|75.57|
127
| Bert | 102.27 | 95.85 |82.88|82.65|
127
| Bert | 102.27 | 95.85 |82.88|82.65|
128
128
129
</div>
129
</div>