|
a |
|
b/biobert_re/README.md |
|
|
1 |
# BioBERT for RE |
|
|
2 |
To train an NER model with BioBERT-v1.1 (base), run the command below. |
|
|
3 |
<br> |
|
|
4 |
Before running this, make sure you have generated the pre-processed dataset using the generate_data.py file with the command mentioned in the parent directory. |
|
|
5 |
|
|
|
6 |
## Additional Requirements |
|
|
7 |
- sklearn: Used for RE evaluation (`pip install scikit-learn`) |
|
|
8 |
- pandas : Used for RE evaluation (`pip install pandas`) |
|
|
9 |
|
|
|
10 |
## Training |
|
|
11 |
``` |
|
|
12 |
export SAVE_DIR=./output |
|
|
13 |
export DATA_DIR=./dataset |
|
|
14 |
|
|
|
15 |
export MAX_LENGTH=128 |
|
|
16 |
export BATCH_SIZE=8 |
|
|
17 |
export NUM_EPOCHS=3 |
|
|
18 |
export SAVE_STEPS=1000 |
|
|
19 |
export SEED=1 |
|
|
20 |
export LEARNING_RATE=5e-5 |
|
|
21 |
|
|
|
22 |
python run_re.py \ |
|
|
23 |
--task_name ehr-re \ |
|
|
24 |
--config_name bert-base-cased \ |
|
|
25 |
--data_dir ${DATA_DIR} \ |
|
|
26 |
--model_name_or_path dmis-lab/biobert-base-cased-v1.1 \ |
|
|
27 |
--max_seq_length ${MAX_LENGTH} \ |
|
|
28 |
--num_train_epochs ${NUM_EPOCHS} \ |
|
|
29 |
--per_device_train_batch_size ${BATCH_SIZE} \ |
|
|
30 |
--save_steps ${SAVE_STEPS} \ |
|
|
31 |
--seed ${SEED} \ |
|
|
32 |
--do_train \ |
|
|
33 |
--do_eval \ |
|
|
34 |
--do_predict \ |
|
|
35 |
--learning_rate ${LEARNING_RATE} \ |
|
|
36 |
--output_dir ${SAVE_DIR} \ |
|
|
37 |
--overwrite_output_dir |
|
|
38 |
``` |
|
|
39 |
|
|
|
40 |
## Results |
|
|
41 |
#### With gold standard entities |
|
|
42 |
| | precision | recall | f1-score | |
|
|
43 |
|:---:|:---:|:---:|:---:| |
|
|
44 |
|Strength -> Drug | 0.9854 | 0.9691| 0.9772| |
|
|
45 |
|Dosage -> Drug | 0.9798 | 0.9725 | 0.9762 | |
|
|
46 |
| Duration -> Drug | 0.9229 | 0.8991 | 0.9108 | |
|
|
47 |
| Frequency -> Drug | 0.9782 | 0.9348 | 0.9560 | |
|
|
48 |
| Form -> Drug | 0.9887 | 0.9829 | 0.9858 | |
|
|
49 |
| Route -> Drug | 0.9668 | 0.9605 | 0.9636 | |
|
|
50 |
| Reason -> Drug | 0.7623 | 0.8801 | 0.8169 | |
|
|
51 |
| ADE -> Drug | 0.8601 | 0.8049 | 0.8316 | |
|
|
52 |
| micro avg | 0.9395 | 0.9455 | 0.9425 | |
|
|
53 |
| macro avg | 0.9303 | 0.9341 | 0.9296 | |
|
|
54 |
|
|
|
55 |
#### With entities predicted using BioBERT NER model (End-to-end Results) |
|
|
56 |
| | precision | recall | f1-score | |
|
|
57 |
|:---:|:---:|:---:|:---:| |
|
|
58 |
|Strength -> Drug | 0.9672 | 0.9526| 0.9599| |
|
|
59 |
|Dosage -> Drug | 0.8995 | 0.9232 | 0.9112 | |
|
|
60 |
| Duration -> Drug | 0.7545 | 0.7934 | 0.7735 | |
|
|
61 |
| Frequency -> Drug | 0.9450 | 0.8607 | 0.9009 | |
|
|
62 |
| Form -> Drug | 0.9443 | 0.9300 | 0.9371 | |
|
|
63 |
| Route -> Drug | 0.9213 | 0.9148 | 0.9181 | |
|
|
64 |
| Reason -> Drug | 0.5531 | 0.6370 | 0.5921 | |
|
|
65 |
| ADE -> Drug | 0.5419 | 0.4584 | 0.4967 | |
|
|
66 |
| micro avg | 0.8600 | 0.8593 | 0.8596 | |
|
|
67 |
| macro avg | 0.8406 | 0.8345 | 0.8340 | |
|
|
68 |
|
|
|
69 |
#### With entities predicted using BiLSTM+CRF NER model |
|
|
70 |
| | precision | recall | f1-score | |
|
|
71 |
|:---:|:---:|:---:|:---:| |
|
|
72 |
|Strength -> Drug | 0.7008 | 0.8475| 0.7672| |
|
|
73 |
|Dosage -> Drug | 0.6418 | 0.8497 | 0.7313 | |
|
|
74 |
| Duration -> Drug | 0.6244 | 0.6244 | 0.6244 | |
|
|
75 |
| Frequency -> Drug | 0.6446 | 0.7643 | 0.6993 | |
|
|
76 |
| Form -> Drug | 0.7006 | 0.8727 | 0.7772 | |
|
|
77 |
| Route -> Drug | 0.6502 | 0.8082 | 0.7206 | |
|
|
78 |
| Reason -> Drug | 0.4455 | 0.3821 | 0.4114 | |
|
|
79 |
| ADE -> Drug | 0.1143 | 0.4829 | 0.1849 | |
|
|
80 |
| micro avg | 0.5900 | 0.7491 | 0.6601 | |
|
|
81 |
| macro avg | 0.5713 | 0.6918 | 0.6149 | |