a b/biobert_re/README.md
1
# BioBERT for RE
2
To train an NER model with BioBERT-v1.1 (base), run the command below.
3
<br>
4
Before running this, make sure you have generated the pre-processed dataset using the generate_data.py file with the command mentioned in the parent directory. 
5
6
## Additional Requirements
7
- sklearn: Used for RE evaluation (`pip install scikit-learn`)
8
- pandas : Used for RE evaluation (`pip install pandas`)
9
10
## Training
11
```
12
export SAVE_DIR=./output
13
export DATA_DIR=./dataset
14
15
export MAX_LENGTH=128
16
export BATCH_SIZE=8
17
export NUM_EPOCHS=3
18
export SAVE_STEPS=1000
19
export SEED=1
20
export LEARNING_RATE=5e-5
21
22
python run_re.py \
23
    --task_name ehr-re \
24
    --config_name bert-base-cased \
25
    --data_dir ${DATA_DIR} \
26
    --model_name_or_path dmis-lab/biobert-base-cased-v1.1 \
27
    --max_seq_length ${MAX_LENGTH} \
28
    --num_train_epochs ${NUM_EPOCHS} \
29
    --per_device_train_batch_size ${BATCH_SIZE} \
30
    --save_steps ${SAVE_STEPS} \
31
    --seed ${SEED} \
32
    --do_train \
33
    --do_eval \
34
    --do_predict \
35
    --learning_rate ${LEARNING_RATE} \
36
    --output_dir ${SAVE_DIR} \
37
    --overwrite_output_dir
38
```
39
40
## Results
41
#### With gold standard entities
42
|             | precision |   recall | f1-score |
43
|:---:|:---:|:---:|:---:|
44
|Strength -> Drug |      0.9854 |     0.9691|      0.9772|
45
|Dosage -> Drug |      0.9798  |    0.9725  |    0.9762   |
46
| Duration -> Drug |      0.9229  |    0.8991  |    0.9108   |
47
| Frequency -> Drug |      0.9782  |    0.9348  |    0.9560   |
48
| Form -> Drug |      0.9887  |    0.9829  |    0.9858   |
49
| Route -> Drug |      0.9668  |    0.9605  |    0.9636   |
50
| Reason -> Drug |      0.7623  |    0.8801  |    0.8169   |
51
| ADE -> Drug |      0.8601  |    0.8049  |    0.8316   |
52
|   micro avg |      0.9395  |    0.9455  |    0.9425   |
53
|   macro avg |      0.9303  |    0.9341  |    0.9296   |
54
55
#### With entities predicted using BioBERT NER model (End-to-end Results)
56
|             | precision |   recall | f1-score |
57
|:---:|:---:|:---:|:---:|
58
|Strength -> Drug |      0.9672 |     0.9526|      0.9599|
59
|Dosage -> Drug |      0.8995  |    0.9232  |    0.9112   |
60
| Duration -> Drug |      0.7545  |    0.7934  |    0.7735   |
61
| Frequency -> Drug |      0.9450  |    0.8607  |    0.9009   |
62
| Form -> Drug |      0.9443  |    0.9300  |    0.9371   |
63
| Route -> Drug |      0.9213  |    0.9148  |    0.9181   |
64
| Reason -> Drug |      0.5531  |    0.6370  |    0.5921   |
65
| ADE -> Drug |      0.5419  |    0.4584  |    0.4967   |
66
|   micro avg |      0.8600  |    0.8593  |    0.8596   |
67
|   macro avg |      0.8406  |    0.8345  |    0.8340   |
68
69
#### With entities predicted using BiLSTM+CRF NER model
70
|             | precision |   recall | f1-score |
71
|:---:|:---:|:---:|:---:|
72
|Strength -> Drug |      0.7008 |     0.8475|      0.7672|
73
|Dosage -> Drug |      0.6418  |    0.8497  |    0.7313   |
74
| Duration -> Drug |      0.6244  |    0.6244  |    0.6244   |
75
| Frequency -> Drug |      0.6446  |    0.7643  |    0.6993   |
76
| Form -> Drug |      0.7006  |    0.8727  |    0.7772   |
77
| Route -> Drug |      0.6502  |    0.8082  |    0.7206   |
78
| Reason -> Drug |      0.4455  |    0.3821  |    0.4114   |
79
| ADE -> Drug |      0.1143  |    0.4829  |    0.1849   |
80
|   micro avg |      0.5900  |    0.7491  |    0.6601   |
81
|   macro avg |      0.5713  |    0.6918  |    0.6149   |