|
a |
|
b/biobert_ner/README.md |
|
|
1 |
# BioBERT for NER |
|
|
2 |
To train an NER model with BioBERT-v1.1 (base), run the command below. |
|
|
3 |
<br> |
|
|
4 |
Before running this, make sure you have generated the pre-processed dataset using the generate_data.py file with the command mentioned in the parent directory. |
|
|
5 |
|
|
|
6 |
## Additional Requirements |
|
|
7 |
- seqeval: Used for NER evaluation (```pip install seqeval```) |
|
|
8 |
|
|
|
9 |
## Training |
|
|
10 |
``` |
|
|
11 |
export SAVE_DIR=./output |
|
|
12 |
export DATA_DIR=./dataset |
|
|
13 |
|
|
|
14 |
export MAX_LENGTH=128 |
|
|
15 |
export BATCH_SIZE=16 |
|
|
16 |
export NUM_EPOCHS=5 |
|
|
17 |
export SAVE_STEPS=1000 |
|
|
18 |
export SEED=0 |
|
|
19 |
|
|
|
20 |
python run_ner.py \ |
|
|
21 |
--data_dir ${DATA_DIR}/ \ |
|
|
22 |
--labels ${DATA_DIR}/labels.txt \ |
|
|
23 |
--model_name_or_path dmis-lab/biobert-large-cased-v1.1 \ |
|
|
24 |
--output_dir ${SAVE_DIR}/ \ |
|
|
25 |
--max_seq_length ${MAX_LENGTH} \ |
|
|
26 |
--num_train_epochs ${NUM_EPOCHS} \ |
|
|
27 |
--per_device_train_batch_size ${BATCH_SIZE} \ |
|
|
28 |
--save_steps ${SAVE_STEPS} \ |
|
|
29 |
--seed ${SEED} \ |
|
|
30 |
--do_train \ |
|
|
31 |
--do_eval \ |
|
|
32 |
--do_predict \ |
|
|
33 |
--overwrite_output_dir |
|
|
34 |
``` |
|
|
35 |
|
|
|
36 |
## Results |
|
|
37 |
| | precision | recall | f1-score | |
|
|
38 |
|:---:|:---:|:---:|:---:| |
|
|
39 |
| ADE | 0.6351 | 0.5680| 0.5997| |
|
|
40 |
| Dosage | 0.9254 | 0.9485 | 0.9368 | |
|
|
41 |
| Drug | 0.9580 | 0.9542 | 0.9561 | |
|
|
42 |
| Duration | 0.8119 | 0.9021 | 0.8546 | |
|
|
43 |
| Form | 0.9546 | 0.9456 | 0.9501 | |
|
|
44 |
| Frequency | 0.9707 | 0.9668 | 0.9688 | |
|
|
45 |
| Reason | 0.7203 | 0.7348 | 0.7275 | |
|
|
46 |
| Route | 0.9530 | 0.9525 | 0.9527 | |
|
|
47 |
| Strength | 0.9807 | 0.9846 | 0.9827 | |
|
|
48 |
| micro avg | 0.9327 | 0.9330 | 0.9328 | |
|
|
49 |
| macro avg | 0.9253 | 0.9225 | 0.9230 | |