[1de6ed]: / biobert_re / README.md

Download this file

81 lines (74 with data), 3.3 kB

BioBERT for RE

To train an NER model with BioBERT-v1.1 (base), run the command below.


Before running this, make sure you have generated the pre-processed dataset using the generate_data.py file with the command mentioned in the parent directory.

Additional Requirements

  • sklearn: Used for RE evaluation (pip install scikit-learn)
  • pandas : Used for RE evaluation (pip install pandas)

Training

export SAVE_DIR=./output
export DATA_DIR=./dataset

export MAX_LENGTH=128
export BATCH_SIZE=8
export NUM_EPOCHS=3
export SAVE_STEPS=1000
export SEED=1
export LEARNING_RATE=5e-5

python run_re.py \
    --task_name ehr-re \
    --config_name bert-base-cased \
    --data_dir ${DATA_DIR} \
    --model_name_or_path dmis-lab/biobert-base-cased-v1.1 \
    --max_seq_length ${MAX_LENGTH} \
    --num_train_epochs ${NUM_EPOCHS} \
    --per_device_train_batch_size ${BATCH_SIZE} \
    --save_steps ${SAVE_STEPS} \
    --seed ${SEED} \
    --do_train \
    --do_eval \
    --do_predict \
    --learning_rate ${LEARNING_RATE} \
    --output_dir ${SAVE_DIR} \
    --overwrite_output_dir

Results

With gold standard entities

precision recall f1-score
Strength -> Drug 0.9854 0.9691 0.9772
Dosage -> Drug 0.9798 0.9725 0.9762
Duration -> Drug 0.9229 0.8991 0.9108
Frequency -> Drug 0.9782 0.9348 0.9560
Form -> Drug 0.9887 0.9829 0.9858
Route -> Drug 0.9668 0.9605 0.9636
Reason -> Drug 0.7623 0.8801 0.8169
ADE -> Drug 0.8601 0.8049 0.8316
micro avg 0.9395 0.9455 0.9425
macro avg 0.9303 0.9341 0.9296

With entities predicted using BioBERT NER model (End-to-end Results)

precision recall f1-score
Strength -> Drug 0.9672 0.9526 0.9599
Dosage -> Drug 0.8995 0.9232 0.9112
Duration -> Drug 0.7545 0.7934 0.7735
Frequency -> Drug 0.9450 0.8607 0.9009
Form -> Drug 0.9443 0.9300 0.9371
Route -> Drug 0.9213 0.9148 0.9181
Reason -> Drug 0.5531 0.6370 0.5921
ADE -> Drug 0.5419 0.4584 0.4967
micro avg 0.8600 0.8593 0.8596
macro avg 0.8406 0.8345 0.8340

With entities predicted using BiLSTM+CRF NER model

precision recall f1-score
Strength -> Drug 0.7008 0.8475 0.7672
Dosage -> Drug 0.6418 0.8497 0.7313
Duration -> Drug 0.6244 0.6244 0.6244
Frequency -> Drug 0.6446 0.7643 0.6993
Form -> Drug 0.7006 0.8727 0.7772
Route -> Drug 0.6502 0.8082 0.7206
Reason -> Drug 0.4455 0.3821 0.4114
ADE -> Drug 0.1143 0.4829 0.1849
micro avg 0.5900 0.7491 0.6601
macro avg 0.5713 0.6918 0.6149