--- a +++ b/README.md @@ -0,0 +1,60 @@ +# MoleculeNet SMILES BERT Mixup + +This repository contains implementation of mixup strategy for text classification. The implementation is primarily based on the paper [Augmenting Data with Mixup for Sentence Classification: An Empirical Study +](https://arxiv.org/abs/1905.08941), although there is some difference. + +Three variants of mixup are considered for text classification +1. Embedding mixup: Texts are mixed immediately after word embeedding +2. Hidden/Encoder mixup: Mixup is done prior to the last fully connected layer +3. Sentence mixup: Mixup is done before softmax + +## Run Supervised Training with Late Mixup Augmentation + +```python +from tqdm import tqdm + +SAMPLES_PER_CLASS = [50, 100, 150, 200, 250] +N_AUGMENT = [0, 2, 4, 8, 16] +DATASETS = ['bace', 'bbbp'] +METHODS = ['embed', 'encoder', 'sent'] +OUTPUT_FILE = 'eval_result_mixup_augment_v1.csv' +N_TRIALS = 20 +EPOCHS = 20 + +for method in METHODS: + for dataset in DATASETS: + for sample in SAMPLES_PER_CLASS: + for n_augment in N_AUGMENT: + for i in tqdm(range(N_TRIALS)): + !python bert_mixup/late_mixup/train_bert.py --dataset-name={dataset} --epoch={EPOCHS} \ + --batch-size=16 --model-name-or-path=shahrukhx01/muv2x-simcse-smole-bert \ + --samples-per-class={sample} --eval-after={EPOCHS} --method={method} \ + --out-file={OUTPUT_FILE} --n-augment={n_augment} + !cat {OUTPUT_FILE} +``` + +## Run Supervised Training with Early Mixup Augmentation +```python +from tqdm import tqdm + +SAMPLES_PER_CLASS = [50, 100, 150, 200, 250] +N_AUGMENT = [2, 4, 8, 16, 32] +DATASETS = ['bace', 'bbbp'] +OUTPUT_FILE = '/nethome/skhan/moleculenet-smiles-bert-mixup/eval_result_early_mixup.csv' +N_TRIALS = 20 +EPOCHS = 100 + + +for dataset in DATASETS: + for sample in SAMPLES_PER_CLASS: + for n_augment in N_AUGMENT: + for i in tqdm(range(N_TRIALS)): + !python bert_mixup/early_mixup/main.py --dataset-name={dataset} --epoch={EPOCHS} \ + --batch-size=16 --model-name-or-path=shahrukhx01/muv2x-simcse-smole-bert \ + --samples-per-class={sample} --eval-after={EPOCHS} \ + --out-file={OUTPUT_FILE} --n-augment={n_augment} + !cat {OUTPUT_FILE} +``` + +## Acknowledgement: +The code in this repository is mainly adapted from the repo "[xashru/mixup-text](https://github.com/xashru/mixup-text)".