--- a +++ b/docs/tutorial.configs.rst @@ -0,0 +1,172 @@ +.. _config: + +Training Models on Task Datasets (Commands and Configurations) +################################################################# + +LAVIS provides scripts to pre-train and finetune supported models on standard language-vision tasks, stored at ``lavis/run_scripts/``. +To replicate the experiments, just run these bash scripts. For example, to train BLIP model on the image-text retrieval task with MSCOCO dataset, we can run + +.. code-block:: + + bash run_scripts/blip/train/train_retrieval_coco.sh + +Inside the scripts, we can see + +.. code-block:: bash + + python -m torch.distributed.run --nproc_per_node=8 train.py --cfg-path lavis/projects/blip/train/retrieval_coco_ft.yaml + +where we start a pytorch distributed training on 8 GPUs (you may change according to your own hardware setup). The ``--cfg-path`` specifys a `runtime configuration file`, specifying +the task, model, dataset and training recipes. + +Available options and their descriptions are as below. + +.. LAVIS executes training and evaluation based on arguments specified in the configuration files. The default model and dataset configurations are defined in ``lavis/configs``. The task-specific configurations are defined in ``lavis/projects``. Task-specific configurations have higher priority over the default configurations. + +.. The following tables provide explanations for the arguments in the configuration files. + +.. list-table:: + :widths: 30 40 + :header-rows: 1 + + * - Model Configurations + - Functionalities + * - arch + - | name of the model from the model zoo + | default: task-dependent + * - model_type + - | the type of the model (e.g., base) + | default: task-dependent + * - load_pretrained + - | load pretrained weights + | default: True (for finetuning task) | False (for pretraining task) + * - load_finetuned + - | load task-specific finetuned weights + | default: False (for finetuning task) | True (for evaluation) + * - pretrained + - | URL or local path which stores the pretrained model, defined in the default model configuration file + | default: task-dependent + * - finetuned + - | URL or local path which stores the finetuned model, defined in the default model configuration file + | default: task-dependent + +.. list-table:: + :widths: 30 50 + :header-rows: 1 + + * - Dataset Configurations + - Functionalities + * - vis_processor + - | pre-processing of visual input + | default: task-dependent + * - text_processor + - | pre-processing of text input + | default: task-dependent + * - build_info + - | dataset information including the storage location, defined in the default dataset configuration file + | default: task-dependent + +.. list-table:: + :widths: 30 50 + :header-rows: 1 + + * - Runtime Configurations + - Functionalities + * - task + - | name of the task + | default: task-dependent + * - lr_sched + - | learning rate schedular + | default: linear_warmup_cosine_lr + * - init_lr + - | initial learning rate (after warmup) + | default: task-dependent + * - min_lr + - | final learning rate after decay + | default: task-dependent + * - warmup_lr + - | starting learning rate for warmup + | default: init_lr (no warmup) + * - lr_decay_rate + - | learning rate decay per epoch for step_lr_shedule + | default: 0.9 + * - warmup_steps + - | number of steps for learning rate warmup + | default: 0 + * - max_epoch + - | total number of training epochs + | default: task-dependent + * - weight_decay + - | weight decay coefficient for the optimizer + | default: 0.05 + * - batch_size_train + - | batch size during training + | default: task-dependent + * - batch_size_eval + - | batch size during evaluation + | default: task-dependent + * - seed + - | pseudo random number generator seed + | default: 42 + * - output_dir + - | directory to store logs, results and checkpoints + | default: task-dependent + * - resume_ckpt_path + - | path of the checkpoint to resume training from + | default: None + * - evaluate + - | only perform evaluation without training + | default: False + * - train_splits + - | dataset splits used for training + | default: ["train"] + * - valid_splits + - | dataset splits used for validation + | default: ["val"] + * - test + - | dataset splits used for test + | default: ["test"] + * - device + - | use cpu or gpu (cuda) + | default: cuda + * - world_size + - | number of processes participating in the job + | default: 1 + * - dist_url + - | URL specifying how to initialize the process group + | default: "env://" + * - distributed + - | use distributed training + | default: True + * - amp + - | use automatic mixed precision training + | default: False + +.. list-table:: + :widths: 40 50 + :header-rows: 1 + + * - Text Generation Configurations + - Functionalities + * - max_len + - | maximum number of text tokens to generate + | default: 20 (for image captioning) + * - min_len + - | minimum number of text tokens to generate + | default: 5 (for image captioning) + * - num_beams + - | number of beams to perform beam search + | default: 3 + +.. list-table:: + :widths: 40 50 + :header-rows: 1 + + * - Multimodal Retrieval Configurations + - Functionalities + * - negative_all_rank + - | collect negatives from all processes for the image-text matching loss + | default: True (for coco) + * - k_test + - | number of retrieval candidates ranked from contrastive similarity + | default: 256 (for coco)