--- a
+++ b/docs/tutorial.configs.rst
@@ -0,0 +1,172 @@
+.. _config:
+
+Training Models on Task Datasets (Commands and Configurations) 
+#################################################################
+
+LAVIS provides scripts to pre-train and finetune supported models on standard language-vision tasks, stored at ``lavis/run_scripts/``. 
+To replicate the experiments, just run these bash scripts. For example, to train BLIP model on the image-text retrieval task with MSCOCO dataset, we can run
+
+.. code-block::
+
+    bash run_scripts/blip/train/train_retrieval_coco.sh
+
+Inside the scripts, we can see 
+
+.. code-block:: bash
+
+    python -m torch.distributed.run --nproc_per_node=8 train.py --cfg-path lavis/projects/blip/train/retrieval_coco_ft.yaml
+
+where we start a pytorch distributed training on 8 GPUs (you may change according to your own hardware setup). The ``--cfg-path`` specifys a `runtime configuration file`, specifying
+the task, model, dataset and training recipes. 
+
+Available options and their descriptions are as below.
+
+.. LAVIS executes training and evaluation based on arguments specified in the configuration files. The default model and dataset configurations are defined in ``lavis/configs``. The task-specific configurations are defined in ``lavis/projects``. Task-specific configurations have higher priority over the default configurations.
+
+.. The following tables provide explanations for the arguments in the configuration files.
+
+.. list-table::
+   :widths: 30 40
+   :header-rows: 1
+
+   * - Model Configurations
+     - Functionalities
+   * - arch
+     - | name of the model from the model zoo
+       | default: task-dependent
+   * - model_type
+     - | the type of the model (e.g., base)
+       | default: task-dependent
+   * - load_pretrained
+     - | load pretrained weights
+       | default: True (for finetuning task) | False (for pretraining task) 
+   * - load_finetuned
+     - | load task-specific finetuned weights
+       | default: False (for finetuning task) | True (for evaluation) 
+   * - pretrained 
+     - | URL or local path which stores the pretrained model, defined in the default model configuration file
+       | default: task-dependent 
+   * - finetuned
+     - | URL or local path which stores the finetuned model, defined in the default model configuration file
+       | default: task-dependent
+
+.. list-table::
+   :widths: 30 50
+   :header-rows: 1
+
+   * - Dataset Configurations
+     - Functionalities
+   * - vis_processor
+     - | pre-processing of visual input
+       | default: task-dependent
+   * - text_processor
+     - | pre-processing of text input
+       | default: task-dependent
+   * - build_info
+     - | dataset information including the storage location, defined in the default dataset configuration file
+       | default: task-dependent
+
+.. list-table::
+   :widths: 30 50
+   :header-rows: 1
+
+   * - Runtime Configurations
+     - Functionalities
+   * - task
+     - | name of the task
+       | default: task-dependent
+   * - lr_sched
+     - | learning rate schedular
+       | default: linear_warmup_cosine_lr
+   * - init_lr
+     - | initial learning rate (after warmup)
+       | default: task-dependent
+   * - min_lr
+     - | final learning rate after decay
+       | default: task-dependent
+   * - warmup_lr
+     - | starting learning rate for warmup
+       | default: init_lr (no warmup)
+   * - lr_decay_rate
+     - | learning rate decay per epoch for step_lr_shedule
+       | default: 0.9
+   * - warmup_steps
+     - | number of steps for learning rate warmup
+       | default: 0
+   * - max_epoch
+     - | total number of training epochs
+       | default: task-dependent
+   * - weight_decay
+     - | weight decay coefficient for the optimizer
+       | default: 0.05
+   * - batch_size_train
+     - | batch size during training
+       | default: task-dependent
+   * - batch_size_eval
+     - | batch size during evaluation
+       | default: task-dependent
+   * - seed
+     - | pseudo random number generator seed
+       | default: 42
+   * - output_dir
+     - | directory to store logs, results and checkpoints
+       | default: task-dependent
+   * - resume_ckpt_path
+     - | path of the checkpoint to resume training from
+       | default: None
+   * - evaluate
+     - | only perform evaluation without training
+       | default: False
+   * - train_splits
+     - | dataset splits used for training
+       | default: ["train"]
+   * - valid_splits
+     - | dataset splits used for validation
+       | default: ["val"]
+   * - test
+     - | dataset splits used for test
+       | default: ["test"]
+   * - device
+     - | use cpu or gpu (cuda)
+       | default: cuda
+   * - world_size
+     - | number of processes participating in the job
+       | default: 1
+   * - dist_url
+     - | URL specifying how to initialize the process group
+       | default: "env://"
+   * - distributed
+     - | use distributed training
+       | default: True
+   * - amp
+     - | use automatic mixed precision training
+       | default: False
+
+.. list-table::
+   :widths: 40 50
+   :header-rows: 1
+
+   * - Text Generation Configurations
+     - Functionalities
+   * - max_len
+     - | maximum number of text tokens to generate
+       | default: 20 (for image captioning)
+   * - min_len
+     - | minimum number of text tokens to generate
+       | default: 5 (for image captioning)
+   * - num_beams
+     - | number of beams to perform beam search
+       | default: 3
+
+.. list-table::
+   :widths: 40 50
+   :header-rows: 1
+
+   * - Multimodal Retrieval Configurations
+     - Functionalities
+   * - negative_all_rank
+     - | collect negatives from all processes for the image-text matching loss
+       | default: True (for coco)
+   * - k_test
+     - | number of retrieval candidates ranked from contrastive similarity
+       | default: 256 (for coco)