Diff of /docs/tutorial.configs.rst [000000] .. [dc40d0]

Switch to unified view

a b/docs/tutorial.configs.rst
1
.. _config:
2
3
Training Models on Task Datasets (Commands and Configurations) 
4
#################################################################
5
6
LAVIS provides scripts to pre-train and finetune supported models on standard language-vision tasks, stored at ``lavis/run_scripts/``. 
7
To replicate the experiments, just run these bash scripts. For example, to train BLIP model on the image-text retrieval task with MSCOCO dataset, we can run
8
9
.. code-block::
10
11
    bash run_scripts/blip/train/train_retrieval_coco.sh
12
13
Inside the scripts, we can see 
14
15
.. code-block:: bash
16
17
    python -m torch.distributed.run --nproc_per_node=8 train.py --cfg-path lavis/projects/blip/train/retrieval_coco_ft.yaml
18
19
where we start a pytorch distributed training on 8 GPUs (you may change according to your own hardware setup). The ``--cfg-path`` specifys a `runtime configuration file`, specifying
20
the task, model, dataset and training recipes. 
21
22
Available options and their descriptions are as below.
23
24
.. LAVIS executes training and evaluation based on arguments specified in the configuration files. The default model and dataset configurations are defined in ``lavis/configs``. The task-specific configurations are defined in ``lavis/projects``. Task-specific configurations have higher priority over the default configurations.
25
26
.. The following tables provide explanations for the arguments in the configuration files.
27
28
.. list-table::
29
   :widths: 30 40
30
   :header-rows: 1
31
32
   * - Model Configurations
33
     - Functionalities
34
   * - arch
35
     - | name of the model from the model zoo
36
       | default: task-dependent
37
   * - model_type
38
     - | the type of the model (e.g., base)
39
       | default: task-dependent
40
   * - load_pretrained
41
     - | load pretrained weights
42
       | default: True (for finetuning task) | False (for pretraining task) 
43
   * - load_finetuned
44
     - | load task-specific finetuned weights
45
       | default: False (for finetuning task) | True (for evaluation) 
46
   * - pretrained 
47
     - | URL or local path which stores the pretrained model, defined in the default model configuration file
48
       | default: task-dependent 
49
   * - finetuned
50
     - | URL or local path which stores the finetuned model, defined in the default model configuration file
51
       | default: task-dependent
52
53
.. list-table::
54
   :widths: 30 50
55
   :header-rows: 1
56
57
   * - Dataset Configurations
58
     - Functionalities
59
   * - vis_processor
60
     - | pre-processing of visual input
61
       | default: task-dependent
62
   * - text_processor
63
     - | pre-processing of text input
64
       | default: task-dependent
65
   * - build_info
66
     - | dataset information including the storage location, defined in the default dataset configuration file
67
       | default: task-dependent
68
69
.. list-table::
70
   :widths: 30 50
71
   :header-rows: 1
72
73
   * - Runtime Configurations
74
     - Functionalities
75
   * - task
76
     - | name of the task
77
       | default: task-dependent
78
   * - lr_sched
79
     - | learning rate schedular
80
       | default: linear_warmup_cosine_lr
81
   * - init_lr
82
     - | initial learning rate (after warmup)
83
       | default: task-dependent
84
   * - min_lr
85
     - | final learning rate after decay
86
       | default: task-dependent
87
   * - warmup_lr
88
     - | starting learning rate for warmup
89
       | default: init_lr (no warmup)
90
   * - lr_decay_rate
91
     - | learning rate decay per epoch for step_lr_shedule
92
       | default: 0.9
93
   * - warmup_steps
94
     - | number of steps for learning rate warmup
95
       | default: 0
96
   * - max_epoch
97
     - | total number of training epochs
98
       | default: task-dependent
99
   * - weight_decay
100
     - | weight decay coefficient for the optimizer
101
       | default: 0.05
102
   * - batch_size_train
103
     - | batch size during training
104
       | default: task-dependent
105
   * - batch_size_eval
106
     - | batch size during evaluation
107
       | default: task-dependent
108
   * - seed
109
     - | pseudo random number generator seed
110
       | default: 42
111
   * - output_dir
112
     - | directory to store logs, results and checkpoints
113
       | default: task-dependent
114
   * - resume_ckpt_path
115
     - | path of the checkpoint to resume training from
116
       | default: None
117
   * - evaluate
118
     - | only perform evaluation without training
119
       | default: False
120
   * - train_splits
121
     - | dataset splits used for training
122
       | default: ["train"]
123
   * - valid_splits
124
     - | dataset splits used for validation
125
       | default: ["val"]
126
   * - test
127
     - | dataset splits used for test
128
       | default: ["test"]
129
   * - device
130
     - | use cpu or gpu (cuda)
131
       | default: cuda
132
   * - world_size
133
     - | number of processes participating in the job
134
       | default: 1
135
   * - dist_url
136
     - | URL specifying how to initialize the process group
137
       | default: "env://"
138
   * - distributed
139
     - | use distributed training
140
       | default: True
141
   * - amp
142
     - | use automatic mixed precision training
143
       | default: False
144
145
.. list-table::
146
   :widths: 40 50
147
   :header-rows: 1
148
149
   * - Text Generation Configurations
150
     - Functionalities
151
   * - max_len
152
     - | maximum number of text tokens to generate
153
       | default: 20 (for image captioning)
154
   * - min_len
155
     - | minimum number of text tokens to generate
156
       | default: 5 (for image captioning)
157
   * - num_beams
158
     - | number of beams to perform beam search
159
       | default: 3
160
161
.. list-table::
162
   :widths: 40 50
163
   :header-rows: 1
164
165
   * - Multimodal Retrieval Configurations
166
     - Functionalities
167
   * - negative_all_rank
168
     - | collect negatives from all processes for the image-text matching loss
169
       | default: True (for coco)
170
   * - k_test
171
     - | number of retrieval candidates ranked from contrastive similarity
172
       | default: 256 (for coco)