a b/docs/benchmark.rst
1
Benchmark
2
############
3
4
We provide scripts for evaluating and training models on task datasets. The following benchmark results are included for reference.
5
6
7
ALBEF
8
*******
9
.. list-table::
10
   :widths: 30 80 20
11
12
   * - **Pretraining**
13
     - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
14
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/pretrain.sh>`__
15
   * -
16
     - Visual Genome (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_vg.py>`__)
17
     -
18
   * -
19
     - SBU (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_sbu.py>`__)
20
     -
21
   * -
22
     - CC3M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc3m.py>`__)
23
     -
24
   * -
25
     - CC12M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc12m.py>`__)
26
     -
27
28
.. list-table::
29
   :widths: 30 40 20 20 20 30 30
30
   :header-rows: 1
31
32
   * -
33
     - **Retrieval**
34
     - **R1**
35
     - **R5**
36
     - **R10**
37
     - **Training**
38
     - **Evaluation**
39
   * - TR
40
     - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
41
     - 77.6
42
     - 94.1
43
     - 97.2
44
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_coco_retrieval_albef.sh>`__
45
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_coco_retrieval.sh>`__
46
   * - IR
47
     - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
48
     - 61.0
49
     - 84.5
50
     - 90.7
51
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_coco_retrieval_albef.sh>`__
52
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_coco_retrieval.sh>`__
53
   * - TR
54
     - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
55
     - 77.6
56
     - 94.1
57
     - 97.2
58
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_flickr30k_retrieval_albef.sh>`__
59
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_flickr30k_retrieval.sh>`__
60
   * - IR
61
     - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
62
     - 61.0
63
     - 84.5
64
     - 90.7
65
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_flickr30k_retrieval_albef.sh>`__
66
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_flickr30k_retrieval.sh>`__
67
68
69
.. list-table::
70
   :widths: 20 20 20 20 20
71
   :header-rows: 1
72
73
   * - **VQA**
74
     - **test-dev**
75
     - **test-std/test**
76
     - **Training**
77
     - **Evaluation**
78
   * - VQAv2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
79
     - 76.35
80
     - 76.54
81
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_vqa_albef.sh>`__
82
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/test_albef_vqa.sh>`__
83
   * - OKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
84
     - NA
85
     - 54.7 
86
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_okvqa_albef.sh>`__
87
     - NA
88
   * - AOKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
89
     - 54.5
90
     - NA
91
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_aokvqa_albef.sh>`__
92
     - NA
93
94
  
95
.. list-table::
96
   :widths: 20 20 20 20 20
97
   :header-rows: 1
98
99
   * - **Multimodal Classification**
100
     - **val**
101
     - **test**
102
     - **Training**
103
     - **Evaluation**
104
   * - SNLI-VE (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
105
     - 80.60
106
     - 81.04
107
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_ve_albef.sh>`__
108
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_albef_ve.sh>`__
109
   * - NLVR2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
110
     - 82.47 
111
     - 82.91 
112
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_nlvr_albef.sh>`__
113
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/eval_albef_nlvr.sh>`__
114
  
115
BLIP
116
*******
117
.. list-table::
118
   :widths: 30 80 20
119
120
   * - **Pretraining (14M)**
121
     - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
122
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/pretrain.sh>`__
123
   * -
124
     - Visual Genome (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_vg.py>`__)
125
     -
126
   * -
127
     - SBU (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_sbu.py>`__)
128
     -
129
   * -
130
     - CC3M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc3m.py>`__)
131
     -
132
   * -
133
     - CC12M (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/DownloadConceptualCaptions/download_data_cc12m.py>`__)
134
     -
135
136
.. list-table::
137
   :widths: 30 40 20 20 20 30 30
138
   :header-rows: 1
139
140
   * - **Tasks**
141
     - **Retrieval**
142
     - **R1**
143
     - **R5**
144
     - **R10**
145
     - **Training**
146
     - **Evaluation**
147
   * - TR
148
     - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
149
     - 82.0
150
     - 95.8
151
     - 98.1
152
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_coco.sh>`__
153
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_coco.sh>`__
154
   * - IR
155
     - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
156
     - 64.5
157
     - 86.0
158
     - 91.7
159
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_coco.sh>`__
160
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_coco.sh>`__
161
   * - TR
162
     - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
163
     - 96.9
164
     - 99.9
165
     - 100.0
166
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_flickr.sh>`__
167
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_flickr.sh>`__
168
   * - IR
169
     - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
170
     - 87.5
171
     - 97.6
172
     - 98.9
173
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_flickr.sh>`__
174
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_ret_flickr.sh>`__
175
176
177
.. list-table::
178
   :widths: 20 20 20 20 20
179
   :header-rows: 1
180
181
   * - **VQA**
182
     - **test-dev**
183
     - **test-std/test**
184
     - **Training**
185
     - **Evaluation**
186
   * - VQAv2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
187
     - 78.23
188
     - 78.29
189
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/train/train_vqa_albef.sh>`__
190
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/albef/eval/test_albef_vqa.sh>`__
191
   * - OKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
192
     - NA
193
     - 55.4 
194
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_okvqa.sh>`__
195
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_okvqa.sh>`__
196
   * - AOKVQA (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
197
     - 56.2
198
     - 50.1 
199
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_aokvqa.sh>`__
200
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_aokvqa.sh>`__
201
202
203
.. list-table::
204
   :widths: 20 20 20 20 20 20
205
   :header-rows: 1
206
207
   * - **Image Captioning**
208
     - **BLEU@4**
209
     - **CIDEr**
210
     - **SPICE**
211
     - **Training**
212
     - **Evaluation**
213
   * - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
214
     - 39.9
215
     - 133.5
216
     - 23.7
217
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_caption_coco.sh>`__
218
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_coco_cap.sh>`__
219
   * - NoCaps (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_nocaps.py>`__)
220
     - 31.9
221
     - 109.1
222
     - 14.7
223
     - NA
224
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_nocaps.sh>`__
225
226
227
.. list-table::
228
   :widths: 20 20 20 20 20
229
   :header-rows: 1
230
231
   * - **Multimodal Classification**
232
     - **val**
233
     - **test**
234
     - **Training**
235
     - **Evaluation**
236
   * - NLVR2 (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
237
     - 82.48
238
     - 83.25
239
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_nlvr.sh>`__
240
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/eval/eval_nlvr.sh>`__
241
242
CLIP
243
*******
244
.. list-table::
245
   :widths: 30 40 20 20 20 30
246
   :header-rows: 1
247
248
   * - **Tasks**
249
     - **Retrieval (Zero-shot)**
250
     - **R1**
251
     - **R5**
252
     - **R10**
253
     - **Evaluation**
254
   * - TR
255
     - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
256
     - 57.2
257
     - 80.5
258
     - 87.8
259
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_coco.sh>`__
260
   * - IR
261
     - COCO (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_coco.py>`__)
262
     - 36.5
263
     - 60.8
264
     - 71.0
265
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_coco.sh>`__
266
   * - TR
267
     - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
268
     - 86.5
269
     - 98.0
270
     - 99.1
271
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_flickr.sh>`__
272
   * - IR
273
     - Flickr30k (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_flickr.py>`__)
274
     - 67.0
275
     - 88.9
276
     - 93.3
277
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_ret_flickr.sh>`__
278
279
.. list-table::
280
   :widths: 20 20 20
281
   :header-rows: 1
282
283
   * - **Multimodal Classification**
284
     - **val**
285
     - **Evaluation**
286
   * - ImageNet 
287
     - 76.5 
288
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/clip/eval/eval_clip_zs_imnet.sh>`__
289
290
291
ALPRO
292
*******
293
.. list-table::
294
   :widths: 30 40 20 20 20 20 30
295
   :header-rows: 1
296
297
   * - **Tasks**
298
     - **Retrieval**
299
     - **R1**
300
     - **R5**
301
     - **R10**
302
     - **Training**
303
     - **Evaluation**
304
   * - TR
305
     - MSRVTT (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_msrvtt.py>`__)
306
     - 33.2
307
     - 60.5 
308
     - 71.7 
309
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msrvtt_ret.sh>`__
310
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msrvtt_ret.sh>`__
311
   * - VR
312
     - MSRVTT (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_msrvtt.py>`__)
313
     - 33.8
314
     - 61.4
315
     - 72.7
316
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msrvtt_ret.sh>`__
317
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msrvtt_ret.sh>`__
318
   * - TR
319
     - DiDeMo (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_didemo.py>`__)
320
     - 38.8 
321
     - 66.4
322
     - 76.8
323
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_didemo_ret.sh>`__
324
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_didemo_ret.sh>`__
325
   * - VR
326
     - DiDeMo (`download <https://github.com/salesforce/LAVIS/blob/main/lavis/datasets/download_scripts/download_didemo.py>`__)
327
     - 36.6
328
     - 67.5
329
     - 77.9
330
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_didemo_ret.sh>`__
331
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_didemo_ret.sh>`__
332
333
.. list-table::
334
   :widths: 20 20 20 20
335
   :header-rows: 1
336
337
   * - **Video QA**
338
     - **test**
339
     - **Training**
340
     - **Evaluation**
341
   * - MSRVTT 
342
     - 42.1 
343
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msrvtt_qa.sh>`__
344
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msrvtt_qa.sh>`__
345
   * - MSVD 
346
     - 46.0 
347
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/train/train_msvd_qa.sh>`__ 
348
     - `script <https://github.com/salesforce/LAVIS/blob/main/run_scripts/alpro/eval/eval_msvd_qa.sh>`__