|
a |
|
b/tools/data/ucf101_24/README.md |
|
|
1 |
# Preparing UCF101-24 |
|
|
2 |
|
|
|
3 |
## Introduction |
|
|
4 |
|
|
|
5 |
<!-- [DATASET] --> |
|
|
6 |
|
|
|
7 |
```BibTeX |
|
|
8 |
@article{Soomro2012UCF101AD, |
|
|
9 |
title={UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild}, |
|
|
10 |
author={K. Soomro and A. Zamir and M. Shah}, |
|
|
11 |
journal={ArXiv}, |
|
|
12 |
year={2012}, |
|
|
13 |
volume={abs/1212.0402} |
|
|
14 |
} |
|
|
15 |
``` |
|
|
16 |
|
|
|
17 |
For basic dataset information, you can refer to the dataset [website](http://www.thumos.info/download.html). |
|
|
18 |
Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/ucf101_24/`. |
|
|
19 |
|
|
|
20 |
## Download and Extract |
|
|
21 |
|
|
|
22 |
You can download the RGB frames, optical flow and ground truth annotations from [google drive](https://drive.google.com/drive/folders/1BvGywlAGrACEqRyfYbz3wzlVV3cDFkct). |
|
|
23 |
The data are provided from [MOC](https://github.com/MCG-NJU/MOC-Detector/blob/master/readme/Dataset.md), which is adapted from [act-detector](https://github.com/vkalogeiton/caffe/tree/act-detector) and [corrected-UCF101-Annots](https://github.com/gurkirt/corrected-UCF101-Annots). |
|
|
24 |
|
|
|
25 |
:::{note} |
|
|
26 |
The annotation of this UCF101-24 is from [here](https://github.com/gurkirt/corrected-UCF101-Annots), which is more correct. |
|
|
27 |
::: |
|
|
28 |
|
|
|
29 |
After downloading the `UCF101_v2.tar.gz` file and put it in `$MMACTION2/tools/data/ucf101_24/`, you can run the following command to uncompress. |
|
|
30 |
|
|
|
31 |
```shell |
|
|
32 |
tar -zxvf UCF101_v2.tar.gz |
|
|
33 |
``` |
|
|
34 |
|
|
|
35 |
## Check Directory Structure |
|
|
36 |
|
|
|
37 |
After uncompressing, you will get the `rgb-images` directory, `brox-images` directory and `UCF101v2-GT.pkl` for UCF101-24. |
|
|
38 |
|
|
|
39 |
In the context of the whole project (for UCF101-24 only), the folder structure will look like: |
|
|
40 |
|
|
|
41 |
``` |
|
|
42 |
mmaction2 |
|
|
43 |
├── mmaction |
|
|
44 |
├── tools |
|
|
45 |
├── configs |
|
|
46 |
├── data |
|
|
47 |
│ ├── ucf101_24 |
|
|
48 |
│ | ├── brox-images |
|
|
49 |
│ | | ├── Basketball |
|
|
50 |
│ | | | ├── v_Basketball_g01_c01 |
|
|
51 |
│ | | | | ├── 00001.jpg |
|
|
52 |
│ | | | | ├── 00002.jpg |
|
|
53 |
│ | | | | ├── ... |
|
|
54 |
│ | | | | ├── 00140.jpg |
|
|
55 |
│ | | | | ├── 00141.jpg |
|
|
56 |
│ | | ├── ... |
|
|
57 |
│ | | ├── WalkingWithDog |
|
|
58 |
│ | | | ├── v_WalkingWithDog_g01_c01 |
|
|
59 |
│ | | | ├── ... |
|
|
60 |
│ | | | ├── v_WalkingWithDog_g25_c04 |
|
|
61 |
│ | ├── rgb-images |
|
|
62 |
│ | | ├── Basketball |
|
|
63 |
│ | | | ├── v_Basketball_g01_c01 |
|
|
64 |
│ | | | | ├── 00001.jpg |
|
|
65 |
│ | | | | ├── 00002.jpg |
|
|
66 |
│ | | | | ├── ... |
|
|
67 |
│ | | | | ├── 00140.jpg |
|
|
68 |
│ | | | | ├── 00141.jpg |
|
|
69 |
│ | | ├── ... |
|
|
70 |
│ | | ├── WalkingWithDog |
|
|
71 |
│ | | | ├── v_WalkingWithDog_g01_c01 |
|
|
72 |
│ | | | ├── ... |
|
|
73 |
│ | | | ├── v_WalkingWithDog_g25_c04 |
|
|
74 |
│ | ├── UCF101v2-GT.pkl |
|
|
75 |
|
|
|
76 |
``` |
|
|
77 |
|
|
|
78 |
:::{note} |
|
|
79 |
The `UCF101v2-GT.pkl` exists as a cache, it contains 6 items as follows: |
|
|
80 |
::: |
|
|
81 |
|
|
|
82 |
1. `labels` (list): List of the 24 labels. |
|
|
83 |
2. `gttubes` (dict): Dictionary that contains the ground truth tubes for each video. |
|
|
84 |
A **gttube** is dictionary that associates with each index of label and a list of tubes. |
|
|
85 |
A **tube** is a numpy array with `nframes` rows and 5 columns, each col is in format like `<frame index> <x1> <y1> <x2> <y2>`. |
|
|
86 |
3. `nframes` (dict): Dictionary that contains the number of frames for each video, like `'HorseRiding/v_HorseRiding_g05_c02': 151`. |
|
|
87 |
4. `train_videos` (list): A list with `nsplits=1` elements, each one containing the list of training videos. |
|
|
88 |
5. `test_videos` (list): A list with `nsplits=1` elements, each one containing the list of testing videos. |
|
|
89 |
6. `resolution` (dict): Dictionary that outputs a tuple (h,w) of the resolution for each video, like `'FloorGymnastics/v_FloorGymnastics_g09_c03': (240, 320)`. |