Diff of /demo/README.md [000000] .. [6d389a]

Switch to unified view

a b/demo/README.md
1
# Demo
2
3
## Outline
4
5
- [Modify configs through script arguments](#modify-config-through-script-arguments): Tricks to directly modify configs through script arguments.
6
- [Video demo](#video-demo): A demo script to predict the recognition result using a single video.
7
- [SpatioTemporal Action Detection Video Demo](#spatiotemporal-action-detection-video-demo): A demo script to predict the SpatioTemporal Action Detection result using a single video.
8
- [Video GradCAM Demo](#video-gradcam-demo): A demo script to visualize GradCAM results using a single video.
9
- [Webcam demo](#webcam-demo): A demo script to implement real-time action recognition from a web camera.
10
- [Long Video demo](#long-video-demo): a demo script to predict different labels using a single long video.
11
- [SpatioTemporal Action Detection Webcam Demo](#spatiotemporal-action-detection-webcam-demo): A demo script to implement real-time spatio-temporal action detection from a web camera.
12
- [Skeleton-based Action Recognition Demo](#skeleton-based-action-recognition-demo): A demo script to predict the skeleton-based action recognition result using a single video.
13
- [Video Structuralize Demo](#video-structuralize-demo): A demo script to predict the skeleton-based and rgb-based action recognition and spatio-temporal action detection result using a single video.
14
15
## Modify configs through script arguments
16
17
When running demos using our provided scripts, you may specify `--cfg-options` to in-place modify the config.
18
19
- Update config keys of dict.
20
21
  The config options can be specified following the order of the dict keys in the original config.
22
  For example, `--cfg-options model.backbone.norm_eval=False` changes the all BN modules in model backbones to `train` mode.
23
24
- Update keys inside a list of configs.
25
26
  Some config dicts are composed as a list in your config. For example, the training pipeline `data.train.pipeline` is normally a list
27
  e.g. `[dict(type='SampleFrames'), ...]`. If you want to change `'SampleFrames'` to `'DenseSampleFrames'` in the pipeline,
28
  you may specify `--cfg-options data.train.pipeline.0.type=DenseSampleFrames`.
29
30
- Update values of list/tuples.
31
32
  If the value to be updated is a list or a tuple. For example, the config file normally sets `workflow=[('train', 1)]`. If you want to
33
  change this key, you may specify `--cfg-options workflow="[(train,1),(val,1)]"`. Note that the quotation mark \" is necessary to
34
  support list/tuple data types, and that **NO** white space is allowed inside the quotation marks in the specified value.
35
36
## Video demo
37
38
We provide a demo script to predict the recognition result using a single video. In order to get predict results in range `[0, 1]`, make sure to set `model['test_cfg'] = dict(average_clips='prob')` in config file.
39
40
```shell
41
python demo/demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${VIDEO_FILE} {LABEL_FILE} [--use-frames] \
42
    [--device ${DEVICE_TYPE}] [--fps {FPS}] [--font-scale {FONT_SCALE}] [--font-color {FONT_COLOR}] \
43
    [--target-resolution ${TARGET_RESOLUTION}] [--resize-algorithm {RESIZE_ALGORITHM}] [--out-filename {OUT_FILE}]
44
```
45
46
Optional arguments:
47
48
- `--use-frames`: If specified, the demo will take rawframes as input. Otherwise, it will take a video as input.
49
- `DEVICE_TYPE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. If not specified, it will be set to `cuda:0`.
50
- `FPS`: FPS value of the output video when using rawframes as input. If not specified, it will be set to 30.
51
- `FONT_SCALE`: Font scale of the label added in the video. If not specified, it will be 0.5.
52
- `FONT_COLOR`: Font color of the label added in the video. If not specified, it will be `white`.
53
- `TARGET_RESOLUTION`: Resolution(desired_width, desired_height) for resizing the frames before output when using a video as input. If not specified, it will be None and the frames are resized by keeping the existing aspect ratio.
54
- `RESIZE_ALGORITHM`: Resize algorithm used for resizing. If not specified, it will be set to `bicubic`.
55
- `OUT_FILE`: Path to the output file which can be a video format or gif format. If not specified, it will be set to `None` and does not generate the output file.
56
57
Examples:
58
59
Assume that you are located at `$MMACTION2` and have already downloaded the checkpoints to the directory `checkpoints/`,
60
or use checkpoint url from `configs/` to directly load corresponding checkpoint, which will be automatically saved in `$HOME/.cache/torch/checkpoints`.
61
62
1. Recognize a video file as input by using a TSN model on cuda by default.
63
64
    ```shell
65
    # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
66
    python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
67
        checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
68
        demo/demo.mp4 tools/data/kinetics/label_map_k400.txt
69
    ```
70
71
2. Recognize a video file as input by using a TSN model on cuda by default, loading checkpoint from url.
72
73
    ```shell
74
    # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
75
    python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
76
        https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
77
        demo/demo.mp4 tools/data/kinetics/label_map_k400.txt
78
    ```
79
80
3. Recognize a list of rawframes as input by using a TSN model on cpu.
81
82
    ```shell
83
    python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \
84
        checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
85
        PATH_TO_FRAMES/ LABEL_FILE --use-frames --device cpu
86
    ```
87
88
4. Recognize a video file as input by using a TSN model and then generate an mp4 file.
89
90
    ```shell
91
    # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
92
    python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
93
        checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
94
        demo/demo.mp4 tools/data/kinetics/label_map_k400.txt --out-filename demo/demo_out.mp4
95
    ```
96
97
5. Recognize a list of rawframes as input by using a TSN model and then generate a gif file.
98
99
    ```shell
100
    python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \
101
        checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
102
        PATH_TO_FRAMES/ LABEL_FILE --use-frames --out-filename demo/demo_out.gif
103
    ```
104
105
6. Recognize a video file as input by using a TSN model, then generate an mp4 file with a given resolution and resize algorithm.
106
107
    ```shell
108
    # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
109
    python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
110
        checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
111
        demo/demo.mp4 tools/data/kinetics/label_map_k400.txt --target-resolution 340 256 --resize-algorithm bilinear \
112
        --out-filename demo/demo_out.mp4
113
    ```
114
115
    ```shell
116
    # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
117
    # If either dimension is set to -1, the frames are resized by keeping the existing aspect ratio
118
    # For --target-resolution 170 -1, original resolution (340, 256) -> target resolution (170, 128)
119
    python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
120
        checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
121
        demo/demo.mp4 tools/data/kinetics/label_map_k400.txt --target-resolution 170 -1 --resize-algorithm bilinear \
122
        --out-filename demo/demo_out.mp4
123
    ```
124
125
7. Recognize a video file as input by using a TSN model, then generate an mp4 file with a label in a red color and fontscale 1.
126
127
    ```shell
128
    # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
129
    python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
130
        checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
131
        demo/demo.mp4 tools/data/kinetics/label_map_k400.txt --font-scale 1 --font-color red \
132
        --out-filename demo/demo_out.mp4
133
    ```
134
135
8. Recognize a list of rawframes as input by using a TSN model and then generate an mp4 file with 24 fps.
136
137
    ```shell
138
    python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \
139
        checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
140
        PATH_TO_FRAMES/ LABEL_FILE --use-frames --fps 24 --out-filename demo/demo_out.gif
141
    ```
142
143
## SpatioTemporal Action Detection Video Demo
144
145
We provide a demo script to predict the SpatioTemporal Action Detection result using a single video.
146
147
```shell
148
python demo/demo_spatiotemporal_det.py --video ${VIDEO_FILE} \
149
    [--config ${SPATIOTEMPORAL_ACTION_DETECTION_CONFIG_FILE}] \
150
    [--checkpoint ${SPATIOTEMPORAL_ACTION_DETECTION_CHECKPOINT}] \
151
    [--det-config ${HUMAN_DETECTION_CONFIG_FILE}] \
152
    [--det-checkpoint ${HUMAN_DETECTION_CHECKPOINT}] \
153
    [--det-score-thr ${HUMAN_DETECTION_SCORE_THRESHOLD}] \
154
    [--action-score-thr ${ACTION_DETECTION_SCORE_THRESHOLD}] \
155
    [--label-map ${LABEL_MAP}] \
156
    [--device ${DEVICE}] \
157
    [--out-filename ${OUTPUT_FILENAME}] \
158
    [--predict-stepsize ${PREDICT_STEPSIZE}] \
159
    [--output-stepsize ${OUTPUT_STEPSIZE}] \
160
    [--output-fps ${OUTPUT_FPS}]
161
```
162
163
Optional arguments:
164
165
- `SPATIOTEMPORAL_ACTION_DETECTION_CONFIG_FILE`: The spatiotemporal action detection config file path.
166
- `SPATIOTEMPORAL_ACTION_DETECTION_CHECKPOINT`: The spatiotemporal action detection checkpoint URL.
167
- `HUMAN_DETECTION_CONFIG_FILE`: The human detection config file path.
168
- `HUMAN_DETECTION_CHECKPOINT`: The human detection checkpoint URL.
169
- `HUMAN_DETECTION_SCORE_THRE`: The score threshold for human detection. Default: 0.9.
170
- `ACTION_DETECTION_SCORE_THRESHOLD`: The score threshold for action detection. Default: 0.5.
171
- `LABEL_MAP`: The label map used. Default: `tools/data/ava/label_map.txt`.
172
- `DEVICE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`.  Default: `cuda:0`.
173
- `OUTPUT_FILENAME`: Path to the output file which is a video format. Default: `demo/stdet_demo.mp4`.
174
- `PREDICT_STEPSIZE`: Make a prediction per N frames.  Default: 8.
175
- `OUTPUT_STEPSIZE`: Output 1 frame per N frames in the input video. Note that `PREDICT_STEPSIZE % OUTPUT_STEPSIZE == 0`. Default: 4.
176
- `OUTPUT_FPS`: The FPS of demo video output. Default: 6.
177
178
Examples:
179
180
Assume that you are located at `$MMACTION2` .
181
182
1. Use the Faster RCNN as the human detector, SlowOnly-8x8-R101 as the action detector. Making predictions per 8 frames, and output 1 frame per 4 frames to the output video. The FPS of the output video is 4.
183
184
```shell
185
python demo/demo_spatiotemporal_det.py --video demo/demo.mp4 \
186
    --config configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py \
187
    --checkpoint https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth \
188
    --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \
189
    --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \
190
    --det-score-thr 0.9 \
191
    --action-score-thr 0.5 \
192
    --label-map tools/data/ava/label_map.txt \
193
    --predict-stepsize 8 \
194
    --output-stepsize 4 \
195
    --output-fps 6
196
```
197
198
## Video GradCAM Demo
199
200
We provide a demo script to visualize GradCAM results using a single video.
201
202
```shell
203
python demo/demo_gradcam.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${VIDEO_FILE} [--use-frames] \
204
    [--device ${DEVICE_TYPE}] [--target-layer-name ${TARGET_LAYER_NAME}] [--fps {FPS}] \
205
    [--target-resolution ${TARGET_RESOLUTION}] [--resize-algorithm {RESIZE_ALGORITHM}] [--out-filename {OUT_FILE}]
206
```
207
208
- `--use-frames`: If specified, the demo will take rawframes as input. Otherwise, it will take a video as input.
209
- `DEVICE_TYPE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. If not specified, it will be set to `cuda:0`.
210
- `FPS`: FPS value of the output video when using rawframes as input. If not specified, it will be set to 30.
211
- `OUT_FILE`: Path to the output file which can be a video format or gif format. If not specified, it will be set to `None` and does not generate the output file.
212
- `TARGET_LAYER_NAME`: Layer name to generate GradCAM localization map.
213
- `TARGET_RESOLUTION`: Resolution(desired_width, desired_height) for resizing the frames before output when using a video as input. If not specified, it will be None and the frames are resized by keeping the existing aspect ratio.
214
- `RESIZE_ALGORITHM`: Resize algorithm used for resizing. If not specified, it will be set to `bilinear`.
215
216
Examples:
217
218
Assume that you are located at `$MMACTION2` and have already downloaded the checkpoints to the directory `checkpoints/`,
219
or use checkpoint url from `configs/` to directly load corresponding checkpoint, which will be automatically saved in `$HOME/.cache/torch/checkpoints`.
220
221
1. Get GradCAM results of a I3D model, using a video file as input and then generate an gif file with 10 fps.
222
223
    ```shell
224
    python demo/demo_gradcam.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \
225
        checkpoints/i3d_r50_video_32x2x1_100e_kinetics400_rgb_20200826-e31c6f52.pth demo/demo.mp4 \
226
        --target-layer-name backbone/layer4/1/relu --fps 10 \
227
        --out-filename demo/demo_gradcam.gif
228
    ```
229
230
2. Get GradCAM results of a TSM model, using a video file as input and then generate an gif file, loading checkpoint from url.
231
232
    ```shell
233
    python demo/demo_gradcam.py configs/recognition/tsm/tsm_r50_video_inference_1x1x8_100e_kinetics400_rgb.py \
234
        https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_100e_kinetics400_rgb/tsm_r50_video_1x1x8_100e_kinetics400_rgb_20200702-a77f4328.pth \
235
        demo/demo.mp4 --target-layer-name backbone/layer4/1/relu --out-filename demo/demo_gradcam_tsm.gif
236
    ```
237
238
## Webcam demo
239
240
We provide a demo script to implement real-time action recognition from web camera. In order to get predict results in range `[0, 1]`, make sure to set `model.['test_cfg'] = dict(average_clips='prob')` in config file.
241
242
```shell
243
python demo/webcam_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${LABEL_FILE} \
244
    [--device ${DEVICE_TYPE}] [--camera-id ${CAMERA_ID}] [--threshold ${THRESHOLD}] \
245
    [--average-size ${AVERAGE_SIZE}] [--drawing-fps ${DRAWING_FPS}] [--inference-fps ${INFERENCE_FPS}]
246
```
247
248
Optional arguments:
249
250
- `DEVICE_TYPE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. If not specified, it will be set to `cuda:0`.
251
- `CAMERA_ID`: ID of camera device If not specified, it will be set to 0.
252
- `THRESHOLD`: Threshold of prediction score for action recognition. Only label with score higher than the threshold will be shown. If not specified, it will be set to 0.
253
- `AVERAGE_SIZE`: Number of latest clips to be averaged for prediction. If not specified, it will be set to 1.
254
- `DRAWING_FPS`: Upper bound FPS value of the output drawing. If not specified, it will be set to 20.
255
- `INFERENCE_FPS`: Upper bound FPS value of the output drawing. If not specified, it will be set to 4.
256
257
:::{note}
258
If your hardware is good enough, increasing the value of `DRAWING_FPS` and `INFERENCE_FPS` will get a better experience.
259
:::
260
261
Examples:
262
263
Assume that you are located at `$MMACTION2` and have already downloaded the checkpoints to the directory `checkpoints/`,
264
or use checkpoint url from `configs/` to directly load corresponding checkpoint, which will be automatically saved in `$HOME/.cache/torch/checkpoints`.
265
266
1. Recognize the action from web camera as input by using a TSN model on cpu, averaging the score per 5 times
267
    and outputting result labels with score higher than 0.2.
268
269
    ```shell
270
    python demo/webcam_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
271
      checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth tools/data/kinetics/label_map_k400.txt --average-size 5 \
272
      --threshold 0.2 --device cpu
273
    ```
274
275
2. Recognize the action from web camera as input by using a TSN model on cpu, averaging the score per 5 times
276
    and outputting result labels with score higher than 0.2, loading checkpoint from url.
277
278
    ```shell
279
    python demo/webcam_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
280
      https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
281
      tools/data/kinetics/label_map_k400.txt --average-size 5 --threshold 0.2 --device cpu
282
    ```
283
284
3. Recognize the action from web camera as input by using a I3D model on gpu by default, averaging the score per 5 times
285
    and outputting result labels with score higher than 0.2.
286
287
    ```shell
288
    python demo/webcam_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \
289
      checkpoints/i3d_r50_32x2x1_100e_kinetics400_rgb_20200614-c25ef9a4.pth tools/data/kinetics/label_map_k400.txt \
290
      --average-size 5 --threshold 0.2
291
    ```
292
293
:::{note}
294
Considering the efficiency difference for users' hardware, Some modifications might be done to suit the case.
295
Users can change:
296
297
1). `SampleFrames` step (especially the number of `clip_len` and `num_clips`) of `test_pipeline` in the config file, like `--cfg-options data.test.pipeline.0.num_clips=3`.
298
2). Change to the suitable Crop methods like `TenCrop`, `ThreeCrop`, `CenterCrop`, etc. in `test_pipeline` of the config file, like `--cfg-options data.test.pipeline.4.type=CenterCrop`.
299
3). Change the number of `--average-size`. The smaller, the faster.
300
:::
301
302
## Long video demo
303
304
We provide a demo script to predict different labels using a single long video. In order to get predict results in range `[0, 1]`, make sure to set `test_cfg = dict(average_clips='prob')` in config file.
305
306
```shell
307
python demo/long_video_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${VIDEO_FILE} ${LABEL_FILE} \
308
    ${OUT_FILE} [--input-step ${INPUT_STEP}] [--device ${DEVICE_TYPE}] [--threshold ${THRESHOLD}]
309
```
310
311
Optional arguments:
312
313
- `OUT_FILE`: Path to the output, either video or json file
314
- `INPUT_STEP`: Input step for sampling frames, which can help to get more spare input. If not specified , it will be set to 1.
315
- `DEVICE_TYPE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. If not specified, it will be set to `cuda:0`.
316
- `THRESHOLD`: Threshold of prediction score for action recognition. Only label with score higher than the threshold will be shown. If not specified, it will be set to 0.01.
317
- `STRIDE`: By default, the demo generates a prediction for each single frame, which might cost lots of time. To speed up, you can set the argument `STRIDE` and then the demo will generate a prediction every `STRIDE x sample_length` frames (`sample_length` indicates the size of temporal window from which you sample frames, which equals to `clip_len x frame_interval`). For example, if the sample_length is 64 frames and you set `STRIDE` to 0.5, predictions will be generated every 32 frames. If set as 0, predictions will be generated for each frame. The desired value of `STRIDE` is (0, 1], while it also works for `STRIDE > 1` (the generated predictions will be too sparse). Default: 0.
318
- `LABEL_COLOR`: Font Color of the labels in (B, G, R). Default is white, that is (256, 256, 256).
319
- `MSG_COLOR`: Font Color of the messages in (B, G, R). Default is gray, that is (128, 128, 128).
320
321
Examples:
322
323
Assume that you are located at `$MMACTION2` and have already downloaded the checkpoints to the directory `checkpoints/`,
324
or use checkpoint url from `configs/` to directly load corresponding checkpoint, which will be automatically saved in `$HOME/.cache/torch/checkpoints`.
325
326
1. Predict different labels in a long video by using a TSN model on cpu, with 3 frames for input steps (that is, random sample one from each 3 frames)
327
   and outputting result labels with score higher than 0.2.
328
329
    ```shell
330
    python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
331
      checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth PATH_TO_LONG_VIDEO tools/data/kinetics/label_map_k400.txt PATH_TO_SAVED_VIDEO \
332
      --input-step 3 --device cpu --threshold 0.2
333
    ```
334
335
2. Predict different labels in a long video by using a TSN model on cpu, with 3 frames for input steps (that is, random sample one from each 3 frames)
336
   and outputting result labels with score higher than 0.2, loading checkpoint from url.
337
338
    ```shell
339
    python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
340
      https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
341
      PATH_TO_LONG_VIDEO tools/data/kinetics/label_map_k400.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2
342
    ```
343
344
3. Predict different labels in a long video from web by using a TSN model on cpu, with 3 frames for input steps (that is, random sample one from each 3 frames)
345
   and outputting result labels with score higher than 0.2, loading checkpoint from url.
346
347
    ```shell
348
    python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
349
      https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
350
      https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4 \
351
      tools/data/kinetics/label_map_k400.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2
352
    ```
353
354
4. Predict different labels in a long video by using a I3D model on gpu, with input_step=1, threshold=0.01 as default and print the labels in cyan.
355
356
    ```shell
357
    python demo/long_video_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \
358
      checkpoints/i3d_r50_256p_32x2x1_100e_kinetics400_rgb_20200801-7d9f44de.pth PATH_TO_LONG_VIDEO tools/data/kinetics/label_map_k400.txt PATH_TO_SAVED_VIDEO \
359
      --label-color 255 255 0
360
    ```
361
362
5. Predict different labels in a long video by using a I3D model on gpu and save the results as a `json` file
363
364
    ```shell
365
    python demo/long_video_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \
366
      checkpoints/i3d_r50_256p_32x2x1_100e_kinetics400_rgb_20200801-7d9f44de.pth PATH_TO_LONG_VIDEO tools/data/kinetics/label_map_k400.txt ./results.json
367
    ```
368
369
## SpatioTemporal Action Detection Webcam Demo
370
371
We provide a demo script to implement real-time spatio-temporal action detection from a web camera.
372
373
```shell
374
python demo/webcam_demo_spatiotemporal_det.py \
375
    [--config ${SPATIOTEMPORAL_ACTION_DETECTION_CONFIG_FILE}] \
376
    [--checkpoint ${SPATIOTEMPORAL_ACTION_DETECTION_CHECKPOINT}] \
377
    [--action-score-thr ${ACTION_DETECTION_SCORE_THRESHOLD}] \
378
    [--det-config ${HUMAN_DETECTION_CONFIG_FILE}] \
379
    [--det-checkpoint ${HUMAN_DETECTION_CHECKPOINT}] \
380
    [--det-score-thr ${HUMAN_DETECTION_SCORE_THRESHOLD}] \
381
    [--input-video] ${INPUT_VIDEO} \
382
    [--label-map ${LABEL_MAP}] \
383
    [--device ${DEVICE}] \
384
    [--output-fps ${OUTPUT_FPS}] \
385
    [--out-filename ${OUTPUT_FILENAME}] \
386
    [--show] \
387
    [--display-height] ${DISPLAY_HEIGHT} \
388
    [--display-width] ${DISPLAY_WIDTH} \
389
    [--predict-stepsize ${PREDICT_STEPSIZE}] \
390
    [--clip-vis-length] ${CLIP_VIS_LENGTH}
391
```
392
393
Optional arguments:
394
395
- `SPATIOTEMPORAL_ACTION_DETECTION_CONFIG_FILE`: The spatiotemporal action detection config file path.
396
- `SPATIOTEMPORAL_ACTION_DETECTION_CHECKPOINT`: The spatiotemporal action detection checkpoint path or URL.
397
- `ACTION_DETECTION_SCORE_THRESHOLD`: The score threshold for action detection. Default: 0.4.
398
- `HUMAN_DETECTION_CONFIG_FILE`: The human detection config file path.
399
- `HUMAN_DETECTION_CHECKPOINT`: The human detection checkpoint URL.
400
- `HUMAN_DETECTION_SCORE_THRE`: The score threshold for human detection. Default: 0.9.
401
- `INPUT_VIDEO`: The webcam id or video path of the source. Default: `0`.
402
- `LABEL_MAP`: The label map used. Default: `tools/data/ava/label_map.txt`.
403
- `DEVICE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`.  Default: `cuda:0`.
404
- `OUTPUT_FPS`: The FPS of demo video output. Default: 15.
405
- `OUTPUT_FILENAME`: Path to the output file which is a video format. Default: None.
406
- `--show`: Whether to show predictions with `cv2.imshow`.
407
- `DISPLAY_HEIGHT`: The height of the display frame. Default: 0.
408
- `DISPLAY_WIDTH`: The width of the display frame. Default: 0. If `DISPLAY_HEIGHT <= 0 and DISPLAY_WIDTH <= 0`, the display frame and input video share the same shape.
409
- `PREDICT_STEPSIZE`: Make a prediction per N frames. Default: 8.
410
- `CLIP_VIS_LENGTH`: The number of the draw frames for each clip. In other words, for each clip, there are at most `CLIP_VIS_LENGTH` frames to be draw around the keyframe. DEFAULT: 8.
411
412
Tips to get a better experience for webcam demo:
413
414
- How to choose `--output-fps`?
415
416
  - `--output-fps` should be almost equal to read thread fps.
417
  - Read thread fps is printed by logger in format `DEBUG:__main__:Read Thread: {duration} ms, {fps} fps`
418
419
- How to choose `--predict-stepsize`?
420
421
  - It's related to how to choose human detector and spatio-temporval model.
422
  - Overall, the duration of read thread for each task should be greater equal to that of model inference.
423
  - The durations for read/inference are both printed by logger.
424
  - Larger `--predict-stepsize` leads to larger duration for read thread.
425
  - In order to fully take the advantage of computation resources, decrease the value of `--predict-stepsize`.
426
427
Examples:
428
429
Assume that you are located at `$MMACTION2` .
430
431
1. Use the Faster RCNN as the human detector, SlowOnly-8x8-R101 as the action detector. Making predictions per 40 frames, and FPS of the output is 20. Show predictions with `cv2.imshow`.
432
433
```shell
434
python demo/webcam_demo_spatiotemporal_det.py \
435
    --input-video 0 \
436
    --config configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py \
437
    --checkpoint https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth \
438
    --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \
439
    --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \
440
    --det-score-thr 0.9 \
441
    --action-score-thr 0.5 \
442
    --label-map tools/data/ava/label_map.txt \
443
    --predict-stepsize 40 \
444
    --output-fps 20 \
445
    --show
446
```
447
448
## Skeleton-based Action Recognition Demo
449
450
We provide a demo script to predict the skeleton-based action recognition result using a single video.
451
452
```shell
453
python demo/demo_skeleton.py ${VIDEO_FILE} ${OUT_FILENAME} \
454
    [--config ${SKELETON_BASED_ACTION_RECOGNITION_CONFIG_FILE}] \
455
    [--checkpoint ${SKELETON_BASED_ACTION_RECOGNITION_CHECKPOINT}] \
456
    [--det-config ${HUMAN_DETECTION_CONFIG_FILE}] \
457
    [--det-checkpoint ${HUMAN_DETECTION_CHECKPOINT}] \
458
    [--det-score-thr ${HUMAN_DETECTION_SCORE_THRESHOLD}] \
459
    [--pose-config ${HUMAN_POSE_ESTIMATION_CONFIG_FILE}] \
460
    [--pose-checkpoint ${HUMAN_POSE_ESTIMATION_CHECKPOINT}] \
461
    [--label-map ${LABEL_MAP}] \
462
    [--device ${DEVICE}] \
463
    [--short-side] ${SHORT_SIDE}
464
```
465
466
Optional arguments:
467
468
- `SKELETON_BASED_ACTION_RECOGNITION_CONFIG_FILE`: The skeleton-based action recognition config file path.
469
- `SKELETON_BASED_ACTION_RECOGNITION_CHECKPOINT`: The skeleton-based action recognition checkpoint path or URL.
470
- `HUMAN_DETECTION_CONFIG_FILE`: The human detection config file path.
471
- `HUMAN_DETECTION_CHECKPOINT`: The human detection checkpoint URL.
472
- `HUMAN_DETECTION_SCORE_THRE`: The score threshold for human detection. Default: 0.9.
473
- `HUMAN_POSE_ESTIMATION_CONFIG_FILE`: The human pose estimation config file path (trained on COCO-Keypoint).
474
- `HUMAN_POSE_ESTIMATION_CHECKPOINT`: The human pose estimation checkpoint URL (trained on COCO-Keypoint).
475
- `LABEL_MAP`: The label map used. Default: `tools/data/ava/label_map.txt`.
476
- `DEVICE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`.  Default: `cuda:0`.
477
- `SHORT_SIDE`: The short side used for frame extraction. Default: 480.
478
479
Examples:
480
481
Assume that you are located at `$MMACTION2` .
482
483
1. Use the Faster RCNN as the human detector, HRNetw32 as the pose estimator, PoseC3D-NTURGB+D-120-Xsub-keypoint as the skeleton-based action recognizer.
484
485
```shell
486
python demo/demo_skeleton.py demo/ntu_sample.avi demo/skeleton_demo.mp4 \
487
    --config configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py \
488
    --checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint-6736b03f.pth \
489
    --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \
490
    --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \
491
    --det-score-thr 0.9 \
492
    --pose-config demo/hrnet_w32_coco_256x192.py \
493
    --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w32_coco_256x192-c78dce93_20200708.pth \
494
    --label-map tools/data/skeleton/label_map_ntu120.txt
495
```
496
497
2. Use the Faster RCNN as the human detector, HRNetw32 as the pose estimator, STGCN-NTURGB+D-60-Xsub-keypoint as the skeleton-based action recognizer.
498
499
```shell
500
python demo/demo_skeleton.py demo/ntu_sample.avi demo/skeleton_demo.mp4 \
501
    --config configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \
502
    --checkpoint https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint-e7bb9653.pth \
503
    --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \
504
    --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \
505
    --det-score-thr 0.9 \
506
    --pose-config demo/hrnet_w32_coco_256x192.py \
507
    --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w32_coco_256x192-c78dce93_20200708.pth \
508
    --label-map tools/data/skeleton/label_map_ntu120.txt
509
```
510
511
## Video Structuralize Demo
512
513
We provide a demo script to to predict the skeleton-based and rgb-based action recognition and spatio-temporal action detection result using a single video.
514
515
```shell
516
python demo/demo_video_structuralize.py
517
    [--rgb-stdet-config ${RGB_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CONFIG_FILE}] \
518
    [--rgb-stdet-checkpoint ${RGB_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CHECKPOINT}] \
519
    [--skeleton-stdet-checkpoint ${SKELETON_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CHECKPOINT}] \
520
    [--det-config ${HUMAN_DETECTION_CONFIG_FILE}] \
521
    [--det-checkpoint ${HUMAN_DETECTION_CHECKPOINT}] \
522
    [--pose-config ${HUMAN_POSE_ESTIMATION_CONFIG_FILE}] \
523
    [--pose-checkpoint ${HUMAN_POSE_ESTIMATION_CHECKPOINT}] \
524
    [--skeleton-config ${SKELETON_BASED_ACTION_RECOGNITION_CONFIG_FILE}] \
525
    [--skeleton-checkpoint ${SKELETON_BASED_ACTION_RECOGNITION_CHECKPOINT}] \
526
    [--rgb-config ${RGB_BASED_ACTION_RECOGNITION_CONFIG_FILE}] \
527
    [--rgb-checkpoint ${RGB_BASED_ACTION_RECOGNITION_CHECKPOINT}] \
528
    [--use-skeleton-stdet ${USE_SKELETON_BASED_SPATIO_TEMPORAL_DETECTION_METHOD}] \
529
    [--use-skeleton-recog ${USE_SKELETON_BASED_ACTION_RECOGNITION_METHOD}] \
530
    [--det-score-thr ${HUMAN_DETECTION_SCORE_THRE}] \
531
    [--action-score-thr ${ACTION_DETECTION_SCORE_THRE}] \
532
    [--video ${VIDEO_FILE}] \
533
    [--label-map-stdet ${LABEL_MAP_FOR_SPATIO_TEMPORAL_ACTION_DETECTION}] \
534
    [--device ${DEVICE}] \
535
    [--out-filename ${OUTPUT_FILENAME}] \
536
    [--predict-stepsize ${PREDICT_STEPSIZE}] \
537
    [--output-stepsize ${OUTPU_STEPSIZE}] \
538
    [--output-fps ${OUTPUT_FPS}] \
539
    [--cfg-options]
540
```
541
542
Optional arguments:
543
544
- `RGB_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CONFIG_FILE`: The rgb-based spatio temoral action detection config file path.
545
- `RGB_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CHECKPOINT`: The rgb-based spatio temoral action detection checkpoint path or URL.
546
- `SKELETON_BASED_SPATIO_TEMPORAL_ACTION_DETECTION_CHECKPOINT`: The skeleton-based spatio temoral action detection checkpoint path or URL.
547
- `HUMAN_DETECTION_CONFIG_FILE`: The human detection config file path.
548
- `HUMAN_DETECTION_CHECKPOINT`: The human detection checkpoint URL.
549
- `HUMAN_POSE_ESTIMATION_CONFIG_FILE`: The human pose estimation config file path (trained on COCO-Keypoint).
550
- `HUMAN_POSE_ESTIMATION_CHECKPOINT`: The human pose estimation checkpoint URL (trained on COCO-Keypoint).
551
- `SKELETON_BASED_ACTION_RECOGNITION_CONFIG_FILE`: The skeleton-based action recognition config file path.
552
- `SKELETON_BASED_ACTION_RECOGNITION_CHECKPOINT`: The skeleton-based action recognition checkpoint path or URL.
553
- `RGB_BASED_ACTION_RECOGNITION_CONFIG_FILE`: The rgb-based action recognition config file path.
554
- `RGB_BASED_ACTION_RECOGNITION_CHECKPOINT`: The rgb-based action recognition checkpoint path or URL.
555
- `USE_SKELETON_BASED_SPATIO_TEMPORAL_DETECTION_METHOD`: Use skeleton-based spatio temporal action detection method.
556
- `USE_SKELETON_BASED_ACTION_RECOGNITION_METHOD`: Use skeleton-based action recognition method.
557
- `HUMAN_DETECTION_SCORE_THRE`: The score threshold for human detection. Default: 0.9.
558
- `ACTION_DETECTION_SCORE_THRE`: The score threshold for action detection. Default: 0.4.
559
- `LABEL_MAP_FOR_SPATIO_TEMPORAL_ACTION_DETECTION`: The label map for spatio temporal action detection used. Default: `tools/data/ava/label_map.txt`.
560
- `LABEL_MAP`: The label map for action recognition. Default: `tools/data/kinetics/label_map_k400.txt`.
561
- `DEVICE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`.  Default: `cuda:0`.
562
- `OUTPUT_FILENAME`: Path to the output file which is a video format. Default: `demo/test_stdet_recognition_output.mp4`.
563
- `PREDICT_STEPSIZE`: Make a prediction per N frames.  Default: 8.
564
- `OUTPUT_STEPSIZE`: Output 1 frame per N frames in the input video. Note that `PREDICT_STEPSIZE % OUTPUT_STEPSIZE == 0`. Default: 1.
565
- `OUTPUT_FPS`: The FPS of demo video output. Default: 24.
566
567
Examples:
568
569
Assume that you are located at `$MMACTION2` .
570
571
1. Use the Faster RCNN as the human detector, HRNetw32 as the pose estimator, PoseC3D as the skeleton-based action recognizer and the skeleton-based spatio temporal action detector. Making action detection predictions per 8 frames, and output 1 frame per 1 frame to the output video. The FPS of the output video is 24.
572
573
```shell
574
python demo/demo_video_structuralize.py
575
    --skeleton-stdet-checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/posec3d_ava.pth \
576
    --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \
577
    --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \
578
    --pose-config demo/hrnet_w32_coco_256x192.py
579
    --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/
580
    hrnet_w32_coco_256x192-c78dce93_20200708.pth \
581
    --skeleton-config configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py \
582
    --skeleton-checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/
583
    posec3d_k400.pth \
584
    --use-skeleton-stdet \
585
    --use-skeleton-recog \
586
    --label-map-stdet tools/data/ava/label_map.txt \
587
    --label-map tools/data/kinetics/label_map_k400.txt
588
```
589
590
2. Use the Faster RCNN as the human detector, TSN-R50-1x1x3 as the rgb-based action recognizer, SlowOnly-8x8-R101 as the rgb-based spatio temporal action detector. Making action detection predictions per 8 frames, and output 1 frame per 1 frame to the output video. The FPS of the output video is 24.
591
592
```shell
593
python demo/demo_video_structuralize.py
594
    --rgb-stdet-config configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py \
595
    --rgb-stdet-checkpoint  https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth \
596
    --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \
597
    --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \
598
    --rgb-config configs/recognition/tsn/
599
    tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
600
    --rgb-checkpoint https://download.openmmlab.com/mmaction/recognition/
601
    tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/
602
    tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
603
    --label-map-stdet tools/data/ava/label_map.txt \
604
    --label-map tools/data/kinetics/label_map_k400.txt
605
```
606
607
3. Use the Faster RCNN as the human detector, HRNetw32 as the pose estimator, PoseC3D as the skeleton-based action recognizer, SlowOnly-8x8-R101 as the rgb-based spatio temporal action detector. Making action detection predictions per 8 frames, and output 1 frame per 1 frame to the output video. The FPS of the output video is 24.
608
609
```shell
610
python demo/demo_video_structuralize.py
611
    --rgb-stdet-config configs/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb.py \
612
    --rgb-stdet-checkpoint  https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth \
613
    --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \
614
    --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \
615
    --pose-config demo/hrnet_w32_coco_256x192.py
616
    --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/
617
    hrnet_w32_coco_256x192-c78dce93_20200708.pth \
618
    --skeleton-config configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py \
619
    --skeleton-checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/
620
    posec3d_k400.pth \
621
    --use-skeleton-recog \
622
    --label-map-stdet tools/data/ava/label_map.txt \
623
    --label-map tools/data/kinetics/label_map_k400.txt
624
```
625
626
4. Use the Faster RCNN as the human detector, HRNetw32 as the pose estimator, TSN-R50-1x1x3 as the rgb-based action recognizer, PoseC3D as the skeleton-based spatio temporal action detector. Making action detection predictions per 8 frames, and output 1 frame per 1 frame to the output video. The FPS of the output video is 24.
627
628
```shell
629
python demo/demo_video_structuralize.py
630
    --skeleton-stdet-checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/posec3d_ava.pth \
631
    --det-config demo/faster_rcnn_r50_fpn_2x_coco.py \
632
    --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \
633
    --pose-config demo/hrnet_w32_coco_256x192.py
634
    --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/
635
    hrnet_w32_coco_256x192-c78dce93_20200708.pth \
636
    --skeleton-config configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py \
637
    --rgb-config configs/recognition/tsn/
638
    tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
639
    --rgb-checkpoint https://download.openmmlab.com/mmaction/recognition/
640
    tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/
641
    tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
642
    --use-skeleton-stdet \
643
    --label-map-stdet tools/data/ava/label_map.txt \
644
    --label-map tools/data/kinetics/label_map_k400.txt
645
```