Video data is with complex temporal dynamics due to various factors such as camera motion, speed variation, and different activities. To effectively capture this diverse motion pattern, this paper presents a new temporal adaptive module ({\bf TAM}) to generate video-specific temporal kernels based on its own feature map. TAM proposes a unique two-level adaptive modeling scheme by decoupling the dynamic kernel into a location sensitive importance map and a location invariant aggregation weight. The importance map is learned in a local temporal window to capture short-term information, while the aggregation weight is generated from a global view with a focus on long-term structure. TAM is a modular block and could be integrated into 2D CNNs to yield a powerful video architecture (TANet) with a very small extra computational cost. The extensive experiments on Kinetics-400 and Something-Something datasets demonstrate that our TAM outperforms other temporal modeling methods consistently, and achieves the state-of-the-art performance under the similar complexity.
@article{liu2020tam,
title={TAM: Temporal Adaptive Module for Video Recognition},
author={Liu, Zhaoyang and Wang, Limin and Wu, Wayne and Qian, Chen and Lu, Tong},
journal={arXiv preprint arXiv:2005.06803},
year={2020}
}
config | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | reference top1 acc | reference top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
tanet_r50_dense_1x1x8_100e_kinetics400_rgb | short-side 320 | 8 | TANet | ImageNet | 76.28 | 92.60 | 76.22 | 92.53 | x | 7124 | ckpt | log | json |
config | resolution | gpus | backbone | pretrain | top1 acc (efficient/accurate) | top5 acc (efficient/accurate) | gpu_mem(M) | ckpt | log | json |
---|---|---|---|---|---|---|---|---|---|---|
tanet_r50_1x1x8_50e_sthv1_rgb | height 100 | 8 | TANet | ImageNet | 47.34/49.58 | 75.72/77.31 | 7127 | ckpt | log | ckpt |
tanet_r50_1x1x16_50e_sthv1_rgb | height 100 | 8 | TANet | ImageNet | 49.05/50.91 | 77.90/79.13 | 7127 | ckpt | log | ckpt |
:::{note}
:::
For more details on data preparation, you can refer to corresponding parts in Data Preparation.
You can use the following command to train a model.
python tools/train.py ${CONFIG_FILE} [optional arguments]
Example: train TANet model on Kinetics-400 dataset in a deterministic option with periodic validation.
python tools/train.py configs/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py \
--work-dir work_dirs/tanet_r50_dense_1x1x8_100e_kinetics400_rgb \
--validate --seed 0 --deterministic
For more details, you can refer to Training setting part in getting_started.
You can use the following command to test a model.
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]
Example: test TANet model on Kinetics-400 dataset and dump the result to a json file.
python tools/test.py configs/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py \
checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \
--out result.json
For more details, you can refer to Test a dataset part in getting_started.