--- a +++ b/docs/tutorials/1_config.md @@ -0,0 +1,757 @@ +# Tutorial 1: Learn about Configs + +We use python files as configs, incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments. +You can find all the provided configs under `$MMAction2/configs`. If you wish to inspect the config file, +you may run `python tools/analysis/print_config.py /PATH/TO/CONFIG` to see the complete config. + +<!-- TOC --> + +- [Modify config through script arguments](#modify-config-through-script-arguments) +- [Config File Structure](#config-file-structure) +- [Config File Naming Convention](#config-file-naming-convention) + - [Config System for Action localization](#config-system-for-action-localization) + - [Config System for Action Recognition](#config-system-for-action-recognition) + - [Config System for Spatio-Temporal Action Detection](#config-system-for-spatio-temporal-action-detection) +- [FAQ](#faq) + - [Use intermediate variables in configs](#use-intermediate-variables-in-configs) + +<!-- TOC --> + +## Modify config through script arguments + +When submitting jobs using "tools/train.py" or "tools/test.py", you may specify `--cfg-options` to in-place modify the config. + +- Update config keys of dict. + + The config options can be specified following the order of the dict keys in the original config. + For example, `--cfg-options model.backbone.norm_eval=False` changes the all BN modules in model backbones to `train` mode. + +- Update keys inside a list of configs. + + Some config dicts are composed as a list in your config. For example, the training pipeline `data.train.pipeline` is normally a list + e.g. `[dict(type='SampleFrames'), ...]`. If you want to change `'SampleFrames'` to `'DenseSampleFrames'` in the pipeline, + you may specify `--cfg-options data.train.pipeline.0.type=DenseSampleFrames`. + +- Update values of list/tuples. + + If the value to be updated is a list or a tuple. For example, the config file normally sets `workflow=[('train', 1)]`. If you want to + change this key, you may specify `--cfg-options workflow="[(train,1),(val,1)]"`. Note that the quotation mark \" is necessary to + support list/tuple data types, and that **NO** white space is allowed inside the quotation marks in the specified value. + +## Config File Structure + +There are 3 basic component types under `config/_base_`, model, schedule, default_runtime. +Many methods could be easily constructed with one of each like TSN, I3D, SlowOnly, etc. +The configs that are composed by components from `_base_` are called _primitive_. + +For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3. + +For easy understanding, we recommend contributors to inherit from exiting methods. +For example, if some modification is made base on TSN, users may first inherit the basic TSN structure by specifying `_base_ = ../tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py`, then modify the necessary fields in the config files. + +If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder under `configs/TASK`. + +Please refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#config) for detailed documentation. + +## Config File Naming Convention + +We follow the style below to name config files. Contributors are advised to follow the same style. + +``` +{model}_[model setting]_{backbone}_[misc]_{data setting}_[gpu x batch_per_gpu]_{schedule}_{dataset}_{modality} +``` + +`{xxx}` is required field and `[yyy]` is optional. + +- `{model}`: model type, e.g. `tsn`, `i3d`, etc. +- `[model setting]`: specific setting for some models. +- `{backbone}`: backbone type, e.g. `r50` (ResNet-50), etc. +- `[misc]`: miscellaneous setting/plugins of model, e.g. `dense`, `320p`, `video`, etc. +- `{data setting}`: frame sample setting in `{clip_len}x{frame_interval}x{num_clips}` format. +- `[gpu x batch_per_gpu]`: GPUs and samples per GPU. +- `{schedule}`: training schedule, e.g. `20e` means 20 epochs. +- `{dataset}`: dataset name, e.g. `kinetics400`, `mmit`, etc. +- `{modality}`: frame modality, e.g. `rgb`, `flow`, etc. + +### Config System for Action localization + +We incorporate modular design into our config system, +which is convenient to conduct various experiments. + +- An Example of BMN + + To help the users have a basic idea of a complete config structure and the modules in an action localization system, + we make brief comments on the config of BMN as the following. + For more detailed usage and alternative for per parameter in each module, please refer to the [API documentation](https://mmaction2.readthedocs.io/en/latest/api.html). + + ```python + # model settings + model = dict( # Config of the model + type='BMN', # Type of the localizer + temporal_dim=100, # Total frames selected for each video + boundary_ratio=0.5, # Ratio for determining video boundaries + num_samples=32, # Number of samples for each proposal + num_samples_per_bin=3, # Number of bin samples for each sample + feat_dim=400, # Dimension of feature + soft_nms_alpha=0.4, # Soft NMS alpha + soft_nms_low_threshold=0.5, # Soft NMS low threshold + soft_nms_high_threshold=0.9, # Soft NMS high threshold + post_process_top_k=100) # Top k proposals in post process + # model training and testing settings + train_cfg = None # Config of training hyperparameters for BMN + test_cfg = dict(average_clips='score') # Config for testing hyperparameters for BMN + + # dataset settings + dataset_type = 'ActivityNetDataset' # Type of dataset for training, validation and testing + data_root = 'data/activitynet_feature_cuhk/csv_mean_100/' # Root path to data for training + data_root_val = 'data/activitynet_feature_cuhk/csv_mean_100/' # Root path to data for validation and testing + ann_file_train = 'data/ActivityNet/anet_anno_train.json' # Path to the annotation file for training + ann_file_val = 'data/ActivityNet/anet_anno_val.json' # Path to the annotation file for validation + ann_file_test = 'data/ActivityNet/anet_anno_test.json' # Path to the annotation file for testing + + train_pipeline = [ # List of training pipeline steps + dict(type='LoadLocalizationFeature'), # Load localization feature pipeline + dict(type='GenerateLocalizationLabels'), # Generate localization labels pipeline + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the localizer + keys=['raw_feature', 'gt_bbox'], # Keys of input + meta_name='video_meta', # Meta name + meta_keys=['video_name']), # Meta keys of input + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['raw_feature']), # Keys to be converted from image to tensor + dict( # Config of ToDataContainer + type='ToDataContainer', # Pipeline to convert the data to DataContainer + fields=[dict(key='gt_bbox', stack=False, cpu_only=True)]) # Required fields to be converted with keys and attributes + ] + val_pipeline = [ # List of validation pipeline steps + dict(type='LoadLocalizationFeature'), # Load localization feature pipeline + dict(type='GenerateLocalizationLabels'), # Generate localization labels pipeline + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the localizer + keys=['raw_feature', 'gt_bbox'], # Keys of input + meta_name='video_meta', # Meta name + meta_keys=[ + 'video_name', 'duration_second', 'duration_frame', 'annotations', + 'feature_frame' + ]), # Meta keys of input + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['raw_feature']), # Keys to be converted from image to tensor + dict( # Config of ToDataContainer + type='ToDataContainer', # Pipeline to convert the data to DataContainer + fields=[dict(key='gt_bbox', stack=False, cpu_only=True)]) # Required fields to be converted with keys and attributes + ] + test_pipeline = [ # List of testing pipeline steps + dict(type='LoadLocalizationFeature'), # Load localization feature pipeline + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the localizer + keys=['raw_feature'], # Keys of input + meta_name='video_meta', # Meta name + meta_keys=[ + 'video_name', 'duration_second', 'duration_frame', 'annotations', + 'feature_frame' + ]), # Meta keys of input + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['raw_feature']), # Keys to be converted from image to tensor + ] + data = dict( # Config of data + videos_per_gpu=8, # Batch size of each single GPU + workers_per_gpu=8, # Workers to pre-fetch data for each single GPU + train_dataloader=dict( # Additional config of train dataloader + drop_last=True), # Whether to drop out the last batch of data in training + val_dataloader=dict( # Additional config of validation dataloader + videos_per_gpu=1), # Batch size of each single GPU during evaluation + test_dataloader=dict( # Additional config of test dataloader + videos_per_gpu=2), # Batch size of each single GPU during testing + test=dict( # Testing dataset config + type=dataset_type, + ann_file=ann_file_test, + pipeline=test_pipeline, + data_prefix=data_root_val), + val=dict( # Validation dataset config + type=dataset_type, + ann_file=ann_file_val, + pipeline=val_pipeline, + data_prefix=data_root_val), + train=dict( # Training dataset config + type=dataset_type, + ann_file=ann_file_train, + pipeline=train_pipeline, + data_prefix=data_root)) + + # optimizer + optimizer = dict( + # Config used to build optimizer, support (1). All the optimizers in PyTorch + # whose arguments are also the same as those in PyTorch. (2). Custom optimizers + # which are built on `constructor`, referring to "tutorials/5_new_modules.md" + # for implementation. + type='Adam', # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details + lr=0.001, # Learning rate, see detail usages of the parameters in the documentation of PyTorch + weight_decay=0.0001) # Weight decay of Adam + optimizer_config = dict( # Config used to build the optimizer hook + grad_clip=None) # Most of the methods do not use gradient clip + # learning policy + lr_config = dict( # Learning rate scheduler config used to register LrUpdater hook + policy='step', # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9 + step=7) # Steps to decay the learning rate + + total_epochs = 9 # Total epochs to train the model + checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation + interval=1) # Interval to save checkpoint + evaluation = dict( # Config of evaluation during training + interval=1, # Interval to perform evaluation + metrics=['AR@AN']) # Metrics to be performed + log_config = dict( # Config to register logger hook + interval=50, # Interval to print the log + hooks=[ # Hooks to be implemented during training + dict(type='TextLoggerHook'), # The logger used to record the training process + # dict(type='TensorboardLoggerHook'), # The Tensorboard logger is also supported + ]) + + # runtime settings + dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set + log_level = 'INFO' # The level of logging + work_dir = './work_dirs/bmn_400x100_2x8_9e_activitynet_feature/' # Directory to save the model checkpoints and logs for the current experiments + load_from = None # load models as a pre-trained model from a given path. This will not resume training + resume_from = None # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved + workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once + output_config = dict( # Config of localization output + out=f'{work_dir}/results.json', # Path to output file + output_format='json') # File format of output file + ``` + +### Config System for Action Recognition + +We incorporate modular design into our config system, +which is convenient to conduct various experiments. + +- An Example of TSN + + To help the users have a basic idea of a complete config structure and the modules in an action recognition system, + we make brief comments on the config of TSN as the following. + For more detailed usage and alternative for per parameter in each module, please refer to the API documentation. + + ```python + # model settings + model = dict( # Config of the model + type='Recognizer2D', # Type of the recognizer + backbone=dict( # Dict for backbone + type='ResNet', # Name of the backbone + pretrained='torchvision://resnet50', # The url/site of the pretrained model + depth=50, # Depth of ResNet model + norm_eval=False), # Whether to set BN layers to eval mode when training + cls_head=dict( # Dict for classification head + type='TSNHead', # Name of classification head + num_classes=400, # Number of classes to be classified. + in_channels=2048, # The input channels of classification head. + spatial_type='avg', # Type of pooling in spatial dimension + consensus=dict(type='AvgConsensus', dim=1), # Config of consensus module + dropout_ratio=0.4, # Probability in dropout layer + init_std=0.01), # Std value for linear layer initiation + # model training and testing settings + train_cfg=None, # Config of training hyperparameters for TSN + test_cfg=dict(average_clips=None)) # Config for testing hyperparameters for TSN. + + # dataset settings + dataset_type = 'RawframeDataset' # Type of dataset for training, validation and testing + data_root = 'data/kinetics400/rawframes_train/' # Root path to data for training + data_root_val = 'data/kinetics400/rawframes_val/' # Root path to data for validation and testing + ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' # Path to the annotation file for training + ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' # Path to the annotation file for validation + ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' # Path to the annotation file for testing + img_norm_cfg = dict( # Config of image normalization used in data pipeline + mean=[123.675, 116.28, 103.53], # Mean values of different channels to normalize + std=[58.395, 57.12, 57.375], # Std values of different channels to normalize + to_bgr=False) # Whether to convert channels from RGB to BGR + + train_pipeline = [ # List of training pipeline steps + dict( # Config of SampleFrames + type='SampleFrames', # Sample frames pipeline, sampling frames from video + clip_len=1, # Frames of each sampled output clip + frame_interval=1, # Temporal interval of adjacent sampled frames + num_clips=3), # Number of clips to be sampled + dict( # Config of RawFrameDecode + type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices + dict( # Config of Resize + type='Resize', # Resize pipeline + scale=(-1, 256)), # The scale to resize images + dict( # Config of MultiScaleCrop + type='MultiScaleCrop', # Multi scale crop pipeline, cropping images with a list of randomly selected scales + input_size=224, # Input size of the network + scales=(1, 0.875, 0.75, 0.66), # Scales of width and height to be selected + random_crop=False, # Whether to randomly sample cropping bbox + max_wh_scale_gap=1), # Maximum gap of w and h scale levels + dict( # Config of Resize + type='Resize', # Resize pipeline + scale=(224, 224), # The scale to resize images + keep_ratio=False), # Whether to resize with changing the aspect ratio + dict( # Config of Flip + type='Flip', # Flip Pipeline + flip_ratio=0.5), # Probability of implementing flip + dict( # Config of Normalize + type='Normalize', # Normalize pipeline + **img_norm_cfg), # Config of image normalization + dict( # Config of FormatShape + type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format + input_format='NCHW'), # Final image shape format + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the recognizer + keys=['imgs', 'label'], # Keys of input + meta_keys=[]), # Meta keys of input + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['imgs', 'label']) # Keys to be converted from image to tensor + ] + val_pipeline = [ # List of validation pipeline steps + dict( # Config of SampleFrames + type='SampleFrames', # Sample frames pipeline, sampling frames from video + clip_len=1, # Frames of each sampled output clip + frame_interval=1, # Temporal interval of adjacent sampled frames + num_clips=3, # Number of clips to be sampled + test_mode=True), # Whether to set test mode in sampling + dict( # Config of RawFrameDecode + type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices + dict( # Config of Resize + type='Resize', # Resize pipeline + scale=(-1, 256)), # The scale to resize images + dict( # Config of CenterCrop + type='CenterCrop', # Center crop pipeline, cropping the center area from images + crop_size=224), # The size to crop images + dict( # Config of Flip + type='Flip', # Flip pipeline + flip_ratio=0), # Probability of implementing flip + dict( # Config of Normalize + type='Normalize', # Normalize pipeline + **img_norm_cfg), # Config of image normalization + dict( # Config of FormatShape + type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format + input_format='NCHW'), # Final image shape format + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the recognizer + keys=['imgs', 'label'], # Keys of input + meta_keys=[]), # Meta keys of input + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['imgs']) # Keys to be converted from image to tensor + ] + test_pipeline = [ # List of testing pipeline steps + dict( # Config of SampleFrames + type='SampleFrames', # Sample frames pipeline, sampling frames from video + clip_len=1, # Frames of each sampled output clip + frame_interval=1, # Temporal interval of adjacent sampled frames + num_clips=25, # Number of clips to be sampled + test_mode=True), # Whether to set test mode in sampling + dict( # Config of RawFrameDecode + type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices + dict( # Config of Resize + type='Resize', # Resize pipeline + scale=(-1, 256)), # The scale to resize images + dict( # Config of TenCrop + type='TenCrop', # Ten crop pipeline, cropping ten area from images + crop_size=224), # The size to crop images + dict( # Config of Flip + type='Flip', # Flip pipeline + flip_ratio=0), # Probability of implementing flip + dict( # Config of Normalize + type='Normalize', # Normalize pipeline + **img_norm_cfg), # Config of image normalization + dict( # Config of FormatShape + type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format + input_format='NCHW'), # Final image shape format + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the recognizer + keys=['imgs', 'label'], # Keys of input + meta_keys=[]), # Meta keys of input + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['imgs']) # Keys to be converted from image to tensor + ] + data = dict( # Config of data + videos_per_gpu=32, # Batch size of each single GPU + workers_per_gpu=2, # Workers to pre-fetch data for each single GPU + train_dataloader=dict( # Additional config of train dataloader + drop_last=True), # Whether to drop out the last batch of data in training + val_dataloader=dict( # Additional config of validation dataloader + videos_per_gpu=1), # Batch size of each single GPU during evaluation + test_dataloader=dict( # Additional config of test dataloader + videos_per_gpu=2), # Batch size of each single GPU during testing + train=dict( # Training dataset config + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( # Validation dataset config + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( # Testing dataset config + type=dataset_type, + ann_file=ann_file_test, + data_prefix=data_root_val, + pipeline=test_pipeline)) + # optimizer + optimizer = dict( + # Config used to build optimizer, support (1). All the optimizers in PyTorch + # whose arguments are also the same as those in PyTorch. (2). Custom optimizers + # which are built on `constructor`, referring to "tutorials/5_new_modules.md" + # for implementation. + type='SGD', # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details + lr=0.01, # Learning rate, see detail usages of the parameters in the documentation of PyTorch + momentum=0.9, # Momentum, + weight_decay=0.0001) # Weight decay of SGD + optimizer_config = dict( # Config used to build the optimizer hook + grad_clip=dict(max_norm=40, norm_type=2)) # Use gradient clip + # learning policy + lr_config = dict( # Learning rate scheduler config used to register LrUpdater hook + policy='step', # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9 + step=[40, 80]) # Steps to decay the learning rate + total_epochs = 100 # Total epochs to train the model + checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation + interval=5) # Interval to save checkpoint + evaluation = dict( # Config of evaluation during training + interval=5, # Interval to perform evaluation + metrics=['top_k_accuracy', 'mean_class_accuracy'], # Metrics to be performed + metric_options=dict(top_k_accuracy=dict(topk=(1, 3))), # Set top-k accuracy to 1 and 3 during validation + save_best='top_k_accuracy') # set `top_k_accuracy` as key indicator to save best checkpoint + eval_config = dict( + metric_options=dict(top_k_accuracy=dict(topk=(1, 3)))) # Set top-k accuracy to 1 and 3 during testing. You can also use `--eval top_k_accuracy` to assign evaluation metrics + log_config = dict( # Config to register logger hook + interval=20, # Interval to print the log + hooks=[ # Hooks to be implemented during training + dict(type='TextLoggerHook'), # The logger used to record the training process + # dict(type='TensorboardLoggerHook'), # The Tensorboard logger is also supported + ]) + + # runtime settings + dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set + log_level = 'INFO' # The level of logging + work_dir = './work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/' # Directory to save the model checkpoints and logs for the current experiments + load_from = None # load models as a pre-trained model from a given path. This will not resume training + resume_from = None # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved + workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once + + ``` + +### Config System for Spatio-Temporal Action Detection + +We incorporate modular design into our config system, which is convenient to conduct various experiments. + +- An Example of FastRCNN + + To help the users have a basic idea of a complete config structure and the modules in a spatio-temporal action detection system, + we make brief comments on the config of FastRCNN as the following. + For more detailed usage and alternative for per parameter in each module, please refer to the API documentation. + + ```python + # model setting + model = dict( # Config of the model + type='FastRCNN', # Type of the detector + backbone=dict( # Dict for backbone + type='ResNet3dSlowOnly', # Name of the backbone + depth=50, # Depth of ResNet model + pretrained=None, # The url/site of the pretrained model + pretrained2d=False, # If the pretrained model is 2D + lateral=False, # If the backbone is with lateral connections + num_stages=4, # Stages of ResNet model + conv1_kernel=(1, 7, 7), # Conv1 kernel size + conv1_stride_t=1, # Conv1 temporal stride + pool1_stride_t=1, # Pool1 temporal stride + spatial_strides=(1, 2, 2, 1)), # The spatial stride for each ResNet stage + roi_head=dict( # Dict for roi_head + type='AVARoIHead', # Name of the roi_head + bbox_roi_extractor=dict( # Dict for bbox_roi_extractor + type='SingleRoIExtractor3D', # Name of the bbox_roi_extractor + roi_layer_type='RoIAlign', # Type of the RoI op + output_size=8, # Output feature size of the RoI op + with_temporal_pool=True), # If temporal dim is pooled + bbox_head=dict( # Dict for bbox_head + type='BBoxHeadAVA', # Name of the bbox_head + in_channels=2048, # Number of channels of the input feature + num_classes=81, # Number of action classes + 1 + multilabel=True, # If the dataset is multilabel + dropout_ratio=0.5)), # The dropout ratio used + # model training and testing settings + train_cfg=dict( # Training config of FastRCNN + rcnn=dict( # Dict for rcnn training config + assigner=dict( # Dict for assigner + type='MaxIoUAssignerAVA', # Name of the assigner + pos_iou_thr=0.9, # IoU threshold for positive examples, > pos_iou_thr -> positive + neg_iou_thr=0.9, # IoU threshold for negative examples, < neg_iou_thr -> negative + min_pos_iou=0.9), # Minimum acceptable IoU for positive examples + sampler=dict( # Dict for sample + type='RandomSampler', # Name of the sampler + num=32, # Batch Size of the sampler + pos_fraction=1, # Positive bbox fraction of the sampler + neg_pos_ub=-1, # Upper bound of the ratio of num negative to num positive + add_gt_as_proposals=True), # Add gt bboxes as proposals + pos_weight=1.0, # Loss weight of positive examples + debug=False)), # Debug mode + test_cfg=dict( # Testing config of FastRCNN + rcnn=dict( # Dict for rcnn testing config + action_thr=0.002))) # The threshold of an action + + # dataset settings + dataset_type = 'AVADataset' # Type of dataset for training, validation and testing + data_root = 'data/ava/rawframes' # Root path to data + anno_root = 'data/ava/annotations' # Root path to annotations + + ann_file_train = f'{anno_root}/ava_train_v2.1.csv' # Path to the annotation file for training + ann_file_val = f'{anno_root}/ava_val_v2.1.csv' # Path to the annotation file for validation + + exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' # Path to the exclude annotation file for training + exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' # Path to the exclude annotation file for validation + + label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' # Path to the label file + + proposal_file_train = f'{anno_root}/ava_dense_proposals_train.FAIR.recall_93.9.pkl' # Path to the human detection proposals for training examples + proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' # Path to the human detection proposals for validation examples + + img_norm_cfg = dict( # Config of image normalization used in data pipeline + mean=[123.675, 116.28, 103.53], # Mean values of different channels to normalize + std=[58.395, 57.12, 57.375], # Std values of different channels to normalize + to_bgr=False) # Whether to convert channels from RGB to BGR + + train_pipeline = [ # List of training pipeline steps + dict( # Config of SampleFrames + type='AVASampleFrames', # Sample frames pipeline, sampling frames from video + clip_len=4, # Frames of each sampled output clip + frame_interval=16), # Temporal interval of adjacent sampled frames + dict( # Config of RawFrameDecode + type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices + dict( # Config of RandomRescale + type='RandomRescale', # Randomly rescale the shortedge by a given range + scale_range=(256, 320)), # The shortedge size range of RandomRescale + dict( # Config of RandomCrop + type='RandomCrop', # Randomly crop a patch with the given size + size=256), # The size of the cropped patch + dict( # Config of Flip + type='Flip', # Flip Pipeline + flip_ratio=0.5), # Probability of implementing flip + dict( # Config of Normalize + type='Normalize', # Normalize pipeline + **img_norm_cfg), # Config of image normalization + dict( # Config of FormatShape + type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format + input_format='NCTHW', # Final image shape format + collapse=True), # Collapse the dim N if N == 1 + dict( # Config of Rename + type='Rename', # Rename keys + mapping=dict(imgs='img')), # The old name to new name mapping + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), # Keys to be converted from image to tensor + dict( # Config of ToDataContainer + type='ToDataContainer', # Convert other types to DataContainer type pipeline + fields=[ # Fields to convert to DataContainer + dict( # Dict of fields + key=['proposals', 'gt_bboxes', 'gt_labels'], # Keys to Convert to DataContainer + stack=False)]), # Whether to stack these tensor + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the detector + keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], # Keys of input + meta_keys=['scores', 'entity_ids']), # Meta keys of input + ] + + val_pipeline = [ # List of validation pipeline steps + dict( # Config of SampleFrames + type='AVASampleFrames', # Sample frames pipeline, sampling frames from video + clip_len=4, # Frames of each sampled output clip + frame_interval=16) # Temporal interval of adjacent sampled frames + dict( # Config of RawFrameDecode + type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices + dict( # Config of Resize + type='Resize', # Resize pipeline + scale=(-1, 256)), # The scale to resize images + dict( # Config of Normalize + type='Normalize', # Normalize pipeline + **img_norm_cfg), # Config of image normalization + dict( # Config of FormatShape + type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format + input_format='NCTHW', # Final image shape format + collapse=True), # Collapse the dim N if N == 1 + dict( # Config of Rename + type='Rename', # Rename keys + mapping=dict(imgs='img')), # The old name to new name mapping + dict( # Config of ToTensor + type='ToTensor', # Convert other types to tensor type pipeline + keys=['img', 'proposals']), # Keys to be converted from image to tensor + dict( # Config of ToDataContainer + type='ToDataContainer', # Convert other types to DataContainer type pipeline + fields=[ # Fields to convert to DataContainer + dict( # Dict of fields + key=['proposals'], # Keys to Convert to DataContainer + stack=False)]), # Whether to stack these tensor + dict( # Config of Collect + type='Collect', # Collect pipeline that decides which keys in the data should be passed to the detector + keys=['img', 'proposals'], # Keys of input + meta_keys=['scores', 'entity_ids'], # Meta keys of input + nested=True) # Whether to wrap the data in a nested list + ] + + data = dict( # Config of data + videos_per_gpu=16, # Batch size of each single GPU + workers_per_gpu=2, # Workers to pre-fetch data for each single GPU + val_dataloader=dict( # Additional config of validation dataloader + videos_per_gpu=1), # Batch size of each single GPU during evaluation + train=dict( # Training dataset config + type=dataset_type, + ann_file=ann_file_train, + exclude_file=exclude_file_train, + pipeline=train_pipeline, + label_file=label_file, + proposal_file=proposal_file_train, + person_det_score_thr=0.9, + data_prefix=data_root), + val=dict( # Validation dataset config + type=dataset_type, + ann_file=ann_file_val, + exclude_file=exclude_file_val, + pipeline=val_pipeline, + label_file=label_file, + proposal_file=proposal_file_val, + person_det_score_thr=0.9, + data_prefix=data_root)) + data['test'] = data['val'] # Set test_dataset as val_dataset + + # optimizer + optimizer = dict( + # Config used to build optimizer, support (1). All the optimizers in PyTorch + # whose arguments are also the same as those in PyTorch. (2). Custom optimizers + # which are built on `constructor`, referring to "tutorials/5_new_modules.md" + # for implementation. + type='SGD', # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details + lr=0.2, # Learning rate, see detail usages of the parameters in the documentation of PyTorch (for 8gpu) + momentum=0.9, # Momentum, + weight_decay=0.00001) # Weight decay of SGD + + optimizer_config = dict( # Config used to build the optimizer hook + grad_clip=dict(max_norm=40, norm_type=2)) # Use gradient clip + + lr_config = dict( # Learning rate scheduler config used to register LrUpdater hook + policy='step', # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9 + step=[40, 80], # Steps to decay the learning rate + warmup='linear', # Warmup strategy + warmup_by_epoch=True, # Warmup_iters indicates iter num or epoch num + warmup_iters=5, # Number of iters or epochs for warmup + warmup_ratio=0.1) # The initial learning rate is warmup_ratio * lr + + total_epochs = 20 # Total epochs to train the model + checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation + interval=1) # Interval to save checkpoint + workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once + evaluation = dict( # Config of evaluation during training + interval=1, save_best='mAP@0.5IOU') # Interval to perform evaluation and the key for saving best checkpoint + log_config = dict( # Config to register logger hook + interval=20, # Interval to print the log + hooks=[ # Hooks to be implemented during training + dict(type='TextLoggerHook'), # The logger used to record the training process + ]) + + # runtime settings + dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set + log_level = 'INFO' # The level of logging + work_dir = ('./work_dirs/ava/' # Directory to save the model checkpoints and logs for the current experiments + 'slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb') + load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/' # load models as a pre-trained model from a given path. This will not resume training + 'slowonly_r50_4x16x1_256e_kinetics400_rgb/' + 'slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth') + resume_from = None # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved + ``` + +## FAQ + +### Use intermediate variables in configs + +Some intermediate variables are used in the config files, like `train_pipeline`/`val_pipeline`/`test_pipeline`, +`ann_file_train`/`ann_file_val`/`ann_file_test`, `img_norm_cfg` etc. + +For Example, we would like to first define `train_pipeline`/`val_pipeline`/`test_pipeline` and pass them into `data`. +Thus, `train_pipeline`/`val_pipeline`/`test_pipeline` are intermediate variable. + +we also define `ann_file_train`/`ann_file_val`/`ann_file_test` and `data_root`/`data_root_val` to provide data pipeline some +basic information. + +In addition, we use `img_norm_cfg` as intermediate variables to construct data augmentation components. + +```python +... +dataset_type = 'RawframeDataset' +data_root = 'data/kinetics400/rawframes_train' +data_root_val = 'data/kinetics400/rawframes_val' +ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' +ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' +ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' + +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False) + +train_pipeline = [ + dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict( + type='MultiScaleCrop', + input_size=224, + scales=(1, 0.8), + random_crop=False, + max_wh_scale_gap=0), + dict(type='Resize', scale=(224, 224), keep_ratio=False), + dict(type='Flip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs', 'label']) +] +val_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=1, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='CenterCrop', crop_size=224), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] +test_pipeline = [ + dict( + type='SampleFrames', + clip_len=32, + frame_interval=2, + num_clips=10, + test_mode=True), + dict(type='RawFrameDecode'), + dict(type='Resize', scale=(-1, 256)), + dict(type='ThreeCrop', crop_size=256), + dict(type='Normalize', **img_norm_cfg), + dict(type='FormatShape', input_format='NCTHW'), + dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), + dict(type='ToTensor', keys=['imgs']) +] + +data = dict( + videos_per_gpu=8, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=ann_file_train, + data_prefix=data_root, + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=val_pipeline), + test=dict( + type=dataset_type, + ann_file=ann_file_val, + data_prefix=data_root_val, + pipeline=test_pipeline)) +```