[6d389a]: / docs / tutorials / 1_config.md

Download this file

758 lines (686 with data), 43.3 kB

Tutorial 1: Learn about Configs

We use python files as configs, incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments.
You can find all the provided configs under $MMAction2/configs. If you wish to inspect the config file,
you may run python tools/analysis/print_config.py /PATH/TO/CONFIG to see the complete config.

Modify config through script arguments

When submitting jobs using "tools/train.py" or "tools/test.py", you may specify --cfg-options to in-place modify the config.

  • Update config keys of dict.

The config options can be specified following the order of the dict keys in the original config.
For example, --cfg-options model.backbone.norm_eval=False changes the all BN modules in model backbones to train mode.

  • Update keys inside a list of configs.

Some config dicts are composed as a list in your config. For example, the training pipeline data.train.pipeline is normally a list
e.g. [dict(type='SampleFrames'), ...]. If you want to change 'SampleFrames' to 'DenseSampleFrames' in the pipeline,
you may specify --cfg-options data.train.pipeline.0.type=DenseSampleFrames.

  • Update values of list/tuples.

If the value to be updated is a list or a tuple. For example, the config file normally sets workflow=[('train', 1)]. If you want to
change this key, you may specify --cfg-options workflow="[(train,1),(val,1)]". Note that the quotation mark \" is necessary to
support list/tuple data types, and that NO white space is allowed inside the quotation marks in the specified value.

Config File Structure

There are 3 basic component types under config/_base_, model, schedule, default_runtime.
Many methods could be easily constructed with one of each like TSN, I3D, SlowOnly, etc.
The configs that are composed by components from _base_ are called primitive.

For all configs under the same folder, it is recommended to have only one primitive config. All other configs should inherit from the primitive config. In this way, the maximum of inheritance level is 3.

For easy understanding, we recommend contributors to inherit from exiting methods.
For example, if some modification is made base on TSN, users may first inherit the basic TSN structure by specifying _base_ = ../tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py, then modify the necessary fields in the config files.

If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder under configs/TASK.

Please refer to mmcv for detailed documentation.

Config File Naming Convention

We follow the style below to name config files. Contributors are advised to follow the same style.

{model}_[model setting]_{backbone}_[misc]_{data setting}_[gpu x batch_per_gpu]_{schedule}_{dataset}_{modality}

{xxx} is required field and [yyy] is optional.

  • {model}: model type, e.g. tsn, i3d, etc.
  • [model setting]: specific setting for some models.
  • {backbone}: backbone type, e.g. r50 (ResNet-50), etc.
  • [misc]: miscellaneous setting/plugins of model, e.g. dense, 320p, video, etc.
  • {data setting}: frame sample setting in {clip_len}x{frame_interval}x{num_clips} format.
  • [gpu x batch_per_gpu]: GPUs and samples per GPU.
  • {schedule}: training schedule, e.g. 20e means 20 epochs.
  • {dataset}: dataset name, e.g. kinetics400, mmit, etc.
  • {modality}: frame modality, e.g. rgb, flow, etc.

Config System for Action localization

We incorporate modular design into our config system,
which is convenient to conduct various experiments.

  • An Example of BMN

    To help the users have a basic idea of a complete config structure and the modules in an action localization system,
    we make brief comments on the config of BMN as the following.
    For more detailed usage and alternative for per parameter in each module, please refer to the API documentation.

    ```python

    model settings

    model = dict( # Config of the model
    type='BMN', # Type of the localizer
    temporal_dim=100, # Total frames selected for each video
    boundary_ratio=0.5, # Ratio for determining video boundaries
    num_samples=32, # Number of samples for each proposal
    num_samples_per_bin=3, # Number of bin samples for each sample
    feat_dim=400, # Dimension of feature
    soft_nms_alpha=0.4, # Soft NMS alpha
    soft_nms_low_threshold=0.5, # Soft NMS low threshold
    soft_nms_high_threshold=0.9, # Soft NMS high threshold
    post_process_top_k=100) # Top k proposals in post process

    model training and testing settings

    train_cfg = None # Config of training hyperparameters for BMN
    test_cfg = dict(average_clips='score') # Config for testing hyperparameters for BMN

    dataset settings

    dataset_type = 'ActivityNetDataset' # Type of dataset for training, validation and testing
    data_root = 'data/activitynet_feature_cuhk/csv_mean_100/' # Root path to data for training
    data_root_val = 'data/activitynet_feature_cuhk/csv_mean_100/' # Root path to data for validation and testing
    ann_file_train = 'data/ActivityNet/anet_anno_train.json' # Path to the annotation file for training
    ann_file_val = 'data/ActivityNet/anet_anno_val.json' # Path to the annotation file for validation
    ann_file_test = 'data/ActivityNet/anet_anno_test.json' # Path to the annotation file for testing

    train_pipeline = [ # List of training pipeline steps
    dict(type='LoadLocalizationFeature'), # Load localization feature pipeline
    dict(type='GenerateLocalizationLabels'), # Generate localization labels pipeline
    dict( # Config of Collect
    type='Collect', # Collect pipeline that decides which keys in the data should be passed to the localizer
    keys=['raw_feature', 'gt_bbox']
    , # Keys of input
    meta_name='video_meta', # Meta name
    meta_keys=['video_name']), # Meta keys of input
    dict( # Config of ToTensor
    type='ToTensor', # Convert other types to tensor type pipeline
    keys=['raw_feature']), # Keys to be converted from image to tensor
    dict( # Config of ToDataContainer
    type='ToDataContainer', # Pipeline to convert the data to DataContainer
    fields=[dict(key='gt_bbox', stack=False, cpu_only=True)]) # Required fields to be converted with keys and attributes
    ]
    val_pipeline = [ # List of validation pipeline steps
    dict(type='LoadLocalizationFeature'), # Load localization feature pipeline
    dict(type='GenerateLocalizationLabels'), # Generate localization labels pipeline
    dict( # Config of Collect
    type='Collect', # Collect pipeline that decides which keys in the data should be passed to the localizer
    keys=['raw_feature', 'gt_bbox']
    , # Keys of input
    meta_name='video_meta', # Meta name
    meta_keys=
    'video_name', 'duration_second', 'duration_frame', 'annotations',
    'feature_frame'
    ), # Meta keys of input
    dict( # Config of ToTensor
    type='ToTensor', # Convert other types to tensor type pipeline
    keys=['raw_feature']), # Keys to be converted from image to tensor
    dict( # Config of ToDataContainer
    type='ToDataContainer', # Pipeline to convert the data to DataContainer
    fields=[dict(key='gt_bbox', stack=False, cpu_only=True)]) # Required fields to be converted with keys and attributes
    ]
    test_pipeline = [ # List of testing pipeline steps
    dict(type='LoadLocalizationFeature'), # Load localization feature pipeline
    dict( # Config of Collect
    type='Collect', # Collect pipeline that decides which keys in the data should be passed to the localizer
    keys=['raw_feature']
    , # Keys of input
    meta_name='video_meta', # Meta name
    meta_keys=
    'video_name', 'duration_second', 'duration_frame', 'annotations',
    'feature_frame'
    ), # Meta keys of input
    dict( # Config of ToTensor
    type='ToTensor', # Convert other types to tensor type pipeline
    keys=['raw_feature']), # Keys to be converted from image to tensor
    ]
    data = dict( # Config of data
    videos_per_gpu=8, # Batch size of each single GPU
    workers_per_gpu=8, # Workers to pre-fetch data for each single GPU
    train_dataloader=dict( # Additional config of train dataloader
    drop_last=True), # Whether to drop out the last batch of data in training
    val_dataloader=dict( # Additional config of validation dataloader
    videos_per_gpu=1), # Batch size of each single GPU during evaluation
    test_dataloader=dict( # Additional config of test dataloader
    videos_per_gpu=2), # Batch size of each single GPU during testing
    test=dict( # Testing dataset config
    type=dataset_type,
    ann_file=ann_file_test,
    pipeline=test_pipeline,
    data_prefix=data_root_val),
    val=dict( # Validation dataset config
    type=dataset_type,
    ann_file=ann_file_val,
    pipeline=val_pipeline,
    data_prefix=data_root_val),
    train=dict( # Training dataset config
    type=dataset_type,
    ann_file=ann_file_train,
    pipeline=train_pipeline,
    data_prefix=data_root))

    optimizer

    optimizer = dict(
    # Config used to build optimizer, support (1). All the optimizers in PyTorch
    # whose arguments are also the same as those in PyTorch. (2). Custom optimizers
    # which are built on constructor, referring to "tutorials/5_new_modules.md"
    # for implementation.
    type='Adam', # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details
    lr=0.001, # Learning rate, see detail usages of the parameters in the documentation of PyTorch
    weight_decay=0.0001) # Weight decay of Adam
    optimizer_config = dict( # Config used to build the optimizer hook
    grad_clip=None) # Most of the methods do not use gradient clip

    learning policy

    lr_config = dict( # Learning rate scheduler config used to register LrUpdater hook
    policy='step', # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9
    step=7) # Steps to decay the learning rate

    total_epochs = 9 # Total epochs to train the model
    checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation
    interval=1) # Interval to save checkpoint
    evaluation = dict( # Config of evaluation during training
    interval=1, # Interval to perform evaluation
    metrics=['AR@AN']) # Metrics to be performed
    log_config = dict( # Config to register logger hook
    interval=50, # Interval to print the log
    hooks=[ # Hooks to be implemented during training
    dict(type='TextLoggerHook'), # The logger used to record the training process
    # dict(type='TensorboardLoggerHook'), # The Tensorboard logger is also supported
    ]
    )

    runtime settings

    dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set
    log_level = 'INFO' # The level of logging
    work_dir = './work_dirs/bmn_400x100_2x8_9e_activitynet_feature/' # Directory to save the model checkpoints and logs for the current experiments
    load_from = None # load models as a pre-trained model from a given path. This will not resume training
    resume_from = None # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved
    workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once
    output_config = dict( # Config of localization output
    out=f'{work_dir}/results.json', # Path to output file
    output_format='json') # File format of output file
    ```

Config System for Action Recognition

We incorporate modular design into our config system,
which is convenient to conduct various experiments.

  • An Example of TSN

    To help the users have a basic idea of a complete config structure and the modules in an action recognition system,
    we make brief comments on the config of TSN as the following.
    For more detailed usage and alternative for per parameter in each module, please refer to the API documentation.

    ```python

    model settings

    model = dict( # Config of the model
    type='Recognizer2D', # Type of the recognizer
    backbone=dict( # Dict for backbone
    type='ResNet', # Name of the backbone
    pretrained='torchvision://resnet50', # The url/site of the pretrained model
    depth=50, # Depth of ResNet model
    norm_eval=False), # Whether to set BN layers to eval mode when training
    cls_head=dict( # Dict for classification head
    type='TSNHead', # Name of classification head
    num_classes=400, # Number of classes to be classified.
    in_channels=2048, # The input channels of classification head.
    spatial_type='avg', # Type of pooling in spatial dimension
    consensus=dict(type='AvgConsensus', dim=1), # Config of consensus module
    dropout_ratio=0.4, # Probability in dropout layer
    init_std=0.01), # Std value for linear layer initiation
    # model training and testing settings
    train_cfg=None, # Config of training hyperparameters for TSN
    test_cfg=dict(average_clips=None)) # Config for testing hyperparameters for TSN.

    dataset settings

    dataset_type = 'RawframeDataset' # Type of dataset for training, validation and testing
    data_root = 'data/kinetics400/rawframes_train/' # Root path to data for training
    data_root_val = 'data/kinetics400/rawframes_val/' # Root path to data for validation and testing
    ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' # Path to the annotation file for training
    ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' # Path to the annotation file for validation
    ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' # Path to the annotation file for testing
    img_norm_cfg = dict( # Config of image normalization used in data pipeline
    mean=[123.675, 116.28, 103.53], # Mean values of different channels to normalize
    std=[58.395, 57.12, 57.375], # Std values of different channels to normalize
    to_bgr=False) # Whether to convert channels from RGB to BGR

    train_pipeline = [ # List of training pipeline steps
    dict( # Config of SampleFrames
    type='SampleFrames', # Sample frames pipeline, sampling frames from video
    clip_len=1, # Frames of each sampled output clip
    frame_interval=1, # Temporal interval of adjacent sampled frames
    num_clips=3), # Number of clips to be sampled
    dict( # Config of RawFrameDecode
    type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
    dict( # Config of Resize
    type='Resize', # Resize pipeline
    scale=(-1, 256)), # The scale to resize images
    dict( # Config of MultiScaleCrop
    type='MultiScaleCrop', # Multi scale crop pipeline, cropping images with a list of randomly selected scales
    input_size=224, # Input size of the network
    scales=(1, 0.875, 0.75, 0.66), # Scales of width and height to be selected
    random_crop=False, # Whether to randomly sample cropping bbox
    max_wh_scale_gap=1), # Maximum gap of w and h scale levels
    dict( # Config of Resize
    type='Resize', # Resize pipeline
    scale=(224, 224), # The scale to resize images
    keep_ratio=False), # Whether to resize with changing the aspect ratio
    dict( # Config of Flip
    type='Flip', # Flip Pipeline
    flip_ratio=0.5), # Probability of implementing flip
    dict( # Config of Normalize
    type='Normalize', # Normalize pipeline
    **img_norm_cfg), # Config of image normalization
    dict( # Config of FormatShape
    type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
    input_format='NCHW'), # Final image shape format
    dict( # Config of Collect
    type='Collect', # Collect pipeline that decides which keys in the data should be passed to the recognizer
    keys=['imgs', 'label']
    , # Keys of input
    meta_keys=[]), # Meta keys of input
    dict( # Config of ToTensor
    type='ToTensor', # Convert other types to tensor type pipeline
    keys=['imgs', 'label']) # Keys to be converted from image to tensor
    ]
    val_pipeline = [ # List of validation pipeline steps
    dict( # Config of SampleFrames
    type='SampleFrames', # Sample frames pipeline, sampling frames from video
    clip_len=1, # Frames of each sampled output clip
    frame_interval=1, # Temporal interval of adjacent sampled frames
    num_clips=3, # Number of clips to be sampled
    test_mode=True), # Whether to set test mode in sampling
    dict( # Config of RawFrameDecode
    type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
    dict( # Config of Resize
    type='Resize', # Resize pipeline
    scale=(-1, 256)), # The scale to resize images
    dict( # Config of CenterCrop
    type='CenterCrop', # Center crop pipeline, cropping the center area from images
    crop_size=224), # The size to crop images
    dict( # Config of Flip
    type='Flip', # Flip pipeline
    flip_ratio=0), # Probability of implementing flip
    dict( # Config of Normalize
    type='Normalize', # Normalize pipeline
    **img_norm_cfg), # Config of image normalization
    dict( # Config of FormatShape
    type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
    input_format='NCHW'), # Final image shape format
    dict( # Config of Collect
    type='Collect', # Collect pipeline that decides which keys in the data should be passed to the recognizer
    keys=['imgs', 'label']
    , # Keys of input
    meta_keys=[]), # Meta keys of input
    dict( # Config of ToTensor
    type='ToTensor', # Convert other types to tensor type pipeline
    keys=['imgs']) # Keys to be converted from image to tensor
    ]
    test_pipeline = [ # List of testing pipeline steps
    dict( # Config of SampleFrames
    type='SampleFrames', # Sample frames pipeline, sampling frames from video
    clip_len=1, # Frames of each sampled output clip
    frame_interval=1, # Temporal interval of adjacent sampled frames
    num_clips=25, # Number of clips to be sampled
    test_mode=True), # Whether to set test mode in sampling
    dict( # Config of RawFrameDecode
    type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
    dict( # Config of Resize
    type='Resize', # Resize pipeline
    scale=(-1, 256)), # The scale to resize images
    dict( # Config of TenCrop
    type='TenCrop', # Ten crop pipeline, cropping ten area from images
    crop_size=224), # The size to crop images
    dict( # Config of Flip
    type='Flip', # Flip pipeline
    flip_ratio=0), # Probability of implementing flip
    dict( # Config of Normalize
    type='Normalize', # Normalize pipeline
    **img_norm_cfg), # Config of image normalization
    dict( # Config of FormatShape
    type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
    input_format='NCHW'), # Final image shape format
    dict( # Config of Collect
    type='Collect', # Collect pipeline that decides which keys in the data should be passed to the recognizer
    keys=['imgs', 'label']
    , # Keys of input
    meta_keys=[]), # Meta keys of input
    dict( # Config of ToTensor
    type='ToTensor', # Convert other types to tensor type pipeline
    keys=['imgs']) # Keys to be converted from image to tensor
    ]
    data = dict( # Config of data
    videos_per_gpu=32, # Batch size of each single GPU
    workers_per_gpu=2, # Workers to pre-fetch data for each single GPU
    train_dataloader=dict( # Additional config of train dataloader
    drop_last=True), # Whether to drop out the last batch of data in training
    val_dataloader=dict( # Additional config of validation dataloader
    videos_per_gpu=1), # Batch size of each single GPU during evaluation
    test_dataloader=dict( # Additional config of test dataloader
    videos_per_gpu=2), # Batch size of each single GPU during testing
    train=dict( # Training dataset config
    type=dataset_type,
    ann_file=ann_file_train,
    data_prefix=data_root,
    pipeline=train_pipeline),
    val=dict( # Validation dataset config
    type=dataset_type,
    ann_file=ann_file_val,
    data_prefix=data_root_val,
    pipeline=val_pipeline),
    test=dict( # Testing dataset config
    type=dataset_type,
    ann_file=ann_file_test,
    data_prefix=data_root_val,
    pipeline=test_pipeline))

    optimizer

    optimizer = dict(
    # Config used to build optimizer, support (1). All the optimizers in PyTorch
    # whose arguments are also the same as those in PyTorch. (2). Custom optimizers
    # which are built on constructor, referring to "tutorials/5_new_modules.md"
    # for implementation.
    type='SGD', # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details
    lr=0.01, # Learning rate, see detail usages of the parameters in the documentation of PyTorch
    momentum=0.9, # Momentum,
    weight_decay=0.0001) # Weight decay of SGD
    optimizer_config = dict( # Config used to build the optimizer hook
    grad_clip=dict(max_norm=40, norm_type=2)) # Use gradient clip

    learning policy

    lr_config = dict( # Learning rate scheduler config used to register LrUpdater hook
    policy='step', # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9
    step=[40, 80]) # Steps to decay the learning rate
    total_epochs = 100 # Total epochs to train the model
    checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation
    interval=5) # Interval to save checkpoint
    evaluation = dict( # Config of evaluation during training
    interval=5, # Interval to perform evaluation
    metrics=['top_k_accuracy', 'mean_class_accuracy'], # Metrics to be performed
    metric_options=dict(top_k_accuracy=dict(topk=(1, 3))), # Set top-k accuracy to 1 and 3 during validation
    save_best='top_k_accuracy') # set top_k_accuracy as key indicator to save best checkpoint
    eval_config = dict(
    metric_options=dict(top_k_accuracy=dict(topk=(1, 3)))) # Set top-k accuracy to 1 and 3 during testing. You can also use --eval top_k_accuracy to assign evaluation metrics
    log_config = dict( # Config to register logger hook
    interval=20, # Interval to print the log
    hooks=[ # Hooks to be implemented during training
    dict(type='TextLoggerHook'), # The logger used to record the training process
    # dict(type='TensorboardLoggerHook'), # The Tensorboard logger is also supported
    ]
    )

    runtime settings

    dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set
    log_level = 'INFO' # The level of logging
    work_dir = './work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/' # Directory to save the model checkpoints and logs for the current experiments
    load_from = None # load models as a pre-trained model from a given path. This will not resume training
    resume_from = None # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved
    workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once

    ```

Config System for Spatio-Temporal Action Detection

We incorporate modular design into our config system, which is convenient to conduct various experiments.

  • An Example of FastRCNN

    To help the users have a basic idea of a complete config structure and the modules in a spatio-temporal action detection system,
    we make brief comments on the config of FastRCNN as the following.
    For more detailed usage and alternative for per parameter in each module, please refer to the API documentation.

    ```python

    model setting

    model = dict( # Config of the model
    type='FastRCNN', # Type of the detector
    backbone=dict( # Dict for backbone
    type='ResNet3dSlowOnly', # Name of the backbone
    depth=50, # Depth of ResNet model
    pretrained=None, # The url/site of the pretrained model
    pretrained2d=False, # If the pretrained model is 2D
    lateral=False, # If the backbone is with lateral connections
    num_stages=4, # Stages of ResNet model
    conv1_kernel=(1, 7, 7), # Conv1 kernel size
    conv1_stride_t=1, # Conv1 temporal stride
    pool1_stride_t=1, # Pool1 temporal stride
    spatial_strides=(1, 2, 2, 1)), # The spatial stride for each ResNet stage
    roi_head=dict( # Dict for roi_head
    type='AVARoIHead', # Name of the roi_head
    bbox_roi_extractor=dict( # Dict for bbox_roi_extractor
    type='SingleRoIExtractor3D', # Name of the bbox_roi_extractor
    roi_layer_type='RoIAlign', # Type of the RoI op
    output_size=8, # Output feature size of the RoI op
    with_temporal_pool=True), # If temporal dim is pooled
    bbox_head=dict( # Dict for bbox_head
    type='BBoxHeadAVA', # Name of the bbox_head
    in_channels=2048, # Number of channels of the input feature
    num_classes=81, # Number of action classes + 1
    multilabel=True, # If the dataset is multilabel
    dropout_ratio=0.5)), # The dropout ratio used
    # model training and testing settings
    train_cfg=dict( # Training config of FastRCNN
    rcnn=dict( # Dict for rcnn training config
    assigner=dict( # Dict for assigner
    type='MaxIoUAssignerAVA', # Name of the assigner
    pos_iou_thr=0.9, # IoU threshold for positive examples, > pos_iou_thr -> positive
    neg_iou_thr=0.9, # IoU threshold for negative examples, < neg_iou_thr -> negative
    min_pos_iou=0.9), # Minimum acceptable IoU for positive examples
    sampler=dict( # Dict for sample
    type='RandomSampler', # Name of the sampler
    num=32, # Batch Size of the sampler
    pos_fraction=1, # Positive bbox fraction of the sampler
    neg_pos_ub=-1, # Upper bound of the ratio of num negative to num positive
    add_gt_as_proposals=True), # Add gt bboxes as proposals
    pos_weight=1.0, # Loss weight of positive examples
    debug=False)), # Debug mode
    test_cfg=dict( # Testing config of FastRCNN
    rcnn=dict( # Dict for rcnn testing config
    action_thr=0.002))) # The threshold of an action

    dataset settings

    dataset_type = 'AVADataset' # Type of dataset for training, validation and testing
    data_root = 'data/ava/rawframes' # Root path to data
    anno_root = 'data/ava/annotations' # Root path to annotations

    ann_file_train = f'{anno_root}/ava_train_v2.1.csv' # Path to the annotation file for training
    ann_file_val = f'{anno_root}/ava_val_v2.1.csv' # Path to the annotation file for validation

    exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' # Path to the exclude annotation file for training
    exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' # Path to the exclude annotation file for validation

    label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' # Path to the label file

    proposal_file_train = f'{anno_root}/ava_dense_proposals_train.FAIR.recall_93.9.pkl' # Path to the human detection proposals for training examples
    proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' # Path to the human detection proposals for validation examples

    img_norm_cfg = dict( # Config of image normalization used in data pipeline
    mean=[123.675, 116.28, 103.53], # Mean values of different channels to normalize
    std=[58.395, 57.12, 57.375], # Std values of different channels to normalize
    to_bgr=False) # Whether to convert channels from RGB to BGR

    train_pipeline = [ # List of training pipeline steps
    dict( # Config of SampleFrames
    type='AVASampleFrames', # Sample frames pipeline, sampling frames from video
    clip_len=4, # Frames of each sampled output clip
    frame_interval=16), # Temporal interval of adjacent sampled frames
    dict( # Config of RawFrameDecode
    type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
    dict( # Config of RandomRescale
    type='RandomRescale', # Randomly rescale the shortedge by a given range
    scale_range=(256, 320)), # The shortedge size range of RandomRescale
    dict( # Config of RandomCrop
    type='RandomCrop', # Randomly crop a patch with the given size
    size=256), # The size of the cropped patch
    dict( # Config of Flip
    type='Flip', # Flip Pipeline
    flip_ratio=0.5), # Probability of implementing flip
    dict( # Config of Normalize
    type='Normalize', # Normalize pipeline
    **img_norm_cfg), # Config of image normalization
    dict( # Config of FormatShape
    type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
    input_format='NCTHW', # Final image shape format
    collapse=True), # Collapse the dim N if N == 1
    dict( # Config of Rename
    type='Rename', # Rename keys
    mapping=dict(imgs='img')), # The old name to new name mapping
    dict( # Config of ToTensor
    type='ToTensor', # Convert other types to tensor type pipeline
    keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']
    ), # Keys to be converted from image to tensor
    dict( # Config of ToDataContainer
    type='ToDataContainer', # Convert other types to DataContainer type pipeline
    fields=[ # Fields to convert to DataContainer
    dict( # Dict of fields
    key=['proposals', 'gt_bboxes', 'gt_labels']
    , # Keys to Convert to DataContainer
    stack=False)]), # Whether to stack these tensor
    dict( # Config of Collect
    type='Collect', # Collect pipeline that decides which keys in the data should be passed to the detector
    keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], # Keys of input
    meta_keys=['scores', 'entity_ids']), # Meta keys of input
    ]

    val_pipeline = [ # List of validation pipeline steps
    dict( # Config of SampleFrames
    type='AVASampleFrames', # Sample frames pipeline, sampling frames from video
    clip_len=4, # Frames of each sampled output clip
    frame_interval=16) # Temporal interval of adjacent sampled frames
    dict( # Config of RawFrameDecode
    type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
    dict( # Config of Resize
    type='Resize', # Resize pipeline
    scale=(-1, 256)), # The scale to resize images
    dict( # Config of Normalize
    type='Normalize', # Normalize pipeline
    **img_norm_cfg), # Config of image normalization
    dict( # Config of FormatShape
    type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
    input_format='NCTHW', # Final image shape format
    collapse=True), # Collapse the dim N if N == 1
    dict( # Config of Rename
    type='Rename', # Rename keys
    mapping=dict(imgs='img')), # The old name to new name mapping
    dict( # Config of ToTensor
    type='ToTensor', # Convert other types to tensor type pipeline
    keys=['img', 'proposals']
    ), # Keys to be converted from image to tensor
    dict( # Config of ToDataContainer
    type='ToDataContainer', # Convert other types to DataContainer type pipeline
    fields=[ # Fields to convert to DataContainer
    dict( # Dict of fields
    key=['proposals']
    , # Keys to Convert to DataContainer
    stack=False)]), # Whether to stack these tensor
    dict( # Config of Collect
    type='Collect', # Collect pipeline that decides which keys in the data should be passed to the detector
    keys=['img', 'proposals'], # Keys of input
    meta_keys=['scores', 'entity_ids'], # Meta keys of input
    nested=True) # Whether to wrap the data in a nested list
    ]

    data = dict( # Config of data
    videos_per_gpu=16, # Batch size of each single GPU
    workers_per_gpu=2, # Workers to pre-fetch data for each single GPU
    val_dataloader=dict( # Additional config of validation dataloader
    videos_per_gpu=1), # Batch size of each single GPU during evaluation
    train=dict( # Training dataset config
    type=dataset_type,
    ann_file=ann_file_train,
    exclude_file=exclude_file_train,
    pipeline=train_pipeline,
    label_file=label_file,
    proposal_file=proposal_file_train,
    person_det_score_thr=0.9,
    data_prefix=data_root),
    val=dict( # Validation dataset config
    type=dataset_type,
    ann_file=ann_file_val,
    exclude_file=exclude_file_val,
    pipeline=val_pipeline,
    label_file=label_file,
    proposal_file=proposal_file_val,
    person_det_score_thr=0.9,
    data_prefix=data_root))
    data['test'] = data['val'] # Set test_dataset as val_dataset

    optimizer

    optimizer = dict(
    # Config used to build optimizer, support (1). All the optimizers in PyTorch
    # whose arguments are also the same as those in PyTorch. (2). Custom optimizers
    # which are built on constructor, referring to "tutorials/5_new_modules.md"
    # for implementation.
    type='SGD', # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details
    lr=0.2, # Learning rate, see detail usages of the parameters in the documentation of PyTorch (for 8gpu)
    momentum=0.9, # Momentum,
    weight_decay=0.00001) # Weight decay of SGD

    optimizer_config = dict( # Config used to build the optimizer hook
    grad_clip=dict(max_norm=40, norm_type=2)) # Use gradient clip

    lr_config = dict( # Learning rate scheduler config used to register LrUpdater hook
    policy='step', # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9
    step=[40, 80], # Steps to decay the learning rate
    warmup='linear', # Warmup strategy
    warmup_by_epoch=True, # Warmup_iters indicates iter num or epoch num
    warmup_iters=5, # Number of iters or epochs for warmup
    warmup_ratio=0.1) # The initial learning rate is warmup_ratio * lr

    total_epochs = 20 # Total epochs to train the model
    checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation
    interval=1) # Interval to save checkpoint
    workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once
    evaluation = dict( # Config of evaluation during training
    interval=1, save_best='mAP@0.5IOU') # Interval to perform evaluation and the key for saving best checkpoint
    log_config = dict( # Config to register logger hook
    interval=20, # Interval to print the log
    hooks=[ # Hooks to be implemented during training
    dict(type='TextLoggerHook'), # The logger used to record the training process
    ]
    )

    runtime settings

    dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set
    log_level = 'INFO' # The level of logging
    work_dir = ('./work_dirs/ava/' # Directory to save the model checkpoints and logs for the current experiments
    'slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb')
    load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/' # load models as a pre-trained model from a given path. This will not resume training
    'slowonly_r50_4x16x1_256e_kinetics400_rgb/'
    'slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth')
    resume_from = None # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved
    ```

FAQ

Use intermediate variables in configs

Some intermediate variables are used in the config files, like train_pipeline/val_pipeline/test_pipeline,
ann_file_train/ann_file_val/ann_file_test, img_norm_cfg etc.

For Example, we would like to first define train_pipeline/val_pipeline/test_pipeline and pass them into data.
Thus, train_pipeline/val_pipeline/test_pipeline are intermediate variable.

we also define ann_file_train/ann_file_val/ann_file_test and data_root/data_root_val to provide data pipeline some
basic information.

In addition, we use img_norm_cfg as intermediate variables to construct data augmentation components.

...
dataset_type = 'RawframeDataset'
data_root = 'data/kinetics400/rawframes_train'
data_root_val = 'data/kinetics400/rawframes_val'
ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt'
ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt'
ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt'

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)

train_pipeline = [
    dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1),
    dict(type='RawFrameDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(
        type='MultiScaleCrop',
        input_size=224,
        scales=(1, 0.8),
        random_crop=False,
        max_wh_scale_gap=0),
    dict(type='Resize', scale=(224, 224), keep_ratio=False),
    dict(type='Flip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
    dict(
        type='SampleFrames',
        clip_len=32,
        frame_interval=2,
        num_clips=1,
        test_mode=True),
    dict(type='RawFrameDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
    dict(
        type='SampleFrames',
        clip_len=32,
        frame_interval=2,
        num_clips=10,
        test_mode=True),
    dict(type='RawFrameDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='ThreeCrop', crop_size=256),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]

data = dict(
    videos_per_gpu=8,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=test_pipeline))