1248 lines (1247 with data), 64.3 kB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
},
"source": [
"<a href=\"https://colab.research.google.com/github/open-mmlab/mmaction2/blob/master/demo/mmaction2_tutorial.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VcjSRFELVbNk"
},
"source": [
"# MMAction2 Tutorial\n",
"\n",
"Welcome to MMAction2! This is the official colab tutorial for using MMAction2. In this tutorial, you will learn\n",
"- Perform inference with a MMAction2 recognizer.\n",
"- Train a new recognizer with a new dataset.\n",
"\n",
"Let's start!"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7LqHGkGEVqpm"
},
"source": [
"## Install MMAction2"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Bf8PpPXtVvmg",
"outputId": "f262f3c6-a9dd-48c7-8f7e-081fd3e12ba8"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"nvcc: NVIDIA (R) Cuda compiler driver\n",
"Copyright (c) 2005-2020 NVIDIA Corporation\n",
"Built on Wed_Jul_22_19:09:09_PDT_2020\n",
"Cuda compilation tools, release 11.0, V11.0.221\n",
"Build cuda_11.0_bu.TC445_37.28845127_0\n",
"gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0\n",
"Copyright (C) 2017 Free Software Foundation, Inc.\n",
"This is free software; see the source for copying conditions. There is NO\n",
"warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n",
"\n"
]
}
],
"source": [
"# Check nvcc version\n",
"!nvcc -V\n",
"# Check GCC version\n",
"!gcc --version"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5PAJ4ArzV5Ry",
"outputId": "b68c4528-1a83-469f-8920-040ae373fc7c"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Looking in links: https://download.pytorch.org/whl/torch_stable.html\n",
"Collecting torch==1.8.0+cu101\n",
"\u001b[?25l Downloading https://download.pytorch.org/whl/cu101/torch-1.8.0%2Bcu101-cp37-cp37m-linux_x86_64.whl (763.5MB)\n",
"\u001b[K |████████████████████████████████| 763.5MB 23kB/s \n",
"\u001b[?25hCollecting torchvision==0.9.0+cu101\n",
"\u001b[?25l Downloading https://download.pytorch.org/whl/cu101/torchvision-0.9.0%2Bcu101-cp37-cp37m-linux_x86_64.whl (17.3MB)\n",
"\u001b[K |████████████████████████████████| 17.3MB 188kB/s \n",
"\u001b[?25hCollecting torchtext==0.9.0\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/36/50/84184d6230686e230c464f0dd4ff32eada2756b4a0b9cefec68b88d1d580/torchtext-0.9.0-cp37-cp37m-manylinux1_x86_64.whl (7.1MB)\n",
"\u001b[K |████████████████████████████████| 7.1MB 8.0MB/s \n",
"\u001b[?25hRequirement already satisfied, skipping upgrade: numpy in /usr/local/lib/python3.7/dist-packages (from torch==1.8.0+cu101) (1.19.5)\n",
"Requirement already satisfied, skipping upgrade: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch==1.8.0+cu101) (3.7.4.3)\n",
"Requirement already satisfied, skipping upgrade: pillow>=4.1.1 in /usr/local/lib/python3.7/dist-packages (from torchvision==0.9.0+cu101) (7.1.2)\n",
"Requirement already satisfied, skipping upgrade: tqdm in /usr/local/lib/python3.7/dist-packages (from torchtext==0.9.0) (4.41.1)\n",
"Requirement already satisfied, skipping upgrade: requests in /usr/local/lib/python3.7/dist-packages (from torchtext==0.9.0) (2.23.0)\n",
"Requirement already satisfied, skipping upgrade: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->torchtext==0.9.0) (2.10)\n",
"Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->torchtext==0.9.0) (1.24.3)\n",
"Requirement already satisfied, skipping upgrade: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->torchtext==0.9.0) (3.0.4)\n",
"Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->torchtext==0.9.0) (2021.5.30)\n",
"Installing collected packages: torch, torchvision, torchtext\n",
" Found existing installation: torch 1.9.0+cu102\n",
" Uninstalling torch-1.9.0+cu102:\n",
" Successfully uninstalled torch-1.9.0+cu102\n",
" Found existing installation: torchvision 0.10.0+cu102\n",
" Uninstalling torchvision-0.10.0+cu102:\n",
" Successfully uninstalled torchvision-0.10.0+cu102\n",
" Found existing installation: torchtext 0.10.0\n",
" Uninstalling torchtext-0.10.0:\n",
" Successfully uninstalled torchtext-0.10.0\n",
"Successfully installed torch-1.8.0+cu101 torchtext-0.9.0 torchvision-0.9.0+cu101\n",
"Looking in links: https://download.openmmlab.com/mmcv/dist/cu101/torch1.8.0/index.html\n",
"Collecting mmcv-full==1.3.9\n",
"\u001b[?25l Downloading https://download.openmmlab.com/mmcv/dist/cu101/torch1.8.0/mmcv_full-1.3.9-cp37-cp37m-manylinux1_x86_64.whl (31.4MB)\n",
"\u001b[K |████████████████████████████████| 31.4MB 94kB/s \n",
"\u001b[?25hRequirement already satisfied: pyyaml in /usr/local/lib/python3.7/dist-packages (from mmcv-full==1.3.9) (3.13)\n",
"Requirement already satisfied: opencv-python>=3 in /usr/local/lib/python3.7/dist-packages (from mmcv-full==1.3.9) (4.1.2.30)\n",
"Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from mmcv-full==1.3.9) (1.19.5)\n",
"Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from mmcv-full==1.3.9) (7.1.2)\n",
"Collecting addict\n",
" Downloading https://files.pythonhosted.org/packages/6a/00/b08f23b7d7e1e14ce01419a467b583edbb93c6cdb8654e54a9cc579cd61f/addict-2.4.0-py3-none-any.whl\n",
"Collecting yapf\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/5f/0d/8814e79eb865eab42d95023b58b650d01dec6f8ea87fc9260978b1bf2167/yapf-0.31.0-py2.py3-none-any.whl (185kB)\n",
"\u001b[K |████████████████████████████████| 194kB 8.8MB/s \n",
"\u001b[?25hInstalling collected packages: addict, yapf, mmcv-full\n",
"Successfully installed addict-2.4.0 mmcv-full-1.3.9 yapf-0.31.0\n",
"Cloning into 'mmaction2'...\n",
"remote: Enumerating objects: 12544, done.\u001b[K\n",
"remote: Counting objects: 100% (677/677), done.\u001b[K\n",
"remote: Compressing objects: 100% (330/330), done.\u001b[K\n",
"remote: Total 12544 (delta 432), reused 510 (delta 344), pack-reused 11867\u001b[K\n",
"Receiving objects: 100% (12544/12544), 42.42 MiB | 30.27 MiB/s, done.\n",
"Resolving deltas: 100% (8980/8980), done.\n",
"/content/mmaction2\n",
"Obtaining file:///content/mmaction2\n",
"Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.16.0) (3.2.2)\n",
"Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.16.0) (1.19.5)\n",
"Requirement already satisfied: opencv-contrib-python in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.16.0) (4.1.2.30)\n",
"Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.16.0) (7.1.2)\n",
"Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.16.0) (2.8.1)\n",
"Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.16.0) (0.10.0)\n",
"Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.16.0) (1.3.1)\n",
"Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.16.0) (2.4.7)\n",
"Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib->mmaction2==0.16.0) (1.15.0)\n",
"Installing collected packages: mmaction2\n",
" Running setup.py develop for mmaction2\n",
"Successfully installed mmaction2\n",
"Collecting av\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/66/ff/bacde7314c646a2bd2f240034809a10cc3f8b096751284d0828640fff3dd/av-8.0.3-cp37-cp37m-manylinux2010_x86_64.whl (37.2MB)\n",
"\u001b[K |████████████████████████████████| 37.2MB 76kB/s \n",
"\u001b[?25hCollecting decord>=0.4.1\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/11/79/936af42edf90a7bd4e41a6cac89c913d4b47fa48a26b042d5129a9242ee3/decord-0.6.0-py3-none-manylinux2010_x86_64.whl (13.6MB)\n",
"\u001b[K |████████████████████████████████| 13.6MB 231kB/s \n",
"\u001b[?25hCollecting einops\n",
" Downloading https://files.pythonhosted.org/packages/5d/a0/9935e030634bf60ecd572c775f64ace82ceddf2f504a5fd3902438f07090/einops-0.3.0-py2.py3-none-any.whl\n",
"Requirement already satisfied: imgaug in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 4)) (0.2.9)\n",
"Requirement already satisfied: librosa in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 5)) (0.8.1)\n",
"Requirement already satisfied: lmdb in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 6)) (0.99)\n",
"Requirement already satisfied: moviepy in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 7)) (0.2.3.5)\n",
"Collecting onnx\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/3f/9b/54c950d3256e27f970a83cd0504efb183a24312702deed0179453316dbd0/onnx-1.9.0-cp37-cp37m-manylinux2010_x86_64.whl (12.2MB)\n",
"\u001b[K |████████████████████████████████| 12.2MB 36.2MB/s \n",
"\u001b[?25hCollecting onnxruntime\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/c9/35/80ab6f444a83c708817e011e9cd4708c816591cc85aff830dff525a34992/onnxruntime-1.8.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5MB)\n",
"\u001b[K |████████████████████████████████| 4.5MB 29.5MB/s \n",
"\u001b[?25hCollecting pims\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/d5/47/82e0ac31e01a271e5a06362fbf03769e9081956f6772f91d98b32899d743/PIMS-0.5.tar.gz (85kB)\n",
"\u001b[K |████████████████████████████████| 92kB 13.1MB/s \n",
"\u001b[?25hCollecting PyTurboJPEG\n",
" Downloading https://files.pythonhosted.org/packages/f9/7b/7621780391ed7a33acec8e803068d7291d940fbbad1ffc8909e94e844477/PyTurboJPEG-1.5.1.tar.gz\n",
"Collecting timm\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/90/fc/606bc5cf46acac3aa9bd179b3954433c026aaf88ea98d6b19f5d14c336da/timm-0.4.12-py3-none-any.whl (376kB)\n",
"\u001b[K |████████████████████████████████| 378kB 43.1MB/s \n",
"\u001b[?25hRequirement already satisfied: numpy>=1.14.0 in /usr/local/lib/python3.7/dist-packages (from decord>=0.4.1->-r requirements/optional.txt (line 2)) (1.19.5)\n",
"Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 4)) (7.1.2)\n",
"Requirement already satisfied: scikit-image>=0.11.0 in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 4)) (0.16.2)\n",
"Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 4)) (1.15.0)\n",
"Requirement already satisfied: imageio in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 4)) (2.4.1)\n",
"Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 4)) (3.2.2)\n",
"Requirement already satisfied: opencv-python in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 4)) (4.1.2.30)\n",
"Requirement already satisfied: Shapely in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 4)) (1.7.1)\n",
"Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 4)) (1.4.1)\n",
"Requirement already satisfied: resampy>=0.2.2 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 5)) (0.2.2)\n",
"Requirement already satisfied: pooch>=1.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 5)) (1.4.0)\n",
"Requirement already satisfied: numba>=0.43.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 5)) (0.51.2)\n",
"Requirement already satisfied: audioread>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 5)) (2.1.9)\n",
"Requirement already satisfied: soundfile>=0.10.2 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 5)) (0.10.3.post1)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 5)) (20.9)\n",
"Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 5)) (1.0.1)\n",
"Requirement already satisfied: decorator>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 5)) (4.4.2)\n",
"Requirement already satisfied: scikit-learn!=0.19.0,>=0.14.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 5)) (0.22.2.post1)\n",
"Requirement already satisfied: tqdm<5.0,>=4.11.2 in /usr/local/lib/python3.7/dist-packages (from moviepy->-r requirements/optional.txt (line 7)) (4.41.1)\n",
"Requirement already satisfied: protobuf in /usr/local/lib/python3.7/dist-packages (from onnx->-r requirements/optional.txt (line 8)) (3.17.3)\n",
"Requirement already satisfied: typing-extensions>=3.6.2.1 in /usr/local/lib/python3.7/dist-packages (from onnx->-r requirements/optional.txt (line 8)) (3.7.4.3)\n",
"Requirement already satisfied: flatbuffers in /usr/local/lib/python3.7/dist-packages (from onnxruntime->-r requirements/optional.txt (line 9)) (1.12)\n",
"Collecting slicerator>=0.9.8\n",
" Downloading https://files.pythonhosted.org/packages/75/ae/fe46f5371105508a209fe6162e7e7b11db531a79d2eabcd24566b8b1f534/slicerator-1.0.0-py3-none-any.whl\n",
"Requirement already satisfied: torchvision in /usr/local/lib/python3.7/dist-packages (from timm->-r requirements/optional.txt (line 12)) (0.9.0+cu101)\n",
"Requirement already satisfied: torch>=1.4 in /usr/local/lib/python3.7/dist-packages (from timm->-r requirements/optional.txt (line 12)) (1.8.0+cu101)\n",
"Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.11.0->imgaug->-r requirements/optional.txt (line 4)) (2.5.1)\n",
"Requirement already satisfied: PyWavelets>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.11.0->imgaug->-r requirements/optional.txt (line 4)) (1.1.1)\n",
"Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 4)) (0.10.0)\n",
"Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 4)) (2.8.1)\n",
"Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 4)) (2.4.7)\n",
"Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 4)) (1.3.1)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from pooch>=1.0->librosa->-r requirements/optional.txt (line 5)) (2.23.0)\n",
"Requirement already satisfied: appdirs in /usr/local/lib/python3.7/dist-packages (from pooch>=1.0->librosa->-r requirements/optional.txt (line 5)) (1.4.4)\n",
"Requirement already satisfied: llvmlite<0.35,>=0.34.0.dev0 in /usr/local/lib/python3.7/dist-packages (from numba>=0.43.0->librosa->-r requirements/optional.txt (line 5)) (0.34.0)\n",
"Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from numba>=0.43.0->librosa->-r requirements/optional.txt (line 5)) (57.0.0)\n",
"Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.7/dist-packages (from soundfile>=0.10.2->librosa->-r requirements/optional.txt (line 5)) (1.14.5)\n",
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->pooch>=1.0->librosa->-r requirements/optional.txt (line 5)) (2.10)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->pooch>=1.0->librosa->-r requirements/optional.txt (line 5)) (2021.5.30)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->pooch>=1.0->librosa->-r requirements/optional.txt (line 5)) (3.0.4)\n",
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->pooch>=1.0->librosa->-r requirements/optional.txt (line 5)) (1.24.3)\n",
"Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi>=1.0->soundfile>=0.10.2->librosa->-r requirements/optional.txt (line 5)) (2.20)\n",
"Building wheels for collected packages: pims, PyTurboJPEG\n",
" Building wheel for pims (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for pims: filename=PIMS-0.5-cp37-none-any.whl size=84328 sha256=436632b7a982144fd933f01d12e38a419eb8a636f2d6dd4bd4a43680734979e2\n",
" Stored in directory: /root/.cache/pip/wheels/0e/0a/14/4c33a4cc1b9158e57329a38e8e3e03901ed24060eb322d5462\n",
" Building wheel for PyTurboJPEG (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for PyTurboJPEG: filename=PyTurboJPEG-1.5.1-cp37-none-any.whl size=7979 sha256=755337aaa622b48be036eca6d743e99bf4528fc6c64e810da11a71236a78bcca\n",
" Stored in directory: /root/.cache/pip/wheels/19/cb/78/5725c881ee618936d956bf0ecd4272cb0f701cb898f44575ca\n",
"Successfully built pims PyTurboJPEG\n",
"Installing collected packages: av, decord, einops, onnx, onnxruntime, slicerator, pims, PyTurboJPEG, timm\n",
"Successfully installed PyTurboJPEG-1.5.1 av-8.0.3 decord-0.6.0 einops-0.3.0 onnx-1.9.0 onnxruntime-1.8.1 pims-0.5 slicerator-1.0.0 timm-0.4.12\n"
]
}
],
"source": [
"# install dependencies: (use cu101 because colab has CUDA 10.1)\n",
"!pip install -U torch==1.8.0+cu101 torchvision==0.9.0+cu101 torchtext==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html\n",
"\n",
"# install mmcv-full thus we could use CUDA operators\n",
"!pip install mmcv-full==1.3.9 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.8.0/index.html\n",
"\n",
"# Install mmaction2\n",
"!rm -rf mmaction2\n",
"!git clone https://github.com/open-mmlab/mmaction2.git\n",
"%cd mmaction2\n",
"\n",
"!pip install -e .\n",
"\n",
"# Install some optional requirements\n",
"!pip install -r requirements/optional.txt"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "No_zZAFpWC-a",
"outputId": "7e95038a-6f79-410b-adf6-0148bf8cc2fc"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.8.0+cu101 True\n",
"0.16.0\n",
"10.1\n",
"GCC 7.3\n"
]
}
],
"source": [
"# Check Pytorch installation\n",
"import torch, torchvision\n",
"print(torch.__version__, torch.cuda.is_available())\n",
"\n",
"# Check MMAction2 installation\n",
"import mmaction\n",
"print(mmaction.__version__)\n",
"\n",
"# Check MMCV installation\n",
"from mmcv.ops import get_compiling_cuda_version, get_compiler_version\n",
"print(get_compiling_cuda_version())\n",
"print(get_compiler_version())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pXf7oV5DWdab"
},
"source": [
"## Perform inference with a MMAction2 recognizer\n",
"MMAction2 already provides high level APIs to do inference and training."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "64CW6d_AaT-Q",
"outputId": "d08bfb9b-ab1e-451b-d3b2-89023a59766b"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2021-07-11 12:44:00-- https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth\n",
"Resolving download.openmmlab.com (download.openmmlab.com)... 47.88.36.78\n",
"Connecting to download.openmmlab.com (download.openmmlab.com)|47.88.36.78|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 97579339 (93M) [application/octet-stream]\n",
"Saving to: ‘checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth’\n",
"\n",
"checkpoints/tsn_r50 100%[===================>] 93.06M 11.4MB/s in 8.1s \n",
"\n",
"2021-07-11 12:44:09 (11.4 MB/s) - ‘checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth’ saved [97579339/97579339]\n",
"\n"
]
}
],
"source": [
"!mkdir checkpoints\n",
"!wget -c https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \\\n",
" -O checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "HNZB7NoSabzj",
"outputId": "b2f9bd71-1490-44d3-81c6-5037d804f0b1"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Use load_from_local loader\n"
]
}
],
"source": [
"from mmaction.apis import inference_recognizer, init_recognizer\n",
"\n",
"# Choose to use a config and initialize the recognizer\n",
"config = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'\n",
"# Setup a checkpoint file to load\n",
"checkpoint = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'\n",
"# Initialize the recognizer\n",
"model = init_recognizer(config, checkpoint, device='cuda:0')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "rEMsBnpHapAn"
},
"outputs": [],
"source": [
"# Use the recognizer to do inference\n",
"video = 'demo/demo.mp4'\n",
"label = 'tools/data/kinetics/label_map_k400.txt'\n",
"results = inference_recognizer(model, video)\n",
"\n",
"labels = open(label).readlines()\n",
"labels = [x.strip() for x in labels]\n",
"results = [(labels[k[0]], k[1]) for k in results]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "NIyJXqfWathq",
"outputId": "ca24528b-f99d-414a-fa50-456f6068b463"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arm wrestling: 29.616438\n",
"rock scissors paper: 10.754841\n",
"shaking hands: 9.908401\n",
"clapping: 9.189913\n",
"massaging feet: 8.305307\n"
]
}
],
"source": [
"# Let's show the results\n",
"for result in results:\n",
" print(f'{result[0]}: ', result[1])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QuZG8kZ2fJ5d"
},
"source": [
"## Train a recognizer on customized dataset\n",
"\n",
"To train a new recognizer, there are usually three things to do:\n",
"1. Support a new dataset\n",
"2. Modify the config\n",
"3. Train a new recognizer"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "neEFyxChfgiJ"
},
"source": [
"### Support a new dataset\n",
"\n",
"In this tutorial, we gives an example to convert the data into the format of existing datasets. Other methods and more advanced usages can be found in the [doc](/docs/tutorials/new_dataset.md)\n",
"\n",
"Firstly, let's download a tiny dataset obtained from [Kinetics-400](https://deepmind.com/research/open-source/open-source-datasets/kinetics/). We select 30 videos with their labels as train dataset and 10 videos with their labels as test dataset."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gjsUj9JzgUlJ",
"outputId": "61c4704d-db81-4ca5-ed16-e2454dbdfe8e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"rm: cannot remove 'kinetics400_tiny.zip*': No such file or directory\n",
"--2021-07-11 12:44:29-- https://download.openmmlab.com/mmaction/kinetics400_tiny.zip\n",
"Resolving download.openmmlab.com (download.openmmlab.com)... 47.88.36.78\n",
"Connecting to download.openmmlab.com (download.openmmlab.com)|47.88.36.78|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 18308682 (17M) [application/zip]\n",
"Saving to: ‘kinetics400_tiny.zip’\n",
"\n",
"kinetics400_tiny.zi 100%[===================>] 17.46M 10.7MB/s in 1.6s \n",
"\n",
"2021-07-11 12:44:31 (10.7 MB/s) - ‘kinetics400_tiny.zip’ saved [18308682/18308682]\n",
"\n"
]
}
],
"source": [
"# download, decompress the data\n",
"!rm kinetics400_tiny.zip*\n",
"!rm -rf kinetics400_tiny\n",
"!wget https://download.openmmlab.com/mmaction/kinetics400_tiny.zip\n",
"!unzip kinetics400_tiny.zip > /dev/null"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "AbZ-o7V6hNw4",
"outputId": "b091909c-def2-49b5-88c2-01b00802b162"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Reading package lists...\n",
"Building dependency tree...\n",
"Reading state information...\n",
"The following NEW packages will be installed:\n",
" tree\n",
"0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.\n",
"Need to get 40.7 kB of archives.\n",
"After this operation, 105 kB of additional disk space will be used.\n",
"Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tree amd64 1.7.0-5 [40.7 kB]\n",
"Fetched 40.7 kB in 0s (88.7 kB/s)\n",
"Selecting previously unselected package tree.\n",
"(Reading database ... 160815 files and directories currently installed.)\n",
"Preparing to unpack .../tree_1.7.0-5_amd64.deb ...\n",
"Unpacking tree (1.7.0-5) ...\n",
"Setting up tree (1.7.0-5) ...\n",
"Processing triggers for man-db (2.8.3-2ubuntu0.1) ...\n",
"kinetics400_tiny\n",
"├── kinetics_tiny_train_video.txt\n",
"├── kinetics_tiny_val_video.txt\n",
"├── train\n",
"│ ├── 27_CSXByd3s.mp4\n",
"│ ├── 34XczvTaRiI.mp4\n",
"│ ├── A-wiliK50Zw.mp4\n",
"│ ├── D32_1gwq35E.mp4\n",
"│ ├── D92m0HsHjcQ.mp4\n",
"│ ├── DbX8mPslRXg.mp4\n",
"│ ├── FMlSTTpN3VY.mp4\n",
"│ ├── h10B9SVE-nk.mp4\n",
"│ ├── h2YqqUhnR34.mp4\n",
"│ ├── iRuyZSKhHRg.mp4\n",
"│ ├── IyfILH9lBRo.mp4\n",
"│ ├── kFC3KY2bOP8.mp4\n",
"│ ├── LvcFDgCAXQs.mp4\n",
"│ ├── O46YA8tI530.mp4\n",
"│ ├── oMrZaozOvdQ.mp4\n",
"│ ├── oXy-e_P_cAI.mp4\n",
"│ ├── P5M-hAts7MQ.mp4\n",
"│ ├── phDqGd0NKoo.mp4\n",
"│ ├── PnOe3GZRVX8.mp4\n",
"│ ├── R8HXQkdgKWA.mp4\n",
"│ ├── RqnKtCEoEcA.mp4\n",
"│ ├── soEcZZsBmDs.mp4\n",
"│ ├── TkkZPZHbAKA.mp4\n",
"│ ├── T_TMNGzVrDk.mp4\n",
"│ ├── WaS0qwP46Us.mp4\n",
"│ ├── Wh_YPQdH1Zg.mp4\n",
"│ ├── WWP5HZJsg-o.mp4\n",
"│ ├── xGY2dP0YUjA.mp4\n",
"│ ├── yLC9CtWU5ws.mp4\n",
"│ └── ZQV4U2KQ370.mp4\n",
"└── val\n",
" ├── 0pVGiAU6XEA.mp4\n",
" ├── AQrbRSnRt8M.mp4\n",
" ├── b6Q_b7vgc7Q.mp4\n",
" ├── ddvJ6-faICE.mp4\n",
" ├── IcLztCtvhb8.mp4\n",
" ├── ik4BW3-SCts.mp4\n",
" ├── jqRrH30V0k4.mp4\n",
" ├── SU_x2LQqSLs.mp4\n",
" ├── u4Rm6srmIS8.mp4\n",
" └── y5Iu7XkTqV0.mp4\n",
"\n",
"2 directories, 42 files\n"
]
}
],
"source": [
"# Check the directory structure of the tiny data\n",
"\n",
"# Install tree first\n",
"!apt-get -q install tree\n",
"!tree kinetics400_tiny"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "fTdi6dI0hY3g",
"outputId": "ffda0997-8d77-431a-d66e-2f273e80c756"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"D32_1gwq35E.mp4 0\n",
"iRuyZSKhHRg.mp4 1\n",
"oXy-e_P_cAI.mp4 0\n",
"34XczvTaRiI.mp4 1\n",
"h2YqqUhnR34.mp4 0\n",
"O46YA8tI530.mp4 0\n",
"kFC3KY2bOP8.mp4 1\n",
"WWP5HZJsg-o.mp4 1\n",
"phDqGd0NKoo.mp4 1\n",
"yLC9CtWU5ws.mp4 0\n",
"27_CSXByd3s.mp4 1\n",
"IyfILH9lBRo.mp4 1\n",
"T_TMNGzVrDk.mp4 1\n",
"TkkZPZHbAKA.mp4 0\n",
"PnOe3GZRVX8.mp4 1\n",
"soEcZZsBmDs.mp4 1\n",
"FMlSTTpN3VY.mp4 1\n",
"WaS0qwP46Us.mp4 0\n",
"A-wiliK50Zw.mp4 1\n",
"oMrZaozOvdQ.mp4 1\n",
"ZQV4U2KQ370.mp4 0\n",
"DbX8mPslRXg.mp4 1\n",
"h10B9SVE-nk.mp4 1\n",
"P5M-hAts7MQ.mp4 0\n",
"R8HXQkdgKWA.mp4 0\n",
"D92m0HsHjcQ.mp4 0\n",
"RqnKtCEoEcA.mp4 0\n",
"LvcFDgCAXQs.mp4 0\n",
"xGY2dP0YUjA.mp4 0\n",
"Wh_YPQdH1Zg.mp4 0\n"
]
}
],
"source": [
"# After downloading the data, we need to check the annotation format\n",
"!cat kinetics400_tiny/kinetics_tiny_train_video.txt"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0bq0mxmEi29H"
},
"source": [
"According to the format defined in [`VideoDataset`](./datasets/video_dataset.py), each line indicates a sample video with the filepath and label, which are split with a whitespace."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ht_DGJA9jQar"
},
"source": [
"### Modify the config\n",
"\n",
"In the next step, we need to modify the config for the training.\n",
"To accelerate the process, we finetune a recognizer using a pre-trained recognizer."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"id": "LjCcmCKOjktc"
},
"outputs": [],
"source": [
"from mmcv import Config\n",
"cfg = Config.fromfile('./configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb.py')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tc8YhFFGjp3e"
},
"source": [
"Given a config that trains a TSN model on kinetics400-full dataset, we need to modify some values to use it for training TSN on Kinetics400-tiny dataset.\n"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "tlhu9byjjt-K",
"outputId": "3b9a3c49-ace0-41d3-dd15-d6c8579755f8"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Config:\n",
"model = dict(\n",
" type='Recognizer2D',\n",
" backbone=dict(\n",
" type='ResNet',\n",
" pretrained='torchvision://resnet50',\n",
" depth=50,\n",
" norm_eval=False),\n",
" cls_head=dict(\n",
" type='TSNHead',\n",
" num_classes=2,\n",
" in_channels=2048,\n",
" spatial_type='avg',\n",
" consensus=dict(type='AvgConsensus', dim=1),\n",
" dropout_ratio=0.4,\n",
" init_std=0.01),\n",
" train_cfg=None,\n",
" test_cfg=dict(average_clips=None))\n",
"optimizer = dict(type='SGD', lr=7.8125e-05, momentum=0.9, weight_decay=0.0001)\n",
"optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))\n",
"lr_config = dict(policy='step', step=[40, 80])\n",
"total_epochs = 10\n",
"checkpoint_config = dict(interval=5)\n",
"log_config = dict(interval=5, hooks=[dict(type='TextLoggerHook')])\n",
"dist_params = dict(backend='nccl')\n",
"log_level = 'INFO'\n",
"load_from = './checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'\n",
"resume_from = None\n",
"workflow = [('train', 1)]\n",
"dataset_type = 'VideoDataset'\n",
"data_root = 'kinetics400_tiny/train/'\n",
"data_root_val = 'kinetics400_tiny/val/'\n",
"ann_file_train = 'kinetics400_tiny/kinetics_tiny_train_video.txt'\n",
"ann_file_val = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n",
"ann_file_test = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n",
"img_norm_cfg = dict(\n",
" mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)\n",
"train_pipeline = [\n",
" dict(type='DecordInit'),\n",
" dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8),\n",
" dict(type='DecordDecode'),\n",
" dict(\n",
" type='MultiScaleCrop',\n",
" input_size=224,\n",
" scales=(1, 0.875, 0.75, 0.66),\n",
" random_crop=False,\n",
" max_wh_scale_gap=1),\n",
" dict(type='Resize', scale=(224, 224), keep_ratio=False),\n",
" dict(type='Flip', flip_ratio=0.5),\n",
" dict(\n",
" type='Normalize',\n",
" mean=[123.675, 116.28, 103.53],\n",
" std=[58.395, 57.12, 57.375],\n",
" to_bgr=False),\n",
" dict(type='FormatShape', input_format='NCHW'),\n",
" dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n",
" dict(type='ToTensor', keys=['imgs', 'label'])\n",
"]\n",
"val_pipeline = [\n",
" dict(type='DecordInit'),\n",
" dict(\n",
" type='SampleFrames',\n",
" clip_len=1,\n",
" frame_interval=1,\n",
" num_clips=8,\n",
" test_mode=True),\n",
" dict(type='DecordDecode'),\n",
" dict(type='Resize', scale=(-1, 256)),\n",
" dict(type='CenterCrop', crop_size=224),\n",
" dict(\n",
" type='Normalize',\n",
" mean=[123.675, 116.28, 103.53],\n",
" std=[58.395, 57.12, 57.375],\n",
" to_bgr=False),\n",
" dict(type='FormatShape', input_format='NCHW'),\n",
" dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n",
" dict(type='ToTensor', keys=['imgs'])\n",
"]\n",
"test_pipeline = [\n",
" dict(type='DecordInit'),\n",
" dict(\n",
" type='SampleFrames',\n",
" clip_len=1,\n",
" frame_interval=1,\n",
" num_clips=25,\n",
" test_mode=True),\n",
" dict(type='DecordDecode'),\n",
" dict(type='Resize', scale=(-1, 256)),\n",
" dict(type='ThreeCrop', crop_size=256),\n",
" dict(\n",
" type='Normalize',\n",
" mean=[123.675, 116.28, 103.53],\n",
" std=[58.395, 57.12, 57.375],\n",
" to_bgr=False),\n",
" dict(type='FormatShape', input_format='NCHW'),\n",
" dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n",
" dict(type='ToTensor', keys=['imgs'])\n",
"]\n",
"data = dict(\n",
" videos_per_gpu=2,\n",
" workers_per_gpu=2,\n",
" train=dict(\n",
" type='VideoDataset',\n",
" ann_file='kinetics400_tiny/kinetics_tiny_train_video.txt',\n",
" data_prefix='kinetics400_tiny/train/',\n",
" pipeline=[\n",
" dict(type='DecordInit'),\n",
" dict(\n",
" type='SampleFrames', clip_len=1, frame_interval=1,\n",
" num_clips=8),\n",
" dict(type='DecordDecode'),\n",
" dict(\n",
" type='MultiScaleCrop',\n",
" input_size=224,\n",
" scales=(1, 0.875, 0.75, 0.66),\n",
" random_crop=False,\n",
" max_wh_scale_gap=1),\n",
" dict(type='Resize', scale=(224, 224), keep_ratio=False),\n",
" dict(type='Flip', flip_ratio=0.5),\n",
" dict(\n",
" type='Normalize',\n",
" mean=[123.675, 116.28, 103.53],\n",
" std=[58.395, 57.12, 57.375],\n",
" to_bgr=False),\n",
" dict(type='FormatShape', input_format='NCHW'),\n",
" dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n",
" dict(type='ToTensor', keys=['imgs', 'label'])\n",
" ]),\n",
" val=dict(\n",
" type='VideoDataset',\n",
" ann_file='kinetics400_tiny/kinetics_tiny_val_video.txt',\n",
" data_prefix='kinetics400_tiny/val/',\n",
" pipeline=[\n",
" dict(type='DecordInit'),\n",
" dict(\n",
" type='SampleFrames',\n",
" clip_len=1,\n",
" frame_interval=1,\n",
" num_clips=8,\n",
" test_mode=True),\n",
" dict(type='DecordDecode'),\n",
" dict(type='Resize', scale=(-1, 256)),\n",
" dict(type='CenterCrop', crop_size=224),\n",
" dict(\n",
" type='Normalize',\n",
" mean=[123.675, 116.28, 103.53],\n",
" std=[58.395, 57.12, 57.375],\n",
" to_bgr=False),\n",
" dict(type='FormatShape', input_format='NCHW'),\n",
" dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n",
" dict(type='ToTensor', keys=['imgs'])\n",
" ]),\n",
" test=dict(\n",
" type='VideoDataset',\n",
" ann_file='kinetics400_tiny/kinetics_tiny_val_video.txt',\n",
" data_prefix='kinetics400_tiny/val/',\n",
" pipeline=[\n",
" dict(type='DecordInit'),\n",
" dict(\n",
" type='SampleFrames',\n",
" clip_len=1,\n",
" frame_interval=1,\n",
" num_clips=25,\n",
" test_mode=True),\n",
" dict(type='DecordDecode'),\n",
" dict(type='Resize', scale=(-1, 256)),\n",
" dict(type='ThreeCrop', crop_size=256),\n",
" dict(\n",
" type='Normalize',\n",
" mean=[123.675, 116.28, 103.53],\n",
" std=[58.395, 57.12, 57.375],\n",
" to_bgr=False),\n",
" dict(type='FormatShape', input_format='NCHW'),\n",
" dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n",
" dict(type='ToTensor', keys=['imgs'])\n",
" ]))\n",
"evaluation = dict(\n",
" interval=5,\n",
" metrics=['top_k_accuracy', 'mean_class_accuracy'],\n",
" save_best='auto')\n",
"work_dir = './tutorial_exps'\n",
"omnisource = False\n",
"seed = 0\n",
"gpu_ids = range(0, 1)\n",
"\n"
]
}
],
"source": [
"from mmcv.runner import set_random_seed\n",
"\n",
"# Modify dataset type and path\n",
"cfg.dataset_type = 'VideoDataset'\n",
"cfg.data_root = 'kinetics400_tiny/train/'\n",
"cfg.data_root_val = 'kinetics400_tiny/val/'\n",
"cfg.ann_file_train = 'kinetics400_tiny/kinetics_tiny_train_video.txt'\n",
"cfg.ann_file_val = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n",
"cfg.ann_file_test = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n",
"\n",
"cfg.data.test.type = 'VideoDataset'\n",
"cfg.data.test.ann_file = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n",
"cfg.data.test.data_prefix = 'kinetics400_tiny/val/'\n",
"\n",
"cfg.data.train.type = 'VideoDataset'\n",
"cfg.data.train.ann_file = 'kinetics400_tiny/kinetics_tiny_train_video.txt'\n",
"cfg.data.train.data_prefix = 'kinetics400_tiny/train/'\n",
"\n",
"cfg.data.val.type = 'VideoDataset'\n",
"cfg.data.val.ann_file = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n",
"cfg.data.val.data_prefix = 'kinetics400_tiny/val/'\n",
"\n",
"# The flag is used to determine whether it is omnisource training\n",
"cfg.setdefault('omnisource', False)\n",
"# Modify num classes of the model in cls_head\n",
"cfg.model.cls_head.num_classes = 2\n",
"# We can use the pre-trained TSN model\n",
"cfg.load_from = './checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'\n",
"\n",
"# Set up working dir to save files and logs.\n",
"cfg.work_dir = './tutorial_exps'\n",
"\n",
"# The original learning rate (LR) is set for 8-GPU training.\n",
"# We divide it by 8 since we only use one GPU.\n",
"cfg.data.videos_per_gpu = cfg.data.videos_per_gpu // 16\n",
"cfg.optimizer.lr = cfg.optimizer.lr / 8 / 16\n",
"cfg.total_epochs = 10\n",
"\n",
"# We can set the checkpoint saving interval to reduce the storage cost\n",
"cfg.checkpoint_config.interval = 5\n",
"# We can set the log print interval to reduce the the times of printing log\n",
"cfg.log_config.interval = 5\n",
"\n",
"# Set seed thus the results are more reproducible\n",
"cfg.seed = 0\n",
"set_random_seed(0, deterministic=False)\n",
"cfg.gpu_ids = range(1)\n",
"\n",
"# Save the best\n",
"cfg.evaluation.save_best='auto'\n",
"\n",
"\n",
"# We can initialize the logger for training and have a look\n",
"# at the final config used for training\n",
"print(f'Config:\\n{cfg.pretty_text}')\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tES-qnZ3k38Z"
},
"source": [
"### Train a new recognizer\n",
"\n",
"Finally, lets initialize the dataset and recognizer, then train a new recognizer!"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "dDBWkdDRk6oz",
"outputId": "a85d80d7-b3c4-43f1-d49a-057e8036807f"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Use load_from_torchvision loader\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"2021-07-11 13:00:46,931 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}\n",
"/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n",
" cpuset_checked))\n",
"2021-07-11 13:00:46,980 - mmaction - INFO - load checkpoint from ./checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth\n",
"2021-07-11 13:00:46,981 - mmaction - INFO - Use load_from_local loader\n",
"2021-07-11 13:00:47,071 - mmaction - WARNING - The model and loaded state dict do not match exactly\n",
"\n",
"size mismatch for cls_head.fc_cls.weight: copying a param with shape torch.Size([400, 2048]) from checkpoint, the shape in current model is torch.Size([2, 2048]).\n",
"size mismatch for cls_head.fc_cls.bias: copying a param with shape torch.Size([400]) from checkpoint, the shape in current model is torch.Size([2]).\n",
"2021-07-11 13:00:47,074 - mmaction - INFO - Start running, host: root@b465112b4add, work_dir: /content/mmaction2/tutorial_exps\n",
"2021-07-11 13:00:47,078 - mmaction - INFO - Hooks will be executed in the following order:\n",
"before_run:\n",
"(VERY_HIGH ) StepLrUpdaterHook \n",
"(NORMAL ) CheckpointHook \n",
"(NORMAL ) EvalHook \n",
"(VERY_LOW ) TextLoggerHook \n",
" -------------------- \n",
"before_train_epoch:\n",
"(VERY_HIGH ) StepLrUpdaterHook \n",
"(NORMAL ) EvalHook \n",
"(LOW ) IterTimerHook \n",
"(VERY_LOW ) TextLoggerHook \n",
" -------------------- \n",
"before_train_iter:\n",
"(VERY_HIGH ) StepLrUpdaterHook \n",
"(NORMAL ) EvalHook \n",
"(LOW ) IterTimerHook \n",
" -------------------- \n",
"after_train_iter:\n",
"(ABOVE_NORMAL) OptimizerHook \n",
"(NORMAL ) CheckpointHook \n",
"(NORMAL ) EvalHook \n",
"(LOW ) IterTimerHook \n",
"(VERY_LOW ) TextLoggerHook \n",
" -------------------- \n",
"after_train_epoch:\n",
"(NORMAL ) CheckpointHook \n",
"(NORMAL ) EvalHook \n",
"(VERY_LOW ) TextLoggerHook \n",
" -------------------- \n",
"before_val_epoch:\n",
"(LOW ) IterTimerHook \n",
"(VERY_LOW ) TextLoggerHook \n",
" -------------------- \n",
"before_val_iter:\n",
"(LOW ) IterTimerHook \n",
" -------------------- \n",
"after_val_iter:\n",
"(LOW ) IterTimerHook \n",
" -------------------- \n",
"after_val_epoch:\n",
"(VERY_LOW ) TextLoggerHook \n",
" -------------------- \n",
"2021-07-11 13:00:47,081 - mmaction - INFO - workflow: [('train', 1)], max: 10 epochs\n",
"/usr/local/lib/python3.7/dist-packages/mmcv/runner/hooks/evaluation.py:190: UserWarning: runner.meta is None. Creating an empty one.\n",
" warnings.warn('runner.meta is None. Creating an empty one.')\n",
"2021-07-11 13:00:51,802 - mmaction - INFO - Epoch [1][5/15]\tlr: 7.813e-05, eta: 0:02:16, time: 0.942, data_time: 0.730, memory: 2918, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.7604, loss: 0.7604, grad_norm: 14.8813\n",
"2021-07-11 13:00:52,884 - mmaction - INFO - Epoch [1][10/15]\tlr: 7.813e-05, eta: 0:01:21, time: 0.217, data_time: 0.028, memory: 2918, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6282, loss: 0.6282, grad_norm: 10.1834\n",
"2021-07-11 13:00:53,706 - mmaction - INFO - Epoch [1][15/15]\tlr: 7.813e-05, eta: 0:00:59, time: 0.164, data_time: 0.001, memory: 2918, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.7165, loss: 0.7165, grad_norm: 10.8534\n",
"2021-07-11 13:00:57,724 - mmaction - INFO - Epoch [2][5/15]\tlr: 7.813e-05, eta: 0:01:09, time: 0.802, data_time: 0.596, memory: 2918, top1_acc: 0.3000, top5_acc: 1.0000, loss_cls: 0.7001, loss: 0.7001, grad_norm: 11.4311\n",
"2021-07-11 13:00:59,219 - mmaction - INFO - Epoch [2][10/15]\tlr: 7.813e-05, eta: 0:01:00, time: 0.296, data_time: 0.108, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6916, loss: 0.6916, grad_norm: 12.7101\n",
"2021-07-11 13:01:00,040 - mmaction - INFO - Epoch [2][15/15]\tlr: 7.813e-05, eta: 0:00:51, time: 0.167, data_time: 0.004, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6567, loss: 0.6567, grad_norm: 8.8837\n",
"2021-07-11 13:01:04,152 - mmaction - INFO - Epoch [3][5/15]\tlr: 7.813e-05, eta: 0:00:56, time: 0.820, data_time: 0.618, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6320, loss: 0.6320, grad_norm: 11.4025\n",
"2021-07-11 13:01:05,526 - mmaction - INFO - Epoch [3][10/15]\tlr: 7.813e-05, eta: 0:00:50, time: 0.276, data_time: 0.075, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6542, loss: 0.6542, grad_norm: 10.6429\n",
"2021-07-11 13:01:06,350 - mmaction - INFO - Epoch [3][15/15]\tlr: 7.813e-05, eta: 0:00:44, time: 0.165, data_time: 0.001, memory: 2918, top1_acc: 0.2000, top5_acc: 1.0000, loss_cls: 0.7661, loss: 0.7661, grad_norm: 12.8421\n",
"2021-07-11 13:01:10,771 - mmaction - INFO - Epoch [4][5/15]\tlr: 7.813e-05, eta: 0:00:47, time: 0.883, data_time: 0.676, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6410, loss: 0.6410, grad_norm: 10.6697\n",
"2021-07-11 13:01:11,776 - mmaction - INFO - Epoch [4][10/15]\tlr: 7.813e-05, eta: 0:00:42, time: 0.201, data_time: 0.011, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6949, loss: 0.6949, grad_norm: 10.5467\n",
"2021-07-11 13:01:12,729 - mmaction - INFO - Epoch [4][15/15]\tlr: 7.813e-05, eta: 0:00:38, time: 0.190, data_time: 0.026, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6290, loss: 0.6290, grad_norm: 11.2779\n",
"2021-07-11 13:01:16,816 - mmaction - INFO - Epoch [5][5/15]\tlr: 7.813e-05, eta: 0:00:38, time: 0.817, data_time: 0.608, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6011, loss: 0.6011, grad_norm: 9.1335\n",
"2021-07-11 13:01:18,176 - mmaction - INFO - Epoch [5][10/15]\tlr: 7.813e-05, eta: 0:00:35, time: 0.272, data_time: 0.080, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6652, loss: 0.6652, grad_norm: 11.0616\n",
"2021-07-11 13:01:19,119 - mmaction - INFO - Epoch [5][15/15]\tlr: 7.813e-05, eta: 0:00:32, time: 0.188, data_time: 0.017, memory: 2918, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6440, loss: 0.6440, grad_norm: 11.6473\n",
"2021-07-11 13:01:19,120 - mmaction - INFO - Saving checkpoint at 5 epochs\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 4.9 task/s, elapsed: 2s, ETA: 0s"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"2021-07-11 13:01:21,673 - mmaction - INFO - Evaluating top_k_accuracy ...\n",
"2021-07-11 13:01:21,677 - mmaction - INFO - \n",
"top1_acc\t0.7000\n",
"top5_acc\t1.0000\n",
"2021-07-11 13:01:21,679 - mmaction - INFO - Evaluating mean_class_accuracy ...\n",
"2021-07-11 13:01:21,682 - mmaction - INFO - \n",
"mean_acc\t0.7000\n",
"2021-07-11 13:01:22,264 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_5.pth.\n",
"2021-07-11 13:01:22,267 - mmaction - INFO - Best top1_acc is 0.7000 at 5 epoch.\n",
"2021-07-11 13:01:22,271 - mmaction - INFO - Epoch(val) [5][5]\ttop1_acc: 0.7000, top5_acc: 1.0000, mean_class_accuracy: 0.7000\n",
"2021-07-11 13:01:26,623 - mmaction - INFO - Epoch [6][5/15]\tlr: 7.813e-05, eta: 0:00:31, time: 0.868, data_time: 0.656, memory: 2918, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6753, loss: 0.6753, grad_norm: 11.8640\n",
"2021-07-11 13:01:27,597 - mmaction - INFO - Epoch [6][10/15]\tlr: 7.813e-05, eta: 0:00:28, time: 0.195, data_time: 0.003, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6715, loss: 0.6715, grad_norm: 11.3347\n",
"2021-07-11 13:01:28,736 - mmaction - INFO - Epoch [6][15/15]\tlr: 7.813e-05, eta: 0:00:25, time: 0.228, data_time: 0.063, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5769, loss: 0.5769, grad_norm: 9.2541\n",
"2021-07-11 13:01:32,860 - mmaction - INFO - Epoch [7][5/15]\tlr: 7.813e-05, eta: 0:00:24, time: 0.822, data_time: 0.620, memory: 2918, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.5379, loss: 0.5379, grad_norm: 8.0147\n",
"2021-07-11 13:01:34,340 - mmaction - INFO - Epoch [7][10/15]\tlr: 7.813e-05, eta: 0:00:22, time: 0.298, data_time: 0.109, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6187, loss: 0.6187, grad_norm: 11.5244\n",
"2021-07-11 13:01:35,165 - mmaction - INFO - Epoch [7][15/15]\tlr: 7.813e-05, eta: 0:00:19, time: 0.165, data_time: 0.002, memory: 2918, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.7063, loss: 0.7063, grad_norm: 12.4979\n",
"2021-07-11 13:01:39,435 - mmaction - INFO - Epoch [8][5/15]\tlr: 7.813e-05, eta: 0:00:17, time: 0.853, data_time: 0.641, memory: 2918, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.5369, loss: 0.5369, grad_norm: 8.6545\n",
"2021-07-11 13:01:40,808 - mmaction - INFO - Epoch [8][10/15]\tlr: 7.813e-05, eta: 0:00:15, time: 0.275, data_time: 0.086, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6407, loss: 0.6407, grad_norm: 12.5537\n",
"2021-07-11 13:01:41,627 - mmaction - INFO - Epoch [8][15/15]\tlr: 7.813e-05, eta: 0:00:12, time: 0.164, data_time: 0.001, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6073, loss: 0.6073, grad_norm: 11.4028\n",
"2021-07-11 13:01:45,651 - mmaction - INFO - Epoch [9][5/15]\tlr: 7.813e-05, eta: 0:00:11, time: 0.803, data_time: 0.591, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5596, loss: 0.5596, grad_norm: 10.0821\n",
"2021-07-11 13:01:46,891 - mmaction - INFO - Epoch [9][10/15]\tlr: 7.813e-05, eta: 0:00:08, time: 0.248, data_time: 0.044, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6470, loss: 0.6470, grad_norm: 11.8979\n",
"2021-07-11 13:01:47,944 - mmaction - INFO - Epoch [9][15/15]\tlr: 7.813e-05, eta: 0:00:06, time: 0.211, data_time: 0.041, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6657, loss: 0.6657, grad_norm: 12.0643\n",
"2021-07-11 13:01:52,200 - mmaction - INFO - Epoch [10][5/15]\tlr: 7.813e-05, eta: 0:00:04, time: 0.849, data_time: 0.648, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6310, loss: 0.6310, grad_norm: 11.5690\n",
"2021-07-11 13:01:53,707 - mmaction - INFO - Epoch [10][10/15]\tlr: 7.813e-05, eta: 0:00:02, time: 0.303, data_time: 0.119, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5178, loss: 0.5178, grad_norm: 9.3324\n",
"2021-07-11 13:01:54,520 - mmaction - INFO - Epoch [10][15/15]\tlr: 7.813e-05, eta: 0:00:00, time: 0.162, data_time: 0.001, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6919, loss: 0.6919, grad_norm: 12.6688\n",
"2021-07-11 13:01:54,522 - mmaction - INFO - Saving checkpoint at 10 epochs\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 5.9 task/s, elapsed: 2s, ETA: 0s"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"2021-07-11 13:01:56,741 - mmaction - INFO - Evaluating top_k_accuracy ...\n",
"2021-07-11 13:01:56,743 - mmaction - INFO - \n",
"top1_acc\t1.0000\n",
"top5_acc\t1.0000\n",
"2021-07-11 13:01:56,749 - mmaction - INFO - Evaluating mean_class_accuracy ...\n",
"2021-07-11 13:01:56,750 - mmaction - INFO - \n",
"mean_acc\t1.0000\n",
"2021-07-11 13:01:57,267 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_10.pth.\n",
"2021-07-11 13:01:57,269 - mmaction - INFO - Best top1_acc is 1.0000 at 10 epoch.\n",
"2021-07-11 13:01:57,270 - mmaction - INFO - Epoch(val) [10][5]\ttop1_acc: 1.0000, top5_acc: 1.0000, mean_class_accuracy: 1.0000\n"
]
}
],
"source": [
"import os.path as osp\n",
"\n",
"from mmaction.datasets import build_dataset\n",
"from mmaction.models import build_model\n",
"from mmaction.apis import train_model\n",
"\n",
"import mmcv\n",
"\n",
"# Build the dataset\n",
"datasets = [build_dataset(cfg.data.train)]\n",
"\n",
"# Build the recognizer\n",
"model = build_model(cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))\n",
"\n",
"# Create work_dir\n",
"mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))\n",
"train_model(model, datasets, cfg, distributed=False, validate=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zdSd7oTLlxIf"
},
"source": [
"### Understand the log\n",
"From the log, we can have a basic understanding the training process and know how well the recognizer is trained.\n",
"\n",
"Firstly, the ResNet-50 backbone pre-trained on ImageNet is loaded, this is a common practice since training from scratch is more cost. The log shows that all the weights of the ResNet-50 backbone are loaded except the `fc.bias` and `fc.weight`.\n",
"\n",
"Second, since the dataset we are using is small, we loaded a TSN model and finetune it for action recognition.\n",
"The original TSN is trained on original Kinetics-400 dataset which contains 400 classes but Kinetics-400 Tiny dataset only have 2 classes. Therefore, the last FC layer of the pre-trained TSN for classification has different weight shape and is not used.\n",
"\n",
"Third, after training, the recognizer is evaluated by the default evaluation. The results show that the recognizer achieves 100% top1 accuracy and 100% top5 accuracy on the val dataset,\n",
" \n",
"Not bad!"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ryVoSfZVmogw"
},
"source": [
"## Test the trained recognizer\n",
"\n",
"After finetuning the recognizer, let's check the prediction results!"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "eyY3hCMwyTct",
"outputId": "ea54ff0a-4299-4e93-c1ca-4fe597e7516b"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ ] 0/10, elapsed: 0s, ETA:"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n",
" cpuset_checked))\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 2.2 task/s, elapsed: 5s, ETA: 0s\n",
"Evaluating top_k_accuracy ...\n",
"\n",
"top1_acc\t1.0000\n",
"top5_acc\t1.0000\n",
"\n",
"Evaluating mean_class_accuracy ...\n",
"\n",
"mean_acc\t1.0000\n",
"top1_acc: 1.0000\n",
"top5_acc: 1.0000\n",
"mean_class_accuracy: 1.0000\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/content/mmaction2/mmaction/datasets/base.py:166: UserWarning: Option arguments for metrics has been changed to `metric_options`, See 'https://github.com/open-mmlab/mmaction2/pull/286' for more details\n",
" 'Option arguments for metrics has been changed to '\n"
]
}
],
"source": [
"from mmaction.apis import single_gpu_test\n",
"from mmaction.datasets import build_dataloader\n",
"from mmcv.parallel import MMDataParallel\n",
"\n",
"# Build a test dataloader\n",
"dataset = build_dataset(cfg.data.test, dict(test_mode=True))\n",
"data_loader = build_dataloader(\n",
" dataset,\n",
" videos_per_gpu=1,\n",
" workers_per_gpu=cfg.data.workers_per_gpu,\n",
" dist=False,\n",
" shuffle=False)\n",
"model = MMDataParallel(model, device_ids=[0])\n",
"outputs = single_gpu_test(model, data_loader)\n",
"\n",
"eval_config = cfg.evaluation\n",
"eval_config.pop('interval')\n",
"eval_res = dataset.evaluate(outputs, **eval_config)\n",
"for name, val in eval_res.items():\n",
" print(f'{name}: {val:.04f}')"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [],
"include_colab_link": true,
"name": "MMAction2 Tutorial.ipynb",
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}