gait-recognition / Git / Diff of /README.md

Models:

JoelW/

gait-recognition

Downloads: 1

Diff of /README.md [9f85ab] .. [682249]

Switch to unified view


...

**Spatial features** from the video frames are extracted according to the descriptors that involve **pose of the pedestrian**. These descriptors are generated from the first sub-network - `HumanPoseNN` defined in `human_pose_nn` module. `HumanPoseNN` can be also used as a standalone network for regular **2D pose estimation problem** from still images (for more info see [this section](#pose-estimation)).

Responsibility of the second sub-network - `GaitNN` defined in `gait_nn` module is the further processing of the generated spatial features into one-dimensional **pose descriptors** with the use of a residual convolutional network. **Temporal features** are then extracted across these *pose descriptors* with the use of the multilayer recurrent cells - **LSTM** or **GRU**. All temporal features are finally aggregated with **Average temporal pooling** into one-dimensional **identification vector** with good discriminatory properties. As already mentioned in the text above, the human identification vectors are linearly separable with each other and can therefore be classified with e.g. **linear SVM**.



## Gait recognition

The dummy code bellow shows how to generate the identification vector form the input data `video_frames`. For the best results, all frames should include the **whole** person visible from the **profile view**. The person should be located approximately in the center of each frame. 

...

#### Dummy pose estimation

If you run the script `dummy_pose_estimation.py`, the pose of a human in the dummy image */images/dummy.jpg* will be estimated and displayed in a new-created image */images/dummy_pose.jpg*. For doing this you must have the `matplotlib` package installed and have pre-trained model `MPII+LSP` stored in */models/MPII+LSP.ckpt* - for getting pre-trained models check the next section. The generated image in */images/dummy_pose.jpg* should look like this one:



Printed probabilities of each estimate:

```
right ankle : 85.80%
...

**Download**: [MPII+LSP.ckpt](https://drive.google.com/file/d/1bNoZkuI0TCqf_DV613SOAng3p6Y0Si6a/view?usp=sharing)

The checkpoint `MPII+LSP.ckpt` was trained on images from [MPII](http://human-pose.mpi-inf.mpg.de) and [LSP](http://www.comp.leeds.ac.uk/mat4saj/lsp.html) database. In the graph below you can see the average distance between predicted and desired joints on a **validation set of about 6 000 images**.









### HumanPoseNN: **Human 3.6m**

**Download**: [Human3.6m.ckpt](https://drive.google.com/file/d/1lup13q5lTzsbrRZafpNbVF8uUyblMpZ3/view?usp=sharing) (action *walking*)

The checkpoint `Human3.6m.ckpt` was trained on the database [Human 3.6m](http://vision.imar.ro/human3.6m/description.php) and only on the **walking** sequences of peoples S1, S6, S7, S8, S9 and S11 (48 sequences). Person S5 (8 sequences) was used for a validation purposes and the average distance between predicted and desired joints is shown in the following graph. As you can see, errors are smaller compared to MPII+LSP. It is because desired poses in Human 3.6m was labeled more precisely using motion capture system, so the a trained network can more accurately estimate the human pose. The second reason is that Human 3.6m sequences are very monotonous and thus human pose estimation is less challenging. 



### GaitNN


We use the same standard TUM GAID experiments as described e.g. in [this paper](https://arxiv.org/abs/1601.06931) (section *Experimental results on TUM GAID*) from F.M. Castro et al. that currently achieve state of the art results. In short, there are 2 main experiments. The goal in the first one is to identify 305 people (100 training, 150 validation, 155 testing) using 10 gait sequences for each person. These sequences catch person in three different covariate conditions: **Normal** walk, walking with **backpack** and walking with **coating shoes**. However, the people on all of these video-sequences wear the same clothing. To address the various clothing conditions, there is the second experiment. The goal of the second experiment is to identify 32 peoples (10 training, 6 validation, 16 testing) using 20 gait sequences for each person - first 10 was taken in January and the other 10 in April. The people have different clothing, usual for respective season. 



The best performing model on the first experiment is `H3.6m-GRU-1` and on the second is `M+L-GRU-2`. The graphs bellow compares the performance of these models with already mentioned state of the art model [PFM](https://arxiv.org/abs/1601.06931) from F.M. Castro et al. The model `H3.6m-GRU-1` was trained only on the first experiment and on the second graph there is shown, how this model works on the validation set of the second experiment. As you can see, both models outperform PFM in the second experiment with a large margin. It means that these models are much more robust against clothing and time elapsed factors. 




**Download**:<br>
[H3.6m-GRU-1.ckpt](models/H3.6m-GRU-1.ckpt)<br>
[M+L-GRU-2.ckpt](models/M+L-GRU-2.ckpt)


	a/README.md		b/README.md
	...		...
17		17
18	Spatial features from the video frames are extracted according to the descriptors that involve pose of the pedestrian. These descriptors are generated from the first sub-network - `HumanPoseNN` defined in `human_pose_nn` module. `HumanPoseNN` can be also used as a standalone network for regular 2D pose estimation problem from still images (for more info see [this section](#pose-estimation)).	18	Spatial features from the video frames are extracted according to the descriptors that involve pose of the pedestrian. These descriptors are generated from the first sub-network - `HumanPoseNN` defined in `human_pose_nn` module. `HumanPoseNN` can be also used as a standalone network for regular 2D pose estimation problem from still images (for more info see [this section](#pose-estimation)).
19		19
20	Responsibility of the second sub-network - `GaitNN` defined in `gait_nn` module is the further processing of the generated spatial features into one-dimensional pose descriptors with the use of a residual convolutional network. Temporal features are then extracted across these pose descriptors with the use of the multilayer recurrent cells - LSTM or GRU. All temporal features are finally aggregated with Average temporal pooling into one-dimensional identification vector with good discriminatory properties. As already mentioned in the text above, the human identification vectors are linearly separable with each other and can therefore be classified with e.g. linear SVM.	20	Responsibility of the second sub-network - `GaitNN` defined in `gait_nn` module is the further processing of the generated spatial features into one-dimensional pose descriptors with the use of a residual convolutional network. Temporal features are then extracted across these pose descriptors with the use of the multilayer recurrent cells - LSTM or GRU. All temporal features are finally aggregated with Average temporal pooling into one-dimensional identification vector with good discriminatory properties. As already mentioned in the text above, the human identification vectors are linearly separable with each other and can therefore be classified with e.g. linear SVM.
21		21
22	![Architecture](images/architecture.jpg)
23		22
24	## Gait recognition	23	## Gait recognition
25		24
26	The dummy code bellow shows how to generate the identification vector form the input data `video_frames`. For the best results, all frames should include the whole person visible from the profile view. The person should be located approximately in the center of each frame.	25	The dummy code bellow shows how to generate the identification vector form the input data `video_frames`. For the best results, all frames should include the whole person visible from the profile view. The person should be located approximately in the center of each frame.
27		26
	...		...
81		80
82	#### Dummy pose estimation	81	#### Dummy pose estimation
83		82
84	If you run the script `dummy_pose_estimation.py`, the pose of a human in the dummy image /images/dummy.jpg will be estimated and displayed in a new-created image /images/dummy_pose.jpg. For doing this you must have the `matplotlib` package installed and have pre-trained model `MPII+LSP` stored in /models/MPII+LSP.ckpt - for getting pre-trained models check the next section. The generated image in /images/dummy_pose.jpg should look like this one:	83	If you run the script `dummy_pose_estimation.py`, the pose of a human in the dummy image /images/dummy.jpg will be estimated and displayed in a new-created image /images/dummy_pose.jpg. For doing this you must have the `matplotlib` package installed and have pre-trained model `MPII+LSP` stored in /models/MPII+LSP.ckpt - for getting pre-trained models check the next section. The generated image in /images/dummy_pose.jpg should look like this one:
85		84
86	![Dummy_pose](images/dummy_pose_gt.jpg)
87		85
88	Printed probabilities of each estimate:	86	Printed probabilities of each estimate:
89		87
90	```	88	```
91	right ankle : 85.80%	89	right ankle : 85.80%
	...		...
112		110
113	Download: [MPII+LSP.ckpt](https://drive.google.com/file/d/1bNoZkuI0TCqf_DV613SOAng3p6Y0Si6a/view?usp=sharing)	111	Download: [MPII+LSP.ckpt](https://drive.google.com/file/d/1bNoZkuI0TCqf_DV613SOAng3p6Y0Si6a/view?usp=sharing)
114		112
115	The checkpoint `MPII+LSP.ckpt` was trained on images from [MPII](http://human-pose.mpi-inf.mpg.de) and [LSP](http://www.comp.leeds.ac.uk/mat4saj/lsp.html) database. In the graph below you can see the average distance between predicted and desired joints on a validation set of about 6 000 images.	113	The checkpoint `MPII+LSP.ckpt` was trained on images from [MPII](http://human-pose.mpi-inf.mpg.de) and [LSP](http://www.comp.leeds.ac.uk/mat4saj/lsp.html) database. In the graph below you can see the average distance between predicted and desired joints on a validation set of about 6 000 images.
116		114
117	![MPII+LSP-results](images/mpii-results.jpg)
118
119	#### The sample of correctly estimated poses
120	![MPII-fit-human-pose](images/mpii-fit.jpg)
121
122	#### The sample of incorrectly estimated poses
123	![MPII-bad-fit-human-pose](images/mpii-fit-bad.jpg)
124
125	### HumanPoseNN: Human 3.6m	115	### HumanPoseNN: Human 3.6m
126		116
127	Download: [Human3.6m.ckpt](https://drive.google.com/file/d/1lup13q5lTzsbrRZafpNbVF8uUyblMpZ3/view?usp=sharing) (action walking)	117	Download: [Human3.6m.ckpt](https://drive.google.com/file/d/1lup13q5lTzsbrRZafpNbVF8uUyblMpZ3/view?usp=sharing) (action walking)
128		118
129	The checkpoint `Human3.6m.ckpt` was trained on the database [Human 3.6m](http://vision.imar.ro/human3.6m/description.php) and only on the walking sequences of peoples S1, S6, S7, S8, S9 and S11 (48 sequences). Person S5 (8 sequences) was used for a validation purposes and the average distance between predicted and desired joints is shown in the following graph. As you can see, errors are smaller compared to MPII+LSP. It is because desired poses in Human 3.6m was labeled more precisely using motion capture system, so the a trained network can more accurately estimate the human pose. The second reason is that Human 3.6m sequences are very monotonous and thus human pose estimation is less challenging.	119	The checkpoint `Human3.6m.ckpt` was trained on the database [Human 3.6m](http://vision.imar.ro/human3.6m/description.php) and only on the walking sequences of peoples S1, S6, S7, S8, S9 and S11 (48 sequences). Person S5 (8 sequences) was used for a validation purposes and the average distance between predicted and desired joints is shown in the following graph. As you can see, errors are smaller compared to MPII+LSP. It is because desired poses in Human 3.6m was labeled more precisely using motion capture system, so the a trained network can more accurately estimate the human pose. The second reason is that Human 3.6m sequences are very monotonous and thus human pose estimation is less challenging.
130		120
131	![H36m-results](images/h36m-results.jpg)
132		121
133	### GaitNN	122	### GaitNN
134		123
135		124
136	We use the same standard TUM GAID experiments as described e.g. in [this paper](https://arxiv.org/abs/1601.06931) (section Experimental results on TUM GAID) from F.M. Castro et al. that currently achieve state of the art results. In short, there are 2 main experiments. The goal in the first one is to identify 305 people (100 training, 150 validation, 155 testing) using 10 gait sequences for each person. These sequences catch person in three different covariate conditions: Normal walk, walking with backpack and walking with coating shoes. However, the people on all of these video-sequences wear the same clothing. To address the various clothing conditions, there is the second experiment. The goal of the second experiment is to identify 32 peoples (10 training, 6 validation, 16 testing) using 20 gait sequences for each person - first 10 was taken in January and the other 10 in April. The people have different clothing, usual for respective season.	125	We use the same standard TUM GAID experiments as described e.g. in [this paper](https://arxiv.org/abs/1601.06931) (section Experimental results on TUM GAID) from F.M. Castro et al. that currently achieve state of the art results. In short, there are 2 main experiments. The goal in the first one is to identify 305 people (100 training, 150 validation, 155 testing) using 10 gait sequences for each person. These sequences catch person in three different covariate conditions: Normal walk, walking with backpack and walking with coating shoes. However, the people on all of these video-sequences wear the same clothing. To address the various clothing conditions, there is the second experiment. The goal of the second experiment is to identify 32 peoples (10 training, 6 validation, 16 testing) using 20 gait sequences for each person - first 10 was taken in January and the other 10 in April. The people have different clothing, usual for respective season.
137		126
138	![gait-time-ellapsed](images/time-elapsed.jpg)
139
140	The best performing model on the first experiment is `H3.6m-GRU-1` and on the second is `M+L-GRU-2`. The graphs bellow compares the performance of these models with already mentioned state of the art model [PFM](https://arxiv.org/abs/1601.06931) from F.M. Castro et al. The model `H3.6m-GRU-1` was trained only on the first experiment and on the second graph there is shown, how this model works on the validation set of the second experiment. As you can see, both models outperform PFM in the second experiment with a large margin. It means that these models are much more robust against clothing and time elapsed factors.	127	The best performing model on the first experiment is `H3.6m-GRU-1` and on the second is `M+L-GRU-2`. The graphs bellow compares the performance of these models with already mentioned state of the art model [PFM](https://arxiv.org/abs/1601.06931) from F.M. Castro et al. The model `H3.6m-GRU-1` was trained only on the first experiment and on the second graph there is shown, how this model works on the validation set of the second experiment. As you can see, both models outperform PFM in the second experiment with a large margin. It means that these models are much more robust against clothing and time elapsed factors.
141		128
142	![Gait-experiment-1](images/ex1.jpg)
143	![Gait-experiment-2](images/ex2.jpg)
144		129
145	Download:<br>	130	Download:<br>
146	[H3.6m-GRU-1.ckpt](models/H3.6m-GRU-1.ckpt)<br>	131	[H3.6m-GRU-1.ckpt](models/H3.6m-GRU-1.ckpt)<br>
147	[M+L-GRU-2.ckpt](models/M+L-GRU-2.ckpt)	132	[M+L-GRU-2.ckpt](models/M+L-GRU-2.ckpt)
148		133