Train and test a human body joints estimator network using Lua/Torch7 for single humans on a single image. This method is a modified version of the original hourglass networks. For more information see our paper.
This code provides an easy way to train a network on a variety of datasets, all available through the dbcollection
package. The available datasets for train/test/benchmark are the following:
Dataset | Train | Test | Benchmark |
---|---|---|---|
Frames Labeled In Cinema (FLIC) | Yes | Yes | Yes |
Leeds Sports Pose (LSP) | Yes | Yes | Yes |
MPII | Yes | Yes | No** |
COCO | Yes | Yes | No** |
Note**: Only the FLIC and LSP datasets are evaluated/benchmarked here. The MPII and COCO datasets have dedicated servers for this purpose on their websites.
The network model used here for human body joint estimation is an enhanced version of Newell's method described in his paper with several modifications:
- replaced ReLUs with RReLUs
- use of spatialdropout between convolutions
- more data augmentation (more rotation, scaling, colour jittering)
- use of wider feature maps (more kernels) as the image resolution decreases
- replaced rmsprop optimization with adam
- additional tweaks to the basic auto-encoder network
Method | Elbow | Wrist | Total |
---|---|---|---|
Sapp et al., CVPR'13 | 72.5 | 54.5 | 63.5 |
Chen et al., NIPS'14 | 89.8 | 86.8 | 88.3 |
Yang et al., CVPR'16 | 91.6 | 88.8 | 90.2 |
Wei et al., CVPR'16 | 92.5 | 90.0 | 91.3 |
Newell et al., arXiv'16 | 98.0 | 95.5 | 96.8 |
Ours | 98.3 | 96.0 | 97.2 |
Method | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Total |
---|---|---|---|---|---|---|---|---|
Wang et al., CVPR'13 | 84.7 | 57.1 | 43.7 | 36.7 | 56.7 | 52.4 | 50.8 | 54.6 |
Pishchulin et al., ICCV' 13 | 87.2 | 56.7 | 46.7 | 38.0 | 61.0 | 57.5 | 52.7 | 57.1 |
Tompson et al., NIPS'14 | 90.6 | 79.2 | 67.9 | 63.4 | 69.5 | 71.0 | 64.2 | 72.3 |
Fan et al., CVPR'15 | 92.4 | 75.2 | 65.3 | 64.0 | 75.7 | 68.3 | 70.4 | 73.0 |
Chen et al., NIPS'14 | 91.8 | 78.2 | 71.8 | 65.5 | 73.3 | 70.2 | 63.4 | 73.4 |
Yang et al., CVPR'16 | 90.6 | 78.1 | 73.8 | 68.8 | 74.8 | 69.9 | 58.9 | 73.6 |
Rafi et al., BMVC'16 | 95.8 | 86.2 | 79.3 | 75.0 | 86.6 | 83.8 | 79.8 | 83.8 |
Yu et al., ECCV'16 | 87.2 | 88.2 | 82.4 | 76.3 | 91.4 | 85.8 | 78.7 | 84.3 |
Belagiannis et al., arXiv'16 | 95.2 | 89.0 | 81.5 | 77.0 | 83.7 | 87.0 | 82.8 | 85.2 |
Lifshitz et al., ECCV'16 | 96.8 | 89.0 | 82.7 | 79.1 | 90.9 | 86.0 | 82.5 | 86.7 |
Pishchulin et al., CVPR'16 | 97.0 | 91.0 | 83.8 | 78.1 | 91.0 | 86.7 | 82.0 | 87.1 |
Insafutdinov et al., ECCV'16 | 97.4 | 92.7 | 87.5 | 84.4 | 91.5 | 89.9 | 87.2 | 90.1 |
Wei et al., CVPR'16 | 97.8 | 92.5 | 87.0 | 83.9 | 91.5 | 90.8 | 89.9 | 90.5 |
Bulat et al., ECCV'16 | 97.2 | 92.1 | 88.1 | 85.2 | 92.2 | 91.4 | 88.7 | 90.7 |
Ours | 97.7 | 93.0 | 88.9 | 85.5 | 91.5 | 92.0 | 92.1 | 91.5 |
Note: The network was trained with data from the MPII and LSPe datasets just like most of the methods on the benchmark.
To run the code in this repository you'll need the following resources:
- Torch7
- Matlab >= 2012a (for running the benchmark code)
- Python >= 2.7 or >= 3.5 (for dbcollection)
- NVIDIA GPU with compute capability 3.5+ (12GB+ ram or two 6GB+ ram GPUs)
Note: Here we used two 6GB GPUs to train the network. We recommend, at least, one GPU with 6GB ram for inference and one 12GB ram GPU for training the model.
Also, you'll need to install the following packages:
luarocks install display
luarocks install cudnn
luarocks install inn
luarocks install matio
luarocks install tds
luarocks install torchnet
To install the dbcollection package do the following:
-
install the Python module (Python>=2.7 and >=3.5).
pip install dbcollection==0.1.7
or
conda install -c farrajota dbcollection==0.1.7
-
install the Lua/Torch7 dbcollection wrapper:
- download the Lua/Torch7 git repo to disk.
git clone https://github.com/dbcollection/dbcollection-torch7
- install the package.
cd dbcollection-torch7 && luarocks make
For more information about the dbcollection package see here.
dbcollection
contains the data annotations necessary to run this code. If the data is not setup, this package will download, extract and preprocess the data annotations and store them to disk. To specify where to store the downloaded/extracted data use the -data_dir=<path>
. If left empty, and the data will be stored in ~/dbcollection/<dataset>/data/
.
For more information on how to setup a dataset see the dbcollection repo.
After installing the necessary requirements, it is advised to setup the necessary data before proceeding. Since the code uses dbcollection
for managing datasets, downloading/setting up the data folders first is best if you desire to specify the dataset's directory manually. Then, to start training a network simply do th train.lua -expID <net_name>
. To use a specific dataset, for example FLIC, specify the -dataset
input arg when running the script th train.lua -expID <net_name> -dataset flic
.
Most of the command line options are pretty self-explanatory, and can be found in options.lua
. The -expID
option will be used to save important information in a directory like pose-torchnet/exp/<dataset>/<expID>/
. This directory will include snapshots of the trained model, training/validations logs with loss and accuracy information, and other details of the options set for that particular experiment.
To use this code, clone the repo into your home directory:
git clone --recursive https://github.com/farrajota/human_pose_estimation_torch
If you clone the repo into a different directory, please make sure you modify projectdir.lua
and point to the new path before using the code.
To train a network you simply need to do th train.lua
. This will train a network with the default parameters. To train a network with other options please see the options.lua
file or look in tests/
for some scripts that contain training procedures.
When training a network, a small sample of the overall dataset is used to compute the current accuracy. To use the entire dataset to compute the total accuracy of a network, run th test.lua -expID <name_exp> -dataset <name_exp>
. For the MPII dataset, the training set is split into two (train + val) and the evaluation is performed on the val
set.
To benchmark a trained model on the FLIC or LSP datasets run th benchmark.lua -expID <name_exp> -dataset <name_exp>
. This will process predictions of all body joints for all images of the selected dataset and store them to disk into two files (Predictions.t7
+ Predictions.mat
) inside the folder of the experiment.
Furthermore, for the FLIC and LSP datasets, this repo provides a comparison with other methods for body joint prediction whose predictions are made available online (lsp results are available here; for FLIC results I had to manually search some of them on github). To evaluate on the MPII and COCO datasets you will need to use their evaluation servers if you want to compare the results with other methods.
For convenience during training, the accuracy function evaluates the PCK metric by comparing the output heatmap of the network to the ground truth heatmap. This should provide a good enough performance measure when training a network on any dataset.
Due to problems in cleaning temporary buffers (grad/output buffers) with :clearState()
, in order to store models to disk, this option is hidden behind an input flag. For our setup, using :clearState()
with the GPUs memory almost maxed out, this would cause crashes due to insufficient memory when re-populating the buffers. In case of training networks that are smaller or have less parameters this should not be an issue (or if you have a GPU with 12GB+ ram).
@Inbook{Farrajota2017,
author="Farrajota, M.
and Rodrigues, Jo{\~a}o M. F.
and du Buf, J. M. H.",
editor="Alexandre, Lu{\'i}s A.
and Salvador S{\'a}nchez, Jos{\'e}
and Rodrigues, Jo{\~a}o M. F.",
title="Human Pose Estimation by a Series of Residual Auto-Encoders",
bookTitle="Pattern Recognition and Image Analysis: 8th Iberian Conference, IbPRIA 2017, Faro, Portugal, June 20-23, 2017, Proceedings",
year="2017",
publisher="Springer International Publishing",
address="Cham",
pages="131--139",
isbn="978-3-319-58838-4",
doi="10.1007/978-3-319-58838-4_15",
url="https://doi.org/10.1007/978-3-319-58838-4_15"
}
MIT license (see the LICENSE file)
This repo is based on Newell's code for training a human pose detector.