We provide two ways to set up the Human3.6M dataset on our pipeline. You can either use the dataset preprocessed by Martinez et al. (fastest way) or convert the original dataset from scratch. The two methods produce the same result. After this step, you should end up with two files in the data
directory: data_3d_h36m.npz
for the 3D poses, and data_2d_h36m_gt.npz
for the ground-truth 2D poses.
Download the h36m.zip archive (source: 3D pose baseline repository) to the data
directory, and run the conversion script from the same directory. This step does not require any additional dependency.
cd data
wget https://www.dropbox.com/s/e35qv3n6zlkouki/h36m.zip
python prepare_data_h36m.py --from-archive h36m.zip
cd ..
Alternatively, you can download the dataset from the Human3.6m website and convert it from its original format. This is useful if the other link goes down, or if you want to be sure to use the original source. MATLAB is required for this step.
First, we need to convert the 3D poses from .cdf
to .mat
, so they can be loaded from Python scripts. To this end, we have provided the MATLAB script convert_cdf_to_mat.m
in the data
directory. Extract the archives named Poses_D3_Positions_S*.tgz
(subjects 1, 5, 6, 7, 8, 9, 11) to a directory named pose
, and set up your directory tree so that it looks like this:
/path/to/dataset/convert_cdf_to_mat.m
/path/to/dataset/pose/S1/MyPoseFeatures/D3_Positions/Directions 1.cdf
/path/to/dataset/pose/S1/MyPoseFeatures/D3_Positions/Directions.cdf
...
Then run convert_cdf_to_mat.m
from MATLAB.
Finally, as before, run the Python conversion script specifying the dataset path:
cd data
python prepare_data_h36m.py --from-source /path/to/dataset/pose
cd ..
We provide support for the following 2D detections:
gt
: ground-truth 2D poses, extracted through the camera projection parameters.sh_pt_mpii
: Stacked Hourglass detections, pretrained on MPII.sh_ft_h36m
: Stacked Hourglass detections, fine-tuned on Human3.6M.detectron_ft_h36m
: Detectron (Mask R-CNN) detections, fine-tuned on Human3.6M.cpn_ft_h36m_dbb
: Cascaded Pyramid Network detections, fine-tuned on Human3.6M. Bounding boxes fromdetectron_ft_h36m
.- User-supplied (see below).
The 2D detection source is specified through the --keypoints
parameter, which loads the file data_2d_DATASET_DETECTION.npz
from the data
directory, where DATASET
is the dataset name (e.g. h36m
) and DETECTION
is the 2D detection source (e.g. sh_pt_mpii
). Since all the files are encoded according to the same format, it is trivial to create a custom set of 2D detections.
Ground-truth poses (gt
) have already been extracted by the previous step. The other detections must be downloaded manually (see instructions below). You only need to download the detections you want to use. For reference, our best results on Human3.6M are achieved by cpn_ft_h36m_dbb
.
You can download these from AWS. You just have to put data_2d_h36m_cpn_ft_h36m_dbb.npz
and data_2d_h36m_detectron_ft_h36m.npz
in the data
directory.
cd data
wget https://dl.fbaipublicfiles.com/video-pose-3d/data_2d_h36m_cpn_ft_h36m_dbb.npz
wget https://dl.fbaipublicfiles.com/video-pose-3d/data_2d_h36m_detectron_ft_h36m.npz
cd ..
These detections have been produced by models fine-tuned on Human3.6M. We adopted the usual protocol of fine-tuning on 5 subjects (S1, S5, S6, S7, and S8). We also included detections from the unlabeled subjects S2, S3, S4, which can be loaded by our framework for semi-supervised experimentation.
These detections (both pretrained and fine-tuned) are provided by Martinez et al. in their repository on 3D human pose estimation. The 2D poses produced by the pretrained model are in the same archive as the dataset (h36m.zip). The fine-tuned poses can be downloaded here. Put the two archives in the data
directory and run:
cd data
python prepare_data_2d_h36m_sh.py -pt h36m.zip
python prepare_data_2d_h36m_sh.py -ft stacked_hourglass_fined_tuned_240.tar.gz
cd ..
For HumanEva, you need the original dataset and MATLAB. We provide a MATLAB script to extract the revelant parts of the dataset automatically.
- Download the HumanEva-I dataset and extract it.
- Download the official source code v1.1 beta and extract it where you extracted the dataset.
- Copy the contents of the directory
Release_Code_v1_1_beta\HumanEva_I
to the root of the source tree (Release_Code_v1_1_beta/
). - Download the critical dataset update and apply it.
- Important: for visualization purposes, the original code requires an old library named dxAvi, which is used for decoding XVID videos. A precompiled binary for 32-bit architectures is already included, but if you are running MATLAB on a 64-bit system, the code will not work. You can either recompile dxAvi library for x64, or bypass it entirely, since we are not using visualization features in our conversion script. To this end, you can patch
@sync_stream/sync_stream.m
, replacing line 202:ImageStream(I) = image_stream(image_paths{I}, start_image_offset(I));
withImageStream(I) = 0;
- Now you can copy our script
ConvertHumanEva.m
(fromdata/
) toRelease_Code_v1_1_beta/
, and run it. It will create a directory namedconverted_15j
, which contains the converted 2D/3D ground-truth poses on a 15-joint skeleton. - Optional: if you want to experiment with a 20-joint skeleton, change
N_JOINTS
to 20 inConvertHumanEva.m
, and repeat the process. It will create a directory namedconverted_20j
. Adapt next steps accordingly.
If you get warnings about mocap errors or dropped frames, this is normal. The HumanEva dataset contains some invalid frames due to occlusions, which are simply discarded. Since we work with videos (and not individual frames), we try to minimize the impact of this issue by grouping valid sequences into contiguous chunks.
Finally, run the Python script to produce the final files:
python prepare_data_humaneva.py -p /path/to/dataset/Release_Code_v1_1_beta/converted_15j --convert-3d
You should end up with two files in the data
directory: data_3d_humaneva15.npz
for the 3D poses, and data_2d_humaneva15_gt.npz
for the ground-truth 2D poses.
We provide support for the following 2D detections:
gt
: ground-truth 2D poses, extracted through camera projection.detectron_pt_coco
: Detectron (Mask R-CNN) detections, pretrained on COCO.
Since HumanEva is very small, we do not fine-tune the pretrained models. As before, you can download Mask R-CNN detections from AWS (data_2d_humaneva15_detectron_pt_coco.npz
, which must be copied to data/
). As before, we have included detections for unlabeled subjects/actions. These begin with the prefix Unlabeled/
. Chunks that correspond to corrupted motion capture streams are also marked as unlabeled.
cd data
wget https://dl.fbaipublicfiles.com/video-pose-3d/data_2d_humaneva15_detectron_pt_coco.npz
cd ..