The following datasets are released for non-commercial use, refer to LICENSE for more details.
In all datasets samples are indexed by identity and frame - there are approximately 20,000 identities in each dataset
with 5 frames each. Indices are formatted with leading zeros, for example img_0000123_004.jpg
for identity 123, frame 4.
Some pose data is sourced from the AMASS and MANO
datasets and is not directly redistributed by us. This data will be downloaded and spliced into the full dataset as part
of the download_data.py
script. You therefore need valid logins to https://amass.is.tue.mpg.de/ and
https://mano.is.tue.mpg.de/ which you will be prompted for when running the script.
Once downloaded, you can use python visualize_data.py [path_to_dataset]
to visualize the data
including some ground-truth annotations.
First setup your environment by running pip install -r requirements.txt
using python 3.10 and installing
wget
on your system if it isn't already. Our server requires TLSv1.2 which some old versions of wget do not support.
We have successfully tested 1.21.4 of this this build on windows.
The following command will download the dataset to YOUR_DATA_DIRECTORY/synth_body/
:
python download_data.py --dataset body --output-dir YOUR_DATA_DIRECTORY/
If you want just a single identity (500KB) you can add the --single_id
flag, or for a single chunk (380MB) add --single_chunk
.
The total size of the dataset is approximately 10GB.
Data Type | File Name |
---|---|
RGB image | img_XXXXXXX_XXX.jpg |
Grayscale beard segmentation | segm_beard_XXXXXXX_XXX.png |
Grayscale eyebrows segmentation | segm_eyebrows_XXXXXXX_XXX.png |
Grayscale eyelashes segmentation | segm_eyelashes_XXXXXXX_XXX.png |
Grayscale facewear segmentation | segm_facewear_XXXXXXX_XXX.png |
Grayscale glasses segmentation | segm_glasses_XXXXXXX_XXX.png |
Grayscale head hair segmentation | segm_hair_XXXXXXX_XXX.png |
Grayscale headwear segmentation | segm_headwear_XXXXXXX_XXX.png |
Integer body parts segmentation | segm_parts_XXXXXXX_XXX.png |
Class | Index |
---|---|
BACKGROUND | 0 |
FACE | 1 |
LEFT_UPPER_TORSO | 2 |
LEFT_LOWER_TORSO | 3 |
RIGHT_UPPER_TORSO | 4 |
RIGHT_LOWER_TORSO | 5 |
LEFT_UPPER_LEG | 6 |
LEFT_LOWER_LEG | 7 |
LEFT_FOOT | 8 |
RIGHT_UPPER_LEG | 9 |
RIGHT_LOWER_LEG | 10 |
RIGHT_FOOT | 11 |
LEFT_UPPER_ARM | 12 |
LEFT_LOWER_ARM | 13 |
LEFT_HAND | 14 |
RIGHT_UPPER_ARM | 15 |
RIGHT_LOWER_ARM | 16 |
RIGHT_HAND | 17 |
{
"camera": {
"world_to_camera": [ "4x4 array of camera extrinsics" ],
"camera_to_image": [ "3x3 array of camera intrinsics" ],
"resolution": [
512,
512
]
},
"pose": [ " 52x3 array of SMPL-H pose parameters" ],
"translation": [ "3 element array for SMPL-H translation" ],
"body_identity": [ "16 element array of neutral SMPL-H shape parameters" ],
"landmarks": {
"3D_world": [ "52x3 array of 3D landmarks in world-space corresponding to SMPL-H joints" ],
"3D_cam": [ "52x3 array of 3D landmarks in camera-space corresponding to SMPL-H joints" ],
"2D": [ "52x2 array of 2D landmarks in image-space corresponding to SMPL-H joints" ]
}
}
The dataset includes some images with secondary "distractor" people in the background, the ground-truth data does not include annotations for these people, only the primary person. These images help with robustness to occlusions and cases where people are close together in real-world scenarios.
As detailed in the paper, clothing is modeled using displacement maps. Segmentation ground-truth includes the effect of these displacements, but landmarks are not displaced and instead lie directly on the surface of the body mesh.
The following command will download the dataset to YOUR_DATA_DIRECTORY/synth_face/
:
python download_data.py --dataset face --output-dir /YOUR_DATA_DIRECTORY/
If you want just a single identity (500KB) you can add the --single_id
flag, or for a single chunk (500MB) add --single_chunk
.
The total size of the dataset is approximately 11GB.
Data Type | File Name |
---|---|
RGB image | img_XXXXXXX_XXX.jpg |
Grayscale beard segmentation | segm_beard_XXXXXXX_XXX.png |
Grayscale clothing segmentation | segm_clothing_XXXXXXX_XXX.png |
Grayscale eyebrows segmentation | segm_eyebrows_XXXXXXX_XXX.png |
Grayscale eyelashes segmentation | segm_eyelashes_XXXXXXX_XXX.png |
Grayscale facewear segmentation | segm_facewear_XXXXXXX_XXX.png |
Grayscale glasses segmentation | segm_glasses_XXXXXXX_XXX.png |
Grayscale head hair segmentation | segm_hair_XXXXXXX_XXX.png |
Grayscale headwear segmentation | segm_headwear_XXXXXXX_XXX.png |
Integer face parts segmentation | segm_parts_XXXXXXX_XXX.png |
Class | Index |
---|---|
BACKGROUND | 0 |
SKIN | 1 |
NOSE | 2 |
RIGHT_EYE | 3 |
LEFT_EYE | 4 |
RIGHT_BROW | 5 |
LEFT_BROW | 6 |
RIGHT_EAR | 7 |
LEFT_EAR | 8 |
MOUTH_INTERIOR | 9 |
TOP_LIP | 10 |
BOTTOM_LIP | 11 |
NECK | 12 |
{
"camera": {
"world_to_camera": [ "4x4 array of camera extrinsics" ],
"camera_to_image": [ "3x3 array of camera intrinsics" ],
"resolution": [
512,
512
]
},
"head_pose": [ "3x3 rotation matrix of the head" ],
"left_eye_pose": [ "3x3 rotation matrix of the left eye"],
"right_eye_pose": [ "3x3 rotation matrix of the right eye" ],
"landmarks": {
"2D": [ "70x2 array of landmarks in image space" ]
}
}
The dataset includes some images with secondary "distractor" faces in the background, the ground-truth data does not include annotations for these faces, only the primary face. These images help with robustness to occlusions and cases where faces are close together in real-world scenarios.
The following command will download the dataset to YOUR_DATA_DIRECTORY/synth_hand/
:
python download_data.py --dataset hand --output-dir /YOUR_DATA_DIRECTORY/
If you want just a single identity (250KB) you can add the --single_id
flag, or for a single chunk (300MB) add --single_chunk
.
The total size of the dataset is approximately 7GB.
Data Type | File Name |
---|---|
RGB image | img_XXXXXXX_XXX.jpg |
{
"camera": {
"world_to_camera": [ "4x4 array of camera extrinsics" ],
"camera_to_image": [ "3x3 array of camera intrinsics" ],
"resolution": [
512,
512
]
},
"pose": [ " 52x3 array of SMPL-H pose parameters" ],
"translation": [ "3 element array for SMPL-H translation" ],
"body_identity": [ "16 element array of neutral SMPL-H shape parameters" ],
"landmarks": {
"3D_world": [ "21x3 array of 3D landmarks in world-space - first 15 elements are MANO joints, last 5 are finger tips" ],
"3D_cam": [ "21x3 array of 3D landmarks in camera-space - first 15 elements are MANO joints, last 5 are finger tips" ],
"2D": [ "21x2 array of 2D landmarks in image-space - first 15 elements are MANO joints, last 5 are finger tips" ]
}
}
Our parametric body model uses a 300 component SMPL-H shape basis and adds the MANO shape basis to the hands, as well as incorporating skin displacement maps. The reposed SMPL-H meshes therefore do not exactly match the rendered images, this difference is only significant for some hand images.