Skip to content

naver/poseembroider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PoseEmbroider

This repository is the official PyTorch implementation for the paper "PoseEmbroider:Towards a 3D, Visual, Semantic-aware Human Pose Representation" (ECCV 2024).

Overall idea

PoseEmbroider idea

The PoseEmbroider builds on the idea that all modalities are only partial views of the same concept. Metaphorically, modalities are like 2D shadows of a 3D object. Each 2D shadow is informative in its own way, but is also only partial. Put together, they make it possible to reach a better understanding of the core concept. In this project, we train a model with all these three complementary, partial yet valuable, modalities so as to derive a generic 3D-, visual- and semantic- aware pose representation.

To do so, we use a subset of input modality embeddings obtained from pretrained, frozen modality-specific encoders. We feed these modality embeddings to a transformer, which outputs the generic pose representation through an added special token (step 1). This generic pose representation is then reprojected into uni-modal spaces, where it is compared to the original modality embeddings (step 2). Metaphorically, we try to approximate the 3D object in the picture from the available shadows (step 1), then we assess the quality of the reconstruction by checking that the shadows are matching (step 2).

We showcase the potential of our proposed representation on two downstream applications: human mesh recovery and the posefix corrective instruction task, where the goal is to generate a text explaining how to modify one pose into another.

For more information, check the 5-minute video, or the paper.

Setup

🐍 Create python environment

Click for details. This code was tested in a python 3.8 environment.

From the main code directory:

pip install -r requirements.txt
python setup.py develop

If problems with OpenGL (setting: linux, anaconda3), check here.

Please also install & setup the PoseScript/PoseFix repo. No need to follow instructions past the "Create python environment" section, just install the repo, and, based on your system (location preferences), edit './src/text2pose/config.py' to set your preferred values for:

  • TRANSFORMER_CACHE_DIR (see more about it after)
  • SELFCONTACT_ESSENTIALS_DIR (see instructions in the abovementioned repo)
  • SMPLX_BODY_MODEL_PATH (see more about this after)
  • NB_INPUT_JOINTS (will likely be overwritten by input arguments anyway)

In general, you can define you favorite paths for data storage in ./src/poseembroider/config.py (eg. DATA_CACHE_DIR, TORCH_CACHE_DIR ...); see more about these below.

📥 Download data & tools

Click for details.

⚠️ Note that in the table below, directories are indicated as <BLAH_BLAH_BLAH>, but refer to the values of the corresponding variables in ./src/poseembroider/config.py. Change paths following your own preferences.

What Download instructions Where to store
SMPL-X body models Visit the smpl-x website, go to the "Download" tab and choose "Download SMPL-X v1.1 (NPZ+PKL, 830 MB) - Use this for SMPL-X Python codebase". in directory <SMPLX_BODY_MODEL_PATH> (smplx should be a direct subdir)
BEDLAM data
(images + csv)
Visit the BEDLAM website, go on the "Download" tab, then follow instructions under "Download instructions for synthetic image data (30fps)". Step 1: move the obtained csv subfolder into <BEDLAM_RAW_DATA_DIR> and untar included archives.
Step 2: define as <BEDLAM_IMG_DIR> the directory containing the obtained training and validation image folders.
BEDLAM data
(ground truth poses)
Visit the BEDLAM website, go on the "Download" tab, then get the SMPL-X neutral ground truth/animation files. in directory <BEDLAM_RAW_DATA_DIR> (the unzipped neutral_ground_truth_motioninfo should be a direct subdir)
BEDLAM data
(downsample_mat_smplx.pkl)
Click to download.
(origin: BEDLAM official github.)
in directory <BEDLAM_RAW_DATA_DIR> (should be a direct subfile)
PoseFix data
(pose pairs & texts)
Click to download. in directory <POSEFIX_DIR> (should contain the .json files)
PoseFix data
(pose data)
Visit the AMASS website, go on the "Download" tab, then download SMPL-H sequences for the subdatasets listed in this section. store it in you favorite location and indicate the path to the global directory under AMASS_FILE_LOCATION_SMPLH in src/poseembroider/datasets/process_posefix/extract_to_obj.py
SMPLer-X image encoder weights Click to download.
(origin: SMPLer-X official github, SMPLer-X-B32 (base))
in directory <TORCH_CACHE_DIR>/smplerx
PoseText model trained on BEDLAM-Script ; providing the text encoder weights Click to download. store it in your favorite location, and indicate the absolute path to the checkpoint file in src/poseembroider/shortname_2_model_path.json under shortname posetext_model_bedlamscript
VAE model trained on the poses of BEDLAM-Script : providing the pose encoder weights Click to download. store it in your favorite location, and indicate the absolute path to the checkpoint file in src/poseembroider/shortname_2_model_path.json under shortname posevae_model_bedlamscript

🔨 Format BEDLAM

Click for details.

This project resorts to the BEDLAM dataset for training: animation files are parsed for pose extraction, selection, pairing and captioning. To process BEDLAM for this project:

  1. Download BEDLAM data, the SMPL-X body model and the PoseText model (see previous section).
  2. Complete the repo by running:
    cd src/poseembroider
    
    # download the original file version for the step 1 of the process
    wget https://raw.githubusercontent.com/pixelite1201/BEDLAM/refs/heads/master/data_processing/df_full_body.py # from commit 099ab0ef8eb4c7737c0ca038e7010238dccbeed6
    mv df_full_body.py datasets/process_bedlam/process_bedlam_step1.py
    
    # modify the file of the step 1 so it does everything we want
    sh datasets/process_bedlam/modify_step1_file.sh
  3. Run:
    # (still from src/poseembroider)
    sh datasets/process_bedlam/do.sh

Note: the process is quite long (~35 hours). If you are interested in BEDLAM-Script only, you can comment steps 6 and 7 in the bash script (gain 5-6 hours). You can also comment the lines related to the train split if you are interested in testing only (gain 29 hours). Multiple caches are created along the way not to have to relaunch the whole process from scratch in case something unexpected happens.

You can check dataset samples by running:

cd src/poseembroider
# For BEDLAM-Script examples:
streamlit run datasets/visualize_dataset_script.py
# For BEDLAM-Fix examples:
streamlit run datasets/visualize_dataset_fix.py

🔨 Format PoseFix

Click for details. PoseFix is needed to train the head for human pose instruction generation.

PoseFix is originally given in SMPL-H format. To convert the dataset to SMPL-X format, please follow these indications.

Then, you can check dataset samples by running:

cd src/poseembroider
streamlit run datasets/visualize_dataset_fix.py

🔨 Compute the mean SMPL-X pose (optional)

Click for details. This step is not necessary to use the released pretrained models.

The mean SMPL-X pose is needed to train the SMPL-X regression head, an iterative residual network, mounted on top of the PoseEmbroider. To compute it:

  1. Format BEDLAM (see previous section).
  2. Create BEDLAM caches:
    cd ./src/poseembroider
    python datasets/bedlam_script.py
  3. Compute the mean pose:
    # (from ./src/poseembroider)
    python datasets/other/compute_mean_smplx_pose.py
    Note: to optionally display the mean pose, run streamlit run datasets/other/compute_mean_smplx_pose.py -- --display instead.

🔨 Get the SMPLer-X image encoder

Click for details. The PoseEmbroider framework makes use of the pretrained SMPLer-X image encoder (ViT base). Run the following to extract the source code of the ViT backbone useful for this project:
cd ./src/poseembroider

# NOTE: experimented with the file from commit 857a3ecbae76bf4ac50534a8d33c1db49d898190
wget https://raw.githubusercontent.com/caizhongang/SMPLer-X/refs/heads/main/main/transformer_utils/mmpose/models/backbones/vit.py
mv vit.py smplerx_image_backbone.py

# change lines to remove extra-dependencies
# (start from the end of the file to limit confusions with respect to line numbers)
# For information:
# * `sed -i 'Xi something' file.py` adds 'something' before line numbered X
# * `sed -i 'X,Yd' file.py` removes lines X through Y

myfile='smplerx_image_backbone.py'
sed -i '285i \
        else:\
            raise NotImplementedError' $myfile
sed -i '272,273d' $myfile
sed -i '177i \
class ViT(nn.Module):' $myfile
sed -i '175,176d' $myfile
sed -i '11,13d' $myfile
sed -i '7d' $myfile
sed -i '2d' $myfile

📚 Download language models

Click for details.
  • Define where to save the pretrained language models, by modifiying the value of TRANSFORMER_CACHE_DIR in ./src/poseembroider/config.py following your own preferences.

  • Download HuggingFace checkpoints, by running the following python script:

    import os
    from transformers import AutoTokenizer, AutoModel
    import poseembroider.config as config
    
    model_type = "distilbert-base-uncased"
    
    # download the tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_type)
    tokenizer.save_pretrained(os.path.join(config.TRANSFORMER_CACHE_DIR, model_type))
    
    # download the encoder
    text_enc = AutoModel.from_pretrained(model_type)
    text_enc.save_pretrained(os.path.join(config.TRANSFORMER_CACHE_DIR, model_type))

Demos

To visualize the results of the trained models, run commands below from ./src/poseembroider:

Model Command Shortnames
PoseEmbroider / Aligner streamlit run demo_core.py -- --model_path ${model_path} poseembroider_model_bedlamscript
aligner_model_bedlamscript
HPSEstimator
(pose estimation)
streamlit run downstream/pose_estimation/demo_pose_estimation.py -- --model_path ${model_path} hpsestimator_model_poseembroider_bedlamscript
hpsestimator_model_aligner_bedlamscript
InstructionGenerator
(generation of corrective pose instructions)
streamlit run downstream/instruction_generation/demo_instruction_generation.py -- --model_path ${model_path} instructgen_model_poseembroider_bedlamfix_posefix
instructgen_model_aligner_bedlamfix_posefix

Shortnames in the third column indicate possible models to try out in each case; check out the file ./src/poseembroider/shortname_2_model_path.json. to get the corresponding ${model_path}.

Evaluate models

In general, you can evaluate each model using their dedicated "evaluate_${model_type}.py" file, as follow:

python ${path_to_file}/evaluate_${model_type}.py --dataset ${dataset_name} --model_path ${model_path} --split ${split} 

See for instance the commands in the table below (all commands and paths below are expected to be run and work from ./src/poseembroider):

Model Evaluation command Inferrence time
PoseEmbroider/Aligner python evaluate_core.py --dataset "bedlamscript-overlap30_res224_j16_sf11" --model_path ${your_model_path} --split 'val' Below 4 minutes for a 10k-sized set (9 query types per sample), on a V100
HPSEstimator
(pose estimation)
On BEDLAM-Script:
python downstream/pose_estimation/evaluate_pose_estimation.py --dataset "bedlamscript-overlap30_res224_j16_sf11" --split 'val' --model_path ${your_model_path}
On 3DPW:
python downstream/pose_estimation/evaluate_pose_estimation.py --dataset "threedpw-sf1" --split 'test' --model_path ${your_model_path}
About 1 minute for 10k queries on a V100
InstructionGenerator
(generation of corrective pose instructions)
On BEDLAM-Fix:
python downstream/instruction_generation/evaluate_instruction_generation.py --dataset "bedlamfix-overlap30_res224_j16_sf11-in15_out20_t05_sim0709" --textret_model 'pairtext_model_ft_bedlamfix_posefix' --split 'val' --model_path ${your_model_path}
On PoseFix-H (out-of-sequence pairs):
python downstream/instruction_generation/evaluate_instruction_generation.py --dataset "posefix-H" --textret_model 'pairtext_model_ft_bedlamfix_posefix' --pair_kind 'out' --split 'test' --model_path ${your_model_path}
About 40 seconds for 709 queries on a V100

To evaluate the released pretrained models, check out the shortnames of the task-dependent eligible models using the table from the previous section ("Demos"), then gather the corresponding ${model_path} from the file ./src/poseembroider/shortname_2_model_path.json.

Train models

Click for details.

All commands and paths below are to be run from ./src/poseembroider.

Pose encoder

python assisting_models/poseVAE/train_poseVAE.py --dataset "bedlamscript-overlap30_res224_j16_sf11" \
--model PoseVAE --latentD 32 --num_body_joints 22 \
--wloss_v2v 1.0 --wloss_rot 1.0 --wloss_jts 1.0 --wloss_kld 1.0 --wloss_kldnpmul 1.0 --kld_epsilon 10.0 \
--lr 1e-05 --wd 0.0001 \
--log_step 100 --val_every 20 \
--batch_size 128 --epochs 2000 --seed 1

Please register the model path in shortname_2_model_path.json, under shortname posevae_model_bedlamscript.

This model gives the "posevae" encoder model used for the PoseEmbroider / Aligner.

Text encoder

python assisting_models/poseText/train_poseText.py --dataset "bedlamscript-overlap30_res224_j16_sf11" \
--model PoseText --latentD 512 --text_encoder_name 'distilbertUncased' --transformer_topping "avgp" \
--num_body_joints 22 \
--retrieval_loss 'symBBC' \
--lr 0.0002 --lr_scheduler "stepLR" --lr_step 400 --lr_gamma 0.5 \
--log_step 20 --val_every 20 \
--batch_size 128 --epochs 500 --seed 1

Please register the model path in shortname_2_model_path.json, under shortname posetext_model_bedlamscript.

This model gives the "posetext" encoder model used for the PoseEmbroider / Aligner. It also gives the semantic features required for building the pose pairs of BEDLAM-Fix.

PairText evaluation model

This model's training is performed in 2 steps; see the table below for step-specific additional arguments.

python assisting_models/pairText/train_pairText.py --model PairText \
--num_body_joints 22 --latentD 512 --text_encoder_name 'distilbertUncased' --transformer_topping "avgp" \
--retrieval_loss 'symBBC' \
--lr 0.00005 --lr_scheduler "stepLR" --lr_gamma 0.5 \
--log_step 50 \
--batch_size 128 --seed 1 \
<additional arguments, see table below>

Please complete the command above with the following step-specific arguments:

Step Additional arguments
Pretraining step --dataset "bedlamfix-overlap30_res224_j16_sf11-in15_out20_t05_sim0709" \
--epochs 400 --lr_step 400 --val_every 20
Finetuning step --datasets "posefix-H" "bedlamfix-overlap30_res224_j16_sf11-in15_out20_t05_sim0709" \
--epochs 250 --lr_step 40 --val_every 10 \
--apply_LR_augmentation --pretrained 'pairtext_model_bedlamfix'

Before launching the finetuning step, please register the path of the model from the pretraining step in shortname_2_model_path.json, under shortname pairtext_model_bedlamfix. Then register the path of the model from the finetuning step in shortname_2_model_path.json, under shortname pairtext_model_ft_bedlamfix_posefix.

This model serves to evaluate text generation models producing (3D) pose corrective instructions.

PoseEmbroider / Aligner, and ablatives

python train_core.py --dataset "bedlamscript-overlap30_res224_j16_sf11" \
--latentD 512 --l2normalize --num_body_joints 22 \
--image_encoder_name 'smplerx_vitb32' \
--pose_encoder_name 'posevae' \
--text_encoder_name 'posetext' \
--encoder_projection_type 'layerplus' \
--retrieval_loss 'symBBC' --single_partials --dual_partials \
--lr 0.0002 --lr_scheduler "stepLR" --lr_step 400 --lr_gamma 0.5 \
--log_step 50 --val_every 5 \
--batch_size 128 --epochs 350 --seed 1 \
<additional arguments, see table below>

Please complete the command above depending on the model version you are interested in:

Version Additional arguments
Aligner --model 'Aligner' --external_encoder_projection_type 'mediummlpxs'
PoseEmbroider --model 'PoseEmbroider' --external_encoder_projection_type 'none' --embroider_core_type 'transformer'
PoseEmbroider, without projection heads --model 'PoseEmbroider' --external_encoder_projection_type 'none' --embroider_core_type 'transformer' --no_projection_heads
PoseEmbroider, MLP core --model 'PoseEmbroider' --external_encoder_projection_type 'none' --embroider_core_type 'mlp'

Please register the paths of the trained models in shortname_2_model_path.json under desired shortnames for further use.

Downstream: pose estimation

Time/memory trade-off: you can cache the representation features first, then train the neural head on the cached features.

Command for caching the representation features:

for split in 'train' 'val'; do
    python downstream/cache_intermodality_features.py \
    --dataset "bedlamscript-overlap30_res224_j16_sf11" --split $split \
    --model_path <path_to_the_model_(PoseEmbroider/Aligner)_from_which_to_extrac_the_intermodality_features> \
    --input_types 'images'
done

Please complete the command above with the desired value; make sure to have a shortname entry in shortname_2_model_path.json for the given model path.

⚠️ Pay attention to the printed filemark (based on the epoch of the model)!

Training command:

python downstream/pose_estimation/train_pose_estimation.py --dataset "bedlamscript-overlap30_res224_j16_sf11" \
--model HPSEstimator --latentD 512 \
--num_body_joints 22 --num_shape_coeffs 10 --predict_bodyshape \
--representation_model_input_types 'images' \
--pretrained_representation_model <shortname_to_pretrained_representation_model> \
--cached_embeddings_file <cached_embeddings_file_mark_(based_on_the_epoch_of_the_model_that_produced_the_features)> \
--no_img_augmentation \
--lr 0.0001 --wd 0.0001 \
--log_step 100 --val_every 1 \
--batch_size 64 --epochs 100 --seed 1

Please complete the command above with the desired values:

  • <shortname_to_pretrained_representation_model>: shortname of the model used to derive input representation (if using cached embeddings, it is the shortname corresponding to the path given as inpu to the previous command)
  • <cached_embeddings_file_mark_(based_on_the_epoch_of_the_model_that_produced_the_features)>: makes it possible to identify the version of cached embeddings, if several for the same model. The value is printed by the script producing the cached features, and it looks like "e<epoch_number>".

Note: if you don't want to use cached features, remove the --cached_embeddings_file argument. Then you can also remove --no_img_augmentation if you like.

Downstream: pose correction instruction

Time/memory trade-off: you can cache the representation features first, then train the neural head on the cached features.

Command for caching the representation features:

for dataset in 'posefix-H' 'bedlamfix-overlap30_res224_j16_sf11-in15_out20_t05_sim0709'; do
    for split in 'train' 'val'; do
        python downstream/cache_intermodality_features.py \
        --dataset $dataset --split $split \
        --model_path <path_to_the_model_(PoseEmbroider/Aligner)_from_which_to_extract_the_intermodality_features> \
        --input_types 'poses'
    done
done

Please complete the command above with the desired value.

⚠️ Pay attention to the printed filemark (based on the epoch of the model)!

Training command:

This model's training is performed in 2 steps; see the table below for step-specific additional commands.

python downstream/instruction_generation/train_instruction_generation.py \
--model InstructionGenerator --latentD 512 --num_body_joints 22 \
--text_decoder_name "transformer_distilbertUncased" --transformer_mode 'crossattention' \
--decoder_nhead 8 --decoder_nlayers 4 --comparison_latentD 512 --decoder_latentD 512 \
--comparison_module_mode 'tirg' \
--representation_model_input_types 'poses' \
--pretrained_representation_model <shortname_to_pretrained_representation_model> \
--cached_embeddings_file <cached_embeddings_file_mark_(based_on_the_epoch_of_the_model_that_produced_the_features)> \
--lr 0.0001 --wd 0.0001 \
--log_step 100 --val_every 20 --textret_model 'pairtext_model_ft_bedlamfix_posefix' \
--batch_size 64 --seed 1 \
<additional arguments, see table below>

Please complete the command above with the desired values (similarly as for the pose estimation downstream), and the following step-specific arguments:

Step Additional arguments
Pretraining step --dataset "bedlamfix-overlap30_res224_j16_sf11-in15_out20_t05_sim0709" --epochs 900
Finetuning step --datasets "posefix-H" "bedlamfix-overlap30_res224_j16_sf11-in15_out20_t05_sim0709" \
--pair_kinds "out" "any" --epochs 300 \
--pretrained "<shortname_to_model_from_the_pretraining_step>"

Before launching the finetuning step, please register the path of the model from the pretraining step in shortname_2_model_path.json, under your preferred shortname, also labeled in the command of the finetuning step as <shortname_to_model_from_the_pretraining_step> (replace with the actual value).

Note: if you don't want to use cached features, remove the --cached_embeddings_file argument.

Disclaimers

The code is shared in the hope it can be useful, but as many things, it is not perfect:

  • You may encounter issues regarding the environment setup (instructions may be incomplete).
  • Some small parts of the code are missing (essentially the parsing of 3DPW ; but hopefully there is enough online about it to fill in the gaps).
  • Some processes are not yet perfectly satisfactory (eg. pose normalization).

Please feel free to share your solutions!

Citation

If you use this code, please cite the corresponding paper:

@inproceedings{delmas2024poseembroider,
  title={PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation},
  author={Delmas, Ginger and Weinzaepfel, Philippe and Moreno-Noguer, Francesc and Rogez, Gr{\'e}gory},
  booktitle={ECCV},
  year={2024}
}

License

This code is distributed under the CC BY-NC-SA 4.0 License. See LICENSE for more information.

Note that some of the softwares to download and install for this project are subject to separate copyright notices and license terms, which use is subject to the terms and conditions under which they are made available.

Releases

No releases published

Packages

No packages published