🔥 SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation 🔥

Abstact

Existing Transformers for monocular 3D human shape and pose estimation typically have a quadratic computation and memory complexity with respect to the feature length, which hinders the exploitation of fine-grained information in high-resolution features that is beneficial for accurate reconstruction. In this work, we propose an SMPL-based Transformer framework (SMPLer) to address this issue. SMPLer incorporates two key ingredients: a decoupled attention operation and an SMPL-based target representation, which allow effective utilization of high-resolution features in the Transformer. In addition, based on these two designs, we also introduce several novel modules including a multi-scale attention and a joint-aware attention to further boost the reconstruction performance. Extensive experiments demonstrate the effectiveness of SMPLer against existing 3D human shape and pose estimation methods both quantitatively and qualitatively. Notably, the proposed algorithm achieves an MPJPE of 45.2mm on the Human3.6M dataset, improving upon Mesh Graphormer by more than 10% with fewer than one-third of the parameters.

Highlights

Decoupled attention design
SMPL-based target representation
Significantly reduced computation and parameter overhead
Exploitation of high-resolution features
Multi-scale attention moduel and joint-aware attention module
MPJPE of 45.2 on Human3.6M surpassing SOTA by over 10%

Results

Method	Parameters (M)	MPJPE ↓ (Human3.6M)	PA-MPJPE ↓ (Human3.6M)	MPVE ↓ (3DPW)	MPJPE ↓ (3DPW)	PA-MPJPE ↓ (3DPW)
METRO	231.8	54.0	36.7	88.2	77.1	47.9
Mesh Graphormer	215.7	51.2	34.5	87.7	74.7	45.6
SMPLer	35.6	47.0	32.8	84.7	75.7	45.2
SMPLer-L	70.2	45.2	32.4	82.0	73.7	43.4

1. Install

Hardware requirements

For Testing: Most modern GPUs are adequate.

For Training: It is recommended to use 2 NVIDIA A100 GPUs.

Create conda environment

conda create -n smpler python=3.8
conda activate smpler

Install packages

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

pip install scipy==1.5.0 scikit-image==0.19.1 opencv-python==4.5.4.58 imageio matplotlib numpy==1.20.3 chumpy==0.70 ipython ipykernel ipdb smplx==0.1.28 tensorboardx==2.4 tensorboard==2.7.0 easydict pillow==8.4.0

Install Pytorch3D

conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install -c bottler nvidiacub

wget https://anaconda.org/pytorch3d/pytorch3d/0.5.0/download/linux-64/pytorch3d-0.5.0-py38_cu111_pyt180.tar.bz2 --no-check-certificate

conda install pytorch3d-0.5.0-py38_cu111_pyt180.tar.bz2

rm pytorch3d-0.5.0-py38_cu111_pyt180.tar.bz2

2. How to Run

Download meta data and extract it into "PATH_to_SMPLer/meta_data"
Download pretrained models and extract it into "PATH_to_SMPLer/pretrained"

Run demo

python demo.py --img_path samples/im01.png

3. Train and Test

3.1 Prepare Datasets

There are two ways to download the datasets: azcopy and wget.

Recommended way: azcopy (faster)

Download azcopy from here

Download datasets with azcopy:

cd PATH_to_STORE_DATASET
azcopy_path=PATH_to_AZCOPY
bash PATH_to_SMPLer/scripts/download_datasets_azcopy.sh

Create a symbolic link:

cd PATH_to_SMPLer
ln -s PATH_to_STORE_DATASET ./datasets

Alternative way: wget (usually slower and less stable, but no dependency on azcopy)

Download datasets with wget:

cd PATH_to_STORE_DATASET
bash PATH_to_SMPLer/scripts/download_datasets_wget.sh

3.2 Test

Test on H36M dataset (models are trained on mixed data: Human3.6M, COCO, MuCo, UP-3D, and MPII)

# for SMPLer
python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --eval_only --val_batch_size=128 --model_type=smpler --data_mode=h36m --hrnet_type=w32 --load_checkpoint=pretrained/SMPLer_h36m.pt 

# for SMPLer-L 
python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --eval_only --val_batch_size=128 --model_type=smpler --data_mode=h36m --hrnet_type=w48 --load_checkpoint=pretrained/SMPLer-L_h36m.pt

Test on 3DPW dataset (models are finetuned on 3DPW)

# for SMPLer
python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --eval_only --val_batch_size=128 --model_type=smpler --data_mode=3dpw --hrnet_type=w32 --load_checkpoint=pretrained/SMPLer_3dpw.pt 

# for SMPLer-L 
python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --eval_only --val_batch_size=128 --model_type=smpler --data_mode=3dpw --hrnet_type=w48 --load_checkpoint=pretrained/SMPLer-L_3dpw.pt

3.3 Train

There are three stages for training.

For SMPLer:

# 1. Train CNN backbone on mixed data
python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --exp_name=backbone --batch_size=100 --num_workers=8 --lr=2e-4 --data_mode=h36m --model_type=backbone --num_epochs=50 --hrnet_type=w32  

# 2. Train SMPLer on mixed data
python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --exp_name=smpler --batch_size=100 --num_workers=8 --lr=2e-4 --data_mode=h36m --model_type=smpler --num_epochs=100 --hrnet_type=w32 --load_checkpoint=logs/backbone/checkpoints/epoch_049.pt

# 3. Finetune SMPLer on 3DPW
python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py --exp_name=smpler_3dpw --batch_size=32 --num_workers=8 --lr=1e-4 --data_mode=3dpw --model_type=smpler --num_epochs=2 --hrnet_type=w32 --load_checkpoint=logs/smpler/checkpoints/epoch_***.pt --summary_steps=100

For SMPLer-L:

# 1. Train CNN backbone on mixed data
python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --exp_name=backbone-L --batch_size=100 --num_workers=8 --lr=2e-4 --data_mode=h36m --model_type=backbone --num_epochs=50 --hrnet_type=w48  

# 2. Train SMPLer-L on mixed data
python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --exp_name=smpler-L --batch_size=100 --num_workers=8 --lr=2e-4 --data_mode=h36m --model_type=smpler --num_epochs=100 --hrnet_type=w48 --load_checkpoint=logs/backbone-L/checkpoints/epoch_049.pt

# 3. Finetune SMPLer-L on 3DPW
python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py --exp_name=smpler-L_3dpw --batch_size=32 --num_workers=8 --lr=1e-4 --data_mode=3dpw --model_type=smpler --num_epochs=2 --hrnet_type=w48 --load_checkpoint=logs/smpler-L/checkpoints/epoch_***.pt --summary_steps=100

4. BibTex

For citing SMPLer in your work,

@article{xu2024smpler,
  title={SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation},
  author={Xu, Xiangyu and Liu, Lijuan and Yan, Shuicheng},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024}
}

5. Related Resources

Please explore these resources to broaden your understanding of 3D human modeling: METRO, Mesh Graphormer, RSC-Net, Texformer, Sewformer, GP-NeRF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🔥 SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation 🔥

Abstact

Highlights

Results

1. Install

2. How to Run

3. Train and Test

3.1 Prepare Datasets

3.2 Test

3.3 Train

4. BibTex

5. Related Resources

Files

README.md

Latest commit

History

README.md

File metadata and controls

🔥 SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation 🔥

Abstact

Highlights

Results

1. Install

2. How to Run

3. Train and Test

3.1 Prepare Datasets

3.2 Test

3.3 Train

4. BibTex

5. Related Resources