This is the official implementation of PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion.
result_clr_scale4_pexels-barbara-olsen-7869640.mp4
result_clr_scale4_pexels-zdmit-6780091.mp4
Given a single image of a clothed person, PSHuman facilitates detailed geometry and realistic 3D human appearance across various poses within one minute.
- [2024.11.30]: Release the SMPL-free version, which does not require SMPL condition for multiview generation and perform well in general posed human.
- [2024.12.11]: The huggingface demo has been deployed here. Special thanks to Sylvain Filoni! Take a try now.
- Q: Minimum VRAM requirement
A: The current model is trained at a resolution of 768, requiring over 40GB of VRAM. We are considering training a new model at a resolution of 512, which would allow it to run on an RTX 4090.
conda create -n pshuman python=3.10
conda activate pshuman
# torch
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
# kaolin
pip install kaolin==0.17.0 -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.1.0_cu121.html
# other dependency
pip install -r requirements.txt
This project is also based on SMPLX. We borrowed the related models from ECON and SIFU, and re-organized them, which can be downloaded from Onedrive.
- Given a human image, we use Clipdrop or
rembg
to remove the background. For the latter, we provide a simple scrip.
python utils/remove_bg.py --path $DATA_PATH$
Then, put the RGBA images in the $DATA_PATH$
.
- By running inference.py, the textured mesh and rendered video will be saved in
out
.
CUDA_VISIBLE_DEVICES=$GPU python inference.py --config configs/inference-768-6view.yaml \
pretrained_model_name_or_path='pengHTYX/PSHuman_Unclip_768_6views' \
validation_dataset.crop_size=740 \
with_smpl=false \
validation_dataset.root_dir=$DATA_PATH$ \
seed=600 \
num_views=7 \
save_mode='rgb'
You can adjust the crop_size
(720 or 740) and seed
(42 or 600) to obtain best results for some cases.
For the data preparing and preprocessing, please refer to our paper. Once the data is ready, we begin the training by running
bash scripts/train_768.sh
You should modified some parameters, such as data_common.root_dir
and data_common.object_list
.
We collect code from following projects. We thanks for the contributions from the open-source community!
ECON and SIFU recover human mesh from single human image.
Era3D and Unique3D generate consistent multiview images with single color image.
Continuous-Remeshing for Inverse Rendering.
If you find this codebase useful, please consider cite our work.
@article{li2024pshuman,
title={PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion},
author={Li, Peng and Zheng, Wangguandong and Liu, Yuan and Yu, Tao and Li, Yangguang and Qi, Xingqun and Li, Mengfei and Chi, Xiaowei and Xia, Siyu and Xue, Wei and others},
journal={arXiv preprint arXiv:2409.10141},
year={2024}
}