Skip to content

zkk2019/PSHuman

 
 

Repository files navigation

PSHuman

This is the official implementation of PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion.

Project Page | Paper | Demo | Hugging Face Model

result_clr_scale4_pexels-barbara-olsen-7869640.mp4
result_clr_scale4_pexels-zdmit-6780091.mp4

Given a single image of a clothed person, PSHuman facilitates detailed geometry and realistic 3D human appearance across various poses within one minute.

📝 Update

  • [2024.11.30]: Release the SMPL-free version, which does not require SMPL condition for multiview generation and perform well in general posed human.
  • [2024.12.11]: The huggingface demo has been deployed here. Special thanks to Sylvain Filoni! Take a try now.

Common issues

  • Q: Minimum VRAM requirement
    A: The current model is trained at a resolution of 768, requiring over 40GB of VRAM. We are considering training a new model at a resolution of 512, which would allow it to run on an RTX 4090.

Installation

conda create -n pshuman python=3.10
conda activate pshuman

# torch
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121

# kaolin
pip install kaolin==0.17.0 -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.1.0_cu121.html

# other dependency
pip install -r requirements.txt

This project is also based on SMPLX. We borrowed the related models from ECON and SIFU, and re-organized them, which can be downloaded from Onedrive.

Inference

  1. Given a human image, we use Clipdrop or rembg to remove the background. For the latter, we provide a simple scrip.
python utils/remove_bg.py --path $DATA_PATH$

Then, put the RGBA images in the $DATA_PATH$.

  1. By running inference.py, the textured mesh and rendered video will be saved in out.
CUDA_VISIBLE_DEVICES=$GPU python inference.py --config configs/inference-768-6view.yaml \
    pretrained_model_name_or_path='pengHTYX/PSHuman_Unclip_768_6views' \
    validation_dataset.crop_size=740 \
    with_smpl=false \
    validation_dataset.root_dir=$DATA_PATH$ \
    seed=600 \
    num_views=7 \
    save_mode='rgb' 

You can adjust the crop_size (720 or 740) and seed (42 or 600) to obtain best results for some cases.

Training

For the data preparing and preprocessing, please refer to our paper. Once the data is ready, we begin the training by running

bash scripts/train_768.sh

You should modified some parameters, such as data_common.root_dir and data_common.object_list.

Related projects

We collect code from following projects. We thanks for the contributions from the open-source community!

ECON and SIFU recover human mesh from single human image.
Era3D and Unique3D generate consistent multiview images with single color image.
Continuous-Remeshing for Inverse Rendering.

Citation

If you find this codebase useful, please consider cite our work.

@article{li2024pshuman,
  title={PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion},
  author={Li, Peng and Zheng, Wangguandong and Liu, Yuan and Yu, Tao and Li, Yangguang and Qi, Xingqun and Li, Mengfei and Chi, Xiaowei and Xia, Siyu and Xue, Wei and others},
  journal={arXiv preprint arXiv:2409.10141},
  year={2024}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.1%
  • Other 0.9%