Creating digital avatars from textual prompts has long been a desirable yet challenging task. Despite the promising outcomes obtained through 2D diffusion priors in recent works, current methods face challenges in achieving high-quality and animated avatars effectively. In this paper, we present HeadStudio, a novel framework that utilizes 3D Gaussian splatting to generate realistic and animated avatars from text prompts. Our method drives 3D Gaussians semantically to create a flexible and achievable appearance through the intermediate FLAME representation. Specifically, we incorporate the FLAME into both 3D representation and score distillation: 1) FLAME-based 3D Gaussian splatting, driving 3D Gaussian points by rigging each point to a FLAME mesh. 2) FLAME-based score distillation sampling, utilizing FLAME-based fine-grained control signal to guide score distillation from the text prompt. Extensive experiments demonstrate the efficacy of HeadStudio in generating animatable avatars from textual prompts, exhibiting visually appealing appearances. The avatars are capable of rendering high-quality real-time (≥40 fps) novel views at a resolution of 1024. They can be smoothly controlled by real-world speech and video. We hope that HeadStudio can advance digital avatar creation and that the present method can widely be applied across various domains.
从文本提示创建数字化头像一直是一个令人期待但又充满挑战的任务。尽管通过在最近的研究中使用2D扩散先验获得了有希望的结果,当前方法在有效地实现高质量和动画化头像方面面临挑战。在本文中,我们介绍了HeadStudio,一个新颖的框架,它利用3D高斯喷溅技术从文本提示生成逼真和动画化的头像。我们的方法通过中间的FLAME表示,语义驱动3D高斯体,以创建灵活且可实现的外观。具体来说,我们将FLAME融入到3D表示和分数蒸馏中:1)基于FLAME的3D高斯喷溅,通过将每个点绑定到FLAME网格来驱动3D高斯点。2)基于FLAME的分数蒸馏采样,利用基于FLAME的细粒度控制信号来指导从文本提示中的分数蒸馏。广泛的实验展示了HeadStudio在从文本提示生成可动画化头像方面的有效性,展示了视觉上吸引人的外观。这些头像能够以1024的分辨率渲染高质量实时(≥40 fps)新视图。它们可以被真实世界的语音和视频平滑控制。我们希望HeadStudio能推进数字头像创建,而且当前方法能广泛应用于各个领域。