Radiance fields have demonstrated impressive performance in synthesizing lifelike 3D talking heads. However, due to the difficulty in fitting steep appearance changes, the prevailing paradigm that presents facial motions by directly modifying point appearance may lead to distortions in dynamic regions. To tackle this challenge, we introduce TalkingGaussian, a deformation-based radiance fields framework for high-fidelity talking head synthesis. Leveraging the point-based Gaussian Splatting, facial motions can be represented in our method by applying smooth and continuous deformations to persistent Gaussian primitives, without requiring to learn the difficult appearance change like previous methods. Due to this simplification, precise facial motions can be synthesized while keeping a highly intact facial feature. Under such a deformation paradigm, we further identify a face-mouth motion inconsistency that would affect the learning of detailed speaking motions. To address this conflict, we decompose the model into two branches separately for the face and inside mouth areas, therefore simplifying the learning tasks to help reconstruct more accurate motion and structure of the mouth region. Extensive experiments demonstrate that our method renders high-quality lip-synchronized talking head videos, with better facial fidelity and higher efficiency compared with previous methods.
辐射场在合成逼真的3D说话头部方面表现出色。然而,由于适应剧烈的外观变化较为困难,当前通过直接修改点的外观来呈现面部动作的范式可能导致动态区域的扭曲。为了解决这一挑战,我们引入了TalkingGaussian,这是一个基于形变的辐射场框架,用于高保真的说话头部合成。通过利用基于点的高斯溅射,我们的方法可以通过对持久的高斯原始体应用平滑且连续的形变来表示面部动作,无需学习像以前的方法那样困难的外观变化。由于这种简化,可以合成精确的面部动作,同时保持高度完整的面部特征。在这种形变范式下,我们进一步发现了一个面部-口部动作不一致性,这会影响详细说话动作的学习。为了解决这一冲突,我们将模型分解为面部和口内区域的两个独立分支,从而简化学习任务,帮助重建更精确的口部动作和结构。广泛的实验表明,我们的方法渲染出的高质量唇同步说话头部视频,在面部保真度和效率上比以前的方法有更好的表现。