Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 2.48 KB

2408.00297.md

File metadata and controls

5 lines (3 loc) · 2.48 KB

EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head

We present a novel approach for synthesizing 3D talking heads with controllable emotion, featuring enhanced lip synchronization and rendering quality. Despite significant progress in the field, prior methods still suffer from multi-view consistency and a lack of emotional expressiveness. To address these issues, we collect EmoTalk3D dataset with calibrated multi-view videos, emotional annotations, and per-frame 3D geometry. By training on the EmoTalk3D dataset, we propose a Speech-to-Geometry-to-Appearance mapping framework that first predicts faithful 3D geometry sequence from the audio features, then the appearance of a 3D talking head represented by 4D Gaussians is synthesized from the predicted geometry. The appearance is further disentangled into canonical and dynamic Gaussians, learned from multi-view videos, and fused to render free-view talking head animation. Moreover, our model enables controllable emotion in the generated talking heads and can be rendered in wide-range views. Our method exhibits improved rendering quality and stability in lip motion generation while capturing dynamic facial details such as wrinkles and subtle expressions. Experiments demonstrate the effectiveness of our approach in generating high-fidelity and emotion-controllable 3D talking heads.

我们提出了一种新颖的方法,用于合成可控情感的三维说话头部模型,具有增强的唇部同步性和渲染质量。尽管该领域取得了显著进展,但现有方法仍存在多视角一致性差和情感表现不足的问题。为解决这些问题,我们收集了带有校准多视角视频、情感标注和每帧三维几何数据的EmoTalk3D数据集。通过在EmoTalk3D数据集上训练,我们提出了一个从语音到几何再到外观的映射框架,该框架首先根据音频特征预测忠实的三维几何序列,然后从预测的几何中合成由4D高斯表示的三维说话头部的外观。外观进一步被解构为从多视角视频中学习到的标准和动态高斯,并融合以渲染自由视角的说话头部动画。此外,我们的模型能够在生成的说话头部中实现可控的情感,并能在广泛的视角中渲染。我们的方法在唇部运动生成的渲染质量和稳定性方面表现出色,同时捕捉到动态的面部细节,如皱纹和细微表情。实验表明,我们的方法在生成高保真度和可控情感的三维说话头部方面有效。