Learning 3D head priors from large 2D image collections is an important step towards high-quality 3D-aware human modeling. A core requirement is an efficient architecture that scales well to large-scale datasets and large image resolutions. Unfortunately, existing 3D GANs struggle to scale to generate samples at high resolutions due to their relatively slow train and render speeds, and typically have to rely on 2D superresolution networks at the expense of global 3D consistency. To address these challenges, we propose Generative Gaussian Heads (GGHead), which adopts the recent 3D Gaussian Splatting representation within a 3D GAN framework. To generate a 3D representation, we employ a powerful 2D CNN generator to predict Gaussian attributes in the UV space of a template head mesh. This way, GGHead exploits the regularity of the template's UV layout, substantially facilitating the challenging task of predicting an unstructured set of 3D Gaussians. We further improve the geometric fidelity of the generated 3D representations with a novel total variation loss on rendered UV coordinates. Intuitively, this regularization encourages that neighboring rendered pixels should stem from neighboring Gaussians in the template's UV space. Taken together, our pipeline can efficiently generate 3D heads trained only from single-view 2D image observations. Our proposed framework matches the quality of existing 3D head GANs on FFHQ while being both substantially faster and fully 3D consistent. As a result, we demonstrate real-time generation and rendering of high-quality 3D-consistent heads at 10242 resolution for the first time.
从大规模2D图像集合中学习3D头部先验是向高质量3D感知人体建模迈进的重要一步。核心要求是一种能够适应大规模数据集和高分辨率图像的高效架构。遗憾的是,现有的3D生成对抗网络(GAN)在生成高分辨率样本时难以扩展,因为它们的训练和渲染速度相对较慢,并且通常必须依赖2D超分辨率网络,而牺牲全局3D一致性。为了应对这些挑战,我们提出了生成式高斯头部模型(GGHead),该模型在3D GAN框架内采用了最新的3D高斯喷涂表达。为了生成3D表征,我们采用强大的2D CNN生成器在模板头部网格的UV空间中预测高斯属性。通过这种方式,GGHead利用模板的UV布局的规律性,大大简化了预测无结构3D高斯集合的复杂任务。我们进一步通过在渲染的UV坐标上应用一种新颖的总变差损失来提高生成的3D表征的几何保真度。直观地说,这种规范化鼓励邻近的渲染像素应来自模板UV空间中的邻近高斯。综合考虑,我们的管道能够高效地生成仅从单视图2D图像观察训练的3D头部。我们提出的框架在FFHQ上与现有的3D头部GAN的质量相当,同时在速度和3D一致性上都有显著提升。结果表明,我们首次实现了以1024^2分辨率实时生成和渲染高质量3D一致的头部。