Recent progress in neural rendering has brought forth pioneering methods, such as NeRF and Gaussian Splatting, which revolutionize view rendering across various domains like AR/VR, gaming, and content creation. While these methods excel at interpolating {\em within the training data}, the challenge of generalizing to new scenes and objects from very sparse views persists. Specifically, modeling 3D humans from sparse views presents formidable hurdles due to the inherent complexity of human geometry, resulting in inaccurate reconstructions of geometry and textures. To tackle this challenge, this paper leverages recent advancements in Gaussian Splatting and introduces a new method to learn generalizable human Gaussians that allows photorealistic and accurate view-rendering of a new human subject from a limited set of sparse views in a feed-forward manner. A pivotal innovation of our approach involves reformulating the learning of 3D Gaussian parameters into a regression process defined on the 2D UV space of a human template, which allows leveraging the strong geometry prior and the advantages of 2D convolutions. In addition, a multi-scaffold is proposed to effectively represent the offset details. Our method outperforms recent methods on both within-dataset generalization as well as cross-dataset generalization settings.
近期在神经渲染领域的进展催生了一些开创性的方法,例如神经辐射场(NeRF)和高斯投影,这些方法在增强现实/虚拟现实、游戏以及内容创作等多个领域革新了视图渲染。虽然这些方法在插值训练数据方面表现出色,但在从非常稀疏的视角推广到新场景和对象的挑战仍然存在。特别是,从稀疏视角对三维人类进行建模面临巨大障碍,因为人类几何形态的固有复杂性,导致几何和纹理重建的不准确。为了应对这一挑战,本文利用高斯投影的最新进展,并引入了一种新方法学习可推广的人类高斯,这种方法能够以前馈方式从有限的稀疏视角实现新人类对象的逼真和精确视图渲染。我们方法的一个关键创新在于将学习三维高斯参数的过程重新定义为在人类模板的二维UV空间上进行的回归过程,这使得我们能够利用强大的几何先验和二维卷积的优势。此外,还提出了一个多脚手架模型来有效地表示偏移细节。我们的方法在数据集内外的泛化能力上均优于最近的方法。