Skip to content

Latest commit

 

History

History
6 lines (4 loc) · 2.81 KB

2412.06273.md

File metadata and controls

6 lines (4 loc) · 2.81 KB

Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction

Prior works employing pixel-based Gaussian representation have demonstrated efficacy in feed-forward sparse-view reconstruction. However, such representation necessitates cross-view overlap for accurate depth estimation, and is challenged by object occlusions and frustum truncations. As a result, these methods require scene-centric data acquisition to maintain cross-view overlap and complete scene visibility to circumvent occlusions and truncations, which limits their applicability to scene-centric reconstruction. In contrast, in autonomous driving scenarios, a more practical paradigm is ego-centric reconstruction, which is characterized by minimal cross-view overlap and frequent occlusions and truncations. The limitations of pixel-based representation thus hinder the utility of prior works in this task. In light of this, this paper conducts an in-depth analysis of different representations, and introduces Omni-Gaussian representation with tailored network design to complement their strengths and mitigate their drawbacks. Experiments show that our method significantly surpasses state-of-the-art methods, pixelSplat and MVSplat, in ego-centric reconstruction, and achieves comparable performance to prior works in scene-centric reconstruction. Furthermore, we extend our method with diffusion models, pioneering feed-forward multi-modal generation of 3D driving scenes.

以像素为基础的高斯表示法在前人研究中已被证明在前馈稀疏视图重建任务中具有较高的有效性。然而,这种表示需要跨视角重叠以确保深度估计的准确性,并且在处理物体遮挡和视锥体截断问题时面临挑战。因此,这些方法通常需要以场景为中心的数据采集方式,以维持视角重叠和场景的完整可见性,从而绕过遮挡和截断的问题,但这也限制了其在场景中心重建任务中的应用。相比之下,在自动驾驶场景中,更实用的范式是以自我为中心的重建(ego-centric reconstruction),其特点是视角重叠最小化,同时伴随频繁的遮挡和截断现象。像素为基础的表示法的局限性因此制约了前人方法在此任务中的应用。 针对这一问题,本文深入分析了不同的表示方法,并提出了一种称为全方位高斯表示(Omni-Gaussian representation)的新方法,结合定制化的网络设计,以补充这些方法的优点并减轻其缺点。实验结果表明,我们的方法在以自我为中心的重建任务中显著超越了最先进的方法,如 pixelSplat 和 MVSplat,同时在以场景为中心的重建任务中取得了与前人方法相当的性能。此外,我们将该方法扩展至扩散模型,率先实现了自动驾驶场景中3D的前馈多模态生成。