Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 3.16 KB

2412.01807.md

File metadata and controls

7 lines (5 loc) · 3.16 KB

Occam's LGS: A Simple Approach for Language Gaussian Splatting

Gaussian Splatting is a widely adopted approach for 3D scene representation that offers efficient, high-quality 3D reconstruction and rendering. A major reason for the success of 3DGS is its simplicity of representing a scene with a set of Gaussians, which makes it easy to interpret and adapt. To enhance scene understanding beyond the visual representation, approaches have been developed that extend 3D Gaussian Splatting with semantic vision-language features, especially allowing for open-set tasks. In this setting, the language features of 3D Gaussian Splatting are often aggregated from multiple 2D views. Existing works address this aggregation problem using cumbersome techniques that lead to high computational cost and training time. In this work, we show that the sophisticated techniques for language-grounded 3D Gaussian Splatting are simply unnecessary. Instead, we apply Occam's razor to the task at hand and perform weighted multi-view feature aggregation using the weights derived from the standard rendering process, followed by a simple heuristic-based noisy Gaussian filtration. Doing so offers us state-of-the-art results with a speed-up of two orders of magnitude. We showcase our results in two commonly used benchmark datasets: LERF and 3D-OVS. Our simple approach allows us to perform reasoning directly in the language features, without any compression whatsoever. Such modeling in turn offers easy scene manipulation, unlike the existing methods -- which we illustrate using an application of object insertion in the scene. Furthermore, we provide a thorough discussion regarding the significance of our contributions within the context of the current literature.

高斯喷溅(Gaussian Splatting)是一种广泛应用于三维场景表示的方法,以其高效性和高质量的3D重建与渲染能力而备受认可。3DGS取得成功的主要原因在于其简单性——通过一组高斯来表示场景,使其易于理解和适配。为了在视觉表现之外增强场景理解,一些方法扩展了3DGS,引入了语义视觉-语言特征,特别是支持开放集任务的能力。在这一背景下,3DGS的语言特征通常通过多个2D视角进行聚合。然而,现有方法在处理特征聚合问题时依赖复杂的技术,导致计算成本和训练时间居高不下。 在本研究中,我们表明,对于语言驱动的3DGS而言,这些复杂技术是完全不必要的。相反,我们借助奥卡姆剃刀原则,采用基于标准渲染过程权重的加权多视角特征聚合方法,并结合简单的启发式噪声高斯过滤。此方法不仅实现了最先进的性能,还实现了两个数量级的加速。 我们在两个常用的基准数据集(LERF和3D-OVS)上展示了我们的成果。我们的简单方法允许直接在语言特征中进行推理,无需任何压缩操作。这种建模方法还支持轻松的场景操作,与现有方法不同——我们通过场景中对象插入的应用示例说明了这一点。此外,我们深入讨论了我们的贡献在当前文献中的意义,为3DGS在语义和语言扩展中的进一步发展提供了启示。