InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception
3D scene understanding has become an essential area of research with applications in autonomous driving, robotics, and augmented reality. Recently, 3D Gaussian Splatting (3DGS) has emerged as a powerful approach, combining explicit modeling with neural adaptability to provide efficient and detailed scene representations. However, three major challenges remain in leveraging 3DGS for scene understanding: 1) an imbalance between appearance and semantics, where dense Gaussian usage for fine-grained texture modeling does not align with the minimal requirements for semantic attributes; 2) inconsistencies between appearance and semantics, as purely appearance-based Gaussians often misrepresent object boundaries; and 3) reliance on top-down instance segmentation methods, which struggle with uneven category distributions, leading to over- or under-segmentation. In this work, we propose InstanceGaussian, a method that jointly learns appearance and semantic features while adaptively aggregating instances. Our contributions include: i) a novel Semantic-Scaffold-GS representation balancing appearance and semantics to improve feature representations and boundary delineation; ii) a progressive appearance-semantic joint training strategy to enhance stability and segmentation accuracy; and iii) a bottom-up, category-agnostic instance aggregation approach that addresses segmentation challenges through farthest point sampling and connected component analysis. Our approach achieves state-of-the-art performance in category-agnostic, open-vocabulary 3D point-level segmentation, highlighting the effectiveness of the proposed representation and training strategies.
3D 场景理解已成为一个重要的研究领域,广泛应用于自动驾驶、机器人和增强现实等领域。近年来,3D 高斯点云表示(3D Gaussian Splatting, 3DGS)作为一种强大的方法脱颖而出,它将显式建模与神经网络的适应性相结合,提供高效且细致的场景表示。然而,在利用 3DGS 进行场景理解时,仍然存在三大挑战:1)外观与语义之间的不平衡,细粒度纹理建模所需的高斯点云密度与语义属性的最低需求之间存在差异;2)外观与语义之间的不一致,单纯基于外观的高斯点云通常会错误地表示物体边界;以及 3)对自上而下实例分割方法的依赖,这种方法在类别分布不均时表现不佳,导致过分割或不足分割。 为了解决这些问题,我们提出了 InstanceGaussian 方法,该方法能够联合学习外观和语义特征,同时自适应地聚合实例。我们的贡献包括: i)提出一种新颖的语义支架高斯点云表示(Semantic-Scaffold-GS),在外观和语义之间取得平衡,以改善特征表示和边界刻画; ii)设计了一种渐进式外观-语义联合训练策略,以增强稳定性和分割准确性; iii)提出一种自下而上、类别无关的实例聚合方法,利用最远点采样和连通分量分析解决分割挑战。 我们的方法在类别无关的开放词汇 3D 点级分割任务中达到了最新的性能水平,验证了所提出表示方法和训练策略的有效性。