DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes
We present DeSiRe-GS, a self-supervised gaussian splatting representation, enabling effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios. Our approach employs a two-stage optimization pipeline of dynamic street Gaussians. In the first stage, we extract 2D motion masks based on the observation that 3D Gaussian Splatting inherently can reconstruct only the static regions in dynamic environments. These extracted 2D motion priors are then mapped into the Gaussian space in a differentiable manner, leveraging an efficient formulation of dynamic Gaussians in the second stage. Combined with the introduced geometric regularizations, our method are able to address the over-fitting issues caused by data sparsity in autonomous driving, reconstructing physically plausible Gaussians that align with object surfaces rather than floating in air. Furthermore, we introduce temporal cross-view consistency to ensure coherence across time and viewpoints, resulting in high-quality surface reconstruction. Comprehensive experiments demonstrate the efficiency and effectiveness of DeSiRe-GS, surpassing prior self-supervised arts and achieving accuracy comparable to methods relying on external 3D bounding box annotations.
我们提出了 DeSiRe-GS,一种自监督的高斯点绘制表示方法,能够在复杂驾驶场景中实现有效的静态-动态分解和高保真表面重建。我们的方法采用两阶段的优化管道,用于处理动态街景中的高斯点。 在第一阶段,我们基于一个关键观察——三维高斯点绘制本质上只能重建动态环境中的静态区域——提取二维运动掩膜。这些提取的二维运动先验随后被以可微分的方式映射到高斯空间。在第二阶段,我们利用动态高斯的高效表达式进行优化。结合我们提出的几何正则化策略,该方法能够解决自动驾驶数据稀疏性导致的过拟合问题,从而重建与物体表面对齐的物理合理高斯点,而不是漂浮在空中。 此外,我们引入了时间上的跨视角一致性,确保时间和视点上的连贯性,从而实现高质量的表面重建。全面的实验表明,DeSiRe-GS 在效率和效果上均优于现有的自监督方法,并在准确性上接近依赖外部 3D 边界框标注的方法。