Recent advances in 3D Gaussian Splatting have shown promising results. Existing methods typically assume static scenes and/or multiple images with prior poses. Dynamics, sparse views, and unknown poses significantly increase the problem complexity due to insufficient geometric constraints. To overcome this challenge, we propose a method that can use only two images without prior poses to fit Gaussians in dynamic environments. To achieve this, we introduce two technical contributions. First, we propose an object-level two-view bundle adjustment. This strategy decomposes dynamic scenes into piece-wise rigid components, and jointly estimates the camera pose and motions of dynamic objects. Second, we design an SE(3) field-driven Gaussian training method. It enables fine-grained motion modeling through learnable per-Gaussian transformations. Our method leads to high-fidelity novel view synthesis of dynamic scenes while accurately preserving temporal consistency and object motion. Experiments on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-art approaches designed for the cases of static environments, multiple images, and/or known poses.
近年来,3D高斯喷溅技术取得了令人瞩目的进展。现有方法通常假设场景是静态的和/或具有已知位姿的多张图像。在动态环境、稀疏视角以及未知位姿的情况下,由于几何约束不足,问题复杂性显著增加。为了解决这一挑战,我们提出了一种仅需两张无先验位姿图像即可在动态环境中拟合高斯的方法。 为实现这一目标,我们引入了两项关键技术创新: 首先,我们提出了一种基于对象级的双视图捆绑调整策略。该方法将动态场景分解为逐片刚性组件,并联合估计相机位姿和动态对象的运动。 其次,我们设计了一种基于SE(3)场的高斯训练方法。这种方法通过每个高斯的可学习变换实现了细粒度的运动建模。 我们的方法能够在动态场景中实现高保真度的新视角合成,同时精确保持时间一致性和对象运动的准确性。在合成和真实世界数据集上的实验表明,与针对静态环境、多图像和/或已知位姿设计的最新方法相比,我们的方法显著优于其性能。