Realistic scene reconstruction in driving scenarios poses significant challenges due to fast-moving objects. Most existing methods rely on labor-intensive manual labeling of object poses to reconstruct dynamic objects in canonical space and move them based on these poses during rendering. While some approaches attempt to use 3D object trackers to replace manual annotations, the limited generalization of 3D trackers -- caused by the scarcity of large-scale 3D datasets -- results in inferior reconstructions in real-world settings. In contrast, 2D foundation models demonstrate strong generalization capabilities. To eliminate the reliance on 3D trackers and enhance robustness across diverse environments, we propose a stable object tracking module by leveraging associations from 2D deep trackers within a 3D object fusion strategy. We address inevitable tracking errors by further introducing a motion learning strategy in an implicit feature space that autonomously corrects trajectory errors and recovers missed detections. Experimental results on Waymo-NOTR datasets show we achieve state-of-the-art performance.
在驾驶场景中的真实感场景重建中,由于快速移动的物体带来了显著挑战。大多数现有方法依赖于劳动密集的手动标注对象姿态,以在标准空间中重建动态对象,并在渲染过程中根据这些姿态移动对象。一些方法尝试使用3D对象追踪器替代手动标注,但由于缺乏大规模3D数据集,3D追踪器的泛化能力有限,导致其在真实环境中的重建效果不佳。 相比之下,2D基础模型展示了强大的泛化能力。为消除对3D追踪器的依赖并增强在多样化环境中的鲁棒性,我们提出了一种稳定的对象追踪模块,利用2D深度追踪器的关联性结合3D对象融合策略。针对不可避免的追踪误差,我们进一步引入了一种隐式特征空间中的运动学习策略,该策略能够自主修正轨迹误差并恢复漏检的目标。 在Waymo-NOTR数据集上的实验结果表明,我们的方法达到了最新的性能水平,显著优于现有方法,展示了在动态对象重建中的卓越效果。