Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 1.88 KB

2412.03526.md

File metadata and controls

7 lines (5 loc) · 1.88 KB

Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos

Recent advancements in static feed-forward scene reconstruction have demonstrated significant progress in high-quality novel view synthesis. However, these models often struggle with generalizability across diverse environments and fail to effectively handle dynamic content. We present BTimer (short for BulletTimer), the first motion-aware feed-forward model for real-time reconstruction and novel view synthesis of dynamic scenes. Our approach reconstructs the full scene in a 3D Gaussian Splatting representation at a given target ('bullet') timestamp by aggregating information from all the context frames. Such a formulation allows BTimer to gain scalability and generalization by leveraging both static and dynamic scene datasets. Given a casual monocular dynamic video, BTimer reconstructs a bullet-time scene within 150ms while reaching state-of-the-art performance on both static and dynamic scene datasets, even compared with optimization-based approaches.

近期在静态前馈场景重建方面的进展显著提升了高质量新视图合成的效果。然而,这些模型通常在适应多样化环境方面表现欠佳,并且难以有效处理动态内容。我们提出了 BTimer(全称 BulletTimer),这是首个面向动态场景实时重建与新视图合成的运动感知前馈模型。 我们的方法通过聚合所有上下文帧的信息,在指定的目标时间点(即“子弹时间”)以三维高斯点绘(3D Gaussian Splatting)表示形式重建完整场景。这样的设计使 BTimer 能够利用静态和动态场景数据集,获得良好的扩展性和泛化能力。 面对一个普通的单目动态视频,BTimer 能在 150 毫秒内重建子弹时间场景,并在静态和动态场景数据集上均实现了当前最先进的性能,甚至超越了一些基于优化的方法。