This paper presents GGRt, a novel approach to generalizable novel view synthesis that alleviates the need for real camera poses, complexity in processing high-resolution images, and lengthy optimization processes, thus facilitating stronger applicability of 3D Gaussian Splatting (3D-GS) in real-world scenarios. Specifically, we design a novel joint learning framework that consists of an Iterative Pose Optimization Network (IPO-Net) and a Generalizable 3D-Gaussians (G-3DG) model. With the joint learning mechanism, the proposed framework can inherently estimate robust relative pose information from the image observations and thus primarily alleviate the requirement of real camera poses. Moreover, we implement a deferred back-propagation mechanism that enables high-resolution training and inference, overcoming the resolution constraints of previous methods. To enhance the speed and efficiency, we further introduce a progressive Gaussian cache module that dynamically adjusts during training and inference. As the first pose-free generalizable 3D-GS framework, GGRt achieves inference at ≥ 5 FPS and real-time rendering at ≥ 100 FPS. Through extensive experimentation, we demonstrate that our method outperforms existing NeRF-based pose-free techniques in terms of inference speed and effectiveness. It can also approach the real pose-based 3D-GS methods. Our contributions provide a significant leap forward for the integration of computer vision and computer graphics into practical applications, offering state-of-the-art results on LLFF, KITTI, and Waymo Open datasets and enabling real-time rendering for immersive experiences.
本文提出了GGRt,一种新颖的可泛化新视角合成方法,该方法减轻了对真实相机姿态的需求、处理高分辨率图像的复杂性以及漫长的优化过程,从而加强了3D高斯溅射(3D-GS)在现实世界场景中的应用性。具体来说,我们设计了一个新颖的联合学习框架,该框架由迭代姿态优化网络(IPO-Net)和可泛化3D高斯模型(G-3DG)组成。借助联合学习机制,所提出的框架可以从图像观测中固有地估计出稳健的相对姿态信息,从而主要减轻了对真实相机姿态的需求。此外,我们实现了一种延迟反向传播机制,使得高分辨率训练和推断成为可能,克服了先前方法的分辨率限制。为了提高速度和效率,我们进一步引入了一个渐进式高斯缓存模块,该模块在训练和推断过程中动态调整。作为首个无姿态可泛化3D-GS框架,GGRt实现了≥5 FPS的推断速度和≥100 FPS的实时渲染速度。通过广泛的实验,我们证明了我们的方法在推断速度和有效性方面超越了现有的基于NeRF的无姿态技术。它还可以接近真实姿态基的3D-GS方法。我们的贡献为计算机视觉与计算机图形学融入实际应用提供了重大进步,在LLFF、KITTI和Waymo Open数据集上提供了最先进的结果,并实现了沉浸式体验的实时渲染。