The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (e.g., 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes.
近几年来,从图像到3D重建的领域迅速发展,首先是神经辐射场(NeRF)的引入,最近则是3D高斯喷溅(3DGS)。后者在训练和推理速度以及重建质量方面,相较于NeRF有显著的优势。尽管3DGS在密集输入图像中表现良好,但在极其稀疏输入图像(例如,3张图像)的更具挑战性的设置中,类似于无结构点云的表示很快就会过度拟合,从新的视角看上去像是一团乱麻。为了解决这个问题,我们提出了正则化优化和基于深度的初始化。我们的关键思想是引入一个可以在2D图像空间中控制的结构化高斯表示。然后,我们约束高斯,特别是它们的位置,并防止它们在优化过程中独立移动。具体来说,我们通过一个隐式卷积解码器和总变分损失分别引入单视图和多视图约束。通过对高斯引入连贯性,我们进一步通过基于流的损失函数约束优化。为了支持我们的正则化优化,我们提出了一种使用每个输入视图处的单目深度估计来初始化高斯的方法。我们在多种场景上与最新的稀疏视图基于NeRF的方法相比,展示了显著的改进。