Sparse-view 3D reconstruction stands as a formidable challenge in computer vision, aiming to build complete three-dimensional models from a limited array of viewing perspectives. This task confronts several difficulties: 1) the limited number of input images that lack consistent information; 2) dependence on the quality of input images; and 3) the substantial size of model parameters. To address these challenges, we propose a self-augmented coarse-to-fine Gaussian splatting paradigm, enhanced with a structure-aware mask, for sparse-view 3D reconstruction. In particular, our method initially employs a coarse Gaussian model to obtain a basic 3D representation from sparse-view inputs. Subsequently, we develop a fine Gaussian network to enhance consistent and detailed representation of the output with both 3D geometry augmentation and perceptual view augmentation. During training, we design a structure-aware masking strategy to further improve the model's robustness against sparse inputs and noise.Experimental results on the MipNeRF360 and OmniObject3D datasets demonstrate that the proposed method achieves state-of-the-art performances for sparse input views in both perceptual quality and efficiency.
稀疏视角的3D重建是计算机视觉领域的一项艰巨挑战,其目标是在有限的观察角度下构建完整的三维模型。此任务面临多重困难:1)输入图像数量有限,信息不一致;2)依赖输入图像的质量;3)模型参数规模庞大。为了解决这些问题,我们提出了一种自增强的粗到细高斯点绘范式,并结合了结构感知掩码,用于稀疏视角的3D重建。具体而言,我们的方法首先采用粗略的高斯模型,从稀疏视角输入中获得基础的3D表示。随后,我们开发了一个精细的高斯网络,通过3D几何增强和感知视角增强来提高输出的一致性和细节表示。在训练过程中,我们设计了一种结构感知掩码策略,以进一步提高模型对稀疏输入和噪声的鲁棒性。实验结果表明,在MipNeRF360和OmniObject3D数据集上,所提出的方法在稀疏输入视角下的感知质量和效率方面均达到了当前最先进的水平。