We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes. Specifically, 1) we leverage MVS to encode geometry-aware Gaussian representations and decode them into Gaussian parameters. 2) To further enhance performance, we propose a hybrid Gaussian rendering that integrates an efficient volume rendering design for novel view synthesis. 3) To support fast fine-tuning for specific scenes, we introduce a multi-view geometric consistent aggregation strategy to effectively aggregate the point clouds generated by the generalizable model, serving as the initialization for per-scene optimization. Compared with previous generalizable NeRF-based methods, which typically require minutes of fine-tuning and seconds of rendering per image, MVSGaussian achieves real-time rendering with better synthesis quality for each scene. Compared with the vanilla 3D-GS, MVSGaussian achieves better view synthesis with less training computational cost. Extensive experiments on DTU, Real Forward-facing, NeRF Synthetic, and Tanks and Temples datasets validate that MVSGaussian attains state-of-the-art performance with convincing generalizability, real-time rendering speed, and fast per-scene optimization.
我们提出了MVSGaussian,这是一种新的从多视图立体(MVS)衍生的通用3D高斯表示方法,能够高效地重建未见过的场景。具体来说,1)我们利用MVS来编码具有几何意识的高斯表示,并将其解码为高斯参数。2)为了进一步提高性能,我们提出了一种混合高斯渲染技术,该技术整合了一种高效的体积渲染设计,用于新视角合成。3)为了支持特定场景的快速微调,我们引入了一种多视图几何一致性聚合策略,有效地聚合由通用模型生成的点云,作为每个场景优化的初始化。与之前需要几分钟微调时间和每幅图像几秒钟渲染时间的通用NeRF-based方法相比,MVSGaussian实现了每个场景更好的合成质量的实时渲染。与原始的3D-GS相比,MVSGaussian在更低的训练计算成本下实现了更好的视图合成。在DTU、Real Forward-facing、NeRF Synthetic以及Tanks and Temples数据集上的广泛实验验证了MVSGaussian具有卓越的性能、令人信服的通用性、实时渲染速度和快速的场景特定优化能力。