We propose GaussCtrl, a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS). Our method first renders a collection of images by using the 3DGS and edits them by using a pre-trained 2D diffusion model (ControlNet) based on the input prompt, which is then used to optimise the 3D model. Our key contribution is multi-view consistent editing, which enables editing all images together instead of iteratively editing one image while updating the 3D model as in previous works. It leads to faster editing as well as higher visual quality. This is achieved by the two terms: (a) depth-conditioned editing that enforces geometric consistency across multi-view images by leveraging naturally consistent depth maps. (b) attention-based latent code alignment that unifies the appearance of edited images by conditioning their editing to several reference views through self and cross-view attention between images' latent representations. Experiments demonstrate that our method achieves faster editing and better visual results than previous state-of-the-art methods.
我们提出了GaussCtrl,这是一种基于文本的方法,用于编辑由3D高斯溅射(3DGS)重建的3D场景。我们的方法首先使用3DGS渲染一系列图像,并根据输入提示,使用预训练的2D扩散模型(ControlNet)编辑它们,然后用于优化3D模型。我们的关键贡献是多视图一致性编辑,它允许同时编辑所有图像,而不是像以前的工作那样,迭代编辑一个图像同时更新3D模型。这导致编辑速度更快以及更高的视觉质量。这通过两个条款实现: (a)深度条件编辑,通过利用自然一致的深度图来强制多视图图像间的几何一致性。 (b)基于注意力的潜码对齐,通过将编辑的图像条件化到几个参考视图上,并通过图像潜在表示之间的自注意力和交叉视图注意力,统一编辑图像的外观。实验表明,我们的方法比以往的最先进方法实现了更快的编辑和更好的视觉结果。