diff --git a/README.md b/README.md index c8dc43c..909b79b 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ Our method takes in a dataset of camera poses and training images, a trained 3DG Our process is similar to Instruct-NeRF2NeRF where for a given training camera view, we set the original training image as the conditioning image, the noisy image input as the NeRF rendered from the camera combined with some randomly selected noise, and receive an edited image respecting the text conditioning. With this method we are able to propagate the edited changes to the GS scene. We are able to maintain grounded edits by conditioning Instruct-Pix2Pix on the original unedited training image. ### Implementation -We use Nerfstudio's gsplat library for our underlying gaussian splatting model. We adapt similar parameters for the diffusion model from Instruct-NeRF2NeRF. Among these are the values for $[t_\text{min}, t_\text{max}] = [0.70,0.98]$, which define the amount of noise (and therefore the amount signal retained from the original images). We vary the classifier-free guidance scales per edit and scene, using a range of values from $s_I=(1.2,1.5)$ and $s_T=(7.5,12.5)$. We edit the entire dataset and then train the scene for 2.5k iterations. For GS training, we use L1 and LPIPS losses. We train our method for a maximum of 30k iterations (starting with a GS scene trained for 20k iterations). However, in practice we stop training once the edit has converged. In many cases, the optimal training length is a subjective decision — a user may prefer more subtle or more extreme edits that are best found at different stages of training. +We use Nerfstudio's gsplat library for our underlying gaussian splatting model. We adapt similar parameters for the diffusion model from Instruct-NeRF2NeRF. Among these are the values that define the amount of noise (and therefore the amount signal retained from the original images). We vary the classifier-free guidance scales per edit and scene, using a range of values. We edit the entire dataset and then train the scene for 2.5k iterations. For GS training, we use L1 and LPIPS losses. We train our method for a maximum of 30k iterations (starting with a GS scene trained for 20k iterations). However, in practice we stop training once the edit has converged. In many cases, the optimal training length is a subjective decision — a user may prefer more subtle or more extreme edits that are best found at different stages of training. # Results Our qualitative results are shown in our first video and the following results. For each edit, we show multiple views to illustrate the 3D consistency. On the portrait capture in the first video, we are able to perform the same edits as Instruct-NeRF2NeRF, as well as new edits like "turn him into a Lego Man." In certain cases, the results look more 3D consistent and higher quality, and we provide a comparison below. However, the gaussian splatting representation makes it challenging to add entirely new geometry. These edits also extend to subjects other than people, like changing a bear statue into a real polar bear, panda, and grizzly bear. We are able to edit large-scale scenes just like Instruct-NeRF2NeRF, while maintaining the same level of 3D consistency. @@ -57,4 +57,4 @@ Repo: [https://github.com/cvachha/instruct-gs2gs](https://github.com/cvachha/ins ## Acknowledgements -We thank our instructors Alexei A. Efros and Angjoo Kanazawa for their support on this project. We would also like to thank the Nerfstudio and gsplat team for providing the 3D Gaussian Splatting implementation. \ No newline at end of file +We thank our instructors Alexei A. Efros and Angjoo Kanazawa for their support on this project. We would also like to thank the Nerfstudio and gsplat team for providing the 3D Gaussian Splatting implementation.