Many recent developments for robots to represent environments have focused on photorealistic reconstructions. This paper particularly focuses on generating sequences of images from the photorealistic Gaussian Splatting models, that match instructions that are given by user-inputted language. We contribute a novel framework, SplaTraj, which formulates the generation of images within photorealistic environment representations as a continuous-time trajectory optimization problem. Costs are designed so that a camera following the trajectory poses will smoothly traverse through the environment and render the specified spatial information in a photogenic manner. This is achieved by querying a photorealistic representation with language embedding to isolate regions that correspond to the user-specified inputs. These regions are then projected to the camera's view as it moves over time and a cost is constructed. We can then apply gradient-based optimization and differentiate through the rendering to optimize the trajectory for the defined cost. The resulting trajectory moves to photogenically view each of the specified objects. We empirically evaluate our approach on a suite of environments and instructions, and demonstrate the quality of generated image sequences.
许多最近的发展集中在为机器人构建环境的照片级真实感重建上。本文特别关注从照片级高斯点模型生成图像序列,这些图像与用户输入的语言指令相匹配。我们提出了一个新框架,SplaTraj,将在照片级环境表示中生成图像的问题形式化为一个连续时间轨迹优化问题。通过设计成本函数,使得沿着轨迹移动的相机能够平滑地穿过环境,并以一种美观的方式渲染指定的空间信息。该方法通过查询照片级表示与语言嵌入,将对应用户指定输入的区域隔离出来。这些区域随后在相机随时间移动时投射到相机视角中,并构建成本函数。我们随后可以应用基于梯度的优化方法,通过渲染的可微分性优化轨迹以满足定义的成本。最终的轨迹能够以美观的方式查看每个指定的对象。我们在一组环境和指令上对该方法进行了实证评估,并展示了生成图像序列的质量。