Object Gaussian for Monocular 6D Pose Estimation from Sparse Views

Monocular object pose estimation, as a pivotal task in computer vision and robotics, heavily depends on accurate 2D-3D correspondences, which often demand costly CAD models that may not be readily available. Object 3D reconstruction methods offer an alternative, among which recent advancements in 3D Gaussian Splatting (3DGS) afford a compelling potential. Yet its performance still suffers and tends to overfit with fewer input views. Embracing this challenge, we introduce SGPose, a novel framework for sparse view object pose estimation using Gaussian-based methods. Given as few as ten views, SGPose generates a geometric-aware representation by starting with a random cuboid initialization, eschewing reliance on Structure-from-Motion (SfM) pipeline-derived geometry as required by traditional 3DGS methods. SGPose removes the dependence on CAD models by regressing dense 2D-3D correspondences between images and the reconstructed model from sparse input and random initialization, while the geometric-consistent depth supervision and online synthetic view warping are key to the success. Experiments on typical benchmarks, especially on the Occlusion LM-O dataset, demonstrate that SGPose outperforms existing methods even under sparse view constraints, under-scoring its potential in real-world applications.

单目物体姿态估计作为计算机视觉和机器人学中的关键任务，严重依赖准确的 2D-3D 对应关系，这通常需要昂贵的 CAD 模型，而这些模型可能并不随时可得。物体 3D 重建方法提供了一种替代方案，其中近期在 3D 高斯点云（3DGS）领域的进展展现了极大的潜力。然而，它的性能仍受限，且在输入视角较少的情况下容易过拟合。面对这一挑战，我们引入了 SGPose，这是一个基于高斯方法的稀疏视角物体姿态估计新框架。给定少至十个视角，SGPose 通过从随机立方体初始化开始生成几何感知表示，避免了传统 3DGS 方法依赖于从运动结构（SfM）管道获得的几何结构。SGPose 通过回归图像与从稀疏输入和随机初始化中重建的模型之间的密集 2D-3D 对应关系，消除了对 CAD 模型的依赖，其中几何一致的深度监督和在线合成视图变形是其成功的关键。实验结果，特别是在 Occlusion LM-O 数据集上，表明 SGPose 在稀疏视角限制下优于现有方法，突显了其在实际应用中的潜力。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2409.02581.md

2409.02581.md

Object Gaussian for Monocular 6D Pose Estimation from Sparse Views

Files

2409.02581.md

Latest commit

History

2409.02581.md

File metadata and controls

Object Gaussian for Monocular 6D Pose Estimation from Sparse Views