Identifying affordance regions on 3D objects from semantic cues is essential for robotics and human-machine interaction. However, existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data and a reliance on 3D backbones focused on geometric encoding, which often lack resilience to real-world noise and data corruption. We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models. We employ a dual-branch architecture with Gaussian splatting to establish consistent mappings between 3D point clouds and 2D representations, enabling realistic 2D renderings from sparse point clouds. A granularity-adaptive fusion module and a 2D-3D consistency alignment module further strengthen cross-modal alignment and knowledge transfer, allowing the 3D branch to benefit from the rich semantics and generalization capacity of 2D models. To holistically assess the robustness, we introduce two new corruption-based benchmarks: PIAD-C and LASO-C. Extensive experiments on public datasets and our benchmarks show that GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data, demonstrating robust and adaptable affordance prediction under diverse conditions.
从语义线索中识别3D对象的可供性区域对机器人学和人机交互至关重要。然而,现有的3D可供性学习方法由于标注数据的有限性以及过度依赖几何编码的3D网络结构,往往在泛化性和鲁棒性上表现欠佳,特别是在面对现实世界中的噪声和数据损坏时。 我们提出了 GEAL,一种旨在通过利用大规模预训练的2D模型来增强3D可供性学习的泛化性和鲁棒性的新框架。GEAL 采用双分支架构,并结合高斯投影(Gaussian Splatting)技术,在3D点云与2D表示之间建立一致的映射,从稀疏点云生成真实感的2D渲染图。框架中设计了粒度自适应融合模块(Granularity-Adaptive Fusion Module)和2D-3D一致性对齐模块(2D-3D Consistency Alignment Module),进一步加强了跨模态对齐与知识迁移,使得3D分支能够充分利用2D模型的丰富语义信息和强泛化能力。 为全面评估鲁棒性,我们引入了两个新的基于损坏的基准测试集:PIAD-C 和 LASO-C。在公开数据集及新基准上的广泛实验表明,GEAL 在已知和新类别对象以及损坏数据场景下的表现显著优于现有方法,展现出强大的鲁棒性和适应性,能够在多样化条件下实现准确的可供性预测。