3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, whose labels are in accordance with the respective datasets. The creation of this dataset utilized the compute equivalent of 2 GPU years on a TITAN XP GPU. We utilize our dataset for unsupervised pretraining and supervised finetuning for classification and segmentation tasks. To this end, we introduce Gaussian-MAE, which highlights the unique benefits of representation learning from Gaussian parameters. Through exhaustive experiments, we provide several valuable insights. In particular, we show that (1) the distribution of the optimized GS centroids significantly differs from the uniformly sampled point cloud (used for initialization) counterpart; (2) this change in distribution results in degradation in classification but improvement in segmentation tasks when using only the centroids; (3) to leverage additional Gaussian parameters, we propose Gaussian feature grouping in a normalized feature space, along with splats pooling layer, offering a tailored solution to effectively group and embed similar Gaussians, which leads to notable improvement in finetuning tasks.
3D Gaussian Splatting (3DGS) 已成为许多视觉任务中三维表示的实际标准。这促使我们直接在这种表示空间中进行三维理解的研究。为推动这一方向的研究,我们首先使用常用的 ShapeNet 和 ModelNet 数据集构建了一个大规模的 3DGS 数据集。我们的数据集 ShapeSplat 包含来自 87 个独特类别的 65K 个对象,其标签与各自的数据集一致。创建此数据集使用了相当于在 TITAN XP GPU 上两年 GPU 计算时间的资源。 我们将此数据集用于无监督预训练和有监督微调,以进行分类和分割任务。为此,我们引入了 Gaussian-MAE 方法,突出了从高斯参数进行表示学习的独特优势。通过详尽的实验,我们提供了若干有价值的见解。特别是,我们表明:(1) 优化后的 GS 质心的分布与均匀采样点云(用于初始化)的分布显著不同;(2) 当仅使用质心时,这种分布变化会导致分类任务的性能下降,但分割任务的性能提升;(3) 为了利用额外的高斯参数,我们提出了在归一化特征空间中进行高斯特征分组,以及 splats pooling 层,提供了一种定制的解决方案来有效地分组和嵌入相似的高斯分布,从而在微调任务中带来显著改进。