GS-PT: Exploiting 3D Gaussian Splatting for Comprehensive Point Cloud Understanding via Self-supervised Learning
Self-supervised learning of point cloud aims to leverage unlabeled 3D data to learn meaningful representations without reliance on manual annotations. However, current approaches face challenges such as limited data diversity and inadequate augmentation for effective feature learning. To address these challenges, we propose GS-PT, which integrates 3D Gaussian Splatting (3DGS) into point cloud self-supervised learning for the first time. Our pipeline utilizes transformers as the backbone for self-supervised pre-training and introduces novel contrastive learning tasks through 3DGS. Specifically, the transformers aim to reconstruct the masked point cloud. 3DGS utilizes multi-view rendered images as input to generate enhanced point cloud distributions and novel view images, facilitating data augmentation and cross-modal contrastive learning. Additionally, we incorporate features from depth maps. By optimizing these tasks collectively, our method enriches the tri-modal self-supervised learning process, enabling the model to leverage the correlation across 3D point clouds and 2D images from various modalities. We freeze the encoder after pre-training and test the model's performance on multiple downstream tasks. Experimental results indicate that GS-PT outperforms the off-the-shelf self-supervised learning methods on various downstream tasks including 3D object classification, real-world classifications, and few-shot learning and segmentation.
点云的自监督学习旨在利用未标注的3D数据学习有意义的表示,而无需依赖人工标注。然而,当前的方法面临数据多样性有限和增强策略不足等挑战,导致特征学习效果不佳。为了解决这些问题,我们首次将3D高斯分裂(3DGS)引入到点云的自监督学习中,提出了GS-PT方法。我们的管道采用Transformer作为自监督预训练的骨干网络,并通过3DGS引入了新的对比学习任务。具体来说,Transformer旨在重建被遮掩的点云,而3DGS则利用多视图渲染图像作为输入,生成增强的点云分布和新视角图像,促进数据增强和跨模态对比学习。此外,我们还结合了来自深度图的特征。通过共同优化这些任务,我们的方法丰富了三模态的自监督学习过程,使模型能够利用不同模态下3D点云和2D图像之间的相关性。我们在预训练后冻结编码器,并测试模型在多个下游任务上的表现。实验结果表明,GS-PT在多个下游任务上(包括3D物体分类、真实世界分类、少样本学习和分割)均优于现有的自监督学习方法。