GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving
Self-supervised learning has made substantial strides in image processing, while visual pre-training for autonomous driving is still in its infancy. Existing methods often focus on learning geometric scene information while neglecting texture or treating both aspects separately, hindering comprehensive scene understanding. In this context, we are excited to introduce GaussianPretrain, a novel pre-training paradigm that achieves a holistic understanding of the scene by uniformly integrating geometric and texture representations. Conceptualizing 3D Gaussian anchors as volumetric LiDAR points, our method learns a deepened understanding of scenes to enhance pre-training performance with detailed spatial structure and texture, achieving that 40.6% faster than NeRF-based method UniPAD with 70% GPU memory only. We demonstrate the effectiveness of GaussianPretrain across multiple 3D perception tasks, showing significant performance improvements, such as a 7.05% increase in NDS for 3D object detection, boosts mAP by 1.9% in HD map construction and 0.8% improvement on Occupancy prediction. These significant gains highlight GaussianPretrain's theoretical innovation and strong practical potential, promoting visual pre-training development for autonomous driving.
自监督学习在图像处理领域取得了显著进展,而用于自动驾驶的视觉预训练仍处于起步阶段。现有方法通常专注于学习场景的几何信息,却忽视了纹理,或将两者分开处理,从而阻碍了对场景的全面理解。在此背景下,我们提出 GaussianPretrain,一种新颖的预训练范式,通过统一整合几何和纹理表示,实现对场景的整体理解。 我们将 3D 高斯锚点概念化为体积化的 LiDAR 点,通过这种方法学习场景的深度理解,从而增强预训练性能,并在捕获详细的空间结构和纹理的同时,比基于 NeRF 的方法 UniPAD 快 40.6%,且仅消耗 70% 的 GPU 内存。 在多个 3D 感知任务中的实验表明,GaussianPretrain 带来了显著的性能提升。例如,在 3D 目标检测中 NDS 提升 7.05%,在高清地图构建中 mAP 提升 1.9%,以及在占用预测中 提升 0.8%。这些显著的性能增益突显了 GaussianPretrain 的理论创新与强大的实用潜力,为自动驾驶的视觉预训练发展提供了新的推动力。