SceneSplat：基于高斯点 splatting 的场景理解与视觉-语言预训练

识别任意或之前未见过的类别对于全面的现实世界 3D 场景理解至关重要。目前，所有现有方法在训练过程中都依赖于 2D 或文本模态，或者在推理时结合两者。这凸显了一个明显的缺失，即没有一种模型能够单独处理 3D 数据以端到端地学习语义，并且缺乏训练此类模型所需的数据。同时，3D 高斯点 splatting (3DGS) 已成为各种视觉任务中 3D 场景表示的事实标准。然而，以通用的方式将语义推理有效整合到 3DGS 中仍然是一个开放的挑战。为了解决这些限制，我们引入了 SceneSplat，据我们所知，这是首个针对 3DGS 原生操作的大规模室内场景理解方法。此外，我们提出了一种自监督学习方案，可以利用未标注场景解锁丰富的 3D 特征学习。为了支持所提出的方案，我们推出了 SceneSplat-7K，这是首个用于室内场景的大规模 3DGS 数据集，包含来自 ScanNet、Matterport3D 等 7 个已建立数据集的 6868 个场景。生成 SceneSplat-7K 所需的计算资源相当于在 L4 GPU 上运行 119 个 GPU 天，从而实现了基于 3DGS 的室内场景推理标准化基准测试。我们在 SceneSplat-7K 上的详尽实验表明，所提出的方法相对于现有基线具有显著优势。

Recognizing arbitrary or previously unseen categories is essential for comprehensive real-world 3D scene understanding. Currently, all existing methods rely on 2D or textual modalities during training, or together at inference. This highlights a clear absence of a model capable of processing 3D data alone for learning semantics end-to-end, along with the necessary data to train such a model. Meanwhile, 3D Gaussian Splatting (3DGS) has emerged as the de facto standard for 3D scene representation across various vision tasks. However, effectively integrating semantic reasoning into 3DGS in a generalizable fashion remains an open challenge. To address these limitations we introduce SceneSplat, to our knowledge the first large-scale 3D indoor scene understanding approach that operates natively on 3DGS. Furthermore, we propose a self-supervised learning scheme that unlocks rich 3D feature learning from unlabeled scenes. In order to power the proposed methods, we introduce SceneSplat-7K, the first large-scale 3DGS dataset for indoor scenes, comprising of 6868 scenes derived from 7 established datasets like ScanNet, Matterport3D, etc. Generating SceneSplat-7K required computational resources equivalent to 119 GPU-days on an L4 GPU, enabling standardized benchmarking for 3DGS-based reasoning for indoor scenes. Our exhaustive experiments on SceneSplat-7K demonstrate the significant benefit of the proposed methods over the established baselines.