VoteFlow:在自监督场景流中强制局部刚性

VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow

摘要 Abstract

场景流估计旨在从两次相邻的激光雷达扫描中恢复每个点的运动。然而,在自动驾驶等真实世界应用中,点很少独立移动,特别是属于同一物体的邻近点通常共享相同的运动。在自监督场景流估计中引入这种局部刚体运动约束一直是一个关键挑战,这通常通过后处理或附加正则化来解决。尽管这些方法能够提高预测流的刚性,但在模型结构中缺乏对局部刚性的归纳偏置,导致学习效率低下且性能较差。相比之下,我们在神经网络设计中通过一个轻量级附加模块强制局部刚性,实现端到端学习。我们设计了一个离散化的投票空间以容纳所有可能的平移,并通过可微分投票识别由邻近点共享的平移。此外,为确保计算效率,我们基于体素而不是点进行操作,并为每个体素学习代表性的特征用于投票。我们将投票模块插入流行模型设计中,并在Argoverse 2和Waymo数据集上评估其优势。我们仅以边际计算开销超越基线工作。代码可在https://github.com/tudelft-iv/VoteFlow获取。

Scene flow estimation aims to recover per-point motion from two adjacent LiDAR scans. However, in real-world applications such as autonomous driving, points rarely move independently of others, especially for nearby points belonging to the same object, which often share the same motion. Incorporating this locally rigid motion constraint has been a key challenge in self-supervised scene flow estimation, which is often addressed by post-processing or appending extra regularization. While these approaches are able to improve the rigidity of predicted flows, they lack an architectural inductive bias for local rigidity within the model structure, leading to suboptimal learning efficiency and inferior performance. In contrast, we enforce local rigidity with a lightweight add-on module in neural network design, enabling end-to-end learning. We design a discretized voting space that accommodates all possible translations and then identify the one shared by nearby points by differentiable voting. Additionally, to ensure computational efficiency, we operate on pillars rather than points and learn representative features for voting per pillar. We plug the Voting Module into popular model designs and evaluate its benefit on Argoverse 2 and Waymo datasets. We outperform baseline works with only marginal compute overhead. Code is available at https://github.com/tudelft-iv/VoteFlow.