ETAP: 任意点基于事件的跟踪

ETAP: Event-based Tracking of Any Point

摘要 Abstract

任意点跟踪(TAP)最近将运动估计范式从关注具有局部模板的显著点转向利用全局图像上下文跟踪任意点。然而,尽管研究主要集中于在理想条件下提高模型的准确性,但由于传感器的局限性,处理困难光照条件和高速运动场景仍然遥不可及。本文提出了首个基于事件相机的TAP方法,利用事件相机的高时间分辨率和高动态范围实现鲁棒的高速跟踪,并结合TAP方法中的全局上下文处理异步和稀疏的事件测量。我们进一步通过引入一种新颖的特征对齐损失扩展了TAP框架,以应对由运动引起的事件特征变化——这是纯事件驱动跟踪领域的一个开放挑战——从而确保学习到对运动鲁棒的特征。我们的方法使用新的数据生成管道进行训练,并在所有设计决策上进行了系统性的消融实验。结果显示,我们的方法在跨数据集泛化方面表现强劲,在平均Jaccard指标上比基线方法提高了136%。此外,在一个已建立的特征跟踪基准测试中,我们的方法比之前的最佳事件仅方法提升了20%,甚至超过了之前的最佳事件加帧方法4.1%。我们的代码可在https://github.com/tub-rip/ETAP获取。

Tracking any point (TAP) recently shifted the motion estimation paradigm from focusing on individual salient points with local templates to tracking arbitrary points with global image contexts. However, while research has mostly focused on driving the accuracy of models in nominal settings, addressing scenarios with difficult lighting conditions and high-speed motions remains out of reach due to the limitations of the sensor. This work addresses this challenge with the first event camera-based TAP method. It leverages the high temporal resolution and high dynamic range of event cameras for robust high-speed tracking, and the global contexts in TAP methods to handle asynchronous and sparse event measurements. We further extend the TAP framework to handle event feature variations induced by motion -- thereby addressing an open challenge in purely event-based tracking -- with a novel feature-alignment loss which ensures the learning of motion-robust features. Our method is trained with data from a new data generation pipeline and systematically ablated across all design decisions. Our method shows strong cross-dataset generalization and performs 136% better on the average Jaccard metric than the baselines. Moreover, on an established feature tracking benchmark, it achieves a 20% improvement over the previous best event-only method and even surpasses the previous best events-and-frames method by 4.1%. Our code is available at https://github.com/tub-rip/ETAP