面向事件相机分类与回归的高效且有效的基于点的方法再思考:EventMamba

Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba

摘要 Abstract

事件相机从生物系统中汲取灵感,具有低延迟、高动态范围且功耗极低的特点。当前处理事件云的最常用方法是将其转换为帧表示,这种方法忽略了事件的稀疏性,丢失了精细的时间信息,并增加了计算负担。相比之下,点云是一种流行的三维数据处理表示方法,可以用于提取局部和全局的空间特征。然而,与基于帧的方法相比,先前的基于点的方法在处理时空事件流时表现不佳。为了弥补这一差距,我们提出了EventMamba,这是一种基于点云表示的高效且有效的方法框架,重新思考了事件云与点云之间的区别,强调了重要的时间信息。随后,事件云被馈入一个分层结构中,包含分阶段的模块,以处理隐式和显式的时间特征。具体而言,我们重新设计了全局提取器,通过时间聚合和基于状态空间模型(SSM)的Mamba增强对长事件序列的显式时间提取。实验表明,我们的模型在消耗最少计算资源的同时,在六个不同规模的动作识别数据集上仍表现出SOTA的基于点的方法性能。此外,在摄像机姿态重定位(CPR)和眼动追踪回归任务上,我们的模型甚至优于所有基于帧的方法。我们的代码可在https://github.com/rhwxmx/EventMamba获取。

Event cameras draw inspiration from biological systems, boasting low latency and high dynamic range while consuming minimal power. The most current approach to processing Event Cloud often involves converting it into frame-based representations, which neglects the sparsity of events, loses fine-grained temporal information, and increases the computational burden. In contrast, Point Cloud is a popular representation for processing 3-dimensional data and serves as an alternative method to exploit local and global spatial features. Nevertheless, previous point-based methods show an unsatisfactory performance compared to the frame-based method in dealing with spatio-temporal event streams. In order to bridge the gap, we propose EventMamba, an efficient and effective framework based on Point Cloud representation by rethinking the distinction between Event Cloud and Point Cloud, emphasizing vital temporal information. The Event Cloud is subsequently fed into a hierarchical structure with staged modules to process both implicit and explicit temporal features. Specifically, we redesign the global extractor to enhance explicit temporal extraction among a long sequence of events with temporal aggregation and State Space Model (SSM) based Mamba. Our model consumes minimal computational resources in the experiments and still exhibits SOTA point-based performance on six different scales of action recognition datasets. It even outperformed all frame-based methods on both Camera Pose Relocalization (CPR) and eye-tracking regression tasks. Our code is available at: https://github.com/rhwxmx/EventMamba.