摘要 Abstract
事件相机通过运动获取场景外观信息,即对于事件相机而言,运动与外观要么同时存在,要么都不存在,这被编码在输出的事件流中。以往的工作将恢复这两种视觉量视为独立任务,这不符合事件相机的本质,并忽视了两者之间的内在联系。本文提出了一种无监督学习框架,通过单一网络联合估计光流(运动)和图像亮度(外观)。从事件生成模型出发,我们推导出基于事件的光度误差作为光流和图像亮度的函数,并进一步结合对比度最大化框架,得到一个综合损失函数,为光流和亮度估计提供适当的约束。大量实验表明,我们的模型在光流(在无监督学习类别中EPE和AE分别提高了20%和25%)和亮度估计(与其他基线相比具有竞争力,尤其是在高动态范围场景下)方面均达到了最先进的性能。最后但同样重要的是,我们的模型在推理时间上比其他所有光流模型以及许多图像重建模型更短,尽管它们只输出一种量。项目页面:https://github.com/tub-rip/e2fai
Event cameras rely on motion to obtain information about scene appearance. In other words, for event cameras, motion and appearance are seen both or neither, which are encoded in the output event stream. Previous works consider recovering these two visual quantities as separate tasks, which does not fit with the nature of event cameras and neglects the inherent relations between both tasks. In this paper, we propose an unsupervised learning framework that jointly estimates optical flow (motion) and image intensity (appearance), with a single network. Starting from the event generation model, we newly derive the event-based photometric error as a function of optical flow and image intensity, which is further combined with the contrast maximization framework, yielding a comprehensive loss function that provides proper constraints for both flow and intensity estimation. Exhaustive experiments show that our model achieves state-of-the-art performance for both optical flow (achieves 20% and 25% improvement in EPE and AE respectively in the unsupervised learning category) and intensity estimation (produces competitive results with other baselines, particularly in high dynamic range scenarios). Last but not least, our model achieves shorter inference time than all the other optical flow models and many of the image reconstruction models, while they output only one quantity. Project page: https://github.com/tub-rip/e2fai