基于高光谱视频的目标跟踪高光谱适配器

Hyperspectral Adapter for Object Tracking based on Hyperspectral Video

摘要 Abstract

基于高光谱视频的目标跟踪吸引了越来越多的关注,因为高光谱视频包含了丰富的材质和运动信息。目前主流的高光谱方法通过在高光谱数据集上微调整个预训练的基于RGB的目标跟踪网络来适应高光谱任务,在具有挑战性的场景中取得了令人印象深刻的结果。然而,高光谱跟踪器的性能受到在转换过程中光谱信息丢失的限制,并且对整个预训练网络进行微调在实际应用中效率低下。为了解决这些问题,本文提出了一种新的高光谱目标跟踪方法——高光谱跟踪适配器(HyA-T)。提出了自注意力的高光谱适配器(HAS)和多层感知器的高光谱适配器(HAM),通过将适配信息增强到多头自注意力(MSA)模块和多层感知器(MLP)的计算中,从而生成适配信息并转移预训练网络中的MSA和MLP用于高光谱目标跟踪任务。此外,还提出了输入的高光谱增强(HEI),将原始光谱信息增强到跟踪网络的输入中。所提出的方法直接从高光谱图像中提取光谱信息,防止了光谱信息的丢失。此外,仅微调所提出方法中的参数,这比现有方法更高效。在四个具有不同光谱波段的数据集上进行了广泛的实验,验证了所提出方法的有效性。HyA-T在所有数据集上达到了最先进的性能。

Object tracking based on hyperspectral video attracts increasing attention to the rich material and motion information in the hyperspectral videos. The prevailing hyperspectral methods adapt pretrained RGB-based object tracking networks for hyperspectral tasks by fine-tuning the entire network on hyperspectral datasets, which achieves impressive results in challenging scenarios. However, the performance of hyperspectral trackers is limited by the loss of spectral information during the transformation, and fine-tuning the entire pretrained network is inefficient for practical applications. To address the issues, a new hyperspectral object tracking method, hyperspectral adapter for tracking (HyA-T), is proposed in this work. The hyperspectral adapter for the self-attention (HAS) and the hyperspectral adapter for the multilayer perceptron (HAM) are proposed to generate the adaption information and to transfer the multi-head self-attention (MSA) module and the multilayer perceptron (MLP) in pretrained network for the hyperspectral object tracking task by augmenting the adaption information into the calculation of the MSA and MLP. Additionally, the hyperspectral enhancement of input (HEI) is proposed to augment the original spectral information into the input of the tracking network. The proposed methods extract spectral information directly from the hyperspectral images, which prevent the loss of the spectral information. Moreover, only the parameters in the proposed methods are fine-tuned, which is more efficient than the existing methods. Extensive experiments were conducted on four datasets with various spectral bands, verifing the effectiveness of the proposed methods. The HyA-T achieves state-of-the-art performance on all the datasets.

基于高光谱视频的目标跟踪高光谱适配器 - arXiv