SPMTrack:基于专家混合的时空参数高效微调用于可扩展视觉跟踪
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking
摘要 Abstract
大多数最先进的跟踪器采用单流范式,使用单一的视觉Transformer进行模板图像和搜索区域图像的联合特征提取和关系建模。然而,不同图像块之间的关系建模表现出显著的变化。例如,由目标无关信息主导的背景区域需要减少注意力分配,而前景,特别是边界区域,则需要被强调。单一模型可能无法同时有效处理各种关系建模。本文提出了一种名为SPMTrack的新跟踪器,该跟踪器基于为视觉跟踪任务(TMoE)定制的专家混合方法,结合了多个专家处理多样化关系建模的能力,更加灵活。得益于TMoE,我们将关系建模从图像对扩展到时空上下文,进一步提高了跟踪精度,同时模型参数增加最小。此外,我们利用TMoE作为一种参数高效的微调方法,大大减少了可训练参数的数量,使我们能够高效地训练不同规模的SPMTrack,并保持预训练模型的泛化能力,从而实现卓越的性能。我们在七个数据集上进行了实验,实验结果表明,我们的方法显著优于当前最先进的跟踪器。源代码可在https://github.com/WenRuiCai/SPMTrack获取。
Most state-of-the-art trackers adopt one-stream paradigm, using a single Vision Transformer for joint feature extraction and relation modeling of template and search region images. However, relation modeling between different image patches exhibits significant variations. For instance, background regions dominated by target-irrelevant information require reduced attention allocation, while foreground, particularly boundary areas, need to be be emphasized. A single model may not effectively handle all kinds of relation modeling simultaneously. In this paper, we propose a novel tracker called SPMTrack based on mixture-of-experts tailored for visual tracking task (TMoE), combining the capability of multiple experts to handle diverse relation modeling more flexibly. Benefiting from TMoE, we extend relation modeling from image pairs to spatio-temporal context, further improving tracking accuracy with minimal increase in model parameters. Moreover, we employ TMoE as a parameter-efficient fine-tuning method, substantially reducing trainable parameters, which enables us to train SPMTrack of varying scales efficiently and preserve the generalization ability of pretrained models to achieve superior performance. We conduct experiments on seven datasets, and experimental results demonstrate that our method significantly outperforms current state-of-the-art trackers. The source code is available at https://github.com/WenRuiCai/SPMTrack.