变换器编码器中渐进式标记长度缩放用于高效通用分割

Research

arXiv

Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation

Yumin Suh ,

摘要 Abstract

一种用于通用分割的强大架构依赖于变换器，该变换器对多尺度图像特征进行编码，并将对象查询解码为掩码预测。在扩展此类模型时，效率是一个高优先级问题。我们观察到最先进的方法Mask2Former有50%的计算量仅用于变换器编码器。这是因为在每个编码层中保留了所有主干特征尺度的完整长度标记表示。基于这一观察，我们提出了一种称为PROgressive Token Length SCALing for Efficient transformer encoders（PRO-SCALE）的策略，可以插入到Mask2Former分割架构中，显著减少计算成本。PRO-SCALE的基本原理是：随着编码器层数的增加，逐步调整标记的长度。这使得PRO-SCALE能够在性能几乎无损的情况下大幅减少计算量（在COCO数据集上，编码器部分计算量减少约52%，整体计算量减少约27%）。在公共基准上的实验表明，PRO-SCALE在架构配置方面具有灵活性，并展示了其在超越分割任务设置时应用于目标检测的潜力。代码见：https://github.com/abhishekaich27/proscale-pytorch

A powerful architecture for universal segmentation relies on transformers that encode multi-scale image features and decode object queries into mask predictions. With efficiency being a high priority for scaling such models, we observed that the state-of-the-art method Mask2Former uses 50% of its compute only on the transformer encoder. This is due to the retention of a full-length token-level representation of all backbone feature scales at each encoder layer. With this observation, we propose a strategy termed PROgressive Token Length SCALing for Efficient transformer encoders (PRO-SCALE) that can be plugged-in to the Mask2Former segmentation architecture to significantly reduce the computational cost. The underlying principle of PRO-SCALE is: progressively scale the length of the tokens with the layers of the encoder. This allows PRO-SCALE to reduce computations by a large margin with minimal sacrifice in performance (~52% encoder and ~27% overall GFLOPs reduction with no drop in performance on COCO dataset). Experiments conducted on public benchmarks demonstrates PRO-SCALE's flexibility in architectural configurations, and exhibits potential for extension beyond the settings of segmentation tasks to encompass object detection. Code here: https://github.com/abhishekaich27/proscale-pytorch