MDP:具有延迟约束的多维视觉模型剪枝

MDP: Multidimensional Vision Model Pruning with Latency Constraint

摘要 Abstract

当前的结构剪枝方法面临两个重要限制:(i) 它们通常将剪枝限制在更细粒度的层面,如通道层面,这使得激进的参数减少变得困难;(ii) 它们重点关注参数和FLOP的减少,现有的延迟感知方法常常依赖于简单且次优的线性模型,这些模型在Transformer中难以很好地泛化,因为在Transformer中多个相互作用的维度会影响延迟。本文通过引入多维剪枝(MDP)解决了这两个限制,MDP是一种新颖的方法,可以在多种剪枝粒度上联合优化,包括通道、查询、键、头、嵌入和块。MDP采用先进的延迟建模技术,准确捕捉所有可剪枝维度上的延迟变化,实现了延迟和准确率之间的最佳平衡。通过将剪枝重新表述为混合整数非线性规划(MINLP),MDP在满足延迟约束的同时高效地识别出所有可剪枝维度上的最优剪枝结构。这一通用框架支持CNN和Transformer。广泛的实验表明,MDP显著优于现有方法,尤其是在高剪枝比率的情况下。在ImageNet上,对于ResNet50剪枝任务,MDP比之前的HALP方法提升了28%的速度,并且Top-1准确率提高了1.4个百分点;与最新的Transformer剪枝方法Isomorphic相比,MDP额外提升了37%的速度,并且Top-1准确率提高了0.7个百分点。

Current structural pruning methods face two significant limitations: (i) they often limit pruning to finer-grained levels like channels, making aggressive parameter reduction challenging, and (ii) they focus heavily on parameter and FLOP reduction, with existing latency-aware methods frequently relying on simplistic, suboptimal linear models that fail to generalize well to transformers, where multiple interacting dimensions impact latency. In this paper, we address both limitations by introducing Multi-Dimensional Pruning (MDP), a novel paradigm that jointly optimizes across a variety of pruning granularities-including channels, query, key, heads, embeddings, and blocks. MDP employs an advanced latency modeling technique to accurately capture latency variations across all prunable dimensions, achieving an optimal balance between latency and accuracy. By reformulating pruning as a Mixed-Integer Nonlinear Program (MINLP), MDP efficiently identifies the optimal pruned structure across all prunable dimensions while respecting latency constraints. This versatile framework supports both CNNs and transformers. Extensive experiments demonstrate that MDP significantly outperforms previous methods, especially at high pruning ratios. On ImageNet, MDP achieves a 28% speed increase with a +1.4 Top-1 accuracy improvement over prior work like HALP for ResNet50 pruning. Against the latest transformer pruning method, Isomorphic, MDP delivers an additional 37% acceleration with a +0.7 Top-1 accuracy improvement.

MDP:具有延迟约束的多维视觉模型剪枝 - arXiv