多尺度金字塔低秩适应的高效模型微调方法:MSPLoRA
MSPLoRA: A Multi-Scale Pyramid Low-Rank Adaptation for Efficient Model Fine-Tuning
摘要 Abstract
参数高效微调(PEFT)已成为在降低计算成本的同时适配大规模预训练模型的重要方法。在PEFT方法中,LoRA通过将权重更新分解为低秩矩阵显著减少了可训练参数。然而,传统LoRA在所有层中采用固定的秩,未能考虑到分层信息的复杂性差异,导致适配效率低下且存在冗余。为了解决这一问题,我们提出了多尺度金字塔低秩适应(MSPLoRA),引入全局共享LoRA、中层共享LoRA以及层特定LoRA,分别用于捕获全局模式、中层特征和细粒度信息。这种分层结构减少了层间冗余,同时保持了强大的适配能力。在多种自然语言处理任务上的实验表明,MSPLoRA实现了更高效的适配并获得了更好的性能,同时大幅减少了可训练参数的数量。此外,基于奇异值分解的额外分析验证了其信息解耦能力,凸显出MSPLoRA作为一种适用于大规模语言模型参数高效微调的可扩展且有效的优化策略。我们的代码可在https://github.com/Oblivioniss/MSPLoRA获取。
Parameter-Efficient Fine-Tuning (PEFT) has become an essential approach for adapting large-scale pre-trained models while reducing computational costs. Among PEFT methods, LoRA significantly reduces trainable parameters by decomposing weight updates into low-rank matrices. However, traditional LoRA applies a fixed rank across all layers, failing to account for the varying complexity of hierarchical information, which leads to inefficient adaptation and redundancy. To address this, we propose MSPLoRA (Multi-Scale Pyramid LoRA), which introduces Global Shared LoRA, Mid-Level Shared LoRA, and Layer-Specific LoRA to capture global patterns, mid-level features, and fine-grained information, respectively. This hierarchical structure reduces inter-layer redundancy while maintaining strong adaptation capability. Experiments on various NLP tasks demonstrate that MSPLoRA achieves more efficient adaptation and better performance while significantly reducing the number of trainable parameters. Furthermore, additional analyses based on Singular Value Decomposition validate its information decoupling ability, highlighting MSPLoRA as a scalable and effective optimization strategy for parameter-efficient fine-tuning in large language models. Our code is available at https://github.com/Oblivioniss/MSPLoRA.