摘要 Abstract
大型语言模型(LLMs)的快速发展凸显了参数高效微调方法的需求,其中低秩适应(LoRA)成为一种流行解决方案。尽管LoRA减少了可训练参数的数量,但在基础模型之上部署多个任务或用户特定的LoRA模块仍带来了显著的存储挑战。为了解决这一问题,我们通过理论推导引入了一种新颖的低秩适应方法——LoRA-XS(极小参数量低秩适应),该方法在大幅减少可训练参数的同时,展示了优越或具有竞争力的性能。LoRA-XS通过在冻结的低秩矩阵之间插入一个小型的可训练r×r权重矩阵实现这一点,这些低秩矩阵由原始权重矩阵的奇异值分解(SVD)构造。这种轻量级矩阵使微调所需的存储需求大幅降低,使得部署数百万个个性化模型成为可能,并最小化内存开销。例如,与LoRA相比,LoRA-XS在7B模型中的可训练参数减少了100倍以上。我们的评估结果表明,LoRA-XS在GLUE、GSM8K、MATH以及八个常识推理数据集等多个基准测试中表现优于或与LoRA及其他最新方法(如VeRA)相当,同时具有更高的参数效率。此外,我们还对奇异向量在Transformer权重中的重要性进行了广泛的消融研究,揭示了驱动LoRA-XS增强效率的潜在机制。这些发现表明,LoRA-XS不仅是一种存储高效的替代方案,而且是大规模扩展和个性化LLMs的强大工具。
The rapid expansion of large language models (LLMs) has underscored the need for parameter-efficient fine-tuning methods, with LoRA (Low-Rank Adaptation) emerging as a popular solution. Although LoRA reduces the number of trainable parameters, serving multiple (task or user-specific) LoRA modules on top of a base model still creates significant storage challenges. To address this, using theoretical derivation, we introduce LoRA-XS (Low-Rank Adaptation with eXtremely Small number of parameters), a novel low-rank adaptation method that considerably reduces the trainable parameters while showing superior or competitive performance. LoRA-XS achieves this by inserting a small, trainable r x r weight matrix between frozen low-rank matrices, which are constructed by Singular Value Decomposition (SVD) of the original weight matrix. This lightweight matrix enables fine-tuning with drastically reduced storage requirements, making it feasible to deploy millions of personalized models while minimizing memory overhead. For instance, LoRA-XS achieves a remarkable reduction of trainable parameters by over 100x in 7B models compared to LoRA. Our evaluations across various benchmarks (including GLUE, GSM8K, MATH, and eight commonsense reasoning datasets) demonstrate that LoRA-XS performs competitively or better than LoRA and other recent methods like VeRA while being significantly more parameter efficient. We also provide an extensive ablation study on the importance of singular vectors in transformer weights, shedding light on the underlying mechanisms driving LoRA-XS's enhanced efficiency. These findings suggest that LoRA-XS is not only a storage-efficient alternative, but also a powerful tool for scaling and personalizing LLMs at unprecedented scales.