针对异构数据的大语言模型联邦微调资源高效方法
Resource-Efficient Federated Fine-Tuning Large Language Models for Heterogeneous Data
摘要 Abstract
通过联邦学习(即FedLLM)微调大型语言模型(LLMs)已被提出,以在保护隐私的前提下为各种下游应用适配LLMs。为了减少资源受限设备上的微调成本,提出了FedLoRA方法,通过将低秩适应(LoRA)集成到FedLLM中,仅微调模型参数的小部分子集。然而,除了资源限制外,另一个关键挑战——数据异质性,严重阻碍了FedLoRA在实际应用中的实施。受先前基于分组的联邦学习范式的启发,我们提出了一个分层FedLoRA框架,称为HierFedLoRA,以解决这些挑战。具体而言,HierFedLoRA将所有设备划分为多个近-独立同分布(near-IID)组,并为每个组调整组内聚合频率,以消除非-IID数据的负面影响。同时,为了降低计算和通信成本,HierFedLoRA为每个组动态分配多样化且合适的微调深度(即从输出开始连续微调的层数)。HierFedLoRA探索联合优化聚合频率和深度,以更好地提升FedLoRA的性能。在由80个商用设备组成的物理平台上进行了广泛的实验。结果显示,与强基准相比,HierFedLoRA将最终模型准确性提高了1.6%至4.2%,并将微调过程加速至少2.1倍。
Fine-tuning large language models (LLMs) via federated learning, i.e., FedLLM, has been proposed to adapt LLMs for various downstream applications in a privacy-preserving way. To reduce the fine-tuning costs on resource-constrained devices, FedLoRA is proposed to fine-tune only a small subset of model parameters by integrating low-rank adaptation (LoRA) into FedLLM. However, apart from resource constraints, there is still another critical challenge, i.e., data heterogeneity, severely hindering the implementation of FedLoRA in practical applications. Herein, inspired by the previous group-based federated learning paradigm, we propose a hierarchical FedLoRA framework, termed HierFedLoRA, to address these challenges. Specifically, HierFedLoRA partitions all devices into multiple near-IID groups and adjusts the intra-group aggregation frequency for each group to eliminate the negative effects of non-IID data. Meanwhile, to reduce the computation and communication cost, HierFedLoRA dynamically assigns diverse and suitable fine-tuning depth (i.e., the number of continuous fine-tuning layers from the output) for each group. HierFedLoRA explores jointly optimizing aggregation frequency and depth upon their coupled relationship to better enhance the performance of FedLoRA. Extensive experiments are conducted on a physical platform with 80 commercial devices. The results show that HierFedLoRA improves the final model accuracy by 1.6% to 4.2%, speeding up the fine-tuning process by at least 2.1$\times$, compared to the strong baselines.