PortLLM:基于无训练模型补丁的演化大型语言模型个性化方法

PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches

摘要 Abstract

随着大型语言模型(LLMs)在人工智能领域的影响力日益增强,针对特定任务优化预训练模型变得比LLM时代之前更为流行。然而,像ChatGPT这样的预训练LLMs会定期更新(即频繁调整模型参数),这使得资源有限的下游用户难以及时对最新版本的LLMs进行微调以满足其领域应用需求。尽管由于LoRA等高效微调技术的创新,微调成本有所降低,但并非所有下游用户都具备足够的计算资源来进行频繁的个性化。此外,在敏感领域如医疗保健中,获取微调数据集可能受到时间限制,因此保留早期微调轮次中编码的知识对于未来的适应至关重要。本文提出PortLLM,这是一种无需训练的框架,能够(i) 创建一个轻量级的初始模型更新补丁以捕获领域特定知识,(ii) 允许后续无缝插入,从而以极低成本实现演化的LLM的持续个性化。我们的广泛实验涵盖了七个代表性数据集,从较简单的问答任务(BoolQ、SST2)到更复杂的推理任务(WinoGrande、GSM8K),以及包括Mistral-7B、Llama2、Llama3.1和Gemma2在内的多种模型,验证了我们设计的模型补丁的可移植性,并展示了所提出的框架的有效性。例如,PortLLM实现了与LoRA微调相当的性能,同时减少了高达12.2倍的GPU内存使用。最后,我们提供了理论依据来解释模型更新补丁的可移植性,为LLMs个性化这一理论维度提供了新的见解。

As large language models (LLMs) increasingly shape the AI landscape, fine-tuning pretrained models has become more popular than in the pre-LLM era for achieving optimal performance in domain-specific tasks. However, pretrained LLMs such as ChatGPT are periodically evolved, i.e., model parameters are frequently updated), making it challenging for downstream users with limited resources to keep up with fine-tuning the newest LLMs for their domain application. Even though fine-tuning costs have nowadays been reduced thanks to the innovations of parameter-efficient fine-tuning such as LoRA, not all downstream users have adequate computing for frequent personalization. Moreover, access to fine-tuning datasets, particularly in sensitive domains such as healthcare, could be time-restrictive, making it crucial to retain the knowledge encoded in earlier fine-tuned rounds for future adaptation. In this paper, we present PortLLM, a training-free framework that (i) creates an initial lightweight model update patch to capture domain-specific knowledge, and (ii) allows a subsequent seamless plugging for the continual personalization of evolved LLM at minimal cost. Our extensive experiments cover seven representative datasets, from easier question-answering tasks {BoolQ, SST2} to harder reasoning tasks {WinoGrande, GSM8K}, and models including {Mistral-7B, Llama2, Llama3.1, and Gemma2}, validating the portability of our designed model patches and showcasing the effectiveness of our proposed framework. For instance, PortLLM achieves comparable performance to LoRA fine-tuning with reductions of up to 12.2x in GPU memory usage. Finally, we provide theoretical justifications to understand the portability of our model update patches, which offers new insights into the theoretical dimension of LLMs' personalization.