异构层权重合并的模型集成学习

Model Assembly Learning with Heterogeneous Layer Weight Merging

摘要 Abstract

模型融合通过结合多个模型的参数,无需额外数据或训练即可获得通用能力。先前的方法通过排列不变性将参数对齐到相同的损失盆地来实现线性模式连接。本文引入了模型集成学习(Model Assembly Learning,MAL),这是一种新颖的模型融合范式,通过迭代整合开放模型库中多样化的模型参数来增强基础模型的能力。与之前需要相同架构的工作不同,MAL允许异构架构以及跨层选择性参数的融合。具体而言,基础模型可以从多个预训练模型的不同层中吸收参数。我们系统地研究了异构参数融合的条件和基本设置,解决了基础模型和目标模型之间所有可能的层宽度不匹配问题。此外,我们制定了关键定律并提供了有效实施MAL的实际指南。

Model merging acquires general capabilities without extra data or training by combining multiple models' parameters. Previous approaches achieve linear mode connectivity by aligning parameters into the same loss basin using permutation invariance. In this paper, we introduce Model Assembly Learning (MAL), a novel paradigm for model merging that iteratively integrates parameters from diverse models in an open-ended model zoo to enhance the base model's capabilities. Unlike previous works that require identical architectures, MAL allows the merging of heterogeneous architectures and selective parameters across layers. Specifically, the base model can incorporate parameters from different layers of multiple pre-trained models. We systematically investigate the conditions and fundamental settings of heterogeneous parameter merging, addressing all possible mismatches in layer widths between the base and target models. Furthermore, we establish key laws and provide practical guidelines for effectively implementing MAL.

异构层权重合并的模型集成学习 - arXiv