结构不足以解决问题:利用行为特征进行神经网络权重重建
Structure Is Not Enough: Leveraging Behavior for Neural Network Weight Reconstruction
摘要 Abstract
近年来,神经网络(NNs)的权重在机器学习领域成为一种新的数据模态,其应用范围涵盖了准确率预测、超参数预测、表征学习以及权重生成等多个方面。一种利用神经网络权重的方法是训练自动编码器(AEs),通过对比损失和重建损失来实现。这种方法使这些模型能够广泛应用于各种下游任务,并表现出较强的预测性能和较低的重建误差。然而,尽管重建误差较低,这些自动编码器重建出的神经网络模型性能却较原始模型有所下降,限制了其在模型权重生成方面的实用性。本文指出权重空间自动编码器的一个局限性,即结构损失(使用原始权重与重建权重之间的欧几里得距离)未能捕捉到一些对重建高性能模型至关重要的特征。我们分析了在权重空间中训练自动编码器时加入行为损失的效果,其中通过比较给定常见输入下重构模型与原始模型的输出,验证了结构信号与行为信号之间存在强大的协同效应,从而显著提升了所有评估的下游任务(特别是神经网络权重的重建与生成)的性能。
The weights of neural networks (NNs) have recently gained prominence as a new data modality in machine learning, with applications ranging from accuracy and hyperparameter prediction to representation learning or weight generation. One approach to leverage NN weights involves training autoencoders (AEs), using contrastive and reconstruction losses. This allows such models to be applied to a wide variety of downstream tasks, and they demonstrate strong predictive performance and low reconstruction error. However, despite the low reconstruction error, these AEs reconstruct NN models with deteriorated performance compared to the original ones, limiting their usability with regard to model weight generation. In this paper, we identify a limitation of weight-space AEs, specifically highlighting that a structural loss, that uses the Euclidean distance between original and reconstructed weights, fails to capture some features critical for reconstructing high-performing models. We analyze the addition of a behavioral loss for training AEs in weight space, where we compare the output of the reconstructed model with that of the original one, given some common input. We show a strong synergy between structural and behavioral signals, leading to increased performance in all downstream tasks evaluated, in particular NN weights reconstruction and generation.