如 PIE 一样简单：理解剪枝为何会导致语言模型产生分歧

Research

arXiv

As easy as PIE: understanding when pruning causes language models to disagree

摘要 Abstract

语言模型（LM）剪枝通过移除权重、节点或其他架构部分来压缩模型。通常情况下，剪枝侧重于效率提升而牺牲了有效性。然而，当观察单个数据点在剪枝过程中的影响时，发现特定子集的数据点始终承受了大部分的准确性下降，但在报告所有数据点的平均准确性时这一影响被忽略。这些数据点被称为 PIE（Pruning Impact Examples），已在图像处理领域研究过，但在自然语言处理（NLP）中尚未研究。通过对多种 NLP 数据集、剪枝方法以及不同压缩水平的研究，我们发现 PIE 对推理质量有显著影响，且不受类别频率的影响，同时发现 BERT 比 BiLSTM 更容易受到 PIE 的影响。此外，我们还发现 PIE 包含大量对模型泛化到未见过数据的表现具有最大影响的数据点。这意味着在剪枝过程中，尽管所有数据点的总体准确性损失看似较小，但实际上严重损害了最重要的那些数据点。我们将 PIE 对推理造成困难和重大影响的原因归结为它们整体上更长且语义更复杂的文本。这些发现是新颖的，并有助于理解 LM 在剪枝过程中的影响。代码可在 https://github.com/pietrotrope/AsEasyAsPIE 获取。

Language Model (LM) pruning compresses the model by removing weights, nodes, or other parts of its architecture. Typically, pruning focuses on the resulting efficiency gains at the cost of effectiveness. However, when looking at how individual data points are affected by pruning, it turns out that a particular subset of data points always bears most of the brunt (in terms of reduced accuracy) when pruning, but this effect goes unnoticed when reporting the mean accuracy of all data points. These data points are called PIEs and have been studied in image processing, but not in NLP. In a study of various NLP datasets, pruning methods, and levels of compression, we find that PIEs impact inference quality considerably, regardless of class frequency, and that BERT is more prone to this than BiLSTM. We also find that PIEs contain a high amount of data points that have the largest influence on how well the model generalises to unseen data. This means that when pruning, with seemingly moderate loss to accuracy across all data points, we in fact hurt tremendously those data points that matter the most. We trace what makes PIEs both hard and impactful to inference to their overall longer and more semantically complex text. These findings are novel and contribute to understanding how LMs are affected by pruning. The code is available at: https://github.com/pietrotrope/AsEasyAsPIE