消除语言模型的位置偏差：一种机制化方法

Research

arXiv

消除语言模型的位置偏差：一种机制化方法

Eliminating Position Bias of Language Models: A Mechanistic Approach

Ziqi Wang ,

Hanlin Zhang ,

Xiner Li ,

Kuan-Hao Huang ,

Chi Han ,

Shuiwang Ji ,

Sham M. Kakade ,

Hao Peng ,

Heng Ji

论文信息在线阅读PDF

摘要 Abstract

位置偏差已被证明是现代语言模型（LMs）的一个普遍问题，这些模型会根据给定上下文中内容的位置优先选择内容。这种偏差常常导致模型出现意外失败，并在各种应用中损害性能、鲁棒性和可靠性。我们的机制分析将位置偏差归因于几乎所有最先进的LMs中使用的两个组件：因果注意力和相对位置编码。基于这些分析，我们提出了一种无需训练的零样本方法来消除位置偏差（例如，在问答任务中不同检索文档的顺序会影响性能）。我们的方法将文档之间的因果注意力改为双向注意力，并利用模型的注意力值来决定文档的相对顺序，而不是依赖输入提示中提供的顺序，从而在文档级别实现位置不变推理（Position-INvariant inferencE，简称PINE）。通过消除位置偏差，模型在下游任务中的表现和可靠性得到了提升，包括作为评判员的语言模型、检索增强型问答、分子生成和数学推理等。值得注意的是，PINE特别适用于将LMs用于评估推理对时：它能持续提供8到10个百分点的性能提升，使Llama-3-70B-Instruct在RewardBench推理集上的表现优于GPT-4-0125-preview和GPT-4o-2024-08-06。

Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpected model failures and hurts performance, robustness, and reliability across various applications. Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings. Based on the analyses, we propose to eliminate position bias (e.g., different retrieved documents' orders in QA affect performance) with a training-free zero-shot approach. Our method changes the causal attention to bidirectional attention between documents and utilizes model attention values to decide the relative orders of documents instead of using the order provided in input prompts, therefore enabling Position-INvariant inferencE (PINE) at the document level. By eliminating position bias, models achieve better performance and reliability in downstream tasks, including LM-as-a-judge, retrieval-augmented QA, molecule generation, and math reasoning. Notably, PINE is especially useful when adapting LMs for evaluating reasoning pairs: it consistently provides 8 to 10 percentage points performance gains, making Llama-3-70B-Instruct perform even better than GPT-4-0125-preview and GPT-4o-2024-08-06 on the RewardBench reasoning set.