微调中的顺序无关性

Research

arXiv

微调中的顺序无关性

Order Independence With Finetuning

摘要 Abstract

大型语言模型（LLMs）在许多自然语言处理（NLP）任务中表现出色，但常常表现出顺序依赖性：仅重新排列语义相同的标记（例如，多项选择题中的答案选项）就可能导致预测不一致。近期研究提出基于集合的提示（Set-Based Prompting, SBP）作为一种方法，从指定标记子集中移除顺序信息，从而减轻位置偏差。然而，在基础模型上应用SBP会导致输入格式偏离分布，这可能降低分布内性能。我们引入了一种微调策略，将SBP整合到训练过程中，“拉近”这些集合格式提示与模型训练流形的距离。我们展示了通过微调可以将SBP集成到模型中。我们在分布内（MMLU）和分布外（CSQA、ARC Challenge）多项选择任务上的实验表明，SBP微调显著提高了准确性和对答案顺序排列的鲁棒性，同时保持了更广泛的语言建模能力。我们讨论了顺序不变建模的更广泛影响，并概述了构建更公平、一致的LLMs的未来方向。

Large language models (LLMs) demonstrate remarkable performance on many NLP tasks, yet often exhibit order dependence: simply reordering semantically identical tokens (e.g., answer choices in multiple-choice questions) can lead to inconsistent predictions. Recent work proposes Set-Based Prompting (SBP) as a way to remove order information from designated token subsets, thereby mitigating positional biases. However, applying SBP on base models induces an out-of-distribution input format, which can degrade in-distribution performance. We introduce a fine-tuning strategy that integrates SBP into the training process, "pulling" these set-formatted prompts closer to the model's training manifold. We show that SBP can be incorporated into a model via fine-tuning. Our experiments on in-distribution (MMLU) and out-of-distribution (CSQA, ARC Challenge) multiple-choice tasks show that SBP fine-tuning significantly improves accuracy and robustness to answer-order permutations, all while preserving broader language modeling capabilities. We discuss the broader implications of order-invariant modeling and outline future directions for building fairer, more consistent LLMs.