摘要 Abstract
大型语言模型(LLMs)在许多自然语言处理(NLP)任务中表现出色,但常常表现出顺序依赖性:仅重新排列语义相同的标记(例如,多项选择题中的答案选项)就可能导致预测不一致。近期研究提出基于集合的提示(Set-Based Prompting, SBP)作为一种方法,从指定标记子集中移除顺序信息,从而减轻位置偏差。然而,在基础模型上应用SBP会导致输入格式偏离分布,这可能降低分布内性能。我们引入了一种微调策略,将SBP整合到训练过程中,“拉近”这些集合格式提示与模型训练流形的距离。我们展示了通过微调可以将SBP集成到模型中。我们在分布内(MMLU)和分布外(CSQA、ARC Challenge)多项选择任务上的实验表明,SBP微调显著提高了准确性和对答案顺序排列的鲁棒性,同时保持了更广泛的语言建模能力。我们讨论了顺序不变建模的更广泛影响,并概述了构建更公平、一致的LLMs的未来方向。
Large language models (LLMs) demonstrate remarkable performance on many NLP tasks, yet often exhibit order dependence: simply reordering semantically identical tokens (e.g., answer choices in multiple-choice questions) can lead to inconsistent predictions. Recent work proposes Set-Based Prompting (SBP) as a way to remove order information from designated token subsets, thereby mitigating positional biases. However, applying SBP on base models induces an out-of-distribution input format, which can degrade in-distribution performance. We introduce a fine-tuning strategy that integrates SBP into the training process, "pulling" these set-formatted prompts closer to the model's training manifold. We show that SBP can be incorporated into a model via fine-tuning. Our experiments on in-distribution (MMLU) and out-of-distribution (CSQA, ARC Challenge) multiple-choice tasks show that SBP fine-tuning significantly improves accuracy and robustness to answer-order permutations, all while preserving broader language modeling capabilities. We discuss the broader implications of order-invariant modeling and outline future directions for building fairer, more consistent LLMs.