通过协调流式体验使大型语言模型成为更好的推理者

Research

arXiv

Making Large Language Models Better Reasoners with Orchestrated Streaming Experiences

Xiangyang Liu ,

Junliang He ,

Xipeng Qiu

论文信息在线阅读PDF

摘要 Abstract

大型语言模型（LLMs）可以在零样本或少样本设置下通过生成中间思维来进行复杂的推理。然而，零样本提示始终会遇到性能低下问题，而少样本提示的优越表现依赖于手工制作的演示。本文提出RoSE（基于协调流式体验的推理），这是一种用于解决推理任务的一般框架，能够在无需复杂外部努力的情况下自我提升。为了实现RoSE，我们描述了一种架构，该架构扩展了LLM，将所有已回答的问题及其思维存储在流式体验池中，然后从池中协调有助于回答新问题的问题。为建立问题感知的协调机制，RoSE首先计算池中每个问题与新测试问题之间的相似度。由于每个已解答问题的解决方案并不总是正确的，RoSE将根据其与新问题的相似度对问题进行排序，并将其均匀地划分为多个桶。最后，它从每个桶中提取一个问题，从而使这些提取的问题更加多样化。为了让这些提取的问题尽可能帮助RoSE回答新问题，我们为每个问题引入了不确定性与复杂性的两个其他属性。RoSE将优先选择每个桶中不确定性低且复杂性高的问题。我们在各种推理任务、LLMs和CoT方法中评估了RoSE的通用性。

Large language models (LLMs) can perform complex reasoning by generating intermediate thoughts under zero-shot or few-shot settings. However, zero-shot prompting always encounters low performance, and the superior performance of few-shot prompting hinges on the manual-crafted demonstrations. In this paper, we present RoSE (Reasoning with Orchestrated Streaming Experiences), a general framework for solving reasoning tasks that can self-improve without complex external efforts. To enable RoSE, we describe an architecture that extends an LLM to store all answered questions and their thoughts in a streaming experience pool then orchestrates helpful questions from the pool to assist in answering new questions. To set up a question-aware orchestration mechanism, RoSE first calculates the similarity of each question in the pool with a new test question. Since the solution to each answered question is not always correct, RoSE will sort the questions according to their similarity with the new question, and then uniformly divide them into multiple buckets. It finally extracts one question from each bucket to make these extracted questions more diverse. To make these extracted questions help RoSE answer new questions as much as possible, we introduce two other attributes of uncertainty and complexity for each question. RoSE will preferentially select the questions with low uncertainty and high complexity from each bucket. We evaluate the versatility of RoSE in various reasoning tasks, LLMs, and CoT methods.