验证代理:语言模型推理中的统一验证代理

VerifiAgent: a Unified Verification Agent in Language Model Reasoning

摘要 Abstract

大型语言模型在推理方面表现出显著的能力,但常常会产生不可靠或错误的响应。现有的验证方法通常是针对特定模型或领域限制的,需要大量的计算资源,并且缺乏在多样化推理任务中的可扩展性。为了解决这些局限性,我们提出了验证代理(VerifiAgent),这是一种统一的验证代理,集成了两个级别的验证:元验证,评估模型响应的完整性和一致性;基于工具的自适应验证,其中验证代理根据推理类型(包括数学、逻辑或常识推理)自主选择合适的验证工具。这种自适应方法确保了在不同验证场景中的效率和鲁棒性。实验结果表明,验证代理在所有推理任务中都优于基线验证方法(例如演绎验证器、后向验证器)。此外,它还可以通过利用验证结果的反馈进一步提高推理准确性。验证代理还可以有效地应用于推理扩展,在数学推理领域中,相比现有的过程奖励模型,它能以更少的生成样本和成本获得更好的结果。代码可在https://github.com/Jiuzhouh/VerifiAgent获取。

Large language models demonstrate remarkable reasoning capabilities but often produce unreliable or incorrect responses. Existing verification methods are typically model-specific or domain-restricted, requiring significant computational resources and lacking scalability across diverse reasoning tasks. To address these limitations, we propose VerifiAgent, a unified verification agent that integrates two levels of verification: meta-verification, which assesses completeness and consistency in model responses, and tool-based adaptive verification, where VerifiAgent autonomously selects appropriate verification tools based on the reasoning type, including mathematical, logical, or commonsense reasoning. This adaptive approach ensures both efficiency and robustness across different verification scenarios. Experimental results show that VerifiAgent outperforms baseline verification methods (e.g., deductive verifier, backward verifier) among all reasoning tasks. Additionally, it can further enhance reasoning accuracy by leveraging feedback from verification results. VerifiAgent can also be effectively applied to inference scaling, achieving better results with fewer generated samples and costs compared to existing process reward models in the mathematical reasoning domain. Code is available at https://github.com/Jiuzhouh/VerifiAgent