摘要 Abstract
大型语言模型在推理方面表现出色,但往往依赖于链式思维提示,这限制了其在需要更复杂拓扑结构的任务中的表现。我们提出了SOLAR(面向推理的大规模架构可扩展优化),一个能够动态优化链式思维(CoT)、树状思维(ToT)和图状思维(GoT)拓扑结构的框架,从而提升准确性和效率。我们的拓扑标注生成(TAG)系统实现了数据集创建、标注以及难度分段的自动化,显著提升了训练后和测试时的表现。此外,我们还提出了基于课程学习的拓扑缩放(Topological-Scaling)方法,该方法能够根据任务自适应地结合训练后和推理缩放。在MATH和GSM8K数据集上,SOLAR取得了显著的性能提升:通过拓扑调优(Topological Tuning)提高了+5%的准确率,通过拓扑奖励(Topological Rewarding)提高了+9%,通过混合缩放(Hybrid Scaling)提高了+10.02%,同时减少了超过5%的响应长度并降低了推理延迟。为进一步提高效率,我们引入了多任务拓扑奖励模型(M-TRM),它能够在单次推断中选择最优的推理拓扑和最终答案,消除了多个单一任务拓扑奖励模型(TRMs)。令人印象深刻的是,M-TRM不仅优于所有单一任务TRMs,还提升了+10%的准确率和+9%的相关度排名。总体而言,SOLAR为可扩展的高精度大型语言模型推理设立了新的基准,并引入了全自动化的动态拓扑竞争机制。
Large Language Models excel in reasoning yet often rely on Chain-of-Thought prompts, limiting performance on tasks demanding more nuanced topological structures. We present SOLAR (Scalable Optimization of Large-scale Architecture for Reasoning), a framework that dynamically optimizes Chain-of-Thought (CoT), Tree-of-Thought (ToT), and Graph-of-Thought (GoT) topologies to boost accuracy and efficiency. Our Topological-Annotation-Generation (TAG) system automates dataset creation, annotation, and difficulty segmentation, leading to stronger post training and test-time performance. We also propose Topological-Scaling, a curriculum-learning-based approach that adaptively combines post training and inference scaling to each task. On MATH and GSM8K, SOLAR delivers notable gains: +5% accuracy with Topological Tuning, +9% with Topological Rewarding, and +10.02% with Hybrid Scaling, while reducing response length by over 5%, lowering inference latency. To further enhance efficiency, we introduce a multi-task Topological Reward Model (M-TRM) that selects both the optimal reasoning topology and final answer in a single pass, eliminating multiple single-task TRMs. Remarkably, M-TRM also surpasses all single-task TRMs, improving accuracy by +10% and rank correlation by +9%. Overall, SOLAR establishes a new benchmark for scalable, high-precision LLM reasoning and introduces a fully automated, dynamic topology competition mechanism.