SOLAR：面向推理的大规模架构可扩展优化

Research

arXiv

SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning

Chen Li ,

Yinyi Luo ,

Anudeep Bolimera ,

Uzair Ahmed ,

Shri Kiran Srinivasan ,

Hrishikesh Gokhale ,

Marios Savvides

论文信息在线阅读PDF

摘要 Abstract

大型语言模型在推理方面表现出色，但往往依赖于链式思维提示，这限制了其在需要更复杂拓扑结构的任务中的表现。我们提出了SOLAR（面向推理的大规模架构可扩展优化），一个能够动态优化链式思维（CoT）、树状思维（ToT）和图状思维（GoT）拓扑结构的框架，从而提升准确性和效率。我们的拓扑标注生成（TAG）系统实现了数据集创建、标注以及难度分段的自动化，显著提升了训练后和测试时的表现。此外，我们还提出了基于课程学习的拓扑缩放（Topological-Scaling）方法，该方法能够根据任务自适应地结合训练后和推理缩放。在MATH和GSM8K数据集上，SOLAR取得了显著的性能提升：通过拓扑调优（Topological Tuning）提高了+5%的准确率，通过拓扑奖励（Topological Rewarding）提高了+9%，通过混合缩放（Hybrid Scaling）提高了+10.02%，同时减少了超过5%的响应长度并降低了推理延迟。为进一步提高效率，我们引入了多任务拓扑奖励模型（M-TRM），它能够在单次推断中选择最优的推理拓扑和最终答案，消除了多个单一任务拓扑奖励模型（TRMs）。令人印象深刻的是，M-TRM不仅优于所有单一任务TRMs，还提升了+10%的准确率和+9%的相关度排名。总体而言，SOLAR为可扩展的高精度大型语言模型推理设立了新的基准，并引入了全自动化的动态拓扑竞争机制。

Large Language Models excel in reasoning yet often rely on Chain-of-Thought prompts, limiting performance on tasks demanding more nuanced topological structures. We present SOLAR (Scalable Optimization of Large-scale Architecture for Reasoning), a framework that dynamically optimizes Chain-of-Thought (CoT), Tree-of-Thought (ToT), and Graph-of-Thought (GoT) topologies to boost accuracy and efficiency. Our Topological-Annotation-Generation (TAG) system automates dataset creation, annotation, and difficulty segmentation, leading to stronger post training and test-time performance. We also propose Topological-Scaling, a curriculum-learning-based approach that adaptively combines post training and inference scaling to each task. On MATH and GSM8K, SOLAR delivers notable gains: +5% accuracy with Topological Tuning, +9% with Topological Rewarding, and +10.02% with Hybrid Scaling, while reducing response length by over 5%, lowering inference latency. To further enhance efficiency, we introduce a multi-task Topological Reward Model (M-TRM) that selects both the optimal reasoning topology and final answer in a single pass, eliminating multiple single-task TRMs. Remarkably, M-TRM also surpasses all single-task TRMs, improving accuracy by +10% and rank correlation by +9%. Overall, SOLAR establishes a new benchmark for scalable, high-precision LLM reasoning and introduces a fully automated, dynamic topology competition mechanism.