摘要 Abstract
关于大型语言模型(LLMs)的一个关键问题是,它们在数学推理方面的明显不足是固有的,还是仅仅由于缺乏高质量数学数据的充分暴露所致。为探讨这一问题,我们开发了一种自动生成高质量监督式数学数据集的自动化方法。该方法通过精心变异现有的数学问题,确保新生成的问题具有多样性和有效性。这通过结合大型语言模型的直观非形式化优势、数学求解器的精确符号推理能力以及在高度不规则符号空间中的投影马尔可夫链蒙特卡洛采样实现。实证实验表明,所提出的方法生成的数据质量很高,并且当LLaMA-2和Mistral等大型语言模型重新对齐这些数据后,其性能超越了现有最先进的模型。
A critical question about Large Language Models (LLMs) is whether their apparent deficiency in mathematical reasoning is inherent, or merely a result of insufficient exposure to high-quality mathematical data. To explore this, we developed an automated method for generating high-quality, supervised mathematical datasets. The method carefully mutates existing math problems, ensuring both diversity and validity of the newly generated problems. This is achieved by a neuro-symbolic data generation framework combining the intuitive informalization strengths of LLMs, and the precise symbolic reasoning of math solvers along with projected Markov chain Monte Carlo sampling in the highly-irregular symbolic space. Empirical experiments demonstrate the high quality of data generated by the proposed method, and that the LLMs, specifically LLaMA-2 and Mistral, when realigned with the generated data, surpass their state-of-the-art counterparts.