流模型在推理时间尺度上的随机生成与滚动预算强制方法
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
摘要 Abstract
我们提出了一种用于预训练流模型的推理时间尺度方法。近年来,推理时间尺度在大型语言模型(LLMs)和扩散模型中引起了广泛关注,通过利用额外的计算资源,提高了样本质量或更好地使输出与用户偏好对齐。对于扩散模型,由于去噪中间步骤中的随机性,粒子采样允许更高效的扩展。相反,尽管流模型作为一种替代扩散模型的方法越来越受欢迎——在最先进的图像和视频生成模型中提供了更快的生成速度和高质量的输出——但其确定性的生成过程使得无法直接应用扩散模型中使用的高效推理时间尺度方法。为了实现流模型的高效推理时间尺度,我们提出了三个关键思想:1)基于随机微分方程(SDE)的生成,使流模型能够进行粒子采样;2)插值转换,扩大搜索空间并增强样本多样性;3)滚动预算强制(RBF),在时间步长之间自适应分配计算资源以最大化预算利用率。我们的实验表明,基于SDE的生成,特别是基于方差保持(VP)插值的生成,改进了流模型推理时间尺度下粒子采样方法的性能。此外,我们还证明了与VP-SDE结合的RBF实现了最佳性能,优于所有先前的推理时间尺度方法。
We propose an inference-time scaling approach for pretrained flow models. Recently, inference-time scaling has gained significant attention in LLMs and diffusion models, improving sample quality or better aligning outputs with user preferences by leveraging additional computation. For diffusion models, particle sampling has allowed more efficient scaling due to the stochasticity at intermediate denoising steps. On the contrary, while flow models have gained popularity as an alternative to diffusion models--offering faster generation and high-quality outputs in state-of-the-art image and video generative models--efficient inference-time scaling methods used for diffusion models cannot be directly applied due to their deterministic generative process. To enable efficient inference-time scaling for flow models, we propose three key ideas: 1) SDE-based generation, enabling particle sampling in flow models, 2) Interpolant conversion, broadening the search space and enhancing sample diversity, and 3) Rollover Budget Forcing (RBF), an adaptive allocation of computational resources across timesteps to maximize budget utilization. Our experiments show that SDE-based generation, particularly variance-preserving (VP) interpolant-based generation, improves the performance of particle sampling methods for inference-time scaling in flow models. Additionally, we demonstrate that RBF with VP-SDE achieves the best performance, outperforming all previous inference-time scaling approaches.