摘要 Abstract
非自回归语言模型能够同时生成所有标记,相较于传统自回归模型在潜在速度上有优势,但面临建模文本数据固有复杂依赖关系的挑战。本文研究了一种条件流匹配方法用于文本生成。我们将标记表示为\(V\)-维单纯形上的独热向量,并利用Kullback-Leibler (KL)散度下的测地线,这些测地线对应于logit空间中的线性插值。我们从理论上证明了最大化条件似然\(P_{\theta}(x_1 \mid x_t, t)\)可得到logit插值下的精确流匹配速度。为了解决基本推理性能不佳的问题,我们提出了一种新颖的经验采样方案,该方案迭代地从条件分布中采样并引入额外噪声,尽管缺乏完整的理论基础,但仍显著提升了结果。此外,我们提出了一个混合推理方法,结合了基本方法与采样方案。此方法在条件和非条件文本生成实验中相较离散流匹配的先前SOTA方法表现更优。
Non-autoregressive language models generate all tokens simultaneously, offering potential speed advantages over traditional autoregressive models, but they face challenges in modeling the complex dependencies inherent in text data. In this work, we investigate a conditional flow matching approach for text generation. We represent tokens as one-hot vectors in a \(V\)-dimensional simplex and utilize geodesics under the Kullback-Leibler (KL) divergence, which correspond to linear interpolation in logit space. We provide a theoretical justification that maximizing the conditional likelihood \(P_{\theta}(x_1 \mid x_t, t)\) yields the exact flow matching velocity under logit interpolation. To address the suboptimal performance of basic inference, we propose a novel empirical sampling scheme that iteratively samples from the conditional distribution and introduces additional noise, significantly improving results despite lacking full theoretical underpinnings. Furthermore, we propose a hybrid inference method that combines the basic approach with the sampling scheme. This method demonstrates superior performance on both conditional and unconditional text generation experiments compared to previous SOTA method for discrete flow matching.