摘要 Abstract
虽然文本到图像的生成模型能够合成多样化且忠实的内容,但多幅创作之间主题的变化限制了其在长内容生成中的应用。现有方法需要耗费大量时间进行调参、为所有主题提供参考,或者访问其他创作。我们引入了对比概念实例化(CoCoIns)框架,有效实现了多独立创作之间一致的主题生成。该框架由一个生成模型和一个映射网络组成,映射网络将输入的潜在代码转换为与特定概念实例相关的伪词。用户可以使用相同的潜在代码生成一致的主题。为了构建这种关联,我们提出了一种对比学习方法,训练网络区分提示词和潜在代码的组合。对单一主体的人脸进行的广泛评估表明,CoCoIns的表现与现有方法相当,同时保持了更高的灵活性。我们还展示了将CoCoIns扩展到多个主体和其他物体类别的潜力。
While text-to-image generative models can synthesize diverse and faithful contents, subject variation across multiple creations limits the application in long content generation. Existing approaches require time-consuming tuning, references for all subjects, or access to other creations. We introduce Contrastive Concept Instantiation (CoCoIns) to effectively synthesize consistent subjects across multiple independent creations. The framework consists of a generative model and a mapping network, which transforms input latent codes into pseudo-words associated with certain instances of concepts. Users can generate consistent subjects with the same latent codes. To construct such associations, we propose a contrastive learning approach that trains the network to differentiate the combination of prompts and latent codes. Extensive evaluations of human faces with a single subject show that CoCoIns performs comparably to existing methods while maintaining higher flexibility. We also demonstrate the potential of extending CoCoIns to multiple subjects and other object categories.