元ControlNet:通过元学习增强任务适应性

Meta ControlNet: Enhancing Task Adaptation via Meta Learning

摘要 Abstract

基于扩散的图像合成最近引起了广泛关注。特别是使用基于图像提示的ControlNet在边缘检测等图像任务中表现出强大的能力,并能很好地生成与这些提示对齐的图像。然而,普通的ControlNet通常需要大约5000步的训练才能为单一任务实现理想的控制。最近的上下文学习方法提高了其适应性,但主要针对基于边缘的任务,并依赖配对示例。因此,为了充分发挥ControlNet的潜力,仍需解决两个重要问题:(i)某些任务的零样本控制;(ii)非边缘任务的更快适应。本文介绍了一种新的元ControlNet方法,采用任务无关的元学习技术并设计了新的层冻结方案。元ControlNet将获得控制能力的学习步数从5000显著减少到1000。此外,元ControlNet在边缘任务中无需微调即可直接实现零样本适应性,并在更复杂的非边缘任务(如人体姿态估计)中仅需100步微调即可实现控制,性能优于现有所有方法。代码可在https://github.com/JunjieYang97/Meta-ControlNet获取。

Diffusion-based image synthesis has attracted extensive attention recently. In particular, ControlNet that uses image-based prompts exhibits powerful capability in image tasks such as canny edge detection and generates images well aligned with these prompts. However, vanilla ControlNet generally requires extensive training of around 5000 steps to achieve a desirable control for a single task. Recent context-learning approaches have improved its adaptability, but mainly for edge-based tasks, and rely on paired examples. Thus, two important open issues are yet to be addressed to reach the full potential of ControlNet: (i) zero-shot control for certain tasks and (ii) faster adaptation for non-edge-based tasks. In this paper, we introduce a novel Meta ControlNet method, which adopts the task-agnostic meta learning technique and features a new layer freezing design. Meta ControlNet significantly reduces learning steps to attain control ability from 5000 to 1000. Further, Meta ControlNet exhibits direct zero-shot adaptability in edge-based tasks without any finetuning, and achieves control within only 100 finetuning steps in more complex non-edge tasks such as Human Pose, outperforming all existing methods. The codes is available in https://github.com/JunjieYang97/Meta-ControlNet.