从易到难:构建差分隐私图像合成的捷径

From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis

摘要 Abstract

差分隐私(DP)图像合成旨在从敏感数据集中生成合成图像,减轻组织在共享和利用合成图像时的隐私泄露担忧。尽管先前的方法在训练敏感图像上的扩散模型方面取得了显著进展,特别是在使用差分隐私随机梯度下降(DP-SGD)方面,但其性能仍不尽如人意。本文受课程学习的启发,提出了一种两阶段的差分隐私图像合成框架,使扩散模型从简单到复杂地学习生成差分隐私合成图像。与现有直接使用DP-SGD训练扩散模型的方法不同,我们提出了一个简单的初始阶段,使扩散模型能够学习敏感图像的简单特征。为促进这一简单阶段,我们建议使用“中心图像”,即敏感数据集随机样本的简单聚合。直观来看,虽然这些中心图像不显示细节,但它们展示了所有图像的有用特征,并且仅带来最小的隐私成本,从而有助于早期模型训练。实验结果表明,在四个调查数据集的平均情况下,我们合成图像的保真度和实用性指标比最先进的方法分别提高了33.1%和2.1%。

Differentially private (DP) image synthesis aims to generate synthetic images from a sensitive dataset, alleviating the privacy leakage concerns of organizations sharing and utilizing synthetic images. Although previous methods have significantly progressed, especially in training diffusion models on sensitive images with DP Stochastic Gradient Descent (DP-SGD), they still suffer from unsatisfactory performance. In this work, inspired by curriculum learning, we propose a two-stage DP image synthesis framework, where diffusion models learn to generate DP synthetic images from easy to hard. Unlike existing methods that directly use DP-SGD to train diffusion models, we propose an easy stage in the beginning, where diffusion models learn simple features of the sensitive images. To facilitate this easy stage, we propose to use `central images', simply aggregations of random samples of the sensitive dataset. Intuitively, although those central images do not show details, they demonstrate useful characteristics of all images and only incur minimal privacy costs, thus helping early-phase model training. We conduct experiments to present that on the average of four investigated image datasets, the fidelity and utility metrics of our synthetic images are 33.1% and 2.1% better than the state-of-the-art method.