只需司法自描述即可实现对AI生成图像的零样本检测、开集来源归因和聚类

Forensic Self-Descriptions Are All You Need for Zero-Shot Detection, Open-Set Source Attribution, and Clustering of AI-generated Images

摘要 Abstract

高级基于人工智能的工具生成逼真图像的出现,给司法检测和来源归因带来了重大挑战,尤其是随着新生成技术的快速涌现。传统方法由于在训练过程中依赖于已知来源的特定特征,在泛化到未知生成器时往往表现不佳。为了解决这一问题,我们提出了一种新颖的方法,显式地建模司法微观结构——独特于图像创建过程的细微、像素级别的模式。仅使用真实图像以自监督的方式,我们学习一组多样化的预测滤波器,提取残差以捕捉这些微观结构的不同方面。通过跨多个尺度联合建模这些残差,我们得到了一个紧凑的模型,其参数构成了每幅图像独特的司法自描述。这种自描述使我们能够在没有先验知识的情况下,实现对合成图像的零样本检测、图像的开集来源归因以及基于来源的聚类。广泛的实验表明,我们的方法在准确性与适应性上优于竞争技术,推动了合成媒体取证领域的进步。

The emergence of advanced AI-based tools to generate realistic images poses significant challenges for forensic detection and source attribution, especially as new generative techniques appear rapidly. Traditional methods often fail to generalize to unseen generators due to reliance on features specific to known sources during training. To address this problem, we propose a novel approach that explicitly models forensic microstructures - subtle, pixel-level patterns unique to the image creation process. Using only real images in a self-supervised manner, we learn a set of diverse predictive filters to extract residuals that capture different aspects of these microstructures. By jointly modeling these residuals across multiple scales, we obtain a compact model whose parameters constitute a unique forensic self-description for each image. This self-description enables us to perform zero-shot detection of synthetic images, open-set source attribution of images, and clustering based on source without prior knowledge. Extensive experiments demonstrate that our method achieves superior accuracy and adaptability compared to competing techniques, advancing the state of the art in synthetic media forensics.