从单模态到多模态人脸深度伪造检测的演进:进展与挑战

Evolving from Single-modal to Multi-modal Facial Deepfake Detection: Progress and Challenges

摘要 Abstract

随着包括视频、音频和文本在内的合成媒体变得越来越难以与真实内容区分,错误信息传播、身份欺诈和社会操控的风险也随之增加。本文回顾了深度伪造检测从早期的单模态方法向集成音频-视觉和文本-视觉线索的复杂多模态方法的演变过程。我们提出了检测技术的结构化分类,并分析了基于GAN到扩散模型驱动的深度伪造转变,这些新方法因其更高的逼真度和对检测的鲁棒性而带来了新的挑战。不同于以往主要关注单模态检测或早期深度伪造技术的综述,本文提供了迄今为止最全面的研究,涵盖了多模态深度伪造检测的最新进展、泛化挑战、主动防御机制以及专门设计用于支持新解释性和推理任务的新兴数据集。此外,我们还探讨了视觉-语言模型(VLMs)和多模态大型语言模型(MLLMs)在增强检测鲁棒性方面的作用,以应对日益复杂的深度伪造攻击。通过系统地分类现有方法并识别新兴的研究方向,本文为未来对抗人工智能生成的人脸伪造技术的进步奠定了基础。相关论文的完整列表可在\href{https://github.com/qiqitao77/Comprehensive-Advances-in-Deepfake-Detection-Spanning-Diverse-Modalities}{https://github.com/qiqitao77/Awesome-Comprehensive-Deepfake-Detection}获取。

As synthetic media, including video, audio, and text, become increasingly indistinguishable from real content, the risks of misinformation, identity fraud, and social manipulation escalate. This survey traces the evolution of deepfake detection from early single-modal methods to sophisticated multi-modal approaches that integrate audio-visual and text-visual cues. We present a structured taxonomy of detection techniques and analyze the transition from GAN-based to diffusion model-driven deepfakes, which introduce new challenges due to their heightened realism and robustness against detection. Unlike prior surveys that primarily focus on single-modal detection or earlier deepfake techniques, this work provides the most comprehensive study to date, encompassing the latest advancements in multi-modal deepfake detection, generalization challenges, proactive defense mechanisms, and emerging datasets specifically designed to support new interpretability and reasoning tasks. We further explore the role of Vision-Language Models (VLMs) and Multimodal Large Language Models (MLLMs) in strengthening detection robustness against increasingly sophisticated deepfake attacks. By systematically categorizing existing methods and identifying emerging research directions, this survey serves as a foundation for future advancements in combating AI-generated facial forgeries. A curated list of all related papers can be found at \href{https://github.com/qiqitao77/Comprehensive-Advances-in-Deepfake-Detection-Spanning-Diverse-Modalities}{https://github.com/qiqitao77/Awesome-Comprehensive-Deepfake-Detection}.