多用途视频取证网络MVFNet：利用多种取证证据

Research

arXiv

MVFNet: Multipurpose Video Forensics Network using Multiple Forms of Forensic Evidence

摘要 Abstract

虽然视频可以通过许多不同方式被篡改，但大多数现有的取证网络只能检测单一类型的篡改（例如深度伪造、修补）。这带来了显著问题，因为用于篡改视频的方法在事先并不为人所知。为了解决这个问题，我们提出了MVFNet——一种能够检测多种类型篡改的多用途视频取证网络，包括修补、深度伪造、拼接和编辑。我们的网络通过提取并联合分析广泛的取证特征模态，捕捉伪造视频中的空间和时间异常来实现这一点。为了可靠地检测和定位各种大小的虚假内容，我们的网络采用了一种新颖的多尺度分层Transformer模块，以识别多个空间尺度上的取证不一致。实验结果表明，我们的网络在可能包含多种不同篡改的一般场景中达到了最先进的性能，并在目标场景中与专门的检测器相当。

While videos can be falsified in many different ways, most existing forensic networks are specialized to detect only a single manipulation type (e.g. deepfake, inpainting). This poses a significant issue as the manipulation used to falsify a video is not known a priori. To address this problem, we propose MVFNet - a multipurpose video forensics network capable of detecting multiple types of manipulations including inpainting, deepfakes, splicing, and editing. Our network does this by extracting and jointly analyzing a broad set of forensic feature modalities that capture both spatial and temporal anomalies in falsified videos. To reliably detect and localize fake content of all shapes and sizes, our network employs a novel Multi-Scale Hierarchical Transformer module to identify forensic inconsistencies across multiple spatial scales. Experimental results show that our network obtains state-of-the-art performance in general scenarios where multiple different manipulations are possible, and rivals specialized detectors in targeted scenarios.