少即是多:多类别无监督异常检测中的Dinomaly方法
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
摘要 Abstract
近期研究强调了无监督异常检测(UAD)的一个实用设置,即为多类别图像构建统一模型。尽管在解决这一具有挑战性的任务方面取得了各种进展,但在多类别设置下的检测性能仍远远落后于最先进的类别分离模型。我们的研究旨在弥合这一显著的性能差距。本文介绍了一种基于最小化重构的异常检测框架Dinomaly,该框架利用纯Transformer架构,无需依赖复杂的结构设计、额外模块或专门技巧。鉴于此强大的框架仅由注意力机制和MLPs组成,我们发现了四个对多类别异常检测至关重要的简单组件:(1)基础Transformer提取通用且判别性强的特征;(2)噪声瓶颈,其中预设的Dropout完成所有噪声注入操作;(3)线性注意力机制,其自然无法聚焦;(4)宽松重构,不强制逐层和逐点重构。我们在流行的异常检测基准数据集MVTec-AD、VisA和Real-IAD上进行了广泛的实验。所提出的Dinomaly在这三个数据集上的图像级别AUROC分别达到了99.6%、98.7%和89.3%,不仅优于最先进的多类别UAD方法,还实现了最先进的类别分离UAD记录。
Recent studies highlighted a practical setting of unsupervised anomaly detection (UAD) that builds a unified model for multi-class images. Despite various advancements addressing this challenging task, the detection performance under the multi-class setting still lags far behind state-of-the-art class-separated models. Our research aims to bridge this substantial performance gap. In this paper, we introduce a minimalistic reconstruction-based anomaly detection framework, namely Dinomaly, which leverages pure Transformer architectures without relying on complex designs, additional modules, or specialized tricks. Given this powerful framework consisted of only Attentions and MLPs, we found four simple components that are essential to multi-class anomaly detection: (1) Foundation Transformers that extracts universal and discriminative features, (2) Noisy Bottleneck where pre-existing Dropouts do all the noise injection tricks, (3) Linear Attention that naturally cannot focus, and (4) Loose Reconstruction that does not force layer-to-layer and point-by-point reconstruction. Extensive experiments are conducted across popular anomaly detection benchmarks including MVTec-AD, VisA, and Real-IAD. Our proposed Dinomaly achieves impressive image-level AUROC of 99.6%, 98.7%, and 89.3% on the three datasets respectively, which is not only superior to state-of-the-art multi-class UAD methods, but also achieves the most advanced class-separated UAD records.