视觉-语言模型高效通用小样本误分类检测方法
Towards Efficient and General-Purpose Few-Shot Misclassification Detection for Vision-Language Models
摘要 Abstract
分类器在高安全性和动态变化场景中的可靠预测至关重要。然而,现代神经网络往往对误分类预测表现出过度自信,凸显了对置信度估计的需求以检测错误。尽管现有方法在小规模数据集上取得了成果,但它们都需要从头开始训练,且缺乏高效且有效的误分类检测(MisD)方法,阻碍了其在大规模和不断变化的数据集上的实际应用。本文通过利用视觉-语言模型(VLM)结合文本信息,建立了高效通用的误分类检测框架。通过利用VLM的能力,我们构建了FSMisD,这是一种基于少量提示学习的小样本误分类检测框架,避免了从头开始训练,从而提高了调优效率。为了增强误分类检测能力,我们采用自适应伪样本生成和一种新颖的负损失函数,通过推动类别提示远离伪特征来缓解过度自信的问题。我们在多种数据集上进行了全面实验,并验证了跨域迁移的泛化能力。显著且一致的改进表明了我们方法的有效性、高效性和泛化能力。
Reliable prediction by classifiers is crucial for their deployment in high security and dynamically changing situations. However, modern neural networks often exhibit overconfidence for misclassified predictions, highlighting the need for confidence estimation to detect errors. Despite the achievements obtained by existing methods on small-scale datasets, they all require training from scratch and there are no efficient and effective misclassification detection (MisD) methods, hindering practical application towards large-scale and ever-changing datasets. In this paper, we pave the way to exploit vision language model (VLM) leveraging text information to establish an efficient and general-purpose misclassification detection framework. By harnessing the power of VLM, we construct FSMisD, a Few-Shot prompt learning framework for MisD to refrain from training from scratch and therefore improve tuning efficiency. To enhance misclassification detection ability, we use adaptive pseudo sample generation and a novel negative loss to mitigate the issue of overconfidence by pushing category prompts away from pseudo features. We conduct comprehensive experiments with prompt learning methods and validate the generalization ability across various datasets with domain shift. Significant and consistent improvement demonstrates the effectiveness, efficiency and generalizability of our approach.