OODFace：人脸识别在常见失真和外观变化下的鲁棒性评估基准

Research

arXiv

OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations

Caixin Kang ,

Yubo Chen ,

Shouwei Ruan ,

Shiji Zhao ,

Ruochen Zhang ,

Jiayi Wang ,

Shan Fu ,

Xingxing Wei

论文信息在线阅读PDF

摘要 Abstract

随着深度学习的兴起，人脸识别技术得到了广泛的研究并迅速发展。尽管人脸识别被认为是一项成熟的技术，但我们发现现有的开源模型和商业算法在某些复杂的分布外（Out-of-Distribution, OOD）场景下缺乏鲁棒性，这引发了对其系统可靠性的一些担忧。本文引入了OODFace，从常见失真和外观变化两个角度探讨了人脸识别模型面临的OOD挑战。我们针对人脸识别设计了涵盖9大类别的30种OOD场景。通过在公共数据集上模拟这些挑战，我们建立了三个鲁棒性基准：LFW-C/V、CFP-FP-C/V和YTF-C/V。随后，我们在19个人脸识别模型和3个商业API上进行了广泛的实验，并通过物理实验进一步测试了面部遮挡对模型鲁棒性的影响。接着，我们从防御策略和视觉-语言模型（Vision-Language Models, VLMs）两个角度探索潜在解决方案。基于实验结果，我们得出了一些关键见解，强调了人脸识别系统对OOD数据的脆弱性，并提出了可能的解决办法。此外，我们提供了一个统一的工具包，包含所有类型的失真和变化，可以轻松扩展到其他数据集。我们希望我们的基准和研究结果能为未来提高人脸识别模型的鲁棒性提供指导。

With the rise of deep learning, facial recognition technology has seen extensive research and rapid development. Although facial recognition is considered a mature technology, we find that existing open-source models and commercial algorithms lack robustness in certain complex Out-of-Distribution (OOD) scenarios, raising concerns about the reliability of these systems. In this paper, we introduce OODFace, which explores the OOD challenges faced by facial recognition models from two perspectives: common corruptions and appearance variations. We systematically design 30 OOD scenarios across 9 major categories tailored for facial recognition. By simulating these challenges on public datasets, we establish three robustness benchmarks: LFW-C/V, CFP-FP-C/V, and YTF-C/V. We then conduct extensive experiments on 19 facial recognition models and 3 commercial APIs, along with extended physical experiments on face masks to assess their robustness. Next, we explore potential solutions from two perspectives: defense strategies and Vision-Language Models (VLMs). Based on the results, we draw several key insights, highlighting the vulnerability of facial recognition systems to OOD data and suggesting possible solutions. Additionally, we offer a unified toolkit that includes all corruption and variation types, easily extendable to other datasets. We hope that our benchmarks and findings can provide guidance for future improvements in facial recognition model robustness.