基于语义线索解耦身份特征的换装行人再识别:DIFFER方法
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
摘要 Abstract
换装行人再识别(CC-ReID)旨在识别不同着装场景下的个体。当前的CC-ReID方法要么集中于利用轮廓、姿态和身体网格等额外模态建模身体形状,这可能导致模型忽略性别、年龄和风格等其他关键生物特征;要么通过附加标签进行监督,而这些标签可能被模型尝试忽略或强调,例如衣物或个人属性。然而,这些标注本质上是离散的,并不能捕捉全面描述。在本文中,我们提出了DIFFER(Disentangle Identity Features From Entangled Representations),一种新颖的对抗学习方法,利用文本描述解耦身份特征。认识到图像特征天然混合了不可分割的信息,DIFFER引入了NBDetach机制,该机制通过利用文本描述的可分性作为监督来进行特征解耦。它将特征空间划分为不同的子空间,并通过梯度反转层有效分离与身份相关的特征与非生物特征。我们在4个不同的基准数据集(LTCC、PRCC、CelebreID-Light和CCVID)上评估了DIFFER,展示了其有效性,并在所有基准测试中提供了最先进的性能。相比基线方法,DIFFER在LTCC上的top-1准确率提高了3.6%,在PRCC上提高了3.4%,在CelebReID-Light上提高了2.5%,在CCVID上提高了1%。我们的代码可以在以下链接找到。
Clothes-changing person re-identification (CC-ReID) aims to recognize individuals under different clothing scenarios. Current CC-ReID approaches either concentrate on modeling body shape using additional modalities including silhouette, pose, and body mesh, potentially causing the model to overlook other critical biometric traits such as gender, age, and style, or they incorporate supervision through additional labels that the model tries to disregard or emphasize, such as clothing or personal attributes. However, these annotations are discrete in nature and do not capture comprehensive descriptions. In this work, we propose DIFFER: Disentangle Identity Features From Entangled Representations, a novel adversarial learning method that leverages textual descriptions to disentangle identity features. Recognizing that image features inherently mix inseparable information, DIFFER introduces NBDetach, a mechanism designed for feature disentanglement by leveraging the separable nature of text descriptions as supervision. It partitions the feature space into distinct subspaces and, through gradient reversal layers, effectively separates identity-related features from non-biometric features. We evaluate DIFFER on 4 different benchmark datasets (LTCC, PRCC, CelebreID-Light, and CCVID) to demonstrate its effectiveness and provide state-of-the-art performance across all the benchmarks. DIFFER consistently outperforms the baseline method, with improvements in top-1 accuracy of 3.6% on LTCC, 3.4% on PRCC, 2.5% on CelebReID-Light, and 1% on CCVID. Our code can be found here.