SChanger:从语义变化和空间一致性视角进行变化检测

SChanger: Change Detection from a Semantic Change and Spatial Consistency Perspective

摘要 Abstract

变化检测是地球观测应用中的关键任务。近期,深度学习方法展示了强大的性能和广泛的应用。然而,由于精确对齐同一区域的遥感图像是一项劳动密集型工作,导致数据稀缺,从而限制了深度学习算法的性能。为了解决数据稀缺问题,我们开发了一种名为语义变化网络(SCN)的微调策略。首先,我们在单时相监督任务上预训练模型,以获取实例特征提取的先验知识。然后,该模型采用共享权重的孪生架构和扩展的时间融合模块(TFM),以保存这些先验知识,并在变化检测任务上进行微调。所学的识别所有实例的语义被调整为仅关注变化的识别。同时,我们观察到两幅图像之间的变化位置在空间上是相同的,我们将这一概念称为空间一致性。我们通过由大核卷积生成的注意力图引入这种归纳偏置,应用于两个时间点的特征。这增强了多尺度变化的建模,并有助于捕捉变化检测语义中的潜在关系。我们利用这两种策略开发了一个二元变化检测模型。该模型在六个数据集上与最先进的方法进行了验证,超越了所有基准方法,在LEVIR-CD、LEVIR-CD+、S2Looking、CDD、SYSU-CD和WHU-CD数据集上的F1分数分别为92.87%、86.43%、68.95%、97.62%、84.58%和93.20%。

Change detection is a key task in Earth observation applications. Recently, deep learning methods have demonstrated strong performance and widespread application. However, change detection faces data scarcity due to the labor-intensive process of accurately aligning remote sensing images of the same area, which limits the performance of deep learning algorithms. To address the data scarcity issue, we develop a fine-tuning strategy called the Semantic Change Network (SCN). We initially pre-train the model on single-temporal supervised tasks to acquire prior knowledge of instance feature extraction. The model then employs a shared-weight Siamese architecture and extended Temporal Fusion Module (TFM) to preserve this prior knowledge and is fine-tuned on change detection tasks. The learned semantics for identifying all instances is changed to focus on identifying only the changes. Meanwhile, we observe that the locations of changes between the two images are spatially identical, a concept we refer to as spatial consistency. We introduce this inductive bias through an attention map that is generated by large-kernel convolutions and applied to the features from both time points. This enhances the modeling of multi-scale changes and helps capture underlying relationships in change detection semantics. We develop a binary change detection model utilizing these two strategies. The model is validated against state-of-the-art methods on six datasets, surpassing all benchmark methods and achieving F1 scores of 92.87%, 86.43%, 68.95%, 97.62%, 84.58%, and 93.20% on the LEVIR-CD, LEVIR-CD+, S2Looking, CDD, SYSU-CD, and WHU-CD datasets, respectively.