多模态变化检测统一框架M$^2$CD：结合专家混合与自蒸馏

Research

arXiv

M$^2$CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation

Ziyuan Liu ,

Jiawei Zhang ,

Wenyu Wang ,

Yuantao Gu

论文信息在线阅读PDF

摘要 Abstract

大多数现有变化检测（CD）方法集中于处理不同时段捕获的光学图像，深度学习（DL）在该领域取得了显著成果。然而，在灾害响应等极端场景下，合成孔径雷达（SAR）因其主动成像能力更适合提供灾后数据，这对变化检测方法提出了新的挑战。现有的权重量化孪生网络难以有效学习光学图像与SAR图像之间的跨模态数据分布。为解决这一问题，我们提出了一种统一的多模态变化检测框架M$^2$CD。我们在主干网络中引入专家混合（MoE）模块，以显式处理多样化的模态数据，从而提升模型学习多模态数据分布的能力。此外，我们创新性地设计了光学到SAR引导路径（O2SP），并在训练过程中实施自蒸馏，以减少不同模态特征空间的差异，进一步减轻模型的学习负担。我们基于CNN和Transformer主干网络设计了多个M$^2$CD变体。大量实验验证了所提框架的有效性，其中基于MiT-b1的M$^2$CD版本在光学-SAR变化检测任务中超越了所有当前最先进的（SOTA）方法。

Most existing change detection (CD) methods focus on optical images captured at different times, and deep learning (DL) has achieved remarkable success in this domain. However, in extreme scenarios such as disaster response, synthetic aperture radar (SAR), with its active imaging capability, is more suitable for providing post-event data. This introduces new challenges for CD methods, as existing weight-sharing Siamese networks struggle to effectively learn the cross-modal data distribution between optical and SAR images. To address this challenge, we propose a unified MultiModal CD framework, M$^2$CD. We integrate Mixture of Experts (MoE) modules into the backbone to explicitly handle diverse modalities, thereby enhancing the model's ability to learn multimodal data distributions. Additionally, we innovatively propose an Optical-to-SAR guided path (O2SP) and implement self-distillation during training to reduce the feature space discrepancy between different modalities, further alleviating the model's learning burden. We design multiple variants of M$^2$CD based on both CNN and Transformer backbones. Extensive experiments validate the effectiveness of the proposed framework, with the MiT-b1 version of M$^2$CD outperforming all state-of-the-art (SOTA) methods in optical-SAR CD tasks.