走向真实世界的测试时适应:带平衡归一化的三网自训练方法

Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with Balanced Normalization

摘要 Abstract

测试时适应旨在推理阶段通过源域模型适配测试数据,已成功应用于适应未见的图像退化。然而,在更具挑战性的现实场景下,这些尝试可能会失败。现有工作主要考虑了在非独立同分布数据流和持续领域偏移下的现实世界测试时适应。本文首先通过引入全局类别不平衡的测试集,补充了现有的真实世界测试时适应(TTA)协议。我们证明,将所有设定结合起来对现有方法提出了新的挑战。我们认为,最先进的方法失败的主要原因是不加区分地将归一化层适应到类别不平衡的测试数据上。为了解决这一缺陷,我们提出了一种平衡批量归一化层,在推理阶段替换常规批量归一化层。新的归一化层能够在不偏向多数类的情况下进行适应。受自训练(ST)在从无标注数据学习中的成功启发,我们将自训练方法适应到测试时适应任务中。然而,单独使用自训练容易过度拟合,这导致在持续领域偏移下的性能不佳。因此,我们提出通过锚点损失正则化模型更新,以改善持续领域偏移下的自训练效果。最终的测试时适应模型TRIBE基于三网络架构,并结合了平衡批量归一化层。我们在四个代表真实世界测试时适应设置的数据集上评估了TRIBE,结果表明TRIBE在多种评估协议下始终达到了最先进的性能。代码可在https://github.com/Gorilla-Lab-SCUT/TRIBE获取。

Test-Time Adaptation aims to adapt source domain model to testing data at inference stage with success demonstrated in adapting to unseen corruptions. However, these attempts may fail under more challenging real-world scenarios. Existing works mainly consider real-world test-time adaptation under non-i.i.d. data stream and continual domain shift. In this work, we first complement the existing real-world TTA protocol with a globally class imbalanced testing set. We demonstrate that combining all settings together poses new challenges to existing methods. We argue the failure of state-of-the-art methods is first caused by indiscriminately adapting normalization layers to imbalanced testing data. To remedy this shortcoming, we propose a balanced batchnorm layer to swap out the regular batchnorm at inference stage. The new batchnorm layer is capable of adapting without biasing towards majority classes. We are further inspired by the success of self-training (ST) in learning from unlabeled data and adapt ST for test-time adaptation. However, ST alone is prone to over adaption which is responsible for the poor performance under continual domain shift. Hence, we propose to improve self-training under continual domain shift by regularizing model updates with an anchored loss. The final TTA model, termed as TRIBE, is built upon a tri-net architecture with balanced batchnorm layers. We evaluate TRIBE on four datasets representing real-world TTA settings. TRIBE consistently achieves the state-of-the-art performance across multiple evaluation protocols. The code is available at https://github.com/Gorilla-Lab-SCUT/TRIBE.

走向真实世界的测试时适应:带平衡归一化的三网自训练方法 - arXiv