掩码与监督的强强联合：Masked Sub-branch方法

Research

arXiv

Masking meets Supervision: A Strong Learning Alliance

Sangdoo Yun ,

摘要 Abstract

随机掩码输入的预训练已成为自监督训练的一种新趋势，然而监督学习在采用掩码增强时仍面临挑战，主要原因是训练不稳定。本文提出了一种新的方法，称为Masked Sub-branch（MaskSub）。MaskSub由主分支和子分支组成，后者是前者的部分结构。主分支遵循常规的训练方法，而子分支则接受密集的掩码增强。通过类似于自蒸馏损失的松弛损失函数，MaskSub解决了训练不稳定性的问题。分析表明，MaskSub提升了性能，并且训练损失收敛速度比标准训练更快，这表明该方法稳定了训练过程。我们进一步验证了MaskSub在多种训练场景和模型中的表现，包括DeiT-III训练、MAE微调、CLIP微调、BERT训练以及分层架构（ResNet和Swin Transformer）。结果表明，MaskSub在所有情况下均取得了显著的性能提升。MaskSub为在各种训练方案下引入额外正则化提供了一个实用且有效的方法。代码见https://github.com/naver-ai/augsub。

Pre-training with random masked inputs has emerged as a novel trend in self-supervised training. However, supervised learning still faces a challenge in adopting masking augmentations, primarily due to unstable training. In this paper, we propose a novel way to involve masking augmentations dubbed Masked Sub-branch (MaskSub). MaskSub consists of the main-branch and sub-branch, the latter being a part of the former. The main-branch undergoes conventional training recipes, while the sub-branch merits intensive masking augmentations, during training. MaskSub tackles the challenge by mitigating adverse effects through a relaxed loss function similar to a self-distillation loss. Our analysis shows that MaskSub improves performance, with the training loss converging faster than in standard training, which suggests our method stabilizes the training process. We further validate MaskSub across diverse training scenarios and models, including DeiT-III training, MAE finetuning, CLIP finetuning, BERT training, and hierarchical architectures (ResNet and Swin Transformer). Our results show that MaskSub consistently achieves impressive performance gains across all the cases. MaskSub provides a practical and effective solution for introducing additional regularization under various training recipes. Code available at https://github.com/naver-ai/augsub