对抗样本汤:免费提升可迁移性和隐蔽性

Adversarial Example Soups: Improving Transferability and Stealthiness for Free

摘要 Abstract

可迁移对抗样本由于无需了解目标模型内部知识即可误导其运行,从而带来了实际的安全风险。传统上,为最大化对抗样本的可迁移性,仅保留优化过程中获得的所有对抗样本中的最优样本。本文首次重新审视这一惯例,并证明那些被丢弃的次优对抗样本可以被重新利用以增强可迁移性。具体而言,我们提出了“对抗样本汤”(AES),包括在超参数调整时通过AES-tune对丢弃的对抗样本进行平均处理,以及通过AES-rand进行稳定性测试。此外,我们的AES受到“模型汤”的启发,后者通过平均多个微调后的模型权重来提高准确性而不增加推理时间。大量实验验证了AES的整体有效性,使10种最先进的迁移攻击及其组合对10种不同的(防御性)目标模型的攻击成功率提高了多达13%。我们还展示了AES可以推广到其他类型的可能性,例如直接平均多个野外对抗样本,其成功率相当。AES的一个有前景的副产品是提升了对抗样本的隐蔽性,因为扰动方差自然减小。

Transferable adversarial examples cause practical security risks since they can mislead a target model without knowing its internal knowledge. A conventional recipe for maximizing transferability is to keep only the optimal adversarial example from all those obtained in the optimization pipeline. In this paper, for the first time, we revisit this convention and demonstrate that those discarded, sub-optimal adversarial examples can be reused to boost transferability. Specifically, we propose ``Adversarial Example Soups'' (AES), with AES-tune for averaging discarded adversarial examples in hyperparameter tuning and AES-rand for stability testing. In addition, our AES is inspired by ``model soups'', which averages weights of multiple fine-tuned models for improved accuracy without increasing inference time. Extensive experiments validate the global effectiveness of our AES, boosting 10 state-of-the-art transfer attacks and their combinations by up to 13\% against 10 diverse (defensive) target models. We also show the possibility of generalizing AES to other types, \textit{e.g.}, directly averaging multiple in-the-wild adversarial examples that yield comparable success. A promising byproduct of AES is the improved stealthiness of adversarial examples since the perturbation variances are naturally reduced.