基于潜在空间插值生成逼真、多样且能揭示故障的深度神经网络测试输入

Research

arXiv

Generating Realistic, Diverse, and Fault-Revealing Inputs with Latent Space Interpolation for Testing Deep Neural Networks

Bin Duan ,

Matthew B. Dwyer ,

Guowei Yang

论文信息在线阅读PDF

摘要 Abstract

深度神经网络（DNN）已被广泛应用于包括安全关键系统在内的各个领域，因此需要进行全面的测试以确保其可靠性。尽管已提出了许多用于生成能够揭示故障的对抗样本的DNN模型测试方法，但现有方法通常是在输入空间中扰动样本，然后基于DNN模型反馈对这些样本进行变异。这些方法生成的测试样本往往不现实，且揭示故障的概率较低。为了解决这些局限性，我们提出了一种黑盒DNN测试输入生成方法ARGUS，用于生成逼真、多样且能揭示故障的测试输入。ARGUS首先将样本压缩到连续的潜在空间中，然后通过将这些样本与不同类别的样本进行插值来扰动原始样本。随后，我们采用向量量化器和解码器将对抗样本重构回输入空间。此外，我们在潜在空间和输入空间中都采用了判别器，以确保生成样本的真实性。与最先进的黑盒测试方法和白盒测试方法相比，ARGUS的评估结果显示，ARGUS在生成针对目标数据集的逼真且多样的对抗样本方面表现出色，并且ARGUS成功扰动了所有原始样本，达到了比最佳基线方法高出4倍的错误率。此外，使用这些对抗样本进行模型再训练可以提高模型的分类准确性。

Deep Neural Networks (DNNs) have been widely employed across various domains, including safety-critical systems, necessitating comprehensive testing to ensure their reliability. Although numerous DNN model testing methods have been proposed to generate adversarial samples that are capable of revealing faults, existing methods typically perturb samples in the input space and then mutate these based on feedback from the DNN model. These methods often result in test samples that are not realistic and with low-probability reveal faults. To address these limitations, we propose a black-box DNN test input generation method, ARGUS, to generate realistic, diverse, and fault-revealing test inputs. ARGUS first compresses samples into a continuous latent space and then perturbs the original samples by interpolating these with samples of different classes. Subsequently, we employ a vector quantizer and decoder to reconstruct adversarial samples back into the input space. Additionally, we employ discriminators both in the latent space and in the input space to ensure the realism of the generated samples. Evaluation of ARGUS in comparison with state-of-the-art black-box testing and white-box testing methods, shows that ARGUS excels in generating realistic and diverse adversarial samples relative to the target dataset, and ARGUS successfully perturbs all original samples and achieves up to 4 times higher error rate than the best baseline method. Furthermore, using these adversarial samples for model retraining can improve model classification accuracy.