激活函数的危害：通过受控信道恢复神经网络权重

Research

arXiv

Activation Functions Considered Harmful: Recovering Neural Network Weights through Controlled Channels

Mark Ryan ,

摘要 Abstract

随着高风险机器学习应用越来越多地迁移到不受信任的终端用户或云环境中，保护预训练模型参数对于保护知识产权和用户隐私变得至关重要。硬件隔离区域（如Intel SGX）的最新进展为保护机器学习应用程序的内部状态提供了可能，即使操作系统受到损害也是如此。然而，我们证明了特权软件对手可以利用常见神经网络激活函数中的输入相关内存访问模式，从SGX enclave中提取秘密权重和偏置。我们的攻击利用了SGX-Step框架，获得了无噪声、指令粒度的页面访问跟踪。在对使用Tensorflow Microlite库的11输入回归网络的案例研究中，我们展示了完全恢复第一层的所有权重和偏置，以及在特定条件下部分恢复深层网络参数的能力。我们的新型攻击技术仅需每输入每权重20次查询即可以平均绝对误差小于1%的精度恢复所有第一层的权重和偏置，优于先前的模型窃取攻击。此外，更广泛的生态系统分析揭示了流行机器学习框架中广泛存在具有输入相关内存访问模式的激活函数（直接或通过底层数学库）。我们的发现突显了在SGX enclave中部署机密模型的局限性，并强调了对机器学习实现进行严格的侧信道验证的必要性，类似于应用于安全加密库的审查工作。

With high-stakes machine learning applications increasingly moving to untrusted end-user or cloud environments, safeguarding pre-trained model parameters becomes essential for protecting intellectual property and user privacy. Recent advancements in hardware-isolated enclaves, notably Intel SGX, hold the promise to secure the internal state of machine learning applications even against compromised operating systems. However, we show that privileged software adversaries can exploit input-dependent memory access patterns in common neural network activation functions to extract secret weights and biases from an SGX enclave. Our attack leverages the SGX-Step framework to obtain a noise-free, instruction-granular page-access trace. In a case study of an 11-input regression network using the Tensorflow Microlite library, we demonstrate complete recovery of all first-layer weights and biases, as well as partial recovery of parameters from deeper layers under specific conditions. Our novel attack technique requires only 20 queries per input per weight to obtain all first-layer weights and biases with an average absolute error of less than 1%, improving over prior model stealing attacks. Additionally, a broader ecosystem analysis reveals the widespread use of activation functions with input-dependent memory access patterns in popular machine learning frameworks (either directly or via underlying math libraries). Our findings highlight the limitations of deploying confidential models in SGX enclaves and emphasise the need for stricter side-channel validation of machine learning implementations, akin to the vetting efforts applied to secure cryptographic libraries.