VESTA:基于统一处理单元的SNNTransformer加速器
VESTA: A Versatile SNN-Based Transformer Accelerator with Unified PEs for Multiple Computational Layers
摘要 Abstract
脉冲神经网络(SNNs)和Transformer是神经计算中的两种强大范式,分别以其低功耗和捕获特征依赖性的能力著称。然而,Transformer架构通常涉及多种类型的计算层,包括MLP模块和分类头中的线性层、tokenizer中的卷积层以及自注意力机制中的点积计算。这些多样化的操作对硬件加速器设计提出了重大挑战,据我们所知,目前尚无硬件解决方案能够利用来自SNNs的脉冲形式数据为Transformer架构服务。本文介绍了一种名为VESTA的新硬件设计,它将这两种技术协同起来,提出了能够高效执行Transformer结构所需三种关键计算的统一处理单元(PE)。VESTA独特地受益于Spike神经元层的脉冲形式输出,通过将乘法操作从处理两个8位整数简化为处理一个8位整数和一个二进制脉冲,从而简化了计算过程。这种简化使得PE模块可以使用多路复用器,显著提高了计算效率,同时保持了SNNs的低功耗优势。实验结果显示,VESTA的核心面积为\(0.844 mm^2\),工作频率为500MHz,并能够在实时图像分类任务中达到每秒30帧的速度。
Spiking Neural Networks (SNNs) and transformers represent two powerful paradigms in neural computation, known for their low power consumption and ability to capture feature dependencies, respectively. However, transformer architectures typically involve multiple types of computational layers, including linear layers for MLP modules and classification heads, convolution layers for tokenizers, and dot product computations for self-attention mechanisms. These diverse operations pose significant challenges for hardware accelerator design, and to our knowledge, there is not yet a hardware solution that leverages spike-form data from SNNs for transformer architectures. In this paper, we introduce VESTA, a novel hardware design that synergizes these technologies, presenting unified Processing Elements (PEs) capable of efficiently performing all three types of computations crucial to transformer structures. VESTA uniquely benefits from the spike-form outputs of the Spike Neuron Layers \cite{zhou2024spikformer}, simplifying multiplication operations by reducing them from handling two 8-bit integers to handling one 8-bit integer and a binary spike. This reduction enables the use of multiplexers in the PE module, significantly enhancing computational efficiency while maintaining the low-power advantage of SNNs. Experimental results show that the core area of VESTA is \(0.844 mm^2\). It operates at 500MHz and is capable of real-time image classification at 30 fps.