具有可重构并行时间步计算的高效硬件加速器用于脉冲Transformer

Research

arXiv

Hardware Efficient Accelerator for Spiking Transformer With Reconfigurable Parallel Time Step Computing

摘要 Abstract

本文介绍了首个低功耗硬件加速器，用于脉冲Transformer——一种传统人工神经网络的新兴替代方案。通过修改基础的Spikformer模型，使用IAND而非残差相加，该模型完全利用脉冲计算。硬件采用全并行的时钟批处理数据流以及时间步可重构的神经元架构，解决了脉冲神经网络中多时间步处理的延迟和功耗问题。此方法能够并行处理所有时间步的输出，减少了计算延迟，并消除了膜记忆，从而降低了能耗。该加速器通过向量化处理支持3x3和1x1卷积以及矩阵运算，满足模型需求。在TSMC 28nm工艺下实现后，在500MHz频率下达到3.456 TSOPS（每秒万亿次脉冲操作），功率效率为38.334 TSOPS/W，使用了198.46K逻辑门和139.25KB SRAM。

This paper introduces the first low-power hardware accelerator for Spiking Transformers, an emerging alternative to traditional artificial neural networks. By modifying the base Spikformer model to use IAND instead of residual addition, the model exclusively utilizes spike computation. The hardware employs a fully parallel tick-batching dataflow and a time-step reconfigurable neuron architecture, addressing the delay and power challenges of multi-timestep processing in spiking neural networks. This approach processes outputs from all time steps in parallel, reducing computation delay and eliminating membrane memory, thereby lowering energy consumption. The accelerator supports 3x3 and 1x1 convolutions and matrix operations through vectorized processing, meeting model requirements. Implemented in TSMC's 28nm process, it achieves 3.456 TSOPS (tera spike operations per second) with a power efficiency of 38.334 TSOPS/W at 500MHz, using 198.46K logic gates and 139.25KB of SRAM.