一种优化数据重用的低功耗稀疏深度学习加速器

Research

arXiv

A Low-Power Sparse Deep Learning Accelerator with Optimized Data Reuse

摘要 Abstract

稀疏深度学习显著减少了计算量，但由于非零数据分布的不规则性，其数据流变得复杂且阻碍了数据重用，增加了片上SRAM访问次数，从而提高了芯片的功耗。本文通过最大化数据重用来减少SRAM访问，提出两种方法解决上述问题。首先，我们提出了有效的索引匹配（EIM），它能够高效地搜索并排列压缩数据中的非零操作。其次，我们提出了共享索引数据重用（SIDR），协调处理单元（PE）之间的操作，使其SRAM数据访问规律化，从而实现所有数据的有效重用。与先前的设计SparTen相比，我们的方法使SRAM缓冲区的访问减少了86%。因此，与最先进的方法相比，我们的设计在保持更简单数据流的同时，实现了2.5倍的能效提升。

Sparse deep learning has reduced computation significantly, but its irregular non-zero data distribution complicates the data flow and hinders data reuse, increasing on-chip SRAM access and thus power consumption of the chip. This paper addresses the aforementioned issues by maximizing data reuse to reduce SRAM access by two approaches. First, we propose Effective Index Matching (EIM), which efficiently searches and arranges non-zero operations from compressed data. Second, we propose Shared Index Data Reuse (SIDR) which coordinates the operations between Processing Elements (PEs), regularizing their SRAM data access, thereby enabling all data to be reused efficiently. Our approach reduces the access of the SRAM buffer by 86\% when compared to the previous design, SparTen. As a result, our design achieves a 2.5$\times$ improvement in power efficiency compared to state-of-the-art methods while maintaining a simpler dataflow.