知识蒸馏与混合量化下的学习图像压缩轻量级嵌入式FPGA部署

Research

arXiv

Lightweight Embedded FPGA Deployment of Learned Image Compression with Knowledge Distillation and Hybrid Quantization

Alaa Mazouz ,

摘要 Abstract

可学习图像压缩（LIC）在率失真（RD）效率方面已显示出优于标准化视频编解码器的潜力，促使人们研究硬件友好的实现方法。大多数现有的LIC硬件实现优先考虑延迟而非RD效率，并通过广泛探索硬件设计空间来实现。我们提出了一种新的设计范式，将针对特定硬件平台调整设计的负担转向模型维度调整，同时不牺牲RD效率。首先，我们设计了一个框架，从参考教师模型蒸馏出一个更精简的学生LIC模型：通过调整单一模型超参数，我们可以满足不同硬件平台的约束，而无需复杂的硬件设计探索。其次，我们提出了GDN激活函数的一种硬件友好的实现方式，即使在参数量化后也能保持RD效率。第三，我们设计了一种流水线化的FPGA配置，通过利用并行处理和优化资源分配充分利用可用的FPGA资源。实验结果表明，我们的方法在最先进的LIC模型上表现出色，不仅超过了所有现有的FPGA实现，而且性能接近原始模型。

Learnable Image Compression (LIC) has shown the potential to outperform standardized video codecs in RD efficiency, prompting the research for hardware-friendly implementations. Most existing LIC hardware implementations prioritize latency to RD-efficiency and through an extensive exploration of the hardware design space. We present a novel design paradigm where the burden of tuning the design for a specific hardware platform is shifted towards model dimensioning and without compromising on RD-efficiency. First, we design a framework for distilling a leaner student LIC model from a reference teacher: by tuning a single model hyperparameters, we can meet the constraints of different hardware platforms without a complex hardware design exploration. Second, we propose a hardware-friendly implementation of the Generalized Divisive Normalization - GDN activation that preserves RD efficiency even post parameter quantization. Third, we design a pipelined FPGA configuration which takes full advantage of available FPGA resources by leveraging parallel processing and optimizing resource allocation. Our experiments with a state of the art LIC model show that we outperform all existing FPGA implementations while performing very close to the original model.