知识蒸馏与混合量化下的学习图像压缩轻量级嵌入式FPGA部署
Lightweight Embedded FPGA Deployment of Learned Image Compression with Knowledge Distillation and Hybrid Quantization
摘要 Abstract
可学习图像压缩(LIC)在率失真(RD)效率方面已显示出优于标准化视频编解码器的潜力,促使人们研究硬件友好的实现方法。大多数现有的LIC硬件实现优先考虑延迟而非RD效率,并通过广泛探索硬件设计空间来实现。我们提出了一种新的设计范式,将针对特定硬件平台调整设计的负担转向模型维度调整,同时不牺牲RD效率。首先,我们设计了一个框架,从参考教师模型蒸馏出一个更精简的学生LIC模型:通过调整单一模型超参数,我们可以满足不同硬件平台的约束,而无需复杂的硬件设计探索。其次,我们提出了GDN激活函数的一种硬件友好的实现方式,即使在参数量化后也能保持RD效率。第三,我们设计了一种流水线化的FPGA配置,通过利用并行处理和优化资源分配充分利用可用的FPGA资源。实验结果表明,我们的方法在最先进的LIC模型上表现出色,不仅超过了所有现有的FPGA实现,而且性能接近原始模型。
Learnable Image Compression (LIC) has shown the potential to outperform standardized video codecs in RD efficiency, prompting the research for hardware-friendly implementations. Most existing LIC hardware implementations prioritize latency to RD-efficiency and through an extensive exploration of the hardware design space. We present a novel design paradigm where the burden of tuning the design for a specific hardware platform is shifted towards model dimensioning and without compromising on RD-efficiency. First, we design a framework for distilling a leaner student LIC model from a reference teacher: by tuning a single model hyperparameters, we can meet the constraints of different hardware platforms without a complex hardware design exploration. Second, we propose a hardware-friendly implementation of the Generalized Divisive Normalization - GDN activation that preserves RD efficiency even post parameter quantization. Third, we design a pipelined FPGA configuration which takes full advantage of available FPGA resources by leveraging parallel processing and optimizing resource allocation. Our experiments with a state of the art LIC model show that we outperform all existing FPGA implementations while performing very close to the original model.