S2CFormer：基于Transformer的图像压缩中解码延迟与率失真权衡的再探讨

Research

arXiv

S2CFormer: Revisiting the RD-Latency Trade-off in Transformer-based Learned Image Compression

Yunuo Chen ,

Qian Li ,

Bing He ,

Donghui Feng ,

Ronghua Wu ,

Qi Wang ,

Li Song ,

Guo Lu ,

Wenjun Zhang

论文信息在线阅读PDF

摘要 Abstract

基于Transformer的图像学习压缩（LIC）在解码延迟与率失真（R-D）性能之间存在次优权衡问题，同时基于FeedForward Network（FFN）的信道聚合模块的重要作用被严重忽视。我们的研究表明，高效的信道聚合——而非复杂且耗时的空间操作——是实现具有竞争力的LIC模型的关键。基于这一洞见，我们提出了“S2CFormer”范式，这是一种简化空间操作并增强信道操作的一般架构，以克服先前的权衡问题。我们展示了两种S2CFormer实例：S2C-Conv和S2C-Attention。这两种模型均表现出最先进的（SOTA）率失真性能，并显著提高了解码速度。此外，我们引入了S2C-Hybrid，这是一种增强变体，通过最大化不同S2CFormer实例的优势，实现了更好的性能-延迟权衡。该模型在Kodak、Tecnick和CLIC Professional Validation数据集上超越了所有现有方法，为高效高性能的LIC设定了新的基准。代码可在\href{https://github.com/YunuoChen/S2CFormer}{https://github.com/YunuoChen/S2CFormer}获取。

Transformer-based Learned Image Compression (LIC) suffers from a suboptimal trade-off between decoding latency and rate-distortion (R-D) performance. Moreover, the critical role of the FeedForward Network (FFN)-based channel aggregation module has been largely overlooked. Our research reveals that efficient channel aggregation-rather than complex and time-consuming spatial operations-is the key to achieving competitive LIC models. Based on this insight, we initiate the ``S2CFormer'' paradigm, a general architecture that simplifies spatial operations and enhances channel operations to overcome the previous trade-off. We present two instances of the S2CFormer: S2C-Conv, and S2C-Attention. Both models demonstrate state-of-the-art (SOTA) R-D performance and significantly faster decoding speed. Furthermore, we introduce S2C-Hybrid, an enhanced variant that maximizes the strengths of different S2CFormer instances to achieve a better performance-latency trade-off. This model outperforms all the existing methods on the Kodak, Tecnick, and CLIC Professional Validation datasets, setting a new benchmark for efficient and high-performance LIC. The code is at \href{https://github.com/YunuoChen/S2CFormer}{https://github.com/YunuoChen/S2CFormer}.