S2CFormer:基于Transformer的图像压缩中解码延迟与率失真权衡的再探讨

S2CFormer: Revisiting the RD-Latency Trade-off in Transformer-based Learned Image Compression

摘要 Abstract

基于Transformer的图像学习压缩(LIC)在解码延迟与率失真(R-D)性能之间存在次优权衡问题,同时基于FeedForward Network(FFN)的信道聚合模块的重要作用被严重忽视。我们的研究表明,高效的信道聚合——而非复杂且耗时的空间操作——是实现具有竞争力的LIC模型的关键。基于这一洞见,我们提出了“S2CFormer”范式,这是一种简化空间操作并增强信道操作的一般架构,以克服先前的权衡问题。我们展示了两种S2CFormer实例:S2C-Conv和S2C-Attention。这两种模型均表现出最先进的(SOTA)率失真性能,并显著提高了解码速度。此外,我们引入了S2C-Hybrid,这是一种增强变体,通过最大化不同S2CFormer实例的优势,实现了更好的性能-延迟权衡。该模型在Kodak、Tecnick和CLIC Professional Validation数据集上超越了所有现有方法,为高效高性能的LIC设定了新的基准。代码可在\href{https://github.com/YunuoChen/S2CFormer}{https://github.com/YunuoChen/S2CFormer}获取。

Transformer-based Learned Image Compression (LIC) suffers from a suboptimal trade-off between decoding latency and rate-distortion (R-D) performance. Moreover, the critical role of the FeedForward Network (FFN)-based channel aggregation module has been largely overlooked. Our research reveals that efficient channel aggregation-rather than complex and time-consuming spatial operations-is the key to achieving competitive LIC models. Based on this insight, we initiate the ``S2CFormer'' paradigm, a general architecture that simplifies spatial operations and enhances channel operations to overcome the previous trade-off. We present two instances of the S2CFormer: S2C-Conv, and S2C-Attention. Both models demonstrate state-of-the-art (SOTA) R-D performance and significantly faster decoding speed. Furthermore, we introduce S2C-Hybrid, an enhanced variant that maximizes the strengths of different S2CFormer instances to achieve a better performance-latency trade-off. This model outperforms all the existing methods on the Kodak, Tecnick, and CLIC Professional Validation datasets, setting a new benchmark for efficient and high-performance LIC. The code is at \href{https://github.com/YunuoChen/S2CFormer}{https://github.com/YunuoChen/S2CFormer}.