摘要 Abstract
来自科学模拟和仪器(超级计算机、加速器、望远镜)的数据量不断增加,往往超出网络、存储和分析能力。科学界应对这一挑战的方法是科学数据缩减。缩减可以采取多种形式,如触发、采样、滤波、量化和降维。本报告聚焦于一种特定的技术:有损压缩。有损压缩保留了所有的数据点,利用相关性和受控的精度降低来实现压缩。对于感兴趣的量值,质量约束对于保存科学发现至关重要。用户需求还包括压缩比和速度。尽管关于有损压缩技术的论文已经发表了很多,并且社区共享了参考数据集,但缺乏详细的应用需求规范,这些规范可以指导有损压缩的研究人员和开发者。本报告通过报告涵盖多个领域的九个科学应用的需求和约束(气候、燃烧、宇宙学、聚变、光源、分子动力学、量子电路模拟、地震学和系统日志),填补了这一空白。报告还详细介绍了关键的有损压缩技术(SZ、ZFP、MGARD、LC、SPERR、DCTZ、TEZip、LibPressio),讨论了它们的历史、原理、误差控制、硬件支持、特性和影响。通过展示应用需求和压缩技术,本报告旨在激发新的研究以填补现有空白。
Increasing data volumes from scientific simulations and instruments (supercomputers, accelerators, telescopes) often exceed network, storage, and analysis capabilities. The scientific community's response to this challenge is scientific data reduction. Reduction can take many forms, such as triggering, sampling, filtering, quantization, and dimensionality reduction. This report focuses on a specific technique: lossy compression. Lossy compression retains all data points, leveraging correlations and controlled reduced accuracy. Quality constraints, especially for quantities of interest, are crucial for preserving scientific discoveries. User requirements also include compression ratio and speed. While many papers have been published on lossy compression techniques and reference datasets are shared by the community, there is a lack of detailed specifications of application needs that can guide lossy compression researchers and developers. This report fills this gap by reporting on the requirements and constraints of nine scientific applications covering a large spectrum of domains (climate, combustion, cosmology, fusion, light sources, molecular dynamics, quantum circuit simulation, seismology, and system logs). The report also details key lossy compression technologies (SZ, ZFP, MGARD, LC, SPERR, DCTZ, TEZip, LibPressio), discussing their history, principles, error control, hardware support, features, and impact. By presenting both application needs and compression technologies, the report aims to inspire new research to fill existing gaps.