利用有损压缩优化可见度数据的处理与存储

Optimising the Processing and Storage of Visibilities using lossy compression

摘要 Abstract

下一代射电天文仪器通过阵列中增加的台站数量和频率范围的扩展,提供了显著的灵敏度和覆盖范围提升。在处理由此产生的海量数据时,主要面临两大问题:充足的存储空间和输入/输出需求。例如,预计来自SKA望远镜的数据洪流每天将超过60PB,全部需要存储在缓冲文件系统中。压缩数据显然是一个显而易见的解决方案。我们采用了MGARD(一种误差控制压缩器),并将其应用于模拟和真实的可见度数据,在无噪声和噪声主导的情况下进行了测试。由于数据系统温度本身存在隐含的误差水平,因此在压缩过程中使用误差界限提供了一个自然的度量标准。通过测量使用有损压缩数据重建图像的退化情况,我们探讨了这些误差界限与相应压缩比之间的权衡关系,以及对从有损压缩数据产品中得出的科学质量的影响。我们研究了输出图像的全局和局部影响。结果表明,相对误差界限达到10%,可获得约20的压缩比,对连续成像的影响有限,因为增加的噪声低于图像均方根值。对于极其敏感的观测和非常珍贵的数据,建议采用0.1%的误差界限,压缩比约为4,此时噪声影响比图像均方根值低两个数量级。在这种情况下,限制因素是反卷积方法中的不稳定性。我们将结果与另一种压缩工具DYSCO进行了比较,包括对图像的影响和相对灵活性方面。MGARD在相似的误差界限下提供了更好的压缩效果,并具有许多潜在的强大附加功能。

The next-generation radio astronomy instruments are providing a massive increase in sensitivity and coverage, through increased stations in the array and frequency span. Two primary problems encountered when processing the resultant avalanche of data are the need for abundant storage and I/O. An example of this is the data deluge expected from the SKA Telescopes of more than 60PB per day, all to be stored on the buffer filesystem. Compressing the data is an obvious solution. We used MGARD, an error-controlled compressor, and applied it to simulated and real visibility data, in noise-free and noise-dominated regimes. As the data has an implicit error level in the system temperature, using an error bound in compression provides a natural metric for compression. Measuring the degradation of images reconstructed using the lossy compressed data, we explore the trade-off between these error bounds and the corresponding compression ratios, as well as the impact on science quality derived from the lossy compressed data products through a series of experiments. We studied the global and local impacts on the output images. We found relative error bounds of as much as $10\%$, which provide compression ratios of about 20, have a limited impact on the continuum imaging as the increased noise is less than the image RMS. For extremely sensitive observations and for very precious data, we would recommend a $0.1\%$ error bound with compression ratios of about 4. These have noise impacts two orders of magnitude less than the image RMS levels. At these levels, the limits are due to instabilities in the deconvolution methods. We compared the results to the alternative compression tool DYSCO, in both the impacts on the images and in the relative flexibility. MGARD provides better compression for similar error bounds, and has a host of potentially powerful additional features.

利用有损压缩优化可见度数据的处理与存储 - arXiv