基于生成模型的星系成像：来自双模型框架的见解

Research

arXiv

Galaxy Imaging with Generative Models: Insights from a Two-Models Framework

摘要 Abstract

生成模型近年来在包括星系图像合成在内的多个领域彻底改变了图像生成任务。本研究调查了三种生成模型（基于轻量级GAN、Glow模型以及基于U-Net去噪器的扩散模型）在SDSS数据集（64×64灰度图像的非重叠子集）上的统计学习和一致性。尽管所有模型均能生成视觉上逼真的图像，并很好地保持形态变量分布，但我们的重点在于它们学习和推广潜在数据分布的能力。随着数据集规模的增加，扩散模型表现出从记忆到泛化的转变，证实了先前的研究结果。较小的数据集导致过拟合，而较大的数据集则支持去噪过程理论基础下的新样本生成。对于基于流的模型，我们提出了一个“逆向测试”，利用其双射特性；同样，基于GAN的模型实现了可比的形态一致性，但缺乏双射性。随后，我们引入了“判别器测试”，显示较大数据集下成功的学习，而在较小数据集下则信心不足。在所有模型中，数据集规模低于O(100,000)时学习面临挑战。通过实验，双模型框架实现了稳健评估，突显了这些模型的潜力与局限性。这些发现为生成建模中的统计学习提供了宝贵的见解，其应用显然不仅限于星系图像生成。

Generative models have recently revolutionized image generation tasks across diverse domains, including galaxy image synthesis. This study investigates the statistical learning and consistency of three generative models: light-weight-gan (a GAN-based model), Glow (a Normalizing Flow-based model), and a diffusion model based on a U-Net denoiser, all trained on non-overlapping subsets of the SDSS dataset of 64x64 grayscale images. While all models produce visually realistic images with well-preserved morphological variable distributions, we focus on their ability to learn and generalize the underlying data distribution. The diffusion model shows a transition from memorization to generalization as the dataset size increases, confirming previous findings. Smaller datasets lead to overfitting, while larger datasets enable novel sample generation, supported by the denoising process's theoretical basis. For the flow-based model, we propose an "inversion test" leveraging its bijective nature. Similarly, the GAN-based model achieves comparable morphological consistency but lacks bijectivity. We then introduce a "discriminator test", which shows successful learning for larger datasets but poorer confidence with smaller ones. Across all models, dataset sizes below O(100,000) pose challenges to learning. Along our experiments, the "two-models" framework enables robust evaluations, highlighting both the potential and limitations of these models. These findings provide valuable insights into statistical learning in generative modeling, with applications certainly extending beyond galaxy image generation.