基于Gaia DR3的南天类星体候选体目录及全天天体目录统一
The CatSouth Quasar Candidate Catalog for the Southern Sky and a Unified All-Sky Catalog Based on Gaia DR3
摘要 Abstract
Gaia DR3提供了超过660万个类星体候选体样本,具有高完整性但纯度较低。先前关于CatNorth类星体候选体目录的工作表明,结合外部多波段数据并应用机器学习方法可以有效净化原始Gaia DR3类星体候选体目录,并改善红移估计。本文利用SkyMapper、CatWISE和VISTA巡天数据,将Gaia DR3类星体候选体选择扩展到南半球。我们在一个统一的高置信度恒星和光谱确认的类星体和星系集合上训练了XGBoost分类器。对于有Gaia BP/RP光谱可用的源,我们使用预训练的卷积神经网络(RegNet)推导出光谱红移。此外,我们基于XGBoost、TabNet和FT-Transformer训练了一个集成的测光红移估计模型,在验证集上的均方根误差(RMSE)为0.2256,归一化中值绝对偏差为0.0187。通过合并CatSouth与之前发布的CatNorth目录,我们构建了包含近190万个源的统一全天天体目录CatGlobe($G<21$),为未来的光谱学和宇宙学研究提供了全面且高纯度的类星体候选体样本。
The Gaia DR3 has provided a large sample of more than 6.6 million quasar candidates with high completeness but low purity. Previous work on the CatNorth quasar candidate catalog has shown that including external multiband data and applying machine-learning methods can efficiently purify the original Gaia DR3 quasar candidate catalog and improve the redshift estimates. In this paper, we extend the Gaia DR3 quasar candidate selection to the southern hemisphere using data from SkyMappper, CatWISE, and VISTA surveys. We train an XGBoost classifier on a unified set of high-confidence stars and spectroscopically confirmed quasars and galaxies. For sources with available Gaia BP/RP spectra, spectroscopic redshifts are derived using a pre-trained convolutional neural network (RegNet). We also train an ensemble photometric redshift estimation model based on XGBoost, TabNet, and FT-Transformer, achieving an RMSE of 0.2256 and a normalized median absolute deviation of 0.0187 on the validation set. By merging CatSouth with the previously published CatNorth catalog, we construct the unified all-sky CatGlobe catalog with nearly 1.9 million sources at $G<21$, providing a comprehensive and high-purity quasar candidate sample for future spectroscopic and cosmological investigations.