摘要 Abstract
本文探讨了图神经网络(GNNs)在生成式人工智能(GenAI)系统数据准备中的重要作用,并特别关注解决和减轻偏差问题。我们对三种不同的偏差缓解方法进行了对比分析:数据稀疏化、特征修改以及合成数据增强。通过使用德国信贷数据集进行实验分析,我们采用多种公平性指标(如统计均等性、机会均等性和假阳性率)评估这些方法。研究表明,虽然所有方法相对于原始数据集都能改善公平性指标,但分层抽样和利用GraphSAGE进行的合成数据增强在保持模型性能的同时,在平衡人口统计学表征方面尤为有效。这些结果为开发更公平的AI系统提供了实用见解,同时确保了模型性能。
This paper examines the critical role of Graph Neural Networks (GNNs) in data preparation for generative artificial intelligence (GenAI) systems, with a particular focus on addressing and mitigating biases. We present a comparative analysis of three distinct methods for bias mitigation: data sparsification, feature modification, and synthetic data augmentation. Through experimental analysis using the german credit dataset, we evaluate these approaches using multiple fairness metrics, including statistical parity, equality of opportunity, and false positive rates. Our research demonstrates that while all methods improve fairness metrics compared to the original dataset, stratified sampling and synthetic data augmentation using GraphSAGE prove particularly effective in balancing demographic representation while maintaining model performance. The results provide practical insights for developing more equitable AI systems while maintaining model performance.