公平充分表征学习

Fair Sufficient Representation Learning

摘要 Abstract

统计建模与机器学习中的主要目标是减少或消除数据或模型本身可能产生的偏差,确保预测和决策不会因种族、性别、年龄或其他受保护特征等敏感属性而受到不公正的影响。本文提出了一种公平充分表征学习(Fair Sufficient Representation Learning, FSRL)方法,平衡了充分性和公平性。充分性确保表征应捕获关于目标变量的所有必要信息,而公平性要求学习到的表征应与敏感属性保持独立。FSRL基于充分表征学习的目标函数和保证公平性的目标函数的凸组合。我们的方法在表征层面管理公平性和充分性,为公平表征学习提供了新的视角。我们利用距离协方差实现该方法,距离协方差对于刻画随机变量之间的独立性非常有效。我们还分析了所学表征的收敛性质。在具有多样结构的健康病例和文本数据集上的实验表明,与现有方法相比,FSRL在公平性和准确性之间实现了更优的权衡。

The main objective of fair statistical modeling and machine learning is to minimize or eliminate biases that may arise from the data or the model itself, ensuring that predictions and decisions are not unjustly influenced by sensitive attributes such as race, gender, age, or other protected characteristics. In this paper, we introduce a Fair Sufficient Representation Learning (FSRL) method that balances sufficiency and fairness. Sufficiency ensures that the representation should capture all necessary information about the target variables, while fairness requires that the learned representation remains independent of sensitive attributes. FSRL is based on a convex combination of an objective function for learning a sufficient representation and an objective function that ensures fairness. Our approach manages fairness and sufficiency at the representation level, offering a novel perspective on fair representation learning. We implement this method using distance covariance, which is effective for characterizing independence between random variables. We establish the convergence properties of the learned representations. Experiments conducted on healthcase and text datasets with diverse structures demonstrate that FSRL achieves a superior trade-off between fairness and accuracy compared to existing approaches.

公平充分表征学习 - arXiv