多群组低秩近似的一致性问题

Research

arXiv

多群组低秩近似的一致性问题

Consistent Multigroup Low-Rank Approximation

摘要 Abstract

考虑多群组数据的一致性低秩近似问题：我们寻求一组由 $k$ 个基向量组成的序列，使得将数据投影到这些基向量张成的子空间后，对所有群组的处理尽可能一致，即最小化各群组间最大误差。此外，我们要求该基向量序列满足自然的一致性属性：在寻找最佳 $k$ 个向量时，前 $d<k$ 个向量是求解找到 $d$ 个基向量问题的最佳解决方案。因此，这种多群组低秩近似方法自然推广了奇异值分解 (\svd)，并且对于单一群组的数据退化为 \svd。我们为此任务设计了一个迭代算法，该算法依次添加基于 min-max 准则的最佳秩-1 投影向量，并将数据投影到该向量的正交补空间。为了寻找最佳秩-1 投影，我们采用了对偶方法或半定规划。我们分析了算法的理论性质，并通过实证表明所提出的方法比现有的多群组（或公平）主成分分析 (PCA) 方法更具优势。

We consider the problem of consistent low-rank approximation for multigroup data: we ask for a sequence of $k$ basis vectors such that projecting the data onto their spanned subspace treats all groups as equally as possible, by minimizing the maximum error among the groups. Additionally, we require that the sequence of basis vectors satisfies the natural consistency property: when looking for the best $k$ vectors, the first $d<k$ vectors are the best possible solution to the problem of finding $d$ basis vectors. Thus, this multigroup low-rank approximation method naturally generalizes \svd and reduces to \svd for data with a single group. We give an iterative algorithm for this task that sequentially adds to the basis the vector that gives the best rank$-1$ projection according to the min-max criterion, and then projects the data onto the orthogonal complement of that vector. For finding the best rank$-1$ projection, we use primal-dual approaches or semidefinite programming. We analyze the theoretical properties of the algorithms and demonstrate empirically that the proposed methods compare favorably to existing methods for multigroup (or fair) PCA.