顶-k Shapley值识别的反向采样方法

Antithetic Sampling for Top-k Shapley Identification

摘要 Abstract

加性特征解释主要依赖于合作博弈论中的概念,例如将特征视为合作玩家的Shapley值。由于公理上的唯一性,Shapley值在可解释人工智能领域内外都广受欢迎。然而,其计算复杂度严重限制了实际应用。大多数研究集中在对所有特征Shapley值的均匀近似,对于不重要的特征而言,这种做法无谓地消耗了样本资源。相比之下,识别最重要的k个特征已经足够有洞察力,并且可以利用多臂老虎机领域的算法机会。我们提出了一个名为“可比边际贡献采样”(CMCS)的方法,用于解决顶-k识别问题,该方法利用了一种新的采样方案,考虑到了相关观察的影响。我们通过实验展示了我们的方法相对于竞争基准的有效性。我们的实证结果表明,近似所有特征的问题的估计质量并不一定适用于顶-k识别,反之亦然。

Additive feature explanations rely primarily on game-theoretic notions such as the Shapley value by viewing features as cooperating players. The Shapley value's popularity in and outside of explainable AI stems from its axiomatic uniqueness. However, its computational complexity severely limits practicability. Most works investigate the uniform approximation of all features' Shapley values, needlessly consuming samples for insignificant features. In contrast, identifying the $k$ most important features can already be sufficiently insightful and yields the potential to leverage algorithmic opportunities connected to the field of multi-armed bandits. We propose Comparable Marginal Contributions Sampling (CMCS), a method for the top-$k$ identification problem utilizing a new sampling scheme taking advantage of correlated observations. We conduct experiments to showcase the efficacy of our method in compared to competitive baselines. Our empirical findings reveal that estimation quality for the approximate-all problem does not necessarily transfer to top-$k$ identification and vice versa.

顶-k Shapley值识别的反向采样方法 - arXiv