分数匹配的新高度:线性、非线性和潜在变量的因果发现

Score matching through the roof: linear, nonlinear, and latent variables causal discovery

摘要 Abstract

从观测数据中进行因果发现具有巨大潜力,但现有方法通常依赖于对潜在因果结构的强假设,往往需要完全可观测所有相关变量。我们通过利用观测变量的分数函数$\nabla \log p(X)$来进行因果发现,并提出以下贡献。首先,我们在可加噪声模型上精细调整了现有的可识别性结果,表明其关于因果机制非线性的假设并非必要。其次,我们建立了即使在存在隐藏变量的情况下,利用分数函数推断因果关系的条件;这一结果具有两面性:我们证明了分数函数可以推断包含隐藏变量的因果图等价类(而以往结果局限于完全可观测的情形),并给出了潜在变量模型中直接原因识别的充分条件。基于这些见解,我们提出了一种灵活的算法,适用于线性、非线性和潜在变量模型的因果发现,并进行了实证验证。

Causal discovery from observational data holds great promise, but existing methods rely on strong assumptions about the underlying causal structure, often requiring full observability of all relevant variables. We tackle these challenges by leveraging the score function $\nabla \log p(X)$ of observed variables for causal discovery and propose the following contributions. First, we fine-tune the existing identifiability results with the score on additive noise models, showing that their assumption of nonlinearity of the causal mechanisms is not necessary. Second, we establish conditions for inferring causal relations from the score even in the presence of hidden variables; this result is two-faced: we demonstrate the score's potential to infer the equivalence class of causal graphs with hidden variables (while previous results are restricted to the fully observable setting), and we provide sufficient conditions for identifying direct causes in latent variable models. Building on these insights, we propose a flexible algorithm suited for causal discovery on linear, nonlinear, and latent variable models, which we empirically validate.

分数匹配的新高度:线性、非线性和潜在变量的因果发现 - arXiv