基于约束的因果发现:分层背景知识与潜变量在单数据集或重叠数据集中的应用

Constraint-based causal discovery with tiered background knowledge and latent variables in single or overlapping datasets

摘要 Abstract

本文研究了在基于约束的因果发现中使用分层背景知识的方法。我们的重点在于放宽因果充分性假设的情形,即允许存在潜在变量,这些潜在变量可能由于无法完全测量相关信息或无法联合测量信息(例如在多个重叠数据集的情况下)。我们首先介绍了分层FCI(tFCI)算法的新见解。在此基础上,我们引入了一种新的扩展算法——结合分层背景知识的集成重叠数据集(IOD)算法,称为“分层IOD”(tIOD)算法。我们证明,在充分利用分层背景知识的情况下,tFCI和tIOD是可靠的,而简单的tIOD和tFCI版本则是可靠且完整的。此外,我们进一步表明,即使在Markov等价类的显式限制之外,tIOD算法通常会比IOD算法更高效且更具信息量。我们还提供了一个关于这种效率和信息量提升条件的正式结果。我们的研究结果辅以一系列示例,说明了分层背景知识的具体作用及其实际效用。

In this paper we consider the use of tiered background knowledge within constraint based causal discovery. Our focus is on settings relaxing causal sufficiency, i.e. allowing for latent variables which may arise because relevant information could not be measured at all, or not jointly, as in the case of multiple overlapping datasets. We first present novel insights into the properties of the 'tiered FCI' (tFCI) algorithm. Building on this, we introduce a new extension of the IOD (integrating overlapping datasets) algorithm incorporating tiered background knowledge, the 'tiered IOD' (tIOD) algorithm. We show that under full usage of the tiered background knowledge tFCI and tIOD are sound, while simple versions of the tIOD and tFCI are sound and complete. We further show that the tIOD algorithm can often be expected to be considerably more efficient and informative than the IOD algorithm even beyond the obvious restriction of the Markov equivalence classes. We provide a formal result on the conditions for this gain in efficiency and informativeness. Our results are accompanied by a series of examples illustrating the exact role and usefulness of tiered background knowledge.