基于HC-Stable和Tabu-Stable算法的稳定结构学习

Stable Structure Learning with HC-Stable and Tabu-Stable Algorithms

摘要 Abstract

许多贝叶斯网络结构学习算法存在不稳定性,所学到的图对数据集的任意特征(如变量顺序)非常敏感。PC-Stable试图解决广泛应用的PC算法中的这一问题,促使研究者改用“稳定”版本。然而,这一问题似乎在基于评分的算法中被忽视了。在这项研究中,我们表明一些广泛使用的基于评分的算法,包括混合算法和约束条件算法(如PC-Stable),都存在相同的问题。我们提出了一种新的解决方案,通过确定稳定的节点顺序来消除基于评分的贪婪爬山算法的不稳定性,从而无论变量顺序如何都能得到一致的结果。介绍了两种实现方式:HC-Stable和Tabu-Stable。Tabu-Stable在所有网络中获得了最高的BIC评分,并且在分类网络中达到了最高的准确性。这些结果突显了解决结构学习中不稳定性的重要性,并为未来应用提供了一个稳健实用的方法。这项工作通过引入连续变量,扩展并增强了我们在2024年Probabilistic Graphical Models会议上展示的工作的影响。该实现及其使用说明已免费发布在GitHub上,网址为https://github.com/causal-iq/discovery。

Many Bayesian Network structure learning algorithms are unstable, with the learned graph sensitive to arbitrary dataset artifacts, such as the ordering of columns (i.e., variable order). PC-Stable attempts to address this issue for the widely-used PC algorithm, prompting researchers to use the "stable" version instead. However, this problem seems to have been overlooked for score-based algorithms. In this study, we show that some widely-used score-based algorithms, as well as hybrid and constraint-based algorithms, including PC-Stable, suffer from the same issue. We propose a novel solution for score-based greedy hill-climbing that eliminates instability by determining a stable node order, leading to consistent results regardless of variable ordering. Two implementations, HC-Stable and Tabu-Stable, are introduced. Tabu-Stable achieves the highest BIC scores across all networks, and the highest accuracy for categorical networks. These results highlight the importance of addressing instability in structure learning and provide a robust and practical approach for future applications. This extends the scope and impact of our previous work presented at Probabilistic Graphical Models 2024 by incorporating continuous variables. The implementation, along with usage instructions, is freely available on GitHub at https://github.com/causal-iq/discovery.