泊松点过程与Cox点过程上的分层聚类算法

Hierarchical Clustering Algorithms on Poisson and Cox Point Processes

摘要 Abstract

聚类是一种广泛应用于无监督学习中的技术,用于基于数据集中元素之间的相似性识别组群。本文提出了三种新的分层聚类模型:Clustroid分层最近邻($\mathrm{CHN}^2$)、单链接分层最近邻($\mathrm{SHN}^2$)以及Hausdorff(完全链接)分层最近邻($\mathrm{H}^2\mathrm{N}^2$),这些模型均设计用于具有可数无限个点的数据集。这些算法通过多级聚类构建簇群,并通过连接最近邻的点或簇群实现,但它们在所采用的距离度量上有所不同(分别为clustroid、单链接或Hausdorff)。每种方法首先应用于欧几里得空间上的齐次泊松点过程,其中它定义了一种系统发育森林,该森林是点过程的一个因子,因此是单模态的。对于$\mathrm{CHN}^2$算法,建立了簇群几乎确定有限的结果,并对算法每一级的平均簇群大小给出了界限。典型簇群的平均大小被证明为无穷大。此外,当级数趋于无穷时,考察了这三种算法的极限结构,并推导出一些性质,如极限连通分量的一端性。特别地,在泊松点过程上的$\mathrm{SHN}^2$算法的极限图被证明是最小生成森林的一个子图。$\mathrm{CHN}^2$算法还被扩展到泊松设定之外,应用于某些平稳的Cox点过程。在这些情况下,类似的有限簇群性质也被证明成立。同时,还表明可以通过这种聚类算法高效检测Cox触发的聚集现象。

Clustering is a widely used technique in unsupervised learning to identify groups within a dataset based on the similarities between its elements. This paper introduces three new hierarchical clustering models, Clustroid Hierarchical Nearest Neighbor ($\mathrm{CHN}^2$), Single Linkage Hierarchical Nearest Neighbor ($\mathrm{SHN}^2$), and Hausdorff (Complete Linkage) Hierarchical Nearest Neighbor ($\mathrm{H}^2\mathrm{N}^2$), all designed for datasets with a countably infinite number of points. These algorithms proceed through multiple levels of clustering and construct clusters by connecting nearest-neighbor points or clusters, but differ in the distance metrics they employ (clustroid, single linkage, or Hausdorff, respectively). Each method is first applied to the homogeneous Poisson point process on the Euclidean space, where it defines a phylogenetic forest, which is a factor of the point process and therefore unimodular. The results established for the $\mathrm{CHN}^2$ algorithm include the almost-sure finiteness of the clusters and bounds on the mean cluster size at each level of the algorithm. The mean size of the typical cluster is shown to be infinite. Moreover, the limiting structure of all three algorithms is examined as the number of levels tends to infinity, and properties such as the one-endedness of the limiting connected components are derived. In the specific case of $\mathrm{SHN}^2$ on the Poisson point process, the limiting graph is shown to be a subgraph of the Minimal Spanning Forest. The $\mathrm{CHN}^2$ algorithm is also extended beyond the Poisson setting, to certain stationary Cox point processes. Similar finite-cluster properties are shown to hold in these cases. It is also shown that efficient detection of Cox-triggered aggregation can be achieved through this clustering algorithm.

泊松点过程与Cox点过程上的分层聚类算法 - arXiv