基于相对密度的双聚类方法RelDenClu:用于识别非线性特征关系

RelDenClu: A Relative Density based Biclustering Method for identifying non-linear feature relations

摘要 Abstract

现有的基于特征关系的双聚类算法往往依赖于单调性或线性的假设。尽管有一些算法通过采用基于密度的方法克服了这一问题,但它们倾向于遗漏许多双聚类,因为它们使用全局标准来识别密集区域。所提出的RelDenClu方法利用每对特征的边缘密度和联合密度的局部变化,找到构成它们之间关系基础的观测值子集。然后找出由共同观测值集合连接的一组特征,从而形成一个双聚类。为展示该方法的有效性,已在十五种模拟数据集上进行了实验。此外,还应用于六个现实数据集。对于其中三个现实数据集,该方法用于无监督学习,而对于另外三个现实数据集,则用作有监督学习的辅助工具。对于所有数据集,将所提出的方法与七种最先进的算法的性能进行了比较,结果显示所提出算法表现更优。通过在COVID-19数据集上的应用,进一步证明了该算法的有效性,用于识别可能影响COVID-19传播的一些特征(遗传学、人口统计学及其他)。

The existing biclustering algorithms for finding feature relation based biclusters often depend on assumptions like monotonicity or linearity. Though a few algorithms overcome this problem by using density-based methods, they tend to miss out many biclusters because they use global criteria for identifying dense regions. The proposed method, RelDenClu uses the local variations in marginal and joint densities for each pair of features to find the subset of observations, which forms the bases of the relation between them. It then finds the set of features connected by a common set of observations, resulting in a bicluster. To show the effectiveness of the proposed methodology, experimentation has been carried out on fifteen types of simulated datasets. Further, it has been applied to six real-life datasets. For three of these real-life datasets, the proposed method is used for unsupervised learning, while for other three real-life datasets it is used as an aid to supervised learning. For all the datasets the performance of the proposed method is compared with that of seven different state-of-the-art algorithms and the proposed algorithm is seen to produce better results. The efficacy of proposed algorithm is also seen by its use on COVID-19 dataset for identifying some features (genetic, demographics and others) that are likely to affect the spread of COVID-19.

基于相对密度的双聚类方法RelDenClu:用于识别非线性特征关系 - arXiv