多个标记图中所有连通最大尺寸公共子图的寻找方法
On Finding All Connected Maximum-Sized Common Subgraphs in Multiple Labeled Graphs
摘要 Abstract
我们提出了一种精确算法,用于计算多个图中具有最多顶点的所有公共子图。我们的方法进一步扩展到处理连通的最大公共子图(MCS),即在多个图中识别基于顶点或边的最大公共子图,其中边或顶点可能附加标签以考虑可能的原子类型或键类型,这是分子图中常用的经典标记方法。我们的方法利用模积图和修改后的Bron-Kerbosch算法枚举极大团,并确保保留所有中间解。一种剪枝启发式方法有效减小模积图的规模,提高了计算可行性。此外,我们引入了一种基于图核相似性度量的图排序策略,以优化搜索过程。我们的方法对生物信息学和化学信息学尤为重要,在这些领域中识别分子图中的保守结构基序至关重要。在分子数据集上的实证结果表明,我们的方法具有可扩展性和高效性。
We present an exact algorithm for computing all common subgraphs with the maximum number of vertices across multiple graphs. Our approach is further extended to handle the connected Maximum Common Subgraph (MCS), identifying the largest common subgraph in terms of either vertices or edges across multiple graphs, where edges or vertices may additionally be labeled to account for possible atom types or bond types, a classical labeling used in molecular graphs. Our approach leverages modular product graphs and a modified Bron-Kerbosch algorithm to enumerate maximal cliques, ensuring all intermediate solutions are retained. A pruning heuristic efficiently reduces the modular product size, improving computational feasibility. Additionally, we introduce a graph ordering strategy based on graph-kernel similarity measures to optimize the search process. Our method is particularly relevant for bioinformatics and cheminformatics, where identifying conserved structural motifs in molecular graphs is crucial. Empirical results on molecular datasets demonstrate that our approach is scalable and fast.