摘要 Abstract
句子的句法结构可以描述为一棵树,用以表明词之间的句法关系。尽管无监督方法在提取句子句法结构方面取得了显著进展,但猜测边的正确方向仍然是一个挑战。由于句法依存结构中的边是从根节点指向其他节点的,因此猜测边的方向这一挑战可以归结为找到一棵无向树及其根节点。当前无监督方法的表现有限,这表明我们对根节点缺乏从第一性原理出发的正确认识。我们考虑了一组中心性评分,其中一些仅考虑自由树(非空间评分),另一些则考虑顶点的位置(空间评分)。我们检验了“根节点是句法依存结构中的重要或中心节点”这一假设,并确认了该假设。我们发现,通过仅考虑顶点及其邻居位置的新评分方法在猜测根节点时表现最佳。我们从网络科学的角度提供了关于根节点概念的理论和实证基础。
The syntactic structure of a sentence can be described as a tree that indicates the syntactic relationships between words. In spite of significant progress in unsupervised methods that retrieve the syntactic structure of sentences, guessing the right direction of edges is still a challenge. As in a syntactic dependency structure edges are oriented away from the root, the challenge of guessing the right direction can be reduced to finding an undirected tree and the root. The limited performance of current unsupervised methods demonstrates the lack of a proper understanding of what a root vertex is from first principles. We consider an ensemble of centrality scores, some that only take into account the free tree (non-spatial scores) and others that take into account the position of vertices (spatial scores). We test the hypothesis that the root vertex is an important or central vertex of the syntactic dependency structure. We confirm that hypothesis and find that the best performance in guessing the root is achieved by novel scores that only take into account the position of a vertex and that of its neighbours. We provide theoretical and empirical foundations towards a universal notion of rootness from a network science perspective.