关于Word RAM模型中$O(n \sqrt{\log n})$复杂度的硬度层次研究

On the Hardness Hierarchy for the $O(n \sqrt{\log n})$ Complexity in the Word RAM

摘要 Abstract

在本文中,我们研究了当前具有最先进的Word RAM算法的核心问题的相对难度,这些算法对于由$\Theta(n)$个机器字(即$\Theta(n\log n)$位)描述的实例运行时间为$O(n\sqrt{\log n})$。这一复杂度类别是Chan和P\u{a}tra\c{s}cu [SODA 2010]识别出的六个硬度级别之一,涵盖了来自多个领域的多样化问题:计数逆序对、字符串处理问题(如后缀数组构造、LZ77因子分解、最长公共子串、批量最长前缀因子查询、批量逆后缀数组查询)、以及计算几何任务(如正交范围计数、正交段相交)。我们的主要贡献有两点:我们建立了上述字符串问题与经典任务Dictionary Matching之间的新联系,该任务可以通过Aho-Corasick自动机解决。我们将Dictionary Matching限制为具有$O(n)$个长度为$m=O(\log n)$的二进制模式的实例,并证明,除非这些实例可以在$o(n\sqrt{\log n})$时间内解决,否则上述字符串问题也无法更快地解决。通过进一步的归约,我们将这种难度扩展到计数逆序对(这是几何算法中的一个基本组成部分)以及正交范围计数和正交段相交。这依赖于String Nesting,这是一个新的问题,它等价于Dictionary Matching,并且可以通过三个步骤归约为计数逆序对。我们的结果揭示了一个单一的问题,其有两个等价形式,构成了几乎所有目前占据$O(n\sqrt{\log n})$硬度级别的主要问题的难点所在。这些结果极大地缩小了改进接近线性问题复杂性的进一步努力方向。作为我们框架的一个辅助结果,我们还证明了在若干中心字符串问题中,字母表可以被高效地减少到二进制。

In this work, we study the relative hardness of fundamental problems with state-of-the-art word RAM algorithms that take $O(n\sqrt{\log n})$ time for instances described in $\Theta(n)$ machine words ($\Theta(n\log n)$ bits). This complexity class, one of six hardness levels identified by Chan and P\u{a}tra\c{s}cu [SODA 2010], includes diverse problems from several domains: Counting Inversions, string processing problems (BWT Construction, LZ77 Factorization, Longest Common Substring, Batched Longest Previous Factor Queries, Batched Inverse Suffix Array Queries), and computational geometry tasks (Orthogonal Range Counting, Orthogonal Segment Intersection). We offer two main contributions: We establish new links between the above string problems and Dictionary Matching, a classic task solvable using the Aho-Corasick automaton. We restrict Dictionary Matching to instances with $O(n)$ binary patterns of length $m = O(\log n)$ each, and we prove that, unless these instances can be solved in $o(n\sqrt{\log n})$ time, the aforementioned string problems cannot be solved faster either. Via further reductions, we extend this hardness to Counting Inversions (a fundamental component in geometric algorithms) and thus to Orthogonal Range Counting and Orthogonal Segment Intersection. This hinges on String Nesting, a new problem which is equivalent to Dictionary Matching and can be reduced to Counting Inversions in three steps. Together, our results unveil a single problem, with two equivalent formulations, that underlies the hardness of nearly all major problems currently occupying the $O(n\sqrt{\log n})$ level of hardness. These results drastically funnel further efforts to improve the complexity of near-linear problems. As an auxiliary outcome of our framework, we also prove that the alphabet in several central string problems can be efficiently reduced to binary.

关于Word RAM模型中$O(n \sqrt{\log n})$复杂度的硬度层次研究 - arXiv