CoMatch:动态共可见性感知Transformer用于双边亚像素级半稠密图像匹配

CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

摘要 Abstract

本文提出了一种名为CoMatch的新颖半稠密图像匹配器,具有动态共可见性感知能力和双边亚像素精度。首先,观察到在整个粗特征图上建模上下文交互会由于标记之间的邻近表示相似性而引发高度冗余的计算,因此引入了共可见性引导的标记压缩器,根据动态估计的共可见性分数自适应地聚合标记,从而在提高聚合标记表征能力的同时确保计算效率。其次,考虑到与大量非共可见区域的特征交互可能会分散注意力并降低特征的区分度,部署了共可见性辅助的注意力机制,有选择地抑制来自非共可见减少标记的无关消息广播,从而实现对相关而非全部标记的鲁棒且紧凑的注意力。第三,我们发现当前方法仅将目标视图的关键点调整到亚像素级别,而源视图中的关键点仍限制在粗粒度级别,不够信息丰富,不利于关键点位置敏感的应用。为此开发了一个简单但强大的精细相关模块,用于同时优化源视图和目标视图中的匹配候选对象至亚像素级别,显著提升了性能。在多个公开基准数据集上的广泛实验验证了CoMatch在准确性、效率和泛化能力方面的优势。

This prospective study proposes CoMatch, a novel semi-dense image matcher with dynamic covisibility awareness and bilateral subpixel accuracy. Firstly, observing that modeling context interaction over the entire coarse feature map elicits highly redundant computation due to the neighboring representation similarity of tokens, a covisibility-guided token condenser is introduced to adaptively aggregate tokens in light of their covisibility scores that are dynamically estimated, thereby ensuring computational efficiency while improving the representational capacity of aggregated tokens simultaneously. Secondly, considering that feature interaction with massive non-covisible areas is distracting, which may degrade feature distinctiveness, a covisibility-assisted attention mechanism is deployed to selectively suppress irrelevant message broadcast from non-covisible reduced tokens, resulting in robust and compact attention to relevant rather than all ones. Thirdly, we find that at the fine-level stage, current methods adjust only the target view's keypoints to subpixel level, while those in the source view remain restricted at the coarse level and thus not informative enough, detrimental to keypoint location-sensitive usages. A simple yet potent fine correlation module is developed to refine the matching candidates in both source and target views to subpixel level, attaining attractive performance improvement. Thorough experimentation across an array of public benchmarks affirms CoMatch's promising accuracy, efficiency, and generalizability.