基于语义区域分割的有效匹配冗余减少方法:MESA与DMESA

MESA: Effective Matching Redundancy Reduction by Semantic Area Segmentation

摘要 Abstract

我们提出了MESA和DMESA两种新颖的特征匹配方法,利用Segment Anything Model(SAM)有效减轻了匹配冗余问题。我们的方法的关键见解是基于SAM的先进图像理解能力,在点匹配之前建立隐式语义区域匹配先验。然后,具有内部语义一致性的信息量大的区域匹配能够进行密集特征比较,从而实现精确的区域内点匹配。具体而言,MESA采用稀疏匹配框架,通过一种新的区域图(AG)从SAM结果中获得候选区域,然后将候选区域间的匹配问题转化为图能量最小化,并通过从AG衍生出的图形模型求解。为了解决MESA的效率问题,我们进一步提出了其密集匹配版本DMESA,采用了密集匹配框架。在确定候选区域后,DMESA通过生成密集匹配分布来建立区域匹配。这些分布利用高斯混合模型从现成的块匹配中生成,并通过期望最大化算法进行优化。由于减少了重复计算,DMESA的速度比MESA提高了近五倍,同时保持了竞争力的准确性。我们在涵盖室内和室外场景的五个数据集上对我们的方法进行了广泛评估,结果显示我们的方法在所有数据集的五个不同的点匹配基准上都取得了持续的性能提升。此外,我们的方法在应对图像分辨率变化时表现出良好的泛化能力和改进的鲁棒性。代码已公开发布在https://github.com/Easonyesheng/A2PM-MESA。

We propose MESA and DMESA as novel feature matching methods, which utilize Segment Anything Model (SAM) to effectively mitigate matching redundancy. The key insight of our methods is to establish implicit-semantic area matching prior to point matching, based on advanced image understanding of SAM. Then, informative area matches with consistent internal semantic are able to undergo dense feature comparison, facilitating precise inside-area point matching. Specifically, MESA adopts a sparse matching framework and first obtains candidate areas from SAM results through a novel Area Graph (AG). Then, area matching among the candidates is formulated as graph energy minimization and solved by graphical models derived from AG. To address the efficiency issue of MESA, we further propose DMESA as its dense counterpart, applying a dense matching framework. After candidate areas are identified by AG, DMESA establishes area matches through generating dense matching distributions. The distributions are produced from off-the-shelf patch matching utilizing the Gaussian Mixture Model and refined via the Expectation Maximization. With less repetitive computation, DMESA showcases a speed improvement of nearly five times compared to MESA, while maintaining competitive accuracy. Our methods are extensively evaluated on five datasets encompassing indoor and outdoor scenes. The results illustrate consistent performance improvements from our methods for five distinct point matching baselines across all datasets. Furthermore, our methods exhibit promise generalization and improved robustness against image resolution variations. The code is publicly available at https://github.com/Easonyesheng/A2PM-MESA.