IGEV++:用于立体匹配的迭代多范围几何编码体

IGEV++: Iterative Multi-range Geometry Encoding Volumes for Stereo Matching

摘要 Abstract

立体匹配是许多计算机视觉和机器人系统的核心组成部分。尽管在过去十年取得了显著进展,但在病态区域处理匹配歧义以及大视差问题仍然是一个开放性的挑战。本文提出了一种新的深度网络架构,称为IGEV++,用于立体匹配。所提出的IGEV++构建了多范围几何编码体(MGEV),该体对病态区域和大视差进行粗粒度几何信息编码,并对细节和小视差进行细粒度几何信息编码。为了构建MGEV,我们引入了一个自适应块匹配模块,能够高效且有效地计算大视差范围和/或病态区域的匹配成本。我们进一步提出了选择性几何特征融合模块,以自适应地融合MGEV中的多范围和多粒度几何特征。然后,我们将融合后的几何特征索引并输入到ConvGRUs中,以迭代更新视差图。MGEV能够高效处理大视差和病态区域(如遮挡和无纹理区域),并在迭代过程中具有快速收敛的优势。我们的IGEV++在Scene Flow测试集的所有视差范围内均表现最佳,最大可达768像素。此外,IGEV++在Middlebury、ETH3D、KITTI 2012和2015基准测试中也达到了最先进的准确性。具体而言,在Middlebury的大视差基准测试中,IGEV++的2像素异常值率(Bad 2.0)为3.23%,相比RAFT-Stereo和GMStereo分别减少了31.9%和54.8%的错误。我们还展示了IGEV++的一个实时版本,在KITTI基准测试中所有已发表的实时方法中表现最佳。代码已在https://github.com/gangweiX/IGEV-plusplus公开发布。

Stereo matching is a core component in many computer vision and robotics systems. Despite significant advances over the last decade, handling matching ambiguities in ill-posed regions and large disparities remains an open challenge. In this paper, we propose a new deep network architecture, called IGEV++, for stereo matching. The proposed IGEV++ builds Multi-range Geometry Encoding Volumes (MGEV) that encode coarse-grained geometry information for ill-posed regions and large disparities and fine-grained geometry information for details and small disparities. To construct MGEV, we introduce an adaptive patch matching module that efficiently and effectively computes matching costs for large disparity ranges and/or ill-posed regions. We further propose a selective geometry feature fusion module to adaptively fuse multi-range and multi-granularity geometry features in MGEV. We then index the fused geometry features and input them to ConvGRUs to iteratively update the disparity map. MGEV allows to efficiently handle large disparities and ill-posed regions, such as occlusions and textureless regions, and enjoys rapid convergence during iterations. Our IGEV++ achieves the best performance on the Scene Flow test set across all disparity ranges, up to 768px. Our IGEV++ also achieves state-of-the-art accuracy on the Middlebury, ETH3D, KITTI 2012, and 2015 benchmarks. Specifically, IGEV++ achieves a 3.23% 2-pixel outlier rate (Bad 2.0) on the large disparity benchmark, Middlebury, representing error reductions of 31.9% and 54.8% compared to RAFT-Stereo and GMStereo, respectively. We also present a real-time version of IGEV++ that achieves the best performance among all published real-time methods on the KITTI benchmarks. The code is publicly available at https://github.com/gangweiX/IGEV-plusplus