基于热-激光雷达融合的全天候深度完成

Research

arXiv

基于热-激光雷达融合的全天候深度完成

All-day Depth Completion via Thermal-LiDAR Fusion

Jinsun Park ,

摘要 Abstract

深度完成技术通过稀疏激光雷达点云和RGB图像估计密集深度图，在光照良好的条件下展现了卓越性能。然而，由于RGB传感器的局限性，现有方法在恶劣环境（如强降雨和低光条件）下往往难以实现可靠性能。此外，我们观察到，在恶劣天气条件下（例如强降雨），真实深度图经常存在大量缺失测量值，导致监督不足。相比之下，热相机在这些条件下能够提供清晰可靠的可见性，但热-激光雷达深度完成的研究仍处于初步阶段。此外，热图像的特性（如模糊、对比度低和噪声）带来了深度边界不清晰的问题。为了解决这些挑战，我们首先通过对MS$^2$和ViViD数据集进行广泛基准测试，评估了热-激光雷达深度完成在不同光照条件（如良好光照、低光）、天气条件（如晴天、降雨）以及环境条件（如室内、室外）下的可行性和鲁棒性。此外，我们提出了一种利用对比学习和伪监督（COPS）的框架，通过两种关键方式增强深度边界的清晰度并提高完成精度。首先，COPS通过使用单目深度基础模型挖掘正负样本，强制在不同的深度点之间施加基于深度的对比损失，从而锐化深度边界。其次，它通过利用基础模型预测作为密集深度先验，缓解了真实深度图监督不足的问题。我们还对热-激光雷达深度完成的关键挑战进行了深入分析，以帮助理解任务并鼓励未来研究。

Depth completion, which estimates dense depth from sparse LiDAR and RGB images, has demonstrated outstanding performance in well-lit conditions. However, due to the limitations of RGB sensors, existing methods often struggle to achieve reliable performance in harsh environments, such as heavy rain and low-light conditions. Furthermore, we observe that ground truth depth maps often suffer from large missing measurements in adverse weather conditions such as heavy rain, leading to insufficient supervision. In contrast, thermal cameras are known for providing clear and reliable visibility in such conditions, yet research on thermal-LiDAR depth completion remains underexplored. Moreover, the characteristics of thermal images, such as blurriness, low contrast, and noise, bring unclear depth boundary problems. To address these challenges, we first evaluate the feasibility and robustness of thermal-LiDAR depth completion across diverse lighting (eg., well-lit, low-light), weather (eg., clear-sky, rainy), and environment (eg., indoor, outdoor) conditions, by conducting extensive benchmarks on the MS$^2$ and ViViD datasets. In addition, we propose a framework that utilizes COntrastive learning and Pseudo-Supervision (COPS) to enhance depth boundary clarity and improve completion accuracy by leveraging a depth foundation model in two key ways. First, COPS enforces a depth-aware contrastive loss between different depth points by mining positive and negative samples using a monocular depth foundation model to sharpen depth boundaries. Second, it mitigates the issue of incomplete supervision from ground truth depth maps by leveraging foundation model predictions as dense depth priors. We also provide in-depth analyses of the key challenges in thermal-LiDAR depth completion to aid in understanding the task and encourage future research.