信息增益并非你所需的一切

Information Gain Is Not All You Need

摘要 Abstract

移动机器人自主探索由两个相互竞争的目标驱动:覆盖率,以彻底观察环境;路径长度,以尽可能最短的路径完成任务。尽管在不了解未知环境的情况下很难评估最佳行动方案,但未知环境往往可以通过模型、地图或常识来理解。然而,以往的研究表明,通过先验知识改善信息增益的估计会导致贪婪行为,并最终导致回溯,从而降低覆盖率性能。事实上,即使没有先验知识,任何最大化信息增益的方法都会表现出这种行为。任务完成后获得的信息增益是恒定的,无法进一步优化。因此,它不适合作为优化目标。相反,信息增益是一种决策标准,用于确定哪些候选状态仍应考虑进行探索。任务因此转变为以最短总路径完成任务。由于确定最短路径通常是不可行的,必须依赖启发式方法或估计来识别能最小化总路径长度的候选状态。为了解决这个问题,我们提出了一种启发式方法,通过优先选择靠近机器人但远离其他候选状态的状态来减少回溯。我们在仿真中评估了所提出的启发式方法的表现,与基于信息增益的方法和前沿探索进行了对比,结果显示我们的方法显著减少了总路径长度,无论是否有环境的先验知识。

Autonomous exploration in mobile robotics is driven by two competing objectives: coverage, to exhaustively observe the environment; and path length, to do so with the shortest path possible. Though it is difficult to evaluate the best course of action without knowing the unknown, the unknown can often be understood through models, maps, or common sense. However, previous work has shown that improving estimates of information gain through such prior knowledge leads to greedy behavior and ultimately causes backtracking, which degrades coverage performance. In fact, any information gain maximization will exhibit this behavior, even without prior knowledge. Information gained at task completion is constant, and cannot be maximized for. It is therefore an unsuitable choice as an optimization objective. Instead, information gain is a decision criterion for determining which candidate states should still be considered for exploration. The task therefore becomes to reach completion with the shortest total path. Since determining the shortest path is typically intractable, it is necessary to rely on a heuristic or estimate to identify candidate states that minimize the total path length. To address this, we propose a heuristic that reduces backtracking by preferring candidate states that are close to the robot, but far away from other candidate states. We evaluate the performance of the proposed heuristic in simulation against an information gain-based approach and frontier exploration, and show that our method significantly decreases total path length, both with and without prior knowledge of the environment.