可达多面体推进（RPM）：深度学习控制系统的精确分析工具

Research

arXiv

Reachable Polyhedral Marching (RPM): An Exact Analysis Tool for Deep-Learned Control Systems

摘要 Abstract

神经网络在机器人领域被越来越多地用作策略、状态转移模型或状态估计模型，甚至全部上述功能。由于这些组件是从数据中学习得到的，因此能够分析学到的行为及其对闭环性能的影响至关重要。本文朝着这一目标迈出了重要一步，开发了计算神经网络表示的动力系统控制不变集和吸引域（ROA）的方法。我们重点关注具有修正线性单元（ReLU）激活函数的前馈神经网络，因为它们已知实现连续分段仿射（PWA）函数。我们描述了可达多面体推进（RPM）算法，通过增量连接遍历枚举神经网络的仿射片段。然后，我们利用该算法计算精确的前向和后向可达集，并提供计算控制不变集和ROA的方法。我们的方法独特之处在于无需基于李雅普诺夫工具即可增量找到这些集合。在示例任务中，我们展示了通过学习的范德波尔振荡器和摆动模型，能够找到非凸控制不变集和ROA的能力。此外，我们还提供了加速计算ROA的算法，充分利用了RPM提供的增量和连接枚举仿射区域的优势。结果显示，在示例中加速比达到15倍。最后，我们将方法应用于求解飞机跑道控制问题中的图像驱动控制器稳定的状态集合。

Neural networks are increasingly used in robotics as policies, state transition models, state estimation models, or all of the above. With these components being learned from data, it is important to be able to analyze what behaviors were learned and how this affects closed-loop performance. In this paper we take steps toward this goal by developing methods for computing control invariant sets and regions of attraction (ROAs) of dynamical systems represented as neural networks. We focus our attention on feedforward neural networks with the rectified linear unit (ReLU) activation, which are known to implement continuous piecewise-affine (PWA) functions. We describe the Reachable Polyhedral Marching (RPM) algorithm for enumerating the affine pieces of a neural network through an incremental connected walk. We then use this algorithm to compute exact forward and backward reachable sets, from which we provide methods for computing control invariant sets and ROAs. Our approach is unique in that we find these sets incrementally, without Lyapunov-based tools. In our examples we demonstrate the ability of our approach to find non-convex control invariant sets and ROAs on tasks with learned van der Pol oscillator and pendulum models. Further, we provide an accelerated algorithm for computing ROAs that leverages the incremental and connected enumeration of affine regions that RPM provides. We show this acceleration to lead to a 15x speedup in our examples. Finally, we apply our methods to find a set of states that are stabilized by an image-based controller for an aircraft runway control problem.