多玩家非线性约束动态博弈中近似反馈Stackelberg均衡的计算

Research

arXiv

The computation of approximate feedback Stackelberg equilibria in multi-player nonlinear constrained dynamic games

Jingqi Li ,

摘要 Abstract

解决具有非线性动态和耦合约束的反馈Stackelberg博弈问题（在实际中常见）具有重大挑战性。本文提出了一种高效方法，用于计算多玩家广义和动态博弈中近似的局部反馈Stackelberg均衡，其中状态和动作空间为连续。不同于主要针对无约束问题设计的现有（近似）动态规划解法，我们的方法通过将反馈Stackelberg动态博弈重新表述为一系列嵌套优化问题，推导出Karush-Kuhn-Tucker（KKT）条件，并建立局部反馈Stackelberg均衡的二阶充分条件。我们提出了一种基于牛顿风格的原对偶内点法来求解约束线性二次（LQ）反馈Stackelberg博弈，该方法具有可证明的收敛性保证。进一步地，我们将该方法扩展到更一般的非线性博弈中，通过迭代地利用LQ博弈逼近，确保其KKT条件与原始非线性博弈的局部一致。我们证明了该算法在约束非线性博弈中的指数收敛性。在具有非线性动态以及（非凸）耦合成本和约束的反馈Stackelberg博弈中，实验结果表明该算法能够处理不可行的初始条件并实现指数收敛至近似的局部反馈Stackelberg均衡。

Solving feedback Stackelberg games with nonlinear dynamics and coupled constraints, a common scenario in practice, presents significant challenges. This work introduces an efficient method for computing approximate local feedback Stackelberg equilibria in multi-player general-sum dynamic games, with continuous state and action spaces. Different from existing (approximate) dynamic programming solutions that are primarily designed for unconstrained problems, our approach involves reformulating a feedback Stackelberg dynamic game into a sequence of nested optimization problems, enabling the derivation of Karush-Kuhn-Tucker (KKT) conditions and the establishment of a second-order sufficient condition for local feedback Stackelberg equilibria. We propose a Newton-style primal-dual interior point method for solving constrained linear quadratic (LQ) feedback Stackelberg games, offering provable convergence guarantees. Our method is further extended to compute local feedback Stackelberg equilibria for more general nonlinear games by iteratively approximating them using LQ games, ensuring that their KKT conditions are locally aligned with those of the original nonlinear games. We prove the exponential convergence of our algorithm in constrained nonlinear games. In a feedback Stackelberg game with nonlinear dynamics and (nonconvex) coupled costs and constraints, our experimental results reveal the algorithm's ability to handle infeasible initial conditions and achieve exponential convergence towards an approximate local feedback Stackelberg equilibrium.