基于深度强化学习的非线性实时运动cueing算法

Research

arXiv

A nonlinear real time capable motion cueing algorithm based on deep reinforcement learning

摘要 Abstract

在运动模拟中，运动cueing算法用于规划运动模拟平台的轨迹，但由于工作空间限制，无法直接再现参考轨迹。在这种情况下，诸如motion washout（返回平台中心）等策略至关重要。对于具有高度非线性工作空间的串联系统运动模拟平台(MSPs)，最大化利用MSPs的运动学和动力学能力尤为重要。传统的经典washout滤波和线性模型预测控制方法未能考虑平台特定的非线性特性，而非线性模型预测控制尽管全面但计算需求高，难以在不简化的情况下实现飞行员在环的实时应用。为克服这些局限性，我们提出了一种基于深度强化学习的新方法，并首次在6自由度设置中展示了其对MSPs运动学非线性的全面考虑。作者先前的工作成功展示了深度强化学习(DRL)在简化2自由度设置中的应用，该设置未考虑运动学或动力学约束。通过将完整的MSPs运动学模型纳入算法中，该方法已扩展到所有6个自由度，这是实现实际运动模拟器应用的关键一步。DRL-MCA的训练基于演员-评论家实现的近端策略优化，并结合自动超参数优化。在详细描述必要的训练框架和算法本身后，我们进行了全面验证，表明DRL MCA在性能上可与现有算法竞争。此外，它通过遵守所有系统约束生成可行轨迹，并满足低延迟下的所有实时要求...

In motion simulation, motion cueing algorithms are used for the trajectory planning of the motion simulator platform, where workspace limitations prevent direct reproduction of reference trajectories. Strategies such as motion washout, which return the platform to its center, are crucial in these settings. For serial robotic MSPs with highly nonlinear workspaces, it is essential to maximize the efficient utilization of the MSPs kinematic and dynamic capabilities. Traditional approaches, including classical washout filtering and linear model predictive control, fail to consider platform-specific, nonlinear properties, while nonlinear model predictive control, though comprehensive, imposes high computational demands that hinder real-time, pilot-in-the-loop application without further simplification. To overcome these limitations, we introduce a novel approach using deep reinforcement learning for motion cueing, demonstrated here for the first time in a 6-degree-of-freedom setting with full consideration of the MSPs kinematic nonlinearities. Previous work by the authors successfully demonstrated the application of DRL to a simplified 2-DOF setup, which did not consider kinematic or dynamic constraints. This approach has been extended to all 6 DOF by incorporating a complete kinematic model of the MSP into the algorithm, a crucial step for enabling its application on a real motion simulator. The training of the DRL-MCA is based on Proximal Policy Optimization in an actor-critic implementation combined with an automated hyperparameter optimization. After detailing the necessary training framework and the algorithm itself, we provide a comprehensive validation, demonstrating that the DRL MCA achieves competitive performance against established algorithms. Moreover, it generates feasible trajectories by respecting all system constraints and meets all real-time requirements with low...