基于预训练贝叶斯非参数知识先验的机器人长时域强化学习

Research

arXiv

Pretrained Bayesian Non-parametric Knowledge Prior in Robotic Long-Horizon Reinforcement Learning

Yuan Meng ,

Xiangtong Yao ,

Kejia Chen ,

Yansong Wu ,

摘要 Abstract

强化学习（RL）方法通常从零开始学习新任务，往往忽略可能加速学习过程的先验知识。尽管有些方法整合了之前习得的技能，但它们通常依赖固定的结构，例如单一高斯分布，来定义技能先验。这种刚性假设可能限制技能的多样性和灵活性，特别是在复杂的长时域任务中。在这项工作中，我们提出了一种方法，将潜在的基础技能动作建模为具有未知数量潜在特征的非参数属性。我们利用带出生和合并启发式的贝叶斯非参数模型——狄利克雷过程混合模型，预先训练了一个能够有效捕捉技能多样性的技能先验。此外，所学技能在先验空间内可显式追踪，提高了可解释性和控制能力。通过将这种灵活的技能先验集成到RL框架中，我们的方法在长时域操作任务中超越现有方法，使复杂环境中的技能迁移和任务成功率更高。研究结果表明，技能先验的丰富非参数表示显著提升了挑战性机器人任务的学习和执行效率。所有数据、代码和视频可在https://ghiara.github.io/HELIOS/获取。

Reinforcement learning (RL) methods typically learn new tasks from scratch, often disregarding prior knowledge that could accelerate the learning process. While some methods incorporate previously learned skills, they usually rely on a fixed structure, such as a single Gaussian distribution, to define skill priors. This rigid assumption can restrict the diversity and flexibility of skills, particularly in complex, long-horizon tasks. In this work, we introduce a method that models potential primitive skill motions as having non-parametric properties with an unknown number of underlying features. We utilize a Bayesian non-parametric model, specifically Dirichlet Process Mixtures, enhanced with birth and merge heuristics, to pre-train a skill prior that effectively captures the diverse nature of skills. Additionally, the learned skills are explicitly trackable within the prior space, enhancing interpretability and control. By integrating this flexible skill prior into an RL framework, our approach surpasses existing methods in long-horizon manipulation tasks, enabling more efficient skill transfer and task success in complex environments. Our findings show that a richer, non-parametric representation of skill priors significantly improves both the learning and execution of challenging robotic tasks. All data, code, and videos are available at https://ghiara.github.io/HELIOS/.