基于元深度强化学习的无人机辅助移动边缘计算系统高效可持续的任务卸载

Research

arXiv

Efficient and Sustainable Task Offloading in UAV-Assisted MEC Systems via Meta Deep Reinforcement Learning

摘要 Abstract

将无人机（UAV）集成到现有的移动边缘计算（MEC）系统中，可满足未来物联网（IoT）网络的严苛需求。本文研究了一种MEC系统，其中具备计算能力的无人机通过无线链路连接到云服务器，旨在为物联网设备的上行传输进行任务卸载。通过制定资源分配问题对该系统的性能进行了研究，该问题旨在最大化长期计算任务效率，同时确保物联网设备、无人机和云的任务缓冲区稳定性。该问题联合优化了物联网设备的上行发射功率及其卸载决策、无人机的轨迹以及所有收发器的计算能力。针对问题的非凸性和随机性，我们提出了一种多步求解方法。首先，通过引入分数规划和Lyapunov理论，我们将长期优化问题转化为等效的每时隙形式。随后，我们将重新表述的问题建模为马尔可夫决策过程（MDP），以反映网络动态特性。最终，该MDP模型用于训练一个元双延迟深度确定性策略梯度（MTD3）代理，负责针对由无人机和物联网设备移动引起的MEC系统变化进行自适应资源分配。仿真结果表明，所提出的资源分配方法优于基于深度强化学习（DRL）的方法，提高了计算任务效率并减少了任务缓冲区长度。

Integrated into existing Mobile Edge Computing (MEC) systems, Unmanned Aerial Vehicles (UAVs) serve as a cornerstone in meeting the stringent requirements of future Internet of Things (IoT) networks. The current endeavor studies an MEC system, in which a computationally-empowered UAV, wirelessly linked to a cloud server, is destined for task offloading in uplink transmission of IoT devices. The performance of this system is studied by formulating a resource allocation problem, which aims to maximize the long-term computed task efficiency, while ensuring the stability of task buffers at the IoT devices, UAV and cloud. The problem jointly optimizes the uplink transmit power of IoT devices and their offloading decisions, the trajectory of the UAV and computing power at all transceivers. Regarding the non-convex and stochastic nature of the problem, we devise a multi-step solution approach. Initially, by invoking the fractional programming and Lyapunov theory, we transform the long-term optimization problem into an equivalent per-time-slot form. Subsequently, we recast the reformulated problem as a Markov Decision Process (MDP), which reflects the network dynamics. The MDP model, eventually, serves for training a Meta Twin Delayed Deep Deterministic Policy Gradient (MTD3) agent, in charge of adaptive resource allocation with respect to the MEC system variations derived from the mobility of the UAV and IoT devices. Simulations reveal the dominance of our proposed resource allocation approach over its Deep Reinforcement Learning (DRL)-powered counterparts, increasing computed task efficiency and reducing task buffer lengths.