WOW: 面向动态科学工作流的工作负载感知数据移动与任务调度
WOW: Workflow-Aware Data Movement and Task Scheduling for Dynamic Scientific Workflows
摘要 Abstract
科学工作流在独立节点集群上处理大规模数据集,这需要复杂的基础设施组件堆栈,特别是资源管理器(RM)用于任务到节点的分配、分布式文件系统(DFS)用于任务间的数据交换以及工作流引擎用于控制任务依赖关系。为了使这些组件能够解耦开发和安装,当前架构在工作流执行期间将中间数据文件独立于未来的负载进行放置。在数据密集型应用中,这种分离会导致次优调度,因为任务常常被分配到缺乏输入数据的节点上,从而引发网络流量和瓶颈问题。本文提出了一种名为WOW的新调度方法,该方法通过引导数据移动和任务调度来减少网络拥塞和整体运行时间。为此,WOW创建了中间文件的推测副本,以准备后续计划任务的执行。WOW支持通过动态构建执行计划获得灵活性的现代工作流系统。我们基于Kubernetes资源管理器为流行的Nextflow工作流引擎原型实现了WOW。在对16个合成和真实工作流的实验中,WOW在所有情况下都减少了完成时间,对于工作流模式最多提高了94.5%,对于真实工作流最多提高了53.2%,同时临时存储空间的增加相对适中。此外,它还对CPU分配产生了有利影响,并且随着集群规模的增加表现出良好的可扩展性。
Scientific workflows process extensive data sets over clusters of independent nodes, which requires a complex stack of infrastructure components, especially a resource manager (RM) for task-to-node assignment, a distributed file system (DFS) for data exchange between tasks, and a workflow engine to control task dependencies. To enable a decoupled development and installation of these components, current architectures place intermediate data files during workflow execution independently of the future workload. In data-intensive applications, this separation results in suboptimal schedules, as tasks are often assigned to nodes lacking input data, causing network traffic and bottlenecks. This paper presents WOW, a new scheduling approach for dynamic scientific workflow systems that steers both data movement and task scheduling to reduce network congestion and overall runtime. For this, WOW creates speculative copies of intermediate files to prepare the execution of subsequently scheduled tasks. WOW supports modern workflow systems that gain flexibility through the dynamic construction of execution plans. We prototypically implemented WOW for the popular workflow engine Nextflow using Kubernetes as a resource manager. In experiments with 16 synthetic and real workflows, WOW reduced makespan in all cases, with improvement of up to 94.5% for workflow patterns and up to 53.2% for real workflows, at a moderate increase of temporary storage space. It also has favorable effects on CPU allocation and scales well with increasing cluster size.