视频编辑中的镜头序列排序:基准数据集、度量方法及受电影美学启发的计算方法
Shot Sequence Ordering for Video Editing: Benchmarks, Metrics, and Cinematology-Inspired Computing Methods
摘要 Abstract
随着短视频平台的兴起,视频制作的需求大幅增加。然而,高质量视频的创作仍然高度依赖于专业剪辑技能以及对视觉语言的深刻理解。为应对这一挑战,人工智能辅助视频编辑中的镜头序列排序(SSO)任务已成为提升视频叙事能力和整体观影体验的关键途径。然而,该领域的进展因缺乏公开可用的基准数据集而受到阻碍。为此,本文引入了两个新的基准数据集,即AVE-Order和ActivityNet-Order。此外,我们采用Kendall Tau距离作为SSO任务的评估指标,并提出了Kendall Tau距离-交叉熵损失函数。我们进一步引入了电影美学嵌入的概念,将电影元数据和镜头标签作为先验知识融入到SSO模型中,并构建了AVE-Meta数据集以验证该方法的有效性。实验结果表明,所提出的损失函数和方法显著提高了SSO任务的准确性。所有数据集均可在https://github.com/litchiar/ShotSeqBench公开获取。
With the rising popularity of short video platforms, the demand for video production has increased substantially. However, high-quality video creation continues to rely heavily on professional editing skills and a nuanced understanding of visual language. To address this challenge, the Shot Sequence Ordering (SSO) task in AI-assisted video editing has emerged as a pivotal approach for enhancing video storytelling and the overall viewing experience. Nevertheless, the progress in this field has been impeded by a lack of publicly available benchmark datasets. In response, this paper introduces two novel benchmark datasets, AVE-Order and ActivityNet-Order. Additionally, we employ the Kendall Tau distance as an evaluation metric for the SSO task and propose the Kendall Tau Distance-Cross Entropy Loss. We further introduce the concept of Cinematology Embedding, which incorporates movie metadata and shot labels as prior knowledge into the SSO model, and constructs the AVE-Meta dataset to validate the method's effectiveness. Experimental results indicate that the proposed loss function and method substantially enhance SSO task accuracy. All datasets are publicly accessible at https://github.com/litchiar/ShotSeqBench.