足球视频理解的通用方法

Research

arXiv

足球视频理解的通用方法

Towards Universal Soccer Video Understanding

Jiayuan Rao ,

Haoning Wu ,

Hao Jiang ,

Ya Zhang ,

Yanfeng Wang ,

Weidi Xie

论文信息在线阅读PDF

摘要 Abstract

作为一种全球广受欢迎的运动，足球吸引了全世界球迷的广泛关注。本文旨在构建一个全面的多模态足球视频理解框架。具体而言，本文做出了以下贡献：(i) 引入了SoccerReplay-1988，这是迄今为止最大的多模态足球数据集，包含来自1,988场完整比赛的视频及其详细注释，并采用自动化注释流程；(ii) 提出了先进的足球专用视觉编码器MatchVision，该编码器利用足球视频中的时空信息，在多种下游任务中表现出色；(iii) 对事件分类、解说生成以及多视角犯规识别进行了广泛的实验和消融研究。MatchVision在所有任务上均达到最先进的性能，显著优于现有模型，这凸显了我们提出的数据和模型的优势。我们相信这项工作将为体育理解研究提供一个标准范式。

As a globally celebrated sport, soccer has attracted widespread interest from fans all over the world. This paper aims to develop a comprehensive multi-modal framework for soccer video understanding. Specifically, we make the following contributions in this paper: (i) we introduce SoccerReplay-1988, the largest multi-modal soccer dataset to date, featuring videos and detailed annotations from 1,988 complete matches, with an automated annotation pipeline; (ii) we present an advanced soccer-specific visual encoder, MatchVision, which leverages spatiotemporal information across soccer videos and excels in various downstream tasks; (iii) we conduct extensive experiments and ablation studies on event classification, commentary generation, and multi-view foul recognition. MatchVision demonstrates state-of-the-art performance on all of them, substantially outperforming existing models, which highlights the superiority of our proposed data and model. We believe that this work will offer a standard paradigm for sports understanding research.