多标签微动作检测（MMAD）在视频中的应用

Research

arXiv

MMAD: Multi-label Micro-Action Detection in Videos

Kun Li ,

Pengyu Liu ,

Dan Guo ,

Fei Wang ,

Zhiliang Wu ,

Hehe Fan ,

Meng Wang

论文信息在线阅读PDF

摘要 Abstract

人体动作是非语言交流的重要形式，在社会互动中占据重要地位。本文特别关注一种被称为微动作的体态动作子集，这些微动作是细微且低强度的身体运动，在人类情感分析中有广阔的应用前景。在现实场景中，人类的微动作往往在时间上共现，多个微动作重叠出现，例如头部和手部的同时运动。然而，当前研究主要集中在识别单个微动作，而忽视了它们共现的本质。为解决这一问题，我们提出了一个新的任务——多标签微动作检测（MMAD），该任务旨在识别给定短视频中的所有微动作，确定其起止时间并进行分类。完成此任务需要一种能够准确捕捉长短期动作关系的模型，以便检测多个重叠的微动作。为了促进MMAD任务的研究，我们引入了一个新的数据集——多标签微动作-52（MMA-52），并提出了一种带有双路径时空适配器的基线方法，以应对MMAD中微妙视觉变化带来的挑战。我们希望MMA-52能激发视频中微动作分析的研究，并推动以人为中心的视频理解中的时空建模发展。所提出的MMA-52数据集可在https://github.com/VUT-HFUT/Micro-Action获取。

Human body actions are an important form of non-verbal communication in social interactions. This paper specifically focuses on a subset of body actions known as micro-actions, which are subtle, low-intensity body movements with promising applications in human emotion analysis. In real-world scenarios, human micro-actions often temporally co-occur, with multiple micro-actions overlapping in time, such as concurrent head and hand movements. However, current research primarily focuses on recognizing individual micro-actions while overlooking their co-occurring nature. To address this gap, we propose a new task named Multi-label Micro-Action Detection (MMAD), which involves identifying all micro-actions in a given short video, determining their start and end times, and categorizing them. Accomplishing this requires a model capable of accurately capturing both long-term and short-term action relationships to detect multiple overlapping micro-actions. To facilitate the MMAD task, we introduce a new dataset named Multi-label Micro-Action-52 (MMA-52) and propose a baseline method equipped with a dual-path spatial-temporal adapter to address the challenges of subtle visual change in MMAD. We hope that MMA-52 can stimulate research on micro-action analysis in videos and prompt the development of spatio-temporal modeling in human-centric video understanding. The proposed MMA-52 dataset is available at: https://github.com/VUT-HFUT/Micro-Action.