摘要 Abstract
为了指导学习者掌握动作技能,教练需要做到以下两点:1) 分析学习者的动作执行情况和技术要点(TechPoints),2) 提供详细且易于理解的反馈,指出做得好的方面以及可以改进的地方。然而,现有的基于评分的动作评估方法仍无法满足这一实际需求。为弥合这一差距,我们研究了一个新的任务——描述性动作辅导(DescCoach),要求模型在提供动作执行质量评分的基础上,进一步对做得好和可以改进的部分进行详细评论。为此,我们首先构建了一个名为EE4D-DescCoach的新数据集。通过自动标注管道,该数据集超越了现有的动作评估数据集,提供了技术点级别的详细评论。此外,我们提出了TechCoach,这是一种新框架,明确将技术点级别的推理融入到描述性动作辅导过程中。我们的方法的核心在于上下文感知的技术点推理器,它能够在技术点级别辅导评论的监督下,通过查询视觉上下文来学习与技术点相关的质量表示。利用视觉上下文和技术点相关质量表示,统一的技术点感知动作评估器被用于提供总体辅导评论和质量评分。结合以上内容,我们建立了描述性动作辅导任务的新基准,并通过大量实验评估了方法的有效性。数据和代码将公开发布。
To guide a learner in mastering action skills, it is crucial for a coach to 1) reason through the learner's action execution and technical points (TechPoints), and 2) provide detailed, comprehensible feedback on what is done well and what can be improved. However, existing score-based action assessment methods are still far from reaching this practical scenario. To bridge this gap, we investigate a new task termed Descriptive Action Coaching (DescCoach) which requires the model to provide detailed commentary on what is done well and what can be improved beyond a simple quality score for action execution. To this end, we first build a new dataset named EE4D-DescCoach. Through an automatic annotation pipeline, our dataset goes beyond the existing action assessment datasets by providing detailed TechPoint-level commentary. Furthermore, we propose TechCoach, a new framework that explicitly incorporates TechPoint-level reasoning into the DescCoach process. The central to our method lies in the Context-aware TechPoint Reasoner, which enables TechCoach to learn TechPoint-related quality representation by querying visual context under the supervision of TechPoint-level coaching commentary. By leveraging the visual context and the TechPoint-related quality representation, a unified TechPoint-aware Action Assessor is then employed to provide the overall coaching commentary together with the quality score. Combining all of these, we establish a new benchmark for DescCoach and evaluate the effectiveness of our method through extensive experiments. The data and code will be made publicly available.