MMTL-UniAD:辅助驾驶感知中的多模态与多任务统一学习框架

MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception

摘要 Abstract

高级驾驶辅助系统需要全面理解驾驶员的心理/生理状态以及交通环境,但现有工作往往忽视了这些任务之间联合学习的潜在优势。本文提出了一种名为MMTL-UniAD的统一多模态多任务学习框架,能够同时识别驾驶员行为(如环顾四周、交谈)、驾驶员情绪(如焦虑、快乐)、车辆行为(如停车、转弯)以及交通环境(如交通拥堵、顺畅交通)。一个关键挑战是如何避免任务之间的负迁移,这可能损害学习性能。为了解决这一问题,我们在框架中引入了两个关键组件:一是多轴区域注意力网络,用于提取全局上下文敏感特征;二是双分支多模态嵌入,从任务共享特征和任务特定特征中学习多模态嵌入。前者通过多注意力机制提取与任务相关特征,减轻了由无关特征引起的负迁移;后者采用双分支结构自适应调整任务共享参数和任务特定参数,增强了跨任务的知识转移,同时减少了任务冲突。我们在AIDE数据集上评估了MMTL-UniAD,并通过一系列消融实验表明,它在所有四个任务中均优于当前最先进的方法。代码可在https://github.com/Wenzhuo-Liu/MMTL-UniAD获取。

Advanced driver assistance systems require a comprehensive understanding of the driver's mental/physical state and traffic context but existing works often neglect the potential benefits of joint learning between these tasks. This paper proposes MMTL-UniAD, a unified multi-modal multi-task learning framework that simultaneously recognizes driver behavior (e.g., looking around, talking), driver emotion (e.g., anxiety, happiness), vehicle behavior (e.g., parking, turning), and traffic context (e.g., traffic jam, traffic smooth). A key challenge is avoiding negative transfer between tasks, which can impair learning performance. To address this, we introduce two key components into the framework: one is the multi-axis region attention network to extract global context-sensitive features, and the other is the dual-branch multimodal embedding to learn multimodal embeddings from both task-shared and task-specific features. The former uses a multi-attention mechanism to extract task-relevant features, mitigating negative transfer caused by task-unrelated features. The latter employs a dual-branch structure to adaptively adjust task-shared and task-specific parameters, enhancing cross-task knowledge transfer while reducing task conflicts. We assess MMTL-UniAD on the AIDE dataset, using a series of ablation studies, and show that it outperforms state-of-the-art methods across all four tasks. The code is available on https://github.com/Wenzhuo-Liu/MMTL-UniAD.