MMTL-UniAD：辅助驾驶感知中的多模态与多任务统一学习框架

Research

arXiv

MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception

Wenzhuo Liu ,

Wenshuo Wang ,

Yicheng Qiao ,

Qiannan Guo ,

Jiayin Zhu ,

Pengfei Li ,

Zilong Chen ,

Huiming Yang ,

Zhiwei Li ,

Lening Wang ,

Tiao Tan ,

Huaping Liu

论文信息在线阅读PDF

摘要 Abstract

高级驾驶辅助系统需要全面理解驾驶员的心理/生理状态以及交通环境，但现有工作往往忽视了这些任务之间联合学习的潜在优势。本文提出了一种名为MMTL-UniAD的统一多模态多任务学习框架，能够同时识别驾驶员行为（如环顾四周、交谈）、驾驶员情绪（如焦虑、快乐）、车辆行为（如停车、转弯）以及交通环境（如交通拥堵、顺畅交通）。一个关键挑战是如何避免任务之间的负迁移，这可能损害学习性能。为了解决这一问题，我们在框架中引入了两个关键组件：一是多轴区域注意力网络，用于提取全局上下文敏感特征；二是双分支多模态嵌入，从任务共享特征和任务特定特征中学习多模态嵌入。前者通过多注意力机制提取与任务相关特征，减轻了由无关特征引起的负迁移；后者采用双分支结构自适应调整任务共享参数和任务特定参数，增强了跨任务的知识转移，同时减少了任务冲突。我们在AIDE数据集上评估了MMTL-UniAD，并通过一系列消融实验表明，它在所有四个任务中均优于当前最先进的方法。代码可在https://github.com/Wenzhuo-Liu/MMTL-UniAD获取。

Advanced driver assistance systems require a comprehensive understanding of the driver's mental/physical state and traffic context but existing works often neglect the potential benefits of joint learning between these tasks. This paper proposes MMTL-UniAD, a unified multi-modal multi-task learning framework that simultaneously recognizes driver behavior (e.g., looking around, talking), driver emotion (e.g., anxiety, happiness), vehicle behavior (e.g., parking, turning), and traffic context (e.g., traffic jam, traffic smooth). A key challenge is avoiding negative transfer between tasks, which can impair learning performance. To address this, we introduce two key components into the framework: one is the multi-axis region attention network to extract global context-sensitive features, and the other is the dual-branch multimodal embedding to learn multimodal embeddings from both task-shared and task-specific features. The former uses a multi-attention mechanism to extract task-relevant features, mitigating negative transfer caused by task-unrelated features. The latter employs a dual-branch structure to adaptively adjust task-shared and task-specific parameters, enhancing cross-task knowledge transfer while reducing task conflicts. We assess MMTL-UniAD on the AIDE dataset, using a series of ablation studies, and show that it outperforms state-of-the-art methods across all four tasks. The code is available on https://github.com/Wenzhuo-Liu/MMTL-UniAD.