数据驱动的目标跟踪：将模块化神经网络集成到卡尔曼框架中

Research

arXiv

Data-Driven Object Tracking: Integrating Modular Neural Networks into a Kalman Framework

Christian Alexander Holz ,

摘要 Abstract

本文提出了针对多目标跟踪（MOT）的新颖机器学习（ML）方法，特别设计以满足高级驾驶辅助系统（ADAS）日益复杂和精确的需求。我们引入了三种神经网络（NN）模型，以解决MOT中的关键挑战：（i）用于轨迹预测的单预测网络（SPENT），（ii）用于将个体传感器目标（SO）映射到现有轨迹的单关联网络（SANT），以及（iii）用于将多个SO关联到多个轨迹的多关联网络（MANTa）。这些模型被无缝集成到传统的卡尔曼滤波器（KF）框架中，通过替换相关组件保持系统的模块化特性，而不影响整体架构。重要的是，所有三个网络均设计为在实时嵌入式环境中运行。每个网络包含少于50k个可训练参数。我们在公开的KITTI跟踪数据集上的评估表明，跟踪性能显著提升。SPENT相比标准KF将均方根误差（RMSE）降低了50%，而SANT和MANTa在传感器目标到轨迹的分配任务中达到了高达95%的准确性。这些结果强调了将任务特定的NN集成到传统跟踪系统中的有效性，提升了性能和鲁棒性，同时保持了模块化、可维护性和可解释性。

This paper presents novel Machine Learning (ML) methodologies for Multi-Object Tracking (MOT), specifically designed to meet the increasing complexity and precision demands of Advanced Driver Assistance Systems (ADAS). We introduce three Neural Network (NN) models that address key challenges in MOT: (i) the Single-Prediction Network (SPENT) for trajectory prediction, (ii) the Single-Association Network (SANT) for mapping individual Sensor Object (SO) to existing tracks, and (iii) the Multi-Association Network (MANTa) for associating multiple SOs to multiple tracks. These models are seamlessly integrated into a traditional Kalman Filter (KF) framework, maintaining the system's modularity by replacing relevant components without disrupting the overall architecture. Importantly, all three networks are designed to be run in a realtime, embedded environment. Each network contains less than 50k trainable parameters. Our evaluation, conducted on the public KITTI tracking dataset, demonstrates significant improvements in tracking performance. SPENT reduces the Root Mean Square Error (RMSE) by 50% compared to a standard KF, while SANT and MANTa achieve up to 95% accuracy in sensor object-to-track assignments. These results underscore the effectiveness of incorporating task-specific NNs into traditional tracking systems, boosting performance and robustness while preserving modularity, maintainability, and interpretability.