透明、可解释且多模态(TIM)的增强现实个人助理的设计与实现

Design and Implementation of the Transparent, Interpretable, and Multimodal (TIM) AR Personal Assistant

摘要 Abstract

任务指导型人工智能助手的概念正迅速从科幻元素转变为即将成为现实的技术。此类系统本质上非常复杂,需要具备感知接地、注意力分配以及推理能力的模型,还需要一个能够适应操作者需求的直观界面,并协调来自多个传感器的数据流。此外,系统获取的所有数据都必须易于事后分析,以便开发人员理解操作者行为并快速检测故障。我们介绍了TIM,这是首个在增强现实中实现的端到端人工智能赋能的任务指导系统,它能够检测用户和场景,并提供可调节的即时反馈。我们讨论了系统面临的挑战并提出了设计解决方案。同时,我们展示了TIM如何针对不同需求的应用领域进行调整,突出了系统组件如何为每个场景定制化的能力。

The concept of an AI assistant for task guidance is rapidly shifting from a science fiction staple to an impending reality. Such a system is inherently complex, requiring models for perceptual grounding, attention, and reasoning, an intuitive interface that adapts to the performer's needs, and the orchestration of data streams from many sensors. Moreover, all data acquired by the system must be readily available for post-hoc analysis to enable developers to understand performer behavior and quickly detect failures. We introduce TIM, the first end-to-end AI-enabled task guidance system in augmented reality which is capable of detecting both the user and scene as well as providing adaptable, just-in-time feedback. We discuss the system challenges and propose design solutions. We also demonstrate how TIM adapts to domain applications with varying needs, highlighting how the system components can be customized for each scenario.

透明、可解释且多模态(TIM)的增强现实个人助理的设计与实现 - arXiv