受认知科学启发的AI对象理解核心能力评估

Research

arXiv

Cognitive Science-Inspired Evaluation of Core Capabilities for Object Understanding in AI

Danaja Rutar ,

Alva Markelius ,

Konstantinos Voudouris ,

José Hernández-Orallo ,

Lucy Cheke

论文信息在线阅读PDF

摘要 Abstract

我们世界模型中的一个核心组成部分是“直观物理”——对物体、空间和因果关系的理解。这一能力使我们能够预测事件、规划行动并导航环境，所有这些都依赖于对物体性的综合感知。尽管其重要性不言而喻，但目前尚无单一统一的物体性理论，不过多个理论框架提供了见解。本文的第一部分综述了物体性研究的主要理论框架——格式塔心理学、能动认知和发育心理学，并确定了每个框架赋予物体理解的核心能力及其在塑造生物体世界模型中的功能角色。鉴于物体性在世界建模中的基础地位，理解物体性在AI领域同样至关重要。本文的第二部分评估了当前AI范式在物体性能力方面的研究方法与测试方式，与认知科学中的方法进行对比。我们将AI范式定义为对物体性的概念化方式、研究方法、所用数据以及评估技术的结合。我们发现，虽然基准测试可以检测到AI系统是否建模了物体性的孤立方面，但它们无法检测到AI系统是否缺乏这些能力之间的功能性整合，因此并未完全解决物体性挑战。最后，我们探索了新的评估方法，这些方法与本文提出的物体性整合视角相一致。这些方法有望推动从孤立的物体能力向具有真实世界情境下通用物体理解的通用AI发展。

One of the core components of our world models is 'intuitive physics' - an understanding of objects, space, and causality. This capability enables us to predict events, plan action and navigate environments, all of which rely on a composite sense of objecthood. Despite its importance, there is no single, unified account of objecthood, though multiple theoretical frameworks provide insights. In the first part of this paper, we present a comprehensive overview of the main theoretical frameworks in objecthood research - Gestalt psychology, enactive cognition, and developmental psychology - and identify the core capabilities each framework attributes to object understanding, as well as what functional roles they play in shaping world models in biological agents. Given the foundational role of objecthood in world modelling, understanding objecthood is also essential in AI. In the second part of the paper, we evaluate how current AI paradigms approach and test objecthood capabilities compared to those in cognitive science. We define an AI paradigm as a combination of how objecthood is conceptualised, the methods used for studying objecthood, the data utilised, and the evaluation techniques. We find that, whilst benchmarks can detect that AI systems model isolated aspects of objecthood, the benchmarks cannot detect when AI systems lack functional integration across these capabilities, not solving the objecthood challenge fully. Finally, we explore novel evaluation approaches that align with the integrated vision of objecthood outlined in this paper. These methods are promising candidates for advancing from isolated object capabilities toward general-purpose AI with genuine object understanding in real-world contexts.