大型语言模型能否很好地玩文字游戏?当前技术水平及开放问题
Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions
摘要 Abstract
大型语言模型(LLMs)如ChatGPT和GPT-4最近展示了其与人类用户沟通的卓越能力。在本技术报告中,我们率先研究了它们在文字游戏中的表现能力,玩家在游戏中需要通过与游戏世界对话来理解环境并应对各种情况。实验结果显示,ChatGPT的表现与现有系统相比具有竞争力,但仍表现出较低的智能水平。具体而言,ChatGPT无法通过玩游戏或甚至阅读游戏手册构建世界模型;它可能无法利用已有的世界知识;它无法随着游戏进展推断每一步的目标。我们的结果在人工智能、机器学习和自然语言处理的交叉领域提出了新的研究问题。
Large language models (LLMs) such as ChatGPT and GPT-4 have recently demonstrated their remarkable abilities of communicating with human users. In this technical report, we take an initiative to investigate their capacities of playing text games, in which a player has to understand the environment and respond to situations by having dialogues with the game world. Our experiments show that ChatGPT performs competitively compared to all the existing systems but still exhibits a low level of intelligence. Precisely, ChatGPT can not construct the world model by playing the game or even reading the game manual; it may fail to leverage the world knowledge that it already has; it cannot infer the goal of each step as the game progresses. Our results open up new research questions at the intersection of artificial intelligence, machine learning, and natural language processing.