debug-gym：一种基于文本的交互式调试环境

Research

arXiv

debug-gym: A Text-Based Environment for Interactive Debugging

Xingdi Yuan ,

Minseon Kim ,

摘要 Abstract

大型语言模型（LLMs）在代码任务中的依赖程度日益增加，但在大多数情况下，都假设所有相关信息要么可以在上下文中访问，要么与训练数据匹配。我们认为，LLMs可以从交互式探索代码库以获取与其任务相关的信息的能力中受益。为实现这一目标，我们提出了一种文本环境，即debug-gym，用于在交互式编码环境中开发基于LLM的代理。我们的环境轻量级且提供一组有用的工具，例如Python调试器（pdb），旨在促进基于LLM的代理的交互式调试。除了编码和调试任务外，这种方法还可以推广到其他任务，这些任务将从LLM代理的信息寻求行为中受益。

Large Language Models (LLMs) are increasingly relied upon for coding tasks, yet in most scenarios it is assumed that all relevant information can be either accessed in context or matches their training data. We posit that LLMs can benefit from the ability to interactively explore a codebase to gather the information relevant to their task. To achieve this, we present a textual environment, namely debug-gym, for developing LLM-based agents in an interactive coding setting. Our environment is lightweight and provides a preset of useful tools, such as a Python debugger (pdb), designed to facilitate an LLM-based agent's interactive debugging. Beyond coding and debugging tasks, this approach can be generalized to other tasks that would benefit from information-seeking behavior by an LLM agent.