AgentSpec：可定制的LLM代理运行时约束 enforcement 以确保安全性和可靠性

Research

arXiv

AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

Haoyu Wang ,

Christopher M. Poskitt ,

Jun Sun

论文信息在线阅读PDF

摘要 Abstract

基于大型语言模型（LLMs）构建的代理正越来越多地部署在各个领域，自动化复杂决策和任务执行。然而，它们的自主性也带来了安全风险，包括安全漏洞、法律违规以及无意中的有害行为。现有的缓解方法，如基于模型的安全保障和早期干预策略，在鲁棒性、可解释性和适应性方面存在不足。为了解决这些挑战，我们提出了AgentSpec，这是一种轻量级的领域特定语言，用于对LLM代理的运行时约束进行指定和强制执行。通过AgentSpec，用户可以定义包含触发器、谓词和执行机制的结构化规则，从而确保代理在预定义的安全边界内运行。我们在多个领域实现了AgentSpec，包括代码执行、具身代理和自动驾驶，展示了其适应性和有效性。我们的评估表明，AgentSpec成功防止了超过90%的代码代理案例中的不安全执行，消除了具身代理任务中的所有危险行为，并使自动驾驶汽车（AVs）达到了100%的合规性。尽管具有强大的安全保障，AgentSpec仍然保持计算上的轻量级，开销仅为毫秒级别。通过结合可解释性、模块化和效率，AgentSpec为在多样化应用中强制执行LLM代理安全性提供了一个实用且可扩展的解决方案。我们还利用LLMs自动生成规则，并评估其效果。我们的评估显示，由OpenAI o1生成的规则对具身代理的精确率为95.56%，召回率为70.96%，成功识别了87.26%的风险代码，并在8个场景中有5个防止了AVs违反法律。

Agents built on LLMs are increasingly deployed across diverse domains, automating complex decision-making and task execution. However, their autonomy introduces safety risks, including security vulnerabilities, legal violations, and unintended harmful actions. Existing mitigation methods, such as model-based safeguards and early enforcement strategies, fall short in robustness, interpretability, and adaptability. To address these challenges, we propose AgentSpec, a lightweight domain-specific language for specifying and enforcing runtime constraints on LLM agents. With AgentSpec, users define structured rules that incorporate triggers, predicates, and enforcement mechanisms, ensuring agents operate within predefined safety boundaries. We implement AgentSpec across multiple domains, including code execution, embodied agents, and autonomous driving, demonstrating its adaptability and effectiveness. Our evaluation shows that AgentSpec successfully prevents unsafe executions in over 90% of code agent cases, eliminates all hazardous actions in embodied agent tasks, and enforces 100% compliance by autonomous vehicles (AVs). Despite its strong safety guarantees, AgentSpec remains computationally lightweight, with overheads in milliseconds. By combining interpretability, modularity, and efficiency, AgentSpec provides a practical and scalable solution for enforcing LLM agent safety across diverse applications. We also automate the generation of rules using LLMs and assess their effectiveness. Our evaluation shows that the rules generated by OpenAI o1 achieve a precision of 95.56% and recall of 70.96% for embodied agents, successfully identifying 87.26% of the risky code, and prevent AVs from breaking laws in 5 out of 8 scenarios.