MANTRA：结合上下文RAG与多智能体LLM协作提升自动化方法级重构

Research

arXiv

MANTRA: Enhancing Automated Method-Level Refactoring with Contextual RAG and Multi-Agent LLM Collaboration

Yisen Xu ,

Feng Lin ,

Jinqiu Yang ,

Tse-Hsun ,

Chen ,

Nikolaos Tsantalis

论文信息在线阅读PDF

摘要 Abstract

维护和扩展软件系统高度依赖于有效的代码重构，然而这一过程仍然耗时费力，需要开发人员仔细分析现有代码库并防止引入新的缺陷。尽管近期进展利用大型语言模型（LLMs）实现了重构任务的部分自动化，但当前解决方案在范围上受到限制，并缺乏确保代码可编译性和测试成功执行的机制。在这项工作中，我们提出了MANTRA，这是一种基于LLM代理的综合性框架，用于自动化方法级重构。MANTRA整合了上下文感知检索增强生成（Context-Aware Retrieval-Augmented Generation）、协调的多智能体协作（coordinated Multi-Agent Collaboration）以及语言强化学习（Verbal Reinforcement Learning），以模拟人类在重构过程中的决策方式，同时保持代码的正确性和可读性。我们的实证研究基于从10个代表性Java项目中提取的703个“纯粹重构”实例（即仅涉及结构改进的代码更改），涵盖了六种最常见的重构操作。实验结果显示，MANTRA显著优于基线LLM模型（RawGPT），其生成的代码能够编译并通过所有测试的成功率为82.8%（582/703），而RawGPT仅为8.7%（61/703）。此外，与IntelliJ的LLM驱动重构工具（EM-Assist）相比，MANTRA在生成提取方法转换方面表现出了50%的性能提升。涉及37名专业开发者的可用性研究表明，由MANTRA执行的重构代码被认为与人工编写的代码一样具有可读性和可重用性，在某些情况下甚至更为有利。这些结果突显了MANTRA的实际优势，并强调了基于LLM系统的潜力，推动软件重构任务的自动化发展。

Maintaining and scaling software systems relies heavily on effective code refactoring, yet this process remains labor-intensive, requiring developers to carefully analyze existing codebases and prevent the introduction of new defects. Although recent advancements have leveraged Large Language Models (LLMs) to automate refactoring tasks, current solutions are constrained in scope and lack mechanisms to guarantee code compilability and successful test execution. In this work, we introduce MANTRA, a comprehensive LLM agent-based framework that automates method-level refactoring. MANTRA integrates Context-Aware Retrieval-Augmented Generation, coordinated Multi-Agent Collaboration, and Verbal Reinforcement Learning to emulate human decision-making during refactoring while preserving code correctness and readability. Our empirical study, conducted on 703 instances of "pure refactorings" (i.e., code changes exclusively involving structural improvements), drawn from 10 representative Java projects, covers the six most prevalent refactoring operations. Experimental results demonstrate that MANTRA substantially surpasses a baseline LLM model (RawGPT ), achieving an 82.8% success rate (582/703) in producing code that compiles and passes all tests, compared to just 8.7% (61/703) with RawGPT. Moreover, in comparison to IntelliJ's LLM-powered refactoring tool (EM-Assist), MANTRA exhibits a 50% improvement in generating Extract Method transformations. A usability study involving 37 professional developers further shows that refactorings performed by MANTRA are perceived to be as readable and reusable as human-written code, and in certain cases, even more favorable. These results highlight the practical advantages of MANTRA and emphasize the growing potential of LLM-based systems in advancing the automation of software refactoring tasks.