提升大语言模型生成代码的鲁棒性：实证研究与框架

Research

arXiv

Enhancing the Robustness of LLM-Generated Code: Empirical Study and Framework

Zike Li ,

Mingwei Liu ,

Anji Li ,

Kaifeng He ,

Yanlin Wang ,

Xin Peng ,

Zibin Zheng

论文信息在线阅读PDF

摘要 Abstract

确保由大规模语言模型（LLM）生成代码的鲁棒性对于实际应用中的可靠性至关重要。然而，现有的评估主要集中在正确性方面，往往忽视了诸如缺少输入验证和错误处理不足等关键鲁棒性问题。本文首次对LLM生成代码的鲁棒性进行了实证研究，引入了新的鲁棒性度量指标，并分析了四种最先进的代码LLM，结果表明，平均而言，它们生成的代码中有43.1%的鲁棒性低于人类编写的代码。值得注意的是，超过90%的鲁棒性缺陷源于缺失的条件检查，其中70%的遗漏出现在代码的第一行。此外，在需要但缺失条件语句的情况下，有69%的情况中“if”关键词在模型预测的词概率排名中仍位列第三或更高，表明模型隐式认识到了控制结构的重要性。基于这些发现，我们提出了RobGen框架，该框架旨在提升代码鲁棒性而无需重新训练模型。RobGen利用两种与模型无关的技术：RobGen-Adj，通过在解码过程中动态调整词概率来鼓励包含控制结构；以及RobGen-Ins，在生成后插入缺失的条件语句以改进生成代码。实验结果显示，RobGen将鲁棒性较低的模型生成代码的比例降低了20.0%，显著提升了跨多种任务的代码可靠性。作为一种轻量级且可适应性强的解决方案，RobGen有效缓解了LLM生成代码中的鲁棒性挑战。所有代码和数据均可在https://github.com/SYSUSELab/RobGen获取。

Ensuring the robustness of code generated by large language models (LLMs) is crucial for real-world reliability. However, existing evaluations predominantly focus on correctness, often neglecting key robustness concerns such as missing input validation and insufficient error handling. In this paper, we present the first empirical study on the robustness of LLM-generated code. We introduce novel robustness metrics and analyze four state-of-the-art code LLMs, revealing that, on average, 43.1% of their generated code is less robust than human-written counterparts. Notably, over 90% of robustness deficiencies stem from missing conditional checks, with 70% of these omissions occurring in the first line of code. Additionally, in 69% of cases where a conditional statement is necessary but absent, the "if" token still ranks third or higher in the model's predicted token probabilities, indicating an implicit recognition of control structures. Building on these findings, we propose RobGen, a framework designed to enhance code robustness without requiring model retraining. RobGen leverages two model-agnostic techniques: RobGen-Adj, which dynamically adjusts token probabilities during decoding to encourage the inclusion of control structures, and RobGen-Ins, which improves generated code by inserting missing conditionals after generation. Experimental results demonstrate that RobGen reduces the proportion of less robust model-generated code by 20.0%, significantly enhancing code reliability across diverse tasks. As a lightweight and adaptable solution, RobGen effectively mitigates robustness challenges in LLM-generated code. All code and data are available at https://github.com/SYSUSELab/RobGen.