摘要 Abstract
在日益复杂且相互依赖的软件系统时代,自动化软件漏洞检测(SVD)仍然是一个关键挑战。尽管代码分析领域的大规模语言模型(LLMs)取得了显著进展,但现有的评估方法学往往缺乏必要的“上下文感知鲁棒性”,无法捕捉现实世界中的复杂性和跨组件交互。为了解决这些局限性,我们提出了VulnSage,这是一个全面的评估框架以及一个从C/C++开发的多样化大规模开源系统软件项目中精心策划的数据集。与现有数据集不同,它利用启发式噪声预过滤方法结合基于LLMs的推理,确保代表性和最小噪声的漏洞光谱。该框架支持函数级、文件级和函数间级别的多层次分析,并采用四种不同的零样本提示策略:Baseline、Chain-of-Thought、Think和Think & Verify。通过这项评估,我们发现结构化推理提示显著提升了LLMs的表现,其中Think & Verify将模糊响应从20.3%降低到9.1%,同时提高了准确性。我们进一步证明,专门针对代码的模型始终优于通用替代方案,其性能在不同类型的漏洞中存在显著差异,表明没有单一方法能够在所有安全上下文中普遍表现优异。数据集和代码链接:https://github.com/Erroristotle/VulnSage.git
Automating software vulnerability detection (SVD) remains a critical challenge in an era of increasingly complex and interdependent software systems. Despite significant advances in Large Language Models (LLMs) for code analysis, prevailing evaluation methodologies often lack the \textbf{context-aware robustness} necessary to capture real-world intricacies and cross-component interactions. To address these limitations, we present \textbf{VulnSage}, a comprehensive evaluation framework and a dataset curated from diverse, large-scale open-source system software projects developed in C/C++. Unlike prior datasets, it leverages a heuristic noise pre-filtering approach combined with LLM-based reasoning to ensure a representative and minimally noisy spectrum of vulnerabilities. The framework supports multi-granular analysis across function, file, and inter-function levels and employs four diverse zero-shot prompt strategies: Baseline, Chain-of-Thought, Think, and Think & Verify. Through this evaluation, we uncover that structured reasoning prompts substantially improve LLM performance, with Think & Verify reducing ambiguous responses from 20.3% to 9.1% while increasing accuracy. We further demonstrate that code-specialized models consistently outperform general-purpose alternatives, with performance varying significantly across vulnerability types, revealing that no single approach universally excels across all security contexts. Link to dataset and codes: https://github.com/Erroristotle/VulnSage.git