大语言模型辅助的主动威胁情报自动化推理

LLM-Assisted Proactive Threat Intelligence for Automated Reasoning

摘要 Abstract

成功防御动态演化的网络威胁需要先进的复杂技术。本研究提出了一种通过集成大语言模型(LLMs)和检索增强生成(RAG)系统与持续威胁情报流,以提升实时网络安全威胁检测和响应能力的新方法。利用GPT-4o等大语言模型的最新进展以及RAG技术的创新应用,我们的方法通过整合动态、实时数据源,克服了传统静态威胁分析的局限性。我们利用RAG技术实现实时获取最新的威胁情报信息,这是现有GPT-4o模型无法实现的。我们采用Patrowl框架自动检索多样化的网络安全威胁情报源,包括常见漏洞和暴露(CVE)、通用弱点枚举(CWE)、漏洞预测评分系统(EPSS)以及已知被利用漏洞(KEV)数据库,并将其与all-mpnet-base-v2模型结合用于高维向量嵌入,存储和查询在Milvus中。通过一系列案例研究,我们展示了该系统的有效性,相较于基线GPT-4o模型,在处理近期披露的漏洞、KEVs和高EPSS评分的CVE方面显示出显著改进。这项工作不仅推动了大语言模型在网络安全中的角色发展,还为构建自动化智能网络威胁信息管理系统奠定了坚实基础,填补了当前网络安全实践中的重要空白。

Successful defense against dynamically evolving cyber threats requires advanced and sophisticated techniques. This research presents a novel approach to enhance real-time cybersecurity threat detection and response by integrating large language models (LLMs) and Retrieval-Augmented Generation (RAG) systems with continuous threat intelligence feeds. Leveraging recent advancements in LLMs, specifically GPT-4o, and the innovative application of RAG techniques, our approach addresses the limitations of traditional static threat analysis by incorporating dynamic, real-time data sources. We leveraged RAG to get the latest information in real-time for threat intelligence, which is not possible in the existing GPT-4o model. We employ the Patrowl framework to automate the retrieval of diverse cybersecurity threat intelligence feeds, including Common Vulnerabilities and Exposures (CVE), Common Weakness Enumeration (CWE), Exploit Prediction Scoring System (EPSS), and Known Exploited Vulnerabilities (KEV) databases, and integrate these with the all-mpnet-base-v2 model for high-dimensional vector embeddings, stored and queried in Milvus. We demonstrate our system's efficacy through a series of case studies, revealing significant improvements in addressing recently disclosed vulnerabilities, KEVs, and high-EPSS-score CVEs compared to the baseline GPT-4o. This work not only advances the role of LLMs in cybersecurity but also establishes a robust foundation for the development of automated intelligent cyberthreat information management systems, addressing crucial gaps in current cybersecurity practices.