EXPLICATE：通过可解释AI与LLM驱动的可解释性增强网络钓鱼检测

Research

arXiv

EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability

Bryan Lim ,

摘要 Abstract

高度复杂的网络钓鱼攻击已成为主要的网络安全威胁，变得越来越普遍且难以预防。尽管机器学习技术在检测网络钓鱼攻击方面显示出潜力，但它们大多作为“黑箱”运行，无法揭示决策依据。这种缺乏透明性削弱了用户的信任并降低了其有效应对威胁的能力。我们提出了EXPLICATE：一种通过三部分架构增强网络钓鱼检测的框架，包括基于机器学习的分类器（利用领域特定特征）、结合LIME和SHAP的双重解释层（提供互补的特征级见解），以及利用DeepSeek v3增强的大语言模型（将技术解释转化为易于理解的自然语言）。实验结果显示，EXPLICATE在所有指标上的准确率达到98.4%，与现有的深度学习技术相当，但在可解释性方面更胜一筹。该框架生成的高质量解释在LLM输出与模型预测之间具有94.2%的准确性以及96.8%的一致性。我们将EXPLICATE设计为一个完全可用的图形用户界面应用程序和轻量级Chrome扩展程序，展示了其在多种部署场景中的适用性。研究表明，在安全应用中，高检测性能可以与有意义的可解释性并存。最重要的是，它弥合了自动化人工智能与用户对网络钓鱼检测系统信任之间的关键鸿沟。

Sophisticated phishing attacks have emerged as a major cybersecurity threat, becoming more common and difficult to prevent. Though machine learning techniques have shown promise in detecting phishing attacks, they function mainly as "black boxes" without revealing their decision-making rationale. This lack of transparency erodes the trust of users and diminishes their effective threat response. We present EXPLICATE: a framework that enhances phishing detection through a three-component architecture: an ML-based classifier using domain-specific features, a dual-explanation layer combining LIME and SHAP for complementary feature-level insights, and an LLM enhancement using DeepSeek v3 to translate technical explanations into accessible natural language. Our experiments show that EXPLICATE attains 98.4 % accuracy on all metrics, which is on par with existing deep learning techniques but has better explainability. High-quality explanations are generated by the framework with an accuracy of 94.2 % as well as a consistency of 96.8\% between the LLM output and model prediction. We create EXPLICATE as a fully usable GUI application and a light Chrome extension, showing its applicability in many deployment situations. The research shows that high detection performance can go hand-in-hand with meaningful explainability in security applications. Most important, it addresses the critical divide between automated AI and user trust in phishing detection systems.