基于语义增强PCFG的密码分析与破解:SE#PCFG

SE#PCFG: Semantically Enhanced PCFG for Password Analysis and Cracking

摘要 Abstract

关于用户生成的文本密码的研究已经取得了很多进展。然而,令人惊讶的是,这类密码中的语义信息仍然研究不足,尤其是对英语和/或汉语用户的密码研究,其语义信息的利用非常有限。本文通过提出一个基于语义增强的概率上下文无关文法(Probabilistic Context-Free Grammars, PCFG)的一般框架——SE#PCFG,填补了这一空白。该框架使我们能够考虑迄今为止最丰富的43种语义信息集,用于密码分析。我们将SE#PCFG应用于来自四种语言(英语、汉语、德语和法语)用户的17个大型泄露密码数据库,展示了其有用性,并报告了关于密码语义在不同层次上的广泛新见解,例如跨网站密码相关性。此外,基于SE#PCFG和一种新的系统平滑方法,我们提出了语义增强的密码破解架构(SEPCA),并从密码覆盖率的角度将其性能与三个最先进的基准进行了比较:两种其他PCFG变体以及神经网络。实验结果表明,在包含重复密码的用户级别上,SEPCA相对于这三个基准分别提高了高达21.53%、52.55%和7.86%;而在唯一密码级别上,SEPCA也分别领先这三个对手高达43.83%、94.11%和11.16%。

Much research has been done on user-generated textual passwords. Surprisingly, semantic information in such passwords remain under-investigated, with passwords created by English- and/or Chinese-speaking users being more studied with limited semantics. This paper fills this gap by proposing a general framework based on semantically enhanced PCFG (probabilistic context-free grammars) named SE#PCFG. It allowed us to consider 43 types of semantic information, the richest set considered so far, for password analysis. Applying SE#PCFG to 17 large leaked password databases of user speaking four languages (English, Chinese, German and French), we demonstrate its usefulness and report a wide range of new insights about password semantics at different levels such as cross-website password correlations. Furthermore, based on SE#PCFG and a new systematic smoothing method, we proposed the Semantically Enhanced Password Cracking Architecture (SEPCA), and compared its performance against three SOTA (state-of-the-art) benchmarks in terms of the password coverage rate: two other PCFG variants and neural network. Our experimental results showed that SEPCA outperformed all the three benchmarks consistently and significantly across 52 test cases, by up to 21.53%, 52.55% and 7.86%, respectively, at the user-level (with duplicate passwords). At the level of unique passwords, SEPCA also beats the three counterparts by up to 43.83%, 94.11% and 11.16%, respectively.