大语言模型水印的综合评估框架CEFW

CEFW: A Comprehensive Evaluation Framework for Watermark in Large Language Models

摘要 Abstract

文本水印为识别大型语言模型生成的合成文本提供了有效的解决方案。然而,现有技术往往专注于满足特定标准,而忽略了其他关键方面,缺乏统一的评估方法。为填补这一空白,我们提出了综合水印评估框架(Comprehensive Evaluation Framework for Watermark, CEFW),这是一个统一的框架,从检测便捷性、文本质量保真度、嵌入成本最小化、对抗攻击鲁棒性以及不可感知性防止模仿或伪造五个关键维度全面评估水印方法。通过依据所有这些关键标准评估水印,CEFW为水印的实际可行性和有效性提供了全面评估。此外,我们引入了一种简单且有效的水印方法,称为平衡水印(Balanced Watermark, BW),通过平衡添加水印信息的方式,确保其鲁棒性和不可感知性。大量实验表明,BW在所有评估维度上的总体性能优于现有方法。我们将代码开源给社区以供未来研究。https://github.com/DrankXs/BalancedWatermark。

Text watermarking provides an effective solution for identifying synthetic text generated by large language models. However, existing techniques often focus on satisfying specific criteria while ignoring other key aspects, lacking a unified evaluation. To fill this gap, we propose the Comprehensive Evaluation Framework for Watermark (CEFW), a unified framework that comprehensively evaluates watermarking methods across five key dimensions: ease of detection, fidelity of text quality, minimal embedding cost, robustness to adversarial attacks, and imperceptibility to prevent imitation or forgery. By assessing watermarks according to all these key criteria, CEFW offers a thorough evaluation of their practicality and effectiveness. Moreover, we introduce a simple and effective watermarking method called Balanced Watermark (BW), which guarantees robustness and imperceptibility through balancing the way watermark information is added. Extensive experiments show that BW outperforms existing methods in overall performance across all evaluation dimensions. We release our code to the community for future research. https://github.com/DrankXs/BalancedWatermark.