大型语言模型在翻译中的迷失:M-ALERT揭示跨语言安全漏洞
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps
摘要 Abstract
在多种语言中构建安全的大型语言模型(LLMs)对于确保安全访问和语言多样性至关重要。为此,我们引入了M-ALERT,这是一个多语言基准,用于评估五种语言(英语、法语、德语、意大利语和西班牙语)中LLMs的安全性。M-ALERT每种语言包含15,000个高质量提示,总计75,000个提示,遵循详细的ALERT分类法。我们在10种最先进的LLMs上的广泛实验突显了语言特定安全性分析的重要性,揭示了模型在不同语言和类别之间经常表现出显著的一致性不足。例如,Llama3.2在意大利语的犯罪税收类别中显示出高风险,但在其他语言中保持安全。类似差异在所有模型中都可以观察到。相比之下,某些类别,如毒品大麻和犯罪宣传,始终在各种模型和语言中触发不安全响应。这些发现强调了在LLMs中采用稳健的多语言安全实践的必要性,以确保在多样化用户社区中的安全和负责任使用。
Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in the category crime_tax for Italian but remains safe in other languages. Similar differences can be observed across all models. In contrast, certain categories, such as substance_cannabis and crime_propaganda, consistently trigger unsafe responses across models and languages. These findings underscore the need for robust multilingual safety practices in LLMs to ensure safe and responsible usage across diverse user communities.