大型语言模型在翻译中的迷失:M-ALERT揭示跨语言安全漏洞

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

摘要 Abstract

在多种语言中构建安全的大型语言模型(LLMs)对于确保安全访问和语言多样性至关重要。为此,我们引入了M-ALERT,这是一个多语言基准,用于评估五种语言(英语、法语、德语、意大利语和西班牙语)中LLMs的安全性。M-ALERT每种语言包含15,000个高质量提示,总计75,000个提示,遵循详细的ALERT分类法。我们在10种最先进的LLMs上的广泛实验突显了语言特定安全性分析的重要性,揭示了模型在不同语言和类别之间经常表现出显著的一致性不足。例如,Llama3.2在意大利语的犯罪税收类别中显示出高风险,但在其他语言中保持安全。类似差异在所有模型中都可以观察到。相比之下,某些类别,如毒品大麻和犯罪宣传,始终在各种模型和语言中触发不安全响应。这些发现强调了在LLMs中采用稳健的多语言安全实践的必要性,以确保在多样化用户社区中的安全和负责任使用。

Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in the category crime_tax for Italian but remains safe in other languages. Similar differences can be observed across all models. In contrast, certain categories, such as substance_cannabis and crime_propaganda, consistently trigger unsafe responses across models and languages. These findings underscore the need for robust multilingual safety practices in LLMs to ensure safe and responsible usage across diverse user communities.

大型语言模型在翻译中的迷失:M-ALERT揭示跨语言安全漏洞 - arXiv