大型语言模型在翻译中的迷失：M-ALERT揭示跨语言安全漏洞

Research

arXiv

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

Felix Friedrich ,

Simone Tedeschi ,

Patrick Schramowski ,

Manuel Brack ,

Roberto Navigli ,

Huu Nguyen ,

Bo Li ,

Kristian Kersting

论文信息在线阅读PDF

摘要 Abstract

在多种语言中构建安全的大型语言模型（LLMs）对于确保安全访问和语言多样性至关重要。为此，我们引入了M-ALERT，这是一个多语言基准，用于评估五种语言（英语、法语、德语、意大利语和西班牙语）中LLMs的安全性。M-ALERT每种语言包含15,000个高质量提示，总计75,000个提示，遵循详细的ALERT分类法。我们在10种最先进的LLMs上的广泛实验突显了语言特定安全性分析的重要性，揭示了模型在不同语言和类别之间经常表现出显著的一致性不足。例如，Llama3.2在意大利语的犯罪税收类别中显示出高风险，但在其他语言中保持安全。类似差异在所有模型中都可以观察到。相比之下，某些类别，如毒品大麻和犯罪宣传，始终在各种模型和语言中触发不安全响应。这些发现强调了在LLMs中采用稳健的多语言安全实践的必要性，以确保在多样化用户社区中的安全和负责任使用。

Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in the category crime_tax for Italian but remains safe in other languages. Similar differences can be observed across all models. In contrast, certain categories, such as substance_cannabis and crime_propaganda, consistently trigger unsafe responses across models and languages. These findings underscore the need for robust multilingual safety practices in LLMs to ensure safe and responsible usage across diverse user communities.