使用语言能力考试测试低资源语言在LLM中的支持:卢森堡语案例研究

Testing Low-Resource Language Support in LLMs Using Language Proficiency Exams: the Case of Luxembourgish

摘要 Abstract

大型语言模型(LLMs)已成为学术界和社会的重要工具。尽管LLMs被全球专家和普通用户广泛使用,但它们的研发主要面向英语使用者,在英语和其他广泛使用的语言上表现良好,而像卢森堡语这样资源匮乏的语言则被视为次要优先事项。这种忽视也反映在可用评估工具和数据集的稀缺性上。本研究调查了语言能力考试作为卢森堡语评估工具的可行性,发现大型模型如ChatGPT、Claude和DeepSeek-R1通常能获得高分,而小型模型的表现较弱。我们还发现,这类语言考试中的表现可以预测其他自然语言处理(NLP)任务中的表现。

Large Language Models (LLMs) have become an increasingly important tool in research and society at large. While LLMs are regularly used all over the world by experts and lay-people alike, they are predominantly developed with English-speaking users in mind, performing well in English and other wide-spread languages while less-resourced languages such as Luxembourgish are seen as a lower priority. This lack of attention is also reflected in the sparsity of available evaluation tools and datasets. In this study, we investigate the viability of language proficiency exams as such evaluation tools for the Luxembourgish language. We find that large models such as ChatGPT, Claude and DeepSeek-R1 typically achieve high scores, while smaller models show weak performances. We also find that the performances in such language exams can be used to predict performances in other NLP tasks.