使用mBART进行文本到手语词典翻译的最新研究：孟加拉语案例分析

Research

arXiv

State-of-the-Art Translation of Text-to-Gloss using mBART : A case study of Bangla

Sharif Md. Abdullah ,

摘要 Abstract

尽管有170万聋哑人口，孟加拉手语（BdSL）仍然是一个研究较少的领域。具体来说，尚无关于孟加拉文本到手语词典翻译任务的研究。为了解决这一问题，我们首先解决数据集问题。我们借鉴了德国和美国手语（ASL）中基于语法规则的手语生成方法，并将其应用于BdSL。我们还利用大型语言模型（LLM）生成合成数据，并采用回译、文本生成等方法进行数据增强。在准备好数据集后，我们开始实验。我们在我们的数据集上微调了预训练的mBART-50和mBERT-多分类-未分词模型。我们还训练了GRU、RNN以及一种带有多头注意力机制的新颖的序列到序列模型。我们观察到，使用Facebook预训练的mBART-50多语言模型进行微调时表现出了显著的高性能（ScareBLEU=79.53）。随后，我们探讨了为什么mBART表现出如此高的性能。我们注意到mBART的一个有趣特性——它是在打乱和屏蔽的文本数据上进行训练的。我们知道，手语形式具有打乱的特性。因此，我们假设mBART天生适合文本到手语的任务。为了验证这一假设，我们在PHOENIX-14T基准数据集上训练了mBART-50，并与其他现有文献进行了评估。我们的mBART-50微调在PHOENIX-14T基准上展示了最先进的性能，在所有6个指标上均远超现有模型（ScareBLEU = 63.89，BLEU-1 = 55.14，BLEU-2 = 38.07，BLEU-3 = 27.13，BLEU-4 = 20.68，COMET = 0.624）。基于这些结果，本研究提出了一种使用mBART模型进行文本到手语任务的新范式。此外，我们的结果显示，基于规则的合成数据集可以极大地促进孟加拉文本到手语任务的发展。

Despite a large deaf and dumb population of 1.7 million, Bangla Sign Language (BdSL) remains a understudied domain. Specifically, there are no works on Bangla text-to-gloss translation task. To address this gap, we begin by addressing the dataset problem. We take inspiration from grammatical rule based gloss generation used in Germany and American sign langauage (ASL) and adapt it for BdSL. We also leverage LLM to generate synthetic data and use back-translation, text generation for data augmentation. With dataset prepared, we started experimentation. We fine-tuned pretrained mBART-50 and mBERT-multiclass-uncased model on our dataset. We also trained GRU, RNN and a novel seq-to-seq model with multi-head attention. We observe significant high performance (ScareBLEU=79.53) with fine-tuning pretrained mBART-50 multilingual model from Facebook. We then explored why we observe such high performance with mBART. We soon notice an interesting property of mBART -- it was trained on shuffled and masked text data. And as we know, gloss form has shuffling property. So we hypothesize that mBART is inherently good at text-to-gloss tasks. To find support against this hypothesis, we trained mBART-50 on PHOENIX-14T benchmark and evaluated it with existing literature. Our mBART-50 finetune demonstrated State-of-the-Art performance on PHOENIX-14T benchmark, far outperforming existing models in all 6 metrics (ScareBLEU = 63.89, BLEU-1 = 55.14, BLEU-2 = 38.07, BLEU-3 = 27.13, BLEU-4 = 20.68, COMET = 0.624). Based on the results, this study proposes a new paradigm for text-to-gloss task using mBART models. Additionally, our results show that BdSL text-to-gloss task can greatly benefit from rule-based synthetic dataset.