HateGPT：释放GPT-3.5 Turbo在X平台对抗仇恨言论

Research

arXiv

HateGPT: Unleashing GPT-3.5 Turbo to Combat Hate Speech on X

摘要 Abstract

社交媒体平台如Twitter和Facebook的广泛使用，使各年龄段的人能够分享自己的想法和经历，从而积累了大量的用户生成内容。然而，这些平台在带来好处的同时也面临着管理仇恨言论和攻击性内容的挑战，这可能破坏理性讨论并威胁民主价值观。因此，迫切需要自动化方法来检测和减轻此类内容，特别是考虑到对话的复杂性可能需要跨多种语言（包括混合语言如印地英语、德英、孟加拉语）进行上下文分析。我们参与了英语任务，需将英语推文分类为“仇恨”和“攻击性”以及“非仇恨-非攻击性”两类。在这项工作中，我们通过提示实验了最先进的大型语言模型如GPT-3.5 Turbo，用于将推文分类为“仇恨-攻击性”或“非仇恨-非攻击性”。本研究评估了分类模型在三个不同运行中的Macro-F1得分表现。Macro-F1分数作为平衡各类别精确率和召回率的主要指标，用于模型评估。三个运行的得分分别为：第1次运行0.756，第2次运行0.751，第3次运行0.754，表明模型在各运行间性能高度一致且具有较小的方差。结果表明，该模型在精确率和召回率方面表现稳定，其中第1次运行表现出最高性能。这些发现凸显了该模型在不同运行中的稳健性和可靠性。

The widespread use of social media platforms like Twitter and Facebook has enabled people of all ages to share their thoughts and experiences, leading to an immense accumulation of user-generated content. However, alongside the benefits, these platforms also face the challenge of managing hate speech and offensive content, which can undermine rational discourse and threaten democratic values. As a result, there is a growing need for automated methods to detect and mitigate such content, especially given the complexity of conversations that may require contextual analysis across multiple languages, including code-mixed languages like Hinglish, German-English, and Bangla. We participated in the English task where we have to classify English tweets into two categories namely Hate and Offensive and Non Hate-Offensive. In this work, we experiment with state-of-the-art large language models like GPT-3.5 Turbo via prompting to classify tweets into Hate and Offensive or Non Hate-Offensive. In this study, we evaluate the performance of a classification model using Macro-F1 scores across three distinct runs. The Macro-F1 score, which balances precision and recall across all classes, is used as the primary metric for model evaluation. The scores obtained are 0.756 for run 1, 0.751 for run 2, and 0.754 for run 3, indicating a high level of performance with minimal variance among the runs. The results suggest that the model consistently performs well in terms of precision and recall, with run 1 showing the highest performance. These findings highlight the robustness and reliability of the model across different runs.