Improving Zero-Shot Cross-Lingual Hate Speech Detection with Pseudo-Label Fine-Tuning of Transformer Language Models

Publications

Improving Zero-Shot Cross-Lingual Hate Speech Detection with Pseudo-Label Fine-Tuning of Transformer Language Models

Zia, Haris Bin; Castro, Ignacio; Zubiaga, Arkaitz; Tyson, Gareth;

dsnmod

Abstract

Hate speech has proliferated on social media platforms inrecent years. While this has been the focus of many stud-ies, most works have exclusively focused on a single lan-guage, generally English. Low-resourced languages havebeen neglected due to the dearth of labeled resources. Theselanguages, however, represent an important portion of thedata due to the multilingual nature of social media. Thiswork presents a novel zero-shot, cross-lingual transfer learn-ing pipeline based on pseudo-label fine-tuning of Trans-former Language Models for automatic hate speech detec-tion. We employ our pipeline on benchmark datasets cov-ering English (source) and 6 different non-English (tar-get) languages written in 3 different scripts. Our pipelineachieves an average improvement of 7.6% (in terms ofmacro-F1) over previous zero-shot, cross-lingual models.This demonstrates the feasibility of high accuracy automatichate speech detection for low-resource languages.

Link to Paper

Exploring Content Moderation in the Decentralised Web: The Pleroma Case

Racist or sexist meme? classifying memes beyond hateful