dsnmod
Abstract
Hate speech has proliferated on social media platforms inrecent years. While this has been the focus of many stud-ies, most works have exclusively focused on a single lan-guage, generally English. Low-resourced languages havebeen neglected due to the dearth of labeled resources. Theselanguages, however, represent an important portion of thedata due to the multilingual nature of social media. Thiswork presents a novel zero-shot, cross-lingual transfer learn-ing pipeline based on pseudo-label fine-tuning of Trans-former Language Models for automatic hate speech detec-tion. We employ our pipeline on benchmark datasets cov-ering English (source) and 6 different non-English (tar-get) languages written in 3 different scripts. Our pipelineachieves an average improvement of 7.6% (in terms ofmacro-F1) over previous zero-shot, cross-lingual models.This demonstrates the feasibility of high accuracy automatichate speech detection for low-resource languages.