Multilingual Hate Speech Detection: A Semi-Supervised Generative Adversarial Approach (MDPI)

Byauthor

Apr 21, 2024 #Algorithms, #Assorted

Glimpse: This research presents a multilingual semisupervised model that combines XLM-RoBERTa and mBERT, or more specifically, Generative Adversarial Networks (GANs) and Pretrained Language Models (PLMs). Using only 20% annotated data from the HASOC2019 dataset, the method demonstrates its efficacy in detecting hate speech and offensive language in Indo-European languages (English, German, and Hindi). This results in notably high performances in multilingual, zero-shot crosslingual, and monolingual training situations. The study presents a strong mBERT-based semisupervised GAN model (SS-GAN-mBERT) that achieved an accuracy gain of 5.75% and an average F1 score boost of 9.23% over the baseline semisupervised mBERT model, outperforming the XLM-RoBERTa-based model (SS-GAN-XLM).

https://lnkd.in/eccmwQX9