This research presents a novel method: a multilingual semisupervised model that combines XLM-RoBERTa and mBERT, or more specifically, Generative Adversarial Networks (GANs) and Pretrained Language Models (PLMs). Using only 20% annotated data from the HASOC2019 dataset, our method demonstrates its efficacy in detecting hate speech and offensive language in Indo-European languages (English, German, and Hindi). This results in notably high performances in multilingual, zero-shot crosslingual, and monolingual training situations. The XLM-RoBERTa-based model (SS-GAN-XLM) was beaten by our study’s strong mBERT-based semisupervised GAN model (SS-GAN-mBERT), which achieved an average F1 score boost of 9.23% and an accuracy gain of 5.75% over the baseline semisupervised mBERT model.

By author

Leave a Reply

Your email address will not be published. Required fields are marked *