Although the majority of study has focused on high-resource languages, hate speech on social media poses a threat to public debate. Albanian is still mostly ignored because of its dialects, lack of NLP tools, and poor data. In order to close that gap, the current work advances deep learning-based automatic hate speech identification in Albanian. A collection of 20,860 manually annotated Facebook comments was used to test a number of methods, such as machine learning, deep neural networks, and transformers. ML classifiers for informal language were boosted using character-level 4-gram TF-IDF, and XLM-RoBERTa set a new benchmark with its 86% F1-score. Problems with dialects, short texts, and implicit insults were brought to light via error analysis utilizing SHAP and LIME. In low-resource situations, the study highlights the necessity of context-sensitive modeling, deeper annotations, and domain-specific embeddings.

https://link.springer.com/article/10.1007/s13278-025-01497-w

By author

Leave a Reply

Your email address will not be published. Required fields are marked *