It is expensive to gather labeled hate speech data, particularly for low-resource languages. Previous research indicates that data augmentation and cross-lingual transfer learning are beneficial in low-data environments. Using nearest-neighbor retrieval, we provide a scalable method to improve detection with little labeled data in the target language. Relevant instances are extracted from a huge multilingual pool using a small labeled set. Tested on eight languages, the researchers’ approach frequently surpasses state-of-the-art outcomes and routinely beats models trained just on target data. It is scalable to new languages and jobs and data-efficient, often utilizing only 200 samples. In some situations, performance is further enhanced by reducing duplication by using maximum marginal relevance.https://arxiv.org/abs/2505.14272Share this: Click to share on Facebook (Opens in new window) Facebook Click to share on X (Opens in new window) X Like this:Like Loading... Post navigation Compositional Generalisation for Explainable Hate Speech Detection (arXiv) Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate (arXiv)