It is expensive to gather labeled hate speech data, particularly for low-resource languages. Previous research indicates that data augmentation and cross-lingual transfer learning are beneficial in low-data environments. Using nearest-neighbor retrieval, we provide a scalable method to improve detection with little labeled data in the target language. Relevant instances are extracted from a huge multilingual pool using a small labeled set. Tested on eight languages, the researchers’ approach frequently surpasses state-of-the-art outcomes and routinely beats models trained just on target data. It is scalable to new languages and jobs and data-efficient, often utilizing only 200 samples. In some situations, performance is further enhanced by reducing duplication by using maximum marginal relevance. https://arxiv.org/abs/2505.14272 Share this: Click to print (Opens in new window) Print Click to share on Facebook (Opens in new window) Facebook Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Reddit (Opens in new window) Reddit Click to share on WhatsApp (Opens in new window) WhatsApp Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Like this:Like Loading... Post navigation Compositional Generalisation for Explainable Hate Speech Detection (arXiv) Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate (arXiv)