Content moderation relies heavily on hate speech identification, however existing models frequently fall short of generalizing because of biases in the dataset and sentence-level labels that do not account for the structure of hate speech. Models find it difficult to distinguish label meanings from context, even when finer span-level annotations are included (e.g., labeling “artists” as a “target” and “are parasites” as dehumanizing). Novel expression combinations are therefore still difficult to find. The researchers investigate whether generalization is enhanced by training on data with uniformly distributed utterances across contexts. The authors then present U-PLEAD, a dataset consisting of around 364,000 synthetic posts and a benchmark of approximately 8,000 hand verified posts. U-PLEAD produces state-of-the-art results on PLEAD and improves compositional generalization when used with actual data. https://arxiv.org/abs/2506.03916 Share this: Click to share on Facebook (Opens in new window) Facebook Click to share on X (Opens in new window) X Like this:Like Loading... Post navigation Two Weeks in Soft Security: Free Resources on Countering Extremism, Hate, and Disinformation, May 2025 (I/II) Data-Efficient Hate Speech Detection via Cross-Lingual Nearest Neighbor Retrieval with Limited Labeled Data (arXiv)