Biases in evaluation datasets hinder the real-world usefulness of current hate speech detection models. HateDay is a globally representative hate speech dataset that was created from a random sample of tweets sent on September 21, 2022, in four English-speaking nations and eight different languages. The prevalence and composition of hate speech vary significantly by geography and language, according to the analysis. Assessments using scholarly datasets significantly exaggerate detection performance, which is particularly subpar for languages other than English. Distinguishing hate speech from offensive speech and the misalignment of academic dataset goals with actual victim categories are two major issues. According to the current research, the public models that are now in use are insufficient for automated moderation, and thorough human supervision is necessary for efficient identification. This emphasizes how important it is to test systems using data that reflects the intricacy of the global conversation on social media. https://aclanthology.org/2025.acl-long.115 Share this: Click to print (Opens in new window) Print Click to share on Facebook (Opens in new window) Facebook Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Reddit (Opens in new window) Reddit Click to share on WhatsApp (Opens in new window) WhatsApp Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Like this:Like Loading... Post navigation Advancing Hate Speech Detection with Transformers: Insights from the MetaHate (arXiv) Thirty years of research into hate speech: topics of interest and their evolution (Scientometrics)