Researchers at the Annenberg School for Communication found that seven big language models, including those from OpenAI, DeepSeek, Google, and Mistral, differ significantly in how they classify hate speech. The study examines 1.3 million synthetic sentences that make reference to 125 demographic groups and discovers that models differ greatly in how they moderate them, especially when it comes to assertions concerning less conventionally protected groups like those based on economic position or education. Some models integrate semantic complexity, producing different results, while others consistently recognize slurs regardless of context. These results highlight the lack of uniform moderation standards and give rise to worries regarding algorithmic bias and unequal protection for different communities. https://penntoday.upenn.edu/news/annenberg-artificial-intelligence-models-vary-widely-identifying-hate-speech Share this: Click to print (Opens in new window) Print Click to share on Facebook (Opens in new window) Facebook Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Reddit (Opens in new window) Reddit Click to share on WhatsApp (Opens in new window) WhatsApp Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Like this:Like Loading... Post navigation A Taxonomy of Response Strategies to Toxic Online Content: Evaluating the Evidence (arXiv) AI models are struggling to identify hate speech, study finds (The Independent)