Researchers at the Annenberg School for Communication found that seven big language models, including those from OpenAI, DeepSeek, Google, and Mistral, differ significantly in how they classify hate speech. The study examines 1.3 million synthetic sentences that make reference to 125 demographic groups and discovers that models differ greatly in how they moderate them, especially when it comes to assertions concerning less conventionally protected groups like those based on economic position or education. Some models integrate semantic complexity, producing different results, while others consistently recognize slurs regardless of context. These results highlight the lack of uniform moderation standards and give rise to worries regarding algorithmic bias and unequal protection for different communities.

https://penntoday.upenn.edu/news/annenberg-artificial-intelligence-models-vary-widely-identifying-hate-speech

By author

Leave a Reply

Your email address will not be published. Required fields are marked *