In their classification of hate speech across 1.3 million synthetic statements referencing 125 demographic groups, top AI content moderation systems, including models from OpenAI, Google, DeepSeek, and Mistral, exhibit notable discrepancies, according to a recent study led by the University of Pennsylvania. Assessments of comments about education, economic status, and personal interests differed significantly, exposing some communities to greater online harm, whereas assessments of ethnicity, gender, and sexual orientation were relatively aligned. The results raise questions regarding bias, accountability, and the moral governance of AI-driven moderation systems by highlighting the lack of defined criteria and the opaque nature of algorithmic censoring.

https://www.independent.co.uk/news/uk/home-news/ai-hate-speech-study-university-pennsylvania-b2826860.html

Leave a Reply

Your email address will not be published. Required fields are marked *