AI models are struggling to identify hate speech, study finds (The Independent)

Sep 16, 2025 #Algorithms, #Policies

In their classification of hate speech across 1.3 million synthetic statements referencing 125 demographic groups, top AI content moderation systems, including models from OpenAI, Google, DeepSeek, and Mistral, exhibit notable discrepancies, according to a recent study led by the University of Pennsylvania. Assessments of comments about education, economic status, and personal interests differed significantly, exposing some communities to greater online harm, whereas assessments of ethnicity, gender, and sexual orientation were relatively aligned. The results raise questions regarding bias, accountability, and the moral governance of AI-driven moderation systems by highlighting the lack of defined criteria and the opaque nature of algorithmic censoring.

https://www.independent.co.uk/news/uk/home-news/ai-hate-speech-study-university-pennsylvania-b2826860.html

AI models are struggling to identify hate speech, study finds (The Independent)

Like this:

Leave a Reply Cancel reply

LATEST NEWS

Two Weeks in Soft Security: Free Resources on Countering Extremism, Hate, and Disinformation, October 2025 (I/II)

Audio: preventhate.org, 16 October 2025

Ideology and polarization set the agenda on social media (scientific reports)

Hate Speech on Social Media: A Systemic Narrative Review of Political Science Contributions (MDPI)

Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision–Language Models (ACL Anthology)

preventhate.org | Policyinstitute.net

AI models are struggling to identify hate speech, study finds (The Independent)

Share this:

Like this:

Leave a Reply Cancel reply

Two Weeks in Soft Security: Free Resources on Countering Extremism, Hate, and Disinformation, October 2025 (I/II)

Audio: preventhate.org, 16 October 2025

Ideology and polarization set the agenda on social media (scientific reports)

Hate Speech on Social Media: A Systemic Narrative Review of Political Science Contributions (MDPI)

Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision–Language Models (ACL Anthology)