LLM generated responses to mitigate the impact of hate speech (ACL Anthology)

The application of Large Language Models (LLMs) to combat hate speech is investigated in this work. The first real-world A/B test evaluating the efficacy of LLM-generated counter-speech was carried out by the researchers. In order to lower user interaction beneath tweets that featured hate speech targeting Ukrainian immigrants in Poland, they uploaded 753 artificially created comments during the trial. According to the results, user engagement is considerably reduced by interventions using LLM-generated answers, especially for original tweets with at least ten views, which saw a 20% decline. The architecture of our automatic moderation system, a straightforward metric for gauging user participation, and the methods for carrying out such an experiment are all described in the article. The authors talk about the difficulties and ethical issues associated with using generative AI for conversation moderation.

https://aclanthology.org/2024.findings-emnlp.931

LLM generated responses to mitigate the impact of hate speech (ACL Anthology)

Like this:

Leave a Reply Cancel reply

LATEST NEWS

Two Weeks in Soft Security: Free Resources on Countering Extremism, Hate, and Disinformation, October 2025 (I/II)

Audio: preventhate.org, 16 October 2025

Ideology and polarization set the agenda on social media (scientific reports)

Hate Speech on Social Media: A Systemic Narrative Review of Political Science Contributions (MDPI)

Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision–Language Models (ACL Anthology)

preventhate.org | Policyinstitute.net

LLM generated responses to mitigate the impact of hate speech (ACL Anthology)

Share this:

Like this:

Leave a Reply Cancel reply

Two Weeks in Soft Security: Free Resources on Countering Extremism, Hate, and Disinformation, October 2025 (I/II)

Audio: preventhate.org, 16 October 2025

Ideology and polarization set the agenda on social media (scientific reports)

Hate Speech on Social Media: A Systemic Narrative Review of Political Science Contributions (MDPI)

Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision–Language Models (ACL Anthology)