WATCHED: A Web AI Agent Tool for Combating Hate Speech by Expanding Data (arXiv)

Sep 16, 2025 #Algorithms

In the current research, the authors present WATCHED, an AI-powered chatbot that combines massive language models with specialized techniques to improve hate speech moderation. WATCHED uses precedent-based comparison, BERT-based classification, slang interpretation, chain-of-thought reasoning, and policy alignment to both detect and justify moderating judgments, addressing the shortcomings of automated systems and the requirement for interpretability. With a macro F1 score of 0.91, empirical evaluation outperforms current approaches, establishing the system as a cooperative tool for academics, safety teams, and moderators to reduce online harms and promote confidence in digital governance.

https://arxiv.org/abs/2509.01379

WATCHED: A Web AI Agent Tool for Combating Hate Speech by Expanding Data (arXiv)

Like this:

Leave a Reply Cancel reply

LATEST NEWS

“They’re Not So Separate After All” – Digital and Analog Dimensions of Radicalization (Policyinstitute.net)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (I/II)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – November 2025 (I/I)

New on preventhate.org | Policyinstitute.net, 17 November 2025

Meta Oversight Board’s Nascent Standard on Hate Speech: Towards Plural Standard Setting in International Human Rights Law (SSRN)

TAGS

preventhate.org | Policyinstitute.net

WATCHED: A Web AI Agent Tool for Combating Hate Speech by Expanding Data (arXiv)

Share this:

Like this:

Leave a Reply Cancel reply

“They’re Not So Separate After All” – Digital and Analog Dimensions of Radicalization (Policyinstitute.net)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (I/II)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – November 2025 (I/I)

New on preventhate.org | Policyinstitute.net, 17 November 2025

Meta Oversight Board’s Nascent Standard on Hate Speech: Towards Plural Standard Setting in International Human Rights Law (SSRN)

TAGS