In the current research, the authors present WATCHED, an AI-powered chatbot that combines massive language models with specialized techniques to improve hate speech moderation. WATCHED uses precedent-based comparison, BERT-based classification, slang interpretation, chain-of-thought reasoning, and policy alignment to both detect and justify moderating judgments, addressing the shortcomings of automated systems and the requirement for interpretability. With a macro F1 score of 0.91, empirical evaluation outperforms current approaches, establishing the system as a cooperative tool for academics, safety teams, and moderators to reduce online harms and promote confidence in digital governance.

https://arxiv.org/abs/2509.01379

By author

Leave a Reply

Your email address will not be published. Required fields are marked *