In the current research, the authors present WATCHED, an AI-powered chatbot that combines massive language models with specialized techniques to improve hate speech moderation. WATCHED uses precedent-based comparison, BERT-based classification, slang interpretation, chain-of-thought reasoning, and policy alignment to both detect and justify moderating judgments, addressing the shortcomings of automated systems and the requirement for interpretability. With a macro F1 score of 0.91, empirical evaluation outperforms current approaches, establishing the system as a cooperative tool for academics, safety teams, and moderators to reduce online harms and promote confidence in digital governance. https://arxiv.org/abs/2509.01379 Share this: Click to print (Opens in new window) Print Click to share on Facebook (Opens in new window) Facebook Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Reddit (Opens in new window) Reddit Click to share on WhatsApp (Opens in new window) WhatsApp Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Like this:Like Loading... Post navigation AI models are struggling to identify hate speech, study finds (The Independent) The Role of Context in Detecting the Target of Hate Speech (ACL Anthology)