The researchers provide SMARTER, a two-stage paradigm for explainable content moderation that uses Large Language Models (LLMs) and is data-efficient. In Stage 1, alignment is made possible by preference optimization with less human oversight by utilizing LLMs’ own outputs to produce synthetic explanations for both correct and incorrect labels. Cross-model training is used in Stage 2 to improve the quality of explanations, enabling weaker models to stylistically and semantically match stronger ones. Using only a portion of the complete training set, experiments on three benchmark tasks—HateXplain, Latent Hate, and Implicit Hate—show that SMARTER allows LLMs to achieve up to a 13.5% macro-F1 improvement over conventional few-shot baselines. The existing system leverages the self-improving categorization and explanation capabilities of LLMs to provide a scalable approach for low-resource environments. https://arxiv.org/abs/2509.15174 Share this: Click to print (Opens in new window) Print Click to share on Facebook (Opens in new window) Facebook Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Reddit (Opens in new window) Reddit Click to share on WhatsApp (Opens in new window) WhatsApp Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Like this:Like Loading... Post navigation AI-driven detection of hate speech on social media: a case study in the French language (HomeCluster Computing) Hate Speech Regulation: Comparative Analysis in Global South Countries (SSRN)