The paper presents CoARL, a unique framework that models the pragmatic consequences of social biases in hostile remarks, hence improving the development of counterspeech. In the first two stages of CoARL, the model is taught to comprehend the intentions, responses, and harms of offensive comments by sequential multi-instruction tuning. After that, it learns task-specific low-rank adapter weights to produce intent-conditioned counterspeech. Reinforcement learning is used in the last stage to optimize outputs for efficacy and nontoxicity. With an average improvement of around 3 points in intent-conformity and about 4 points in argument-quality criteria, CoARL surpasses current standards in intent-conditioned counterspeech production. CoARL’s effectiveness in producing better and more context-appropriate replies than other systems, including well-known LLMs like ChatGPT, is supported by extensive human review. https://aclanthology.org/2024.naacl-long.374 Share this: Print (Opens in new window) Print Share on Facebook (Opens in new window) Facebook Share on LinkedIn (Opens in new window) LinkedIn Share on Reddit (Opens in new window) Reddit Share on WhatsApp (Opens in new window) WhatsApp Share on Bluesky (Opens in new window) Bluesky Email a link to a friend (Opens in new window) Email Like this:Like Loading... Post navigation Two Weeks in Soft Security: Free Resources on Countering Extremism, Hate, and Disinformation, December 2024 (I/II) ReZG: Retrieval-augmented zero-shot counter narrative generation for hate speech (Neurocomputing)