Intent-conditioned and Non-toxic Counterspeech Generation using Multi-Task Instruction Tuning with RLAIF (ACL Anthology)

Dec 25, 2024 #Algorithms, #Assorted

The paper presents CoARL, a unique framework that models the pragmatic consequences of social biases in hostile remarks, hence improving the development of counterspeech. In the first two stages of CoARL, the model is taught to comprehend the intentions, responses, and harms of offensive comments by sequential multi-instruction tuning. After that, it learns task-specific low-rank adapter weights to produce intent-conditioned counterspeech. Reinforcement learning is used in the last stage to optimize outputs for efficacy and nontoxicity. With an average improvement of around 3 points in intent-conformity and about 4 points in argument-quality criteria, CoARL surpasses current standards in intent-conditioned counterspeech production. CoARL’s effectiveness in producing better and more context-appropriate replies than other systems, including well-known LLMs like ChatGPT, is supported by extensive human review.

https://aclanthology.org/2024.naacl-long.374

Intent-conditioned and Non-toxic Counterspeech Generation using Multi-Task Instruction Tuning with RLAIF (ACL Anthology)

Like this:

Leave a Reply Cancel reply

LATEST NEWS

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (II/II)

“They’re Not So Separate After All” – Digital and Analog Dimensions of Radicalization (Policyinstitute.net)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (I/II)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – November 2025 (I/I)

New on preventhate.org | Policyinstitute.net, 17 November 2025

TAGS

preventhate.org | Policyinstitute.net

Intent-conditioned and Non-toxic Counterspeech Generation using Multi-Task Instruction Tuning with RLAIF (ACL Anthology)

Share this:

Like this:

Leave a Reply Cancel reply

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (II/II)

“They’re Not So Separate After All” – Digital and Analog Dimensions of Radicalization (Policyinstitute.net)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (I/II)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – November 2025 (I/I)

New on preventhate.org | Policyinstitute.net, 17 November 2025

TAGS