Those approaches pander more to the semantic information than they do to matching with the intended intentions, because LLMs ignore intent-specific information throughout the decoding process. Furthermore, it is currently difficult to quantify how successful different types of counterspeech are at reducing hate speech. In this research, we present DART, an LLMs-based DuAl-discRiminaTor guided architecture for counterspeech creation, to overcome the aforementioned problems. In order to jointly direct the decoding preferences of LLMs, we utilize an intent-aware discriminator and a hate-mitigating discriminator. This enables the model to produce counterspeech that addresses particular intent and mitigates hatred. We train discriminators using a maximum-margin relative objective. This goal uses the difference between counterspeech that is in line with the intended target (like a particular intent or efficacy in reducing hate) and that which is not in line with the intended target as a useful learning indicator. Numerous tests demonstrate that DART performs admirably in terms of emulating the intended aim and reducing hate.

https://aclanthology.org/2024.lrec-main.800

By author

Leave a Reply

Your email address will not be published. Required fields are marked *