The application of Large Language Models (LLMs) to combat hate speech is investigated in this work. The first real-world A/B test evaluating the efficacy of LLM-generated counter-speech was carried out by the researchers. In order to lower user interaction beneath tweets that featured hate speech targeting Ukrainian immigrants in Poland, they uploaded 753 artificially created comments during the trial. According to the results, user engagement is considerably reduced by interventions using LLM-generated answers, especially for original tweets with at least ten views, which saw a 20% decline. The architecture of our automatic moderation system, a straightforward metric for gauging user participation, and the methods for carrying out such an experiment are all described in the article. The authors talk about the difficulties and ethical issues associated with using generative AI for conversation moderation.

https://aclanthology.org/2024.findings-emnlp.931

By author

Leave a Reply

Your email address will not be published. Required fields are marked *