Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech Detection using Debias Tuning in Large Language Models (arXiv)

Recently, memory elements have been added to commercial large language models (LLMs) to provide customized replies. LLMs can modify their behavior based on personal information since its memory keeps track of specifics like user demographics and particular traits. The effects of incorporating individualized data into the context, however, have not been fully evaluated. Customization may be difficult, especially when dealing with delicate subjects. In order to comprehend how several state-of-the-art LLMs behave in various personalization scenarios—with a particular focus on hate speech—we analyze them in this research. In order to detect hate speech, the researchers ask the models to adopt national identities and employ various languages. Results show that context personalization has a major impact on LLMs’ answers in this delicate area. The researchers penalize inconsistent hate speech classifications generated with and without nation or language-specific information in order to reduce undesired biases. Both when no context is given and in customized settings, the updated models show better performance.

https://arxiv.org/abs/2505.02252v1

Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech Detection using Debias Tuning in Large Language Models (arXiv)

Like this:

Leave a Reply Cancel reply

LATEST NEWS

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (I/II)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – November 2025 (I/I)

New on preventhate.org | Policyinstitute.net, 17 November 2025

Meta Oversight Board’s Nascent Standard on Hate Speech: Towards Plural Standard Setting in International Human Rights Law (SSRN)

Coping with Digital Hostility: How Witnessing and Receiving Hate Speech Elicit Divergent Responses (SSRN)

preventhate.org | Policyinstitute.net

Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech Detection using Debias Tuning in Large Language Models (arXiv)

Share this:

Like this:

Leave a Reply Cancel reply

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (I/II)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – November 2025 (I/I)

New on preventhate.org | Policyinstitute.net, 17 November 2025

Meta Oversight Board’s Nascent Standard on Hate Speech: Towards Plural Standard Setting in International Human Rights Law (SSRN)

Coping with Digital Hostility: How Witnessing and Receiving Hate Speech Elicit Divergent Responses (SSRN)