Evaluating Simple Debiasing Techniques in RoBERTa-based Hate Speech Detection Models (arXiv)

Jan 30, 2025 #Algorithms

The annotation bias in the underlying hate speech datasets used to train these models is known to cause bias against African American English (AAE) dialect text in the hate speech detection task. As a result, regular AAE writing is more likely than non-AAE content to be incorrectly labeled as abusive or nasty, creating a discrepancy. To combat this kind of discrepancy, basic debiasing methods have been created; in this work, we implement and assess these methods within the context of RoBERTa-based encoders. Results from experiments indicate that these approaches’ effectiveness is mostly dependent on how training datasets are constructed, although they can lessen the difference between dialect subgroups on the hate speech detection task if representation bias is properly taken into account.

https://arxiv.org/abs/2501.15430

Evaluating Simple Debiasing Techniques in RoBERTa-based Hate Speech Detection Models (arXiv)

Like this:

Leave a Reply Cancel reply

LATEST NEWS

“They’re Not So Separate After All” – Digital and Analog Dimensions of Radicalization (Policyinstitute.net)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (I/II)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – November 2025 (I/I)

New on preventhate.org | Policyinstitute.net, 17 November 2025

Meta Oversight Board’s Nascent Standard on Hate Speech: Towards Plural Standard Setting in International Human Rights Law (SSRN)

TAGS

preventhate.org | Policyinstitute.net

Evaluating Simple Debiasing Techniques in RoBERTa-based Hate Speech Detection Models (arXiv)

Share this:

Like this:

Leave a Reply Cancel reply

“They’re Not So Separate After All” – Digital and Analog Dimensions of Radicalization (Policyinstitute.net)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (I/II)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – November 2025 (I/I)

New on preventhate.org | Policyinstitute.net, 17 November 2025

Meta Oversight Board’s Nascent Standard on Hate Speech: Towards Plural Standard Setting in International Human Rights Law (SSRN)

TAGS