The annotation bias in the underlying hate speech datasets used to train these models is known to cause bias against African American English (AAE) dialect text in the hate speech detection task. As a result, regular AAE writing is more likely than non-AAE content to be incorrectly labeled as abusive or nasty, creating a discrepancy. To combat this kind of discrepancy, basic debiasing methods have been created; in this work, we implement and assess these methods within the context of RoBERTa-based encoders. Results from experiments indicate that these approaches’ effectiveness is mostly dependent on how training datasets are constructed, although they can lessen the difference between dialect subgroups on the hate speech detection task if representation bias is properly taken into account. https://arxiv.org/abs/2501.15430 Share this: Click to print (Opens in new window) Print Click to share on Facebook (Opens in new window) Facebook Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Reddit (Opens in new window) Reddit Click to share on WhatsApp (Opens in new window) WhatsApp Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Like this:Like Loading... Post navigation Echoes of Discord: Forecasting Hater Reactions to Counterspeech (arXiv) Hyderabadi Pearls at Multilingual Counterspeech Generation : HALT : Hate Speech Alleviation using Large Language Models and Transformers (ACL Anthology)