A Target-Aware Analysis of Data Augmentation for Hate Speech Detection (arXiv)

Oct 18, 2024 #Algorithms, #Assorted

The authors explore the idea of enhancing current data with generative language models, lowering target imbalance, given the unparalleled skills of LLMs in providing high-quality data. The Measuring Hate Speech corpus is an English dataset tagged with target identity information. Approximately 30,000 synthetic samples are added, and 1,000 posts are augmented using a combination of basic data augmentation techniques and several generative model types, comparing autoregressive and sequence-to-sequence approaches. The combination of the two usually yields the greatest outcomes, however the researchers found that classic DA approaches are frequently superior to generative models. In fact, hate speech categorization utilizing enhanced data for training increases by more than 10% F1 over the no augmentation baseline for several hate categories including origin, religion, and handicap.

https://arxiv.org/abs/2410.08053

A Target-Aware Analysis of Data Augmentation for Hate Speech Detection (arXiv)

Like this:

Leave a Reply Cancel reply

LATEST NEWS

“They’re Not So Separate After All” – Digital and Analog Dimensions of Radicalization (Policyinstitute.net)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (I/II)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – November 2025 (I/I)

New on preventhate.org | Policyinstitute.net, 17 November 2025

Meta Oversight Board’s Nascent Standard on Hate Speech: Towards Plural Standard Setting in International Human Rights Law (SSRN)

TAGS

preventhate.org | Policyinstitute.net

A Target-Aware Analysis of Data Augmentation for Hate Speech Detection (arXiv)

Share this:

Like this:

Leave a Reply Cancel reply

“They’re Not So Separate After All” – Digital and Analog Dimensions of Radicalization (Policyinstitute.net)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (I/II)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – November 2025 (I/I)

New on preventhate.org | Policyinstitute.net, 17 November 2025

Meta Oversight Board’s Nascent Standard on Hate Speech: Towards Plural Standard Setting in International Human Rights Law (SSRN)

TAGS