Delving into Qualitative Implications of Synthetic Data for Hate Speech Detection (ACL Anthology)

Byauthor

Nov 12, 2024 #Algorithms, #Assorted

It is currently common practice to train models for a range of NLP applications using synthetic data. Regarding its efficacy on highly subjective tasks like hate speech identification, prior research has produced conflicting findings. Using 3,500 carefully annotated samples, this study provides a thorough qualitative review of the potential and particular drawbacks of using synthetic data for hate speech identification in English. It is demonstrated that synthetic data produced by paraphrasing gold texts can enhance out-of-distribution resilience from a computational perspective across several models. However, synthetic data significantly diminishes the representation of both particular identity groups and intersectional hatred, produces radically different class distributions, and fails to accurately replicate the features of real-world data on a number of language variables.

https://aclanthology.org/2024.emnlp-main.1099

By author

Algorithms Legal Policies Society

Two Weeks in Soft Security: Free Resources on Countering Extremism, Hate, and Disinformation, April 2025 (I/II)

Apr 22, 2025 author

Algorithms Policies

Assessing the Hatefulness of Social Media Posts: A Continuous Measure of Hate Using Generative AI (SSRN)

Apr 21, 2025 author

Algorithms

A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English (arXiv)

Apr 21, 2025 author

Delving into Qualitative Implications of Synthetic Data for Hate Speech Detection (ACL Anthology)

Byauthor

Like this:

By author

Related Post

Two Weeks in Soft Security: Free Resources on Countering Extremism, Hate, and Disinformation, April 2025 (I/II)

Assessing the Hatefulness of Social Media Posts: A Continuous Measure of Hate Using Generative AI (SSRN)

A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English (arXiv)

Leave a Reply Cancel reply

Latest News

Two Weeks in Soft Security: Free Resources on Countering Extremism, Hate, and Disinformation, April 2025 (I/II)

Assessing the Hatefulness of Social Media Posts: A Continuous Measure of Hate Using Generative AI (SSRN)

NLP in the Digital Age: Combating Fake News, Hate Speech, and Ethical Risks for Social Integrity (SSRN)

A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English (arXiv)

Kenya develops National Guidelines to tackle digital disinformation and hate speech (UNESCO)

Site Stats

preventhate.org | Policyinstitute.net

Byauthor

Share this:

Like this:

By author

Related Post

Leave a Reply Cancel reply