A Bilingual Malay-English Social Media Dataset for Binary Hate Speech Detection (Data in Brief)

Oct 16, 2025 #Algorithms, #Society

Despite the growing threats to safety and unity posed by online hate speech, Southeast Asian languages, such as Malay, are still underrepresented in NLP studies. By offering 26,985 multilingual Malay-English social media texts for binary hate speech categorization, the current dataset fills the gap. It provides high-confidence, quality-controlled entries that have been curated from five public sources and filtered using human annotation and pseudo-labelling. The dataset, which was created for multilingual machine learning applications, supports cross-lingual benchmarking, transformer-based classifiers, and instructional resources for populations that speak Malay and English.

https://www.sciencedirect.com/science/article/pii/S2352340925008741

A Bilingual Malay-English Social Media Dataset for Binary Hate Speech Detection (Data in Brief)

Like this:

Leave a Reply Cancel reply

LATEST NEWS

Audio: preventhate.org, 16 October 2025

Ideology and polarization set the agenda on social media (scientific reports)

Hate Speech on Social Media: A Systemic Narrative Review of Political Science Contributions (MDPI)

Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision–Language Models (ACL Anthology)

A Bilingual Malay-English Social Media Dataset for Binary Hate Speech Detection (Data in Brief)

preventhate.org | Policyinstitute.net

A Bilingual Malay-English Social Media Dataset for Binary Hate Speech Detection (Data in Brief)

Share this:

Like this:

Leave a Reply Cancel reply

Audio: preventhate.org, 16 October 2025

Ideology and polarization set the agenda on social media (scientific reports)

Hate Speech on Social Media: A Systemic Narrative Review of Political Science Contributions (MDPI)

Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision–Language Models (ACL Anthology)

A Bilingual Malay-English Social Media Dataset for Binary Hate Speech Detection (Data in Brief)