Online Hate speech detection has become important with the growth of digital devices, but resources in languages other than English are extremely limited. We introduce K-MHaS, a new multi-label dataset for hate speech detection that effectively handles Korean language patterns. The dataset consists of 109k utterances from news comments and provides multi-label classification from 1 to 4 labels, and handling subjectivity and intersectionality. We evaluate strong baselines on K-MHaS. KR-BERT with sub-character tokenizer outperforms, recognising decomposed characters in each hate speech class.
K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment (arXiv)
Categories:
Related Post
Empathy-based counterspeech can reduce racist hate speech in a social media field experiment (scite_)Empathy-based counterspeech can reduce racist hate speech in a social media field experiment (scite_)
Despite their growing popularity, there is scant experimental evidence on the effectiveness and design of counterspeech strategies (in the public domain). Modeling our interventions on current I/NGO practice, we randomly
Addressing religious hate online: from taxonomy creation to automated detection (PeerJ)Addressing religious hate online: from taxonomy creation to automated detection (PeerJ)
“Motivated by the lack of annotated data specifically tailoring religion and the poor interoperability of current datasets, in this article we propose a fine-grained labeling scheme for religious hate speech
TACKLING ONLINE HATE SPEECH THROUGH CONTENT MODERATION (University of Oxford)TACKLING ONLINE HATE SPEECH THROUGH CONTENT MODERATION (University of Oxford)
This paper builds on the existing work of United Nations bodies to provide more granular and well-calibrated guidance on the application of these provisions to online hate speech, particularly in