Towards Efficient and Explainable Hate Speech Detection via Model Distillation (arXiv)

Dec 25, 2024 #Algorithms

To stop hatred and nasty language from spreading online, automatic detection is crucial. By identifying and elucidating hate speech, we can raise awareness of its detrimental impacts. The majority of detection models, however, function as opaque black boxes that are difficult to understand and analyze. LLMs, or large language models, have shown promise in detecting hate speech and improving interpretability. However, they are computationally expensive to operate. The authors suggest employing Chain-of-Thought to extract explanations that assist the classification objective in order to condense large language models. It will be easier to employ these activities in operational situations if they have compact language models. The researchers show that distilled models outperform bigger models in classification performance while providing explanations of the same caliber, making hate speech detection more accessible, intelligible, and useful.

https://arxiv.org/abs/2412.13698

Towards Efficient and Explainable Hate Speech Detection via Model Distillation (arXiv)

Like this:

Leave a Reply Cancel reply

LATEST NEWS

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (I/II)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – November 2025 (I/I)

New on preventhate.org | Policyinstitute.net, 17 November 2025

Meta Oversight Board’s Nascent Standard on Hate Speech: Towards Plural Standard Setting in International Human Rights Law (SSRN)

Coping with Digital Hostility: How Witnessing and Receiving Hate Speech Elicit Divergent Responses (SSRN)

preventhate.org | Policyinstitute.net

Towards Efficient and Explainable Hate Speech Detection via Model Distillation (arXiv)

Share this:

Like this:

Leave a Reply Cancel reply

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (I/II)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – November 2025 (I/I)

New on preventhate.org | Policyinstitute.net, 17 November 2025

Meta Oversight Board’s Nascent Standard on Hate Speech: Towards Plural Standard Setting in International Human Rights Law (SSRN)

Coping with Digital Hostility: How Witnessing and Receiving Hate Speech Elicit Divergent Responses (SSRN)