Tending to the Digital Commons: Examining the Potential of Artificial Intelligence to Detect and Respond to Toxic Speech (Toda Peace Institute)

Nov 17, 2025 #Algorithms, #Policies

In order to prevent toxic speech online, the present study investigates the possibilities of artificial intelligence (AI), specifically large language models (LLMs). The study focuses on hate speech classification and counterspeech generation, highlighting advancements in supervised, unsupervised, and generative approaches while recognizing important limitations, such as the amplification of biases from training data, implicit hate, and difficulties in capturing nuance. Reviewing efforts to create AI-powered counterspeech tools, the focus is on the challenges of creating human-like, constructive responses to offensive content. In order to ensure ethical and successful deployment in addressing online harms, the research finds that LLMs hold promise for counterspeech applications and provides guidelines for developers and policymakers.

https://toda.org/policy-briefs-and-resources/policy-briefs/report-253-full-text.html

Tending to the Digital Commons: Examining the Potential of Artificial Intelligence to Detect and Respond to Toxic Speech (Toda Peace Institute)

Like this:

Leave a Reply Cancel reply

LATEST NEWS

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (I/II)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – November 2025 (I/I)

New on preventhate.org | Policyinstitute.net, 17 November 2025

Meta Oversight Board’s Nascent Standard on Hate Speech: Towards Plural Standard Setting in International Human Rights Law (SSRN)

Coping with Digital Hostility: How Witnessing and Receiving Hate Speech Elicit Divergent Responses (SSRN)

preventhate.org | Policyinstitute.net

Tending to the Digital Commons: Examining the Potential of Artificial Intelligence to Detect and Respond to Toxic Speech (Toda Peace Institute)

Share this:

Like this:

Leave a Reply Cancel reply

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – December 2025 (I/II)

Soft Security Resources: Press Articles, Documents, and Recordings on Countering Extremism, Hate Speech, and False Information – November 2025 (I/I)

New on preventhate.org | Policyinstitute.net, 17 November 2025

Meta Oversight Board’s Nascent Standard on Hate Speech: Towards Plural Standard Setting in International Human Rights Law (SSRN)

Coping with Digital Hostility: How Witnessing and Receiving Hate Speech Elicit Divergent Responses (SSRN)