A major issue with social media is online hatred, which has a detrimental impact on people, social media in general, and intolerance in society. To combat it, content moderation relies on dichotomous classifiers that only provide a score based on how likely a communication is to be hostile. However, the limitations of categorical classifiers have hindered social research investigating the factors that influence hatefulness degrees. Using ChatGPT (GPT-4), this study achieved two goals: (1) creating a continuous, 6-interval scale for rating hate communications that is scalable to huge datasets, and (2) comparing GPT-4 to trustworthy human raters on a limited but varied set of hate messages. Although GPT-4’s assessments were much more hostile than those of human raters, the results demonstrate parallelism in the analyses as well as convergent and discriminant validity. The GPT-4 and human raters’ ratings were more strongly associated than those obtained by other classifiers, according to further studies that compare the scores of the GPT-administered measure and human raters to those of other well-known hate classifiers. Limitations and implications for future study are discussed in the work’s conclusion.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5215167

By author

Leave a Reply

Your email address will not be published. Required fields are marked *