While previous studies have assessed large language models’ (LLMs) effectiveness as annotators in great detail, this work explores the biases inherent in LLMs—specifically, GPT 3.5 and GPT 4o—when annotating data related to hate speech. Understanding prejudices in four important areas—gender, race, religion, and disability—is aided by the research laid out in the study. The researchers evaluate annotator biases, focusing on particularly vulnerable groups within these categories. Moreover, they analyze the annotated data in order to perform a thorough investigation of possible causes of these biases. To carry out this research, the authors present HateSpeechCorpus, our unique hate speech detection dataset. Furthermore, for comparative analysis, they conduct the same tests on the ETHOS dataset.

https://arxiv.org/abs/2406.11109

By author

Leave a Reply

Your email address will not be published. Required fields are marked *