The study focuses on identifying hate speech in two languages in YouTube comments and evaluating how incorporating more data from other platforms affects the classification model’s performance. The researchers investigate the usefulness of extra cross-platform training datasets for enhancing classification model performance. To gauge the effect of datasets on performance, what was also taken into account was variables like definition similarity, content similarity, and the prevalence of hate words. Results demonstrate that the performance of classification models is enhanced with the addition of more comparable datasets based on hate words, definitions, and content similarity. Combining Twitter, Gab, and YouTube comment datasets produced the greatest results, with F1-scores of 0.68 and 0.74 for English and German YouTube comments, respectively.

https://arxiv.org/abs/2410.05287

By author

Leave a Reply

Your email address will not be published. Required fields are marked *