“To address this research gap, we collect a total of 197,566 comments from four platforms: YouTube, Reddit, Wikipedia, and Twitter, with 80% of the comments labeled as non-hateful and the remaining 20% labeled as hateful. We then experiment with several classifcation algorithms (Logistic Regression, Naïve Bayes, Support Vector Machines, XGBoost, and Neural Networks) and feature representations (Bag-of-Words, TF-IDF, Word2Vec, BERT, and their combination). While all the models signifcantly outperform the keyword-based baseline classifer, XGBoost using all features performs the best (F1=0.92). Feature importance analysis indicates that BERT features are the most impactful for the predictions.”


By author

Leave a Reply

Your email address will not be published. Required fields are marked *