The current study presents a comprehensive approach to detecting hate speech in Roman Urdu, a widely used but under-resourced language variant. Recognizing the rise of online hate speech facilitated by anonymity and unrestricted expression on social media, the research expands the Roman Urdu Hate Speech and Offensive Language Detection dataset to 30,955 instances, introducing a new “Racism” category. Using a combination of supervised and unsupervised machine learning, deep learning, and natural language processing techniques—including mBERT, which achieved 92% accuracy—the system effectively identifies abusive, religious, sexist, and racist language patterns. The work contributes to scalable hate speech mitigation in linguistically diverse digital environments.

https://dl.acm.org/doi/10.1145/3768571

By author

Leave a Reply

Your email address will not be published. Required fields are marked *