The dataset is curated from Twitter, annotated manually. Our dataset consists of over 25000 distinct tweets labeled into four major classes i.e hate, offensive, profane, and not. We present the approaches used for collecting and annotating the data and the challenges faced during the process. Finally, we present baseline classification results using deep learning models based on CNN, LSTM, and Transformers. We explore mono-lingual and multilingual variants of BERT like MahaBERT, IndicBERT, mBERT, and xlm-RoBERTa and show that mono-lingual models perform better than their multi-lingual counterparts. The MahaBERT model provides the best results on L3Cube-MahaHate Corpus. https://arxiv.org/pdf/2203.13778.pdf Share this: Print (Opens in new window) Print Share on Facebook (Opens in new window) Facebook Share on LinkedIn (Opens in new window) LinkedIn Share on Reddit (Opens in new window) Reddit Share on WhatsApp (Opens in new window) WhatsApp Share on Bluesky (Opens in new window) Bluesky Email a link to a friend (Opens in new window) Email Like this:Like Loading... Post navigation Large-Scale Hate Speech Detection with Cross-Domain Transfer (arXiv) An Inter and Intra Transformer for Hate Speech Detection (IEEE)