Despite the growing threats to safety and unity posed by online hate speech, Southeast Asian languages, such as Malay, are still underrepresented in NLP studies. By offering 26,985 multilingual Malay-English social media texts for binary hate speech categorization, the current dataset fills the gap. It provides high-confidence, quality-controlled entries that have been curated from five public sources and filtered using human annotation and pseudo-labelling. The dataset, which was created for multilingual machine learning applications, supports cross-lingual benchmarking, transformer-based classifiers, and instructional resources for populations that speak Malay and English. https://www.sciencedirect.com/science/article/pii/S2352340925008741 Share this: Click to print (Opens in new window) Print Click to share on Facebook (Opens in new window) Facebook Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Reddit (Opens in new window) Reddit Click to share on WhatsApp (Opens in new window) WhatsApp Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Like this:Like Loading... Post navigation Beating Harmful Stereotypes Through Facts: RAG-based Counter-speech Generation (arXiv) Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision–Language Models (ACL Anthology)