Because memes are multimodal—combining text, pictures, and cultural cues—hate speech in them presents difficulties. In order to tackle this issue, we provide MemHateCaptioning, a framework that produces understandable and human-like justifications for the hateful content of memes. It combines Chain-of-Thought prompting with vision-language and big language models (ClipCap, BLIP, and T5) to enhance interpretability. MemHateCaptioning reduces hallucinations and context mistakes while outperforming current models in BLEU and ROUGE-L scores when tested on the HatReD dataset. https://dl.acm.org/doi/10.1145/3701716.3718385 Share this: Click to print (Opens in new window) Print Click to share on Facebook (Opens in new window) Facebook Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Reddit (Opens in new window) Reddit Click to share on WhatsApp (Opens in new window) WhatsApp Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Like this:Like Loading... Post navigation Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate (arXiv) The Role of Context in Detecting the Target of Hate Speech (ACL Anthology)