Because memes are multimodal—combining text, pictures, and cultural cues—hate speech in them presents difficulties. In order to tackle this issue, we provide MemHateCaptioning, a framework that produces understandable and human-like justifications for the hateful content of memes. It combines Chain-of-Thought prompting with vision-language and big language models (ClipCap, BLIP, and T5) to enhance interpretability. MemHateCaptioning reduces hallucinations and context mistakes while outperforming current models in BLEU and ROUGE-L scores when tested on the HatReD dataset. https://dl.acm.org/doi/10.1145/3701716.3718385 Share this: Click to share on Facebook (Opens in new window) Facebook Click to share on X (Opens in new window) X Like this:Like Loading... Post navigation Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate (arXiv) The Role of Context in Detecting the Target of Hate Speech (ACL Anthology)