Because memes are multimodal—combining text, pictures, and cultural cues—hate speech in them presents difficulties. In order to tackle this issue, we provide MemHateCaptioning, a framework that produces understandable and human-like justifications for the hateful content of memes. It combines Chain-of-Thought prompting with vision-language and big language models (ClipCap, BLIP, and T5) to enhance interpretability. MemHateCaptioning reduces hallucinations and context mistakes while outperforming current models in BLEU and ROUGE-L scores when tested on the HatReD dataset.

https://dl.acm.org/doi/10.1145/3701716.3718385

By author

Leave a Reply

Your email address will not be published. Required fields are marked *