MMVAE at semeval-2022 task 5: A multi-modal multi-task VAE on misogynous meme detection

Publications

MMVAE at semeval-2022 task 5: A multi-modal multi-task VAE on misogynous meme detection

Gu, Yimeng; Castro, Ignacio; Tyson, Gareth;

dsnmod

Abstract

Memes have become quite common in day-today communications on social media platforms. They often appear to be amusing, evoking and attractive to audiences. However, some memes containing malicious content can be harmful to targeted groups. In this paper, we study misogynous meme detection, a shared task in SemEval 2022 - Multimedia Automatic Misogyny Identification (MAMI). The challenge of misogynous meme detection is to co-represent multi-modal features. To tackle with this challenge, we propose a Multi-modal Multi-task Variational AutoEncoder (MMVAE) to learn an effective corepresentation of visual and textual features in the latent space. Our goal is to automatically determine if a meme contains misogynous information and then identify its fine-grained category. Our model achieves F1 scores of 0.723 on the MAMI sub-task A and 0.634 on sub-task B. We carry out comprehensive experiments on our model’s architecture and show that our approach significantly outperforms several strong uni-modal and multi-modal approaches.
Link to Paper