ArXiv Preprint
Facial expression recognition is important for various purpose such as
emotion detection, mental health analysis, and human-machine interaction. In
facial expression recognition, incorporating audio information along with still
images can provide a more comprehensive understanding of an expression state.
This paper presents the Multi-modal facial expression recognition methods for
Affective Behavior in-the-wild (ABAW) challenge at CVPR 2023. We propose a
Modal Fusion Module (MFM) to fuse audio-visual information. The modalities used
are image and audio, and features are extracted based on Swin Transformer to
forward the MFM. Our approach also addresses imbalances in the dataset through
data resampling in training dataset and leverages the rich modal in a single
frame using dynmaic data sampling, leading to improved performance.
Jun-Hwa Kim, Namho Kim, Chee Sun Won
2023-03-15