Preprint
Article

This version is not peer-reviewed.

Mutual Refinement Distillation for Multimodal Emotion Recognition: Interactive Learning and Reverse Curriculum for Complex Sample Classification

Submitted:

29 January 2026

Posted:

29 January 2026

You are already at the latest version

Abstract
With the rapid advancement of speech emotion recognition, the transition from unimodal to multimodal approaches has become inevitable. However, multimodal methods introduce new challenges, particularly classification ambiguity in complex samples when compared to unimodal approaches. To address this, we propose a Mutual Refinement Distillation (MRD) method, which incorporates three key components: (1) Modal Interaction Calibration, enhancing classification accuracy for complex samples; (2) Interactive Learning Constraints, mitigating overfitting; and (3) Reverse Curriculum Learning, further improving model robustness. Experiments with the MELD and IEMOCAP datasets demonstrate that our approach outperforms state-of-the-art methods in emotion recognition, achieving a notable 6.07% improvement over the baseline on IEMOCAP.
Keywords: 
;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated