Preprint
Article

This version is not peer-reviewed.

Adaptive Contextualized Multi-feature Fusion Network for Robust Cross-Linguistic Speech Emotion Recognition

Submitted:

30 December 2025

Posted:

30 December 2025

You are already at the latest version

Abstract
Speech Emotion Recognition (SER) faces significant generalization challenges, particularly in Cross-Linguistic SER (CLSER), due to linguistic and cultural variabilities. Existing approaches struggle with robustly fusing diverse features and adapting to cross-linguistic discrepancies. To address this, we propose the Adaptive Contextualized Multi-feature Fusion Network (ACMF-Net), a novel architecture built on a ``contextualize first, then adaptively fuse'' paradigm. ACMF-Net leverages HuBERT embeddings alongside contextualized Mel-frequency Cepstral Coefficients (MFCCs) and prosodic features, each processed by dedicated Transformer encoders to capture rich temporal dependencies. A core innovation is the Dynamic Gating mechanism, which intelligently learns to dynamically weight the contributions of these heterogeneous feature modalities based on the input speech characteristics, thereby enhancing robustness against cross-linguistic variations. Evaluated on the IEMOCAP dataset for source language performance, ACMF-Net achieved superior Unweighted Accuracy (UAR), outperforming strong baselines and existing multi-feature fusion models. Furthermore, through few-shot fine-tuning on diverse target languages, ACMF-Net consistently demonstrated superior cross-linguistic generalization. An ablation study confirmed the critical contribution of each proposed component, especially the Dynamic Gating mechanism. These results underscore ACMF-Net's potential to significantly advance robust and generalized emotion recognition across linguistic boundaries.
Keywords: 
;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated