Preprint
Article

This version is not peer-reviewed.

Symmetry-Aware Structured Representation Learning for Unified Multi-Modal Physiological Modeling in Affective State and Preference Inference

Submitted:

25 February 2026

Posted:

27 February 2026

You are already at the latest version

Abstract
Decoding affective states and personal preferences from physiological responses remains a fundamental challenge in affective computing due to strong heterogeneity across neural, autonomic, and attentional signals, as well as the coupling between transient emotions and long-term preferences. Most existing methods address these factors independently and lack explicit mechanisms to preserve the intrinsic structural regularities and invariances of physiological affective responses, limiting their applicability in real-world scenarios such as music therapy. In this paper, we propose a symmetry-aware and structured multi-modal physiological modeling framework for joint affective state and preference inference. The framework integrates electroencephalography (EEG), peripheral physiological signals (GSR, BVP, EMG, respiration, and temperature), and eye-movement data (EOG) within a unified temporal modeling paradigm. At its core, a Dynamic Token Feature Extractor (DTFE) converts raw physiological time series into compact token representations without handcrafted features, and explicitly decomposes representation learning into cross-series symmetry and intra-series symmetry. These two complementary symmetry dimensions are realized through Cross-Series Intersection (CSI) and Intra-Series Intersection (ISI) mechanisms, enabling structured and interpretable physiological representations. A hierarchical cross-modal fusion strategy further integrates modality-level tokens in a symmetry-consistent manner, capturing dependencies among neural, autonomic, and attentional modalities. Extensive experiments on the DEAP dataset demonstrate consistent improvements over state-of-the-art methods under both single-task and multi-task settings. The proposed model achieves 98.32% and 98.45% accuracy for valence and arousal prediction, respectively, and 97.96% accuracy for quadrant-based emotion classification in single-task evaluation, while attaining 92.8%, 91.8%, and 93.6% accuracy for valence, arousal, and liking prediction in joint multi-task settings. Additional robustness analyses under reduced training data confirm that symmetry-aware structured decomposition improves data efficiency and generalization. Overall, this work establishes a principled symmetry-preserving representation learning framework for robust affective decoding and intelligent, feedback-driven music therapy systems.
Keywords: 
;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated