Preprint
Article

This version is not peer-reviewed.

Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management

Submitted:

07 May 2026

Posted:

12 May 2026

You are already at the latest version

Abstract
Multimodal reinforcement learning agents must fuse signals with vastly different noise profiles—yet existing architectures, whether monolithic (π0, DreamerV3) or modular (MSDP, VTDexManip), allow noise from unreliable modalities to contaminate reliable ones at the point of fusion. We propose filter-before-mixing: each modality’s representation is independently refined by a per-modality Flow Matching module before spectral-domain fusion via a Fourier Neural Operator (FNO), with a residual gate ensuring that refinement is never harmful. The resulting architecture, FreamerV1 (Filter-before-mixing dreamer), has 93M parameters (0.4M trainable). On MiniGrid, FreamerV1 reaches 100% success at 5000 episodes, surpassing the 94% encoder-only baseline which degrades to 78% due to catastrophic forgetting. On Crafter (no language modality), it scores 16.0%, exceeding DreamerV3 (14.5%). On PAMAP2 wearable sensors—where no pre-trained encoder exists—the foundation encoder achieves 2.4× higher reward and 16× lower variance than a vanilla MLP, confirming that the filter-before-mixing advantage grows with encoder noise.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated