Can artificial systems form internal representations that resemble conscious perception? Here we introduce eXCube1, a brain-inspired spiking neural network (BI-SNN) framework that learns evolving spatio-temporal associative memories (ESTAMs) from fMRI data. ESTAMs provide an interpretable, causal account of how neural activity propagates across space and time, bridging statistical neuroimaging analysis and mechanistic modelling. We show that eXCube1 learns discriminative ESTAMs that separate meaningful from meaningless visual and auditory stimuli without access to semantic content. Across two fMRI case studies, the learned models achieve high classification accuracy while revealing structured, modality-specific spatio-temporal dynamics consistent with known neurobiological pathways. Notably, ESTAMs can be robustly recalled from partial temporal input, demonstrating emergent associative memory properties. By explicitly modelling directed spike-based dynamics, eXCube1 moves beyond correlation-based analysis toward causal, interpretable representations of neural computation. These results position ESTAMs as a candidate computational substrate for aspects of machine consciousness grounded in spatio-temporal neural dynamics.