Submitted:
06 September 2025
Posted:
09 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We propose FMD-GAN, an innovative generative framework that combines spectral clustering, Markov-guided latent modeling, and frequency-aware diffusion to generate realistic and class-consistent time series.
- We run extensive tests on four distinct UCR datasets, illustrating that FMD-GAN attains comparable or superior performance relative to six leading generative baselines across many assessment parameters.
- We conduct comprehensive interpretability analysis utilizing t-SNE visualization, residual plots, and latent state overlays, demonstrating FMD-GAN’s capacity to maintain semantic structure and reveal significant generative dynamics.
2. Related Work
2.1. GAN-Based Models for Time Series Generation
2.2. Diffusion Models for Temporal Generation
2.3. Class-Conditional and Structured Sequence Models
2.4. Hybrid Models with Semantic and Structural Constraints
3. The Proposed Model
3.1. Sliding-Window Segmentation
3.2. Class-Aware State Assignment via Spectral Features
3.3. Fourier–Markov Diffusion with State-Conditioned Noise
3.4. Reverse Generation and Segment Aggregation
3.5. Adversarial Training with Class-Aware Dual-Branch Discriminator
3.6. Pseudocode of FMD-GAN Training
| Algorithm 1: Training Procedure of FMD-GAN |
|
3.7. Computational Complexity Analysis
4. Experiment
4.1. Datasets
4.2. Baselines
- TimeGAN [7] (Adversarial + Supervised): A hybrid model integrating RNN-based autoencoding, temporal supervision, and adversarial learning. It serves as a prevalent standard for sequential generation.
- RCGAN-UCR [9] (Conditional GAN): A recurrent conditional GAN initially designed for the synthesis of medical signals. We modify it for UCR datasets by conditioning on one-hot class labels.
- TTS-CGAN [18] (Prototype-guided GAN): A GAN model that produces time series by conditioning on class prototypes, hence improving semantic integrity and temporal coherence.
- CSDI [10] (Score-based Diffusion): A conditional score-based diffusion model for imputing time series data. We adapt it for unconditional generation by class-aware reverse sampling.
- DiffWave [22] (Denoising Diffusion): An audio synthesis diffusion model, modified for unconditional time series production with Gaussian noise schedules.
- Diffusion-TS [15] (Denoising Diffusion): A comprehensible time series generator utilizing autoregressive denoising diffusion, providing high fidelity across many tasks.
4.3. Evaluation Metrics
- Fréchet Inception Distance (FID): Evaluates the distributional similarity between authentic and produced samples inside a learned embedding space. We employ a pretrained LSTM encoder to derive fixed-length representations and calculate the Fréchet distance between the empirical Gaussian distributions of these embeddings. A reduced FID signifies enhanced distributional alignment and authenticity.
- Dynamic Time Warping (DTW): Determines structural alignment by calculating the best alignment cost between generated and actual sequences. Dynamic Time Warping accommodates local time variations and distortions, rendering it a resilient metric for temporal accuracy. Reduced DTW values signify enhanced structural preservation.
- Class Consistency Accuracy (CCA): Evaluates semantic coherence by confirming that generated sequences accurately correspond to their designated class labels. A one-dimensional convolutional neural network classifier is trained on actual data and employed to forecast class labels for generated samples. A higher CCA indicates enhanced semantic fidelity and superior class-conditional generation quality.
- Spectral Distance (SD): Assesses the preservation of frequency-domain structure by calculating the average Euclidean distance between the normalized power spectra of actual and produced sequences. We utilize the Fast Fourier Transform (FFT) to derive the magnitude spectrum for each sequence. Reduced SD values signify enhanced global structural alignment in the frequency domain, reinforcing the spectral modeling rationale underlying FMD-GAN.
4.4. Implementation Details
4.5. Quantitative Results
4.6. Ablation Study
- NoMask: Eliminates the state-conditioned spectral mask , substituting it with isotropic Gaussian noise, therefore disregarding frequency-aware modulation.
- NoMarkov: Substitutes the acquired Markovian transition matrix with uniform random sampling, hence undermining temporal state continuity.
- NoDiff:Completely disables the forward diffusion, hence reducing the model to a traditional GAN trained on latent vectors.
- FMD-GAN (Full): The comprehensive model integrating spectrum masking and Markov-guided denoising diffusion.
4.7. Qualitative Analysis
4.7.1. Structure Visualization and Residual Analysis
- Real: The original ground-truth sequence from the test set.
- Generated: The sequence synthesized by each method (TimeGAN, Diffusion-TS, FMD-GAN).
- Residual (FMD-GAN only): The pointwise discrepancy between the actual output and that of FMD-GAN, emphasizing its reconstruction accuracy.
- Latent State (FMD-GAN only): A color-coded bar representing the Markov state allocated to each timestep throughout the creation process.
4.7.2. Latent Space Alignment via t-SNE
4.8. Sensitivity to Markov States and KL Regularization
4.9. Training Dynamics and Convergence Stability
5. Discussion
6. Conclusions
7. Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- I. Gulrajani, F. I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, Improved Training of Wasserstein GANs. in Advances in Neural Information Processing Systems (NeurIPS), 2017.
- J. B. Allen, Short term spectral analysis, synthesis, and modification by discrete Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing 1977, 25, 235–238. [CrossRef]
- S. Lloyd, Least squares quantization in PCM. IEEE Transactions on Information Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
- L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 1989, 77, 257–286. [Google Scholar] [CrossRef]
- E. Perez, F. E. Perez, F. Strub, H. de Vries, V. Dumoulin, and A. Courville, FiLM: Visual reasoning with a general conditioning layer. in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2018.
- H. A. Dau, E. H. A. Dau, E. Keogh, K. Kamgar, C.-C. M. Yeh, Y. Zhu, S. Gharghabi, C. A. Ratanamahatana, Yanping, B. Hu, N. Begum, A. Bagnall, A. Mueen, G. Batista, and Hexagon-ML, The UCR Time Series Classification Archive. 2018. [Online]. Available: https://www.cs.ucr.edu/~eamonn/time_series_data_2018/.
- J. Yoon, D. J. Yoon, D. Jarrett, and M. van der Schaar, Time-series Generative Adversarial Networks. in Advances in Neural Information Processing Systems (NeurIPS), 2019.
- O. Mogren, C-RNN-GAN: Continuous recurrent neural networks with adversarial training. in NIPS Workshop on Constructive Machine Learning, 2016.
- C. Esteban, S. L. C. Esteban, S. L. Hyland, and G. Rätsch, Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs. in NeurIPS Workshop on Machine Learning for Health, 2017.
- Y. Tashiro, J. Y. Tashiro, J. Song, and S. Ermon, CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation. in Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Q. Wen, J. Q. Wen, J. Gao, L. Sun, X. Xu, et al., Time-series data augmentation for deep learning: A survey. in Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2020.
- K. Rasul, et al., Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting. in International Conference on Machine Learning (ICML), 2021.
- E. Adib, F. E. Adib, F. Afghah, and J. J. Prevost, Synthetic ECG Signal Generation Using Generative Neural Networks. arXiv 2021, arXiv:2112.03268. [Google Scholar]
- E. Adib, A. E. Adib, A. Fernandez, F. Afghah, and J. J. Prevost, Synthetic ECG Signal Generation using Probabilistic Diffusion Models. arXiv 2023, arXiv:2303.02475. [Google Scholar]
- X. Yuan and Y. Qiao, Diffusion-TS: Interpretable Diffusion for General Time Series Generation. in International Conference on Learning Representations (ICLR). arXiv, arXiv:2403.01742.
- B. Barancikova, Z. B. Barancikova, Z. Huang, and C. Salvi, SigDiffusions: Score-Based Diffusion Models for Long Time Series via Log-Signature Embeddings. arXiv, arXiv:2406.10354.
- K. Yi, Q. Zhang, S. Wang, and H. He, Neural Time Series Analysis with Fourier Transform: A Survey. arXiv 2023, arXiv:2302.02173. [Google Scholar]
- X. Li, A. H. H. Ngu, and V. Metsis, TTS-CGAN: A Transformer Time-Series Conditional GAN for Biosignal Data Augmentation. arXiv 2022, arXiv:2206.13676. [Google Scholar]
- H. Zhou, et al., Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021.
- I. Ismail, et al., DAWN: An End-to-End Differentially Private Time-series Classifier. in International Conference on Learning Representations (ICLR), 2020.
- J. Han, J. J. Han, J. Pei, and H. Tong, Data Mining: Concepts and Techniques, 4th ed. San Francisco, CA: Morgan Kaufmann, 2022.
- Z. Kong, W. Z. Kong, W. Ping, J. Huang, K. Zhao, and B. Catanzaro, DiffWave: A versatile diffusion model for audio synthesis. arXiv 2020, arXiv:2009.09761. [Google Scholar]
- L. van der Maaten and G. Hinton, Visualizing data using t-SNE. Journal of Machine Learning Research 2008, 9, 2579–2605. [Google Scholar]
- Y. Liang, H. Y. Liang, H. Wen, Y. Nie, Y. Jiang, M. Jin, D. Song, S. Pan, and Q. Wen, Foundation Models for Time Series Analysis: A Tutorial and Survey. in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24), pp. 3376–3396, 2024.
- Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, and L. Sun, Transformers in Time Series: A Survey. arXiv 2022, arXiv:2202.07125. [Google Scholar]
- Neifar, N.; Ben-Hamadou, A.; Mdhaffar, A.; Jmaiel, M. DiffECG: A Versatile Probabilistic Diffusion Model for ECG Signals Synthesis. arXiv 2023, arXiv:2306.01875. [Online]. Available: https://arxiv.org/abs/2306, 01875. [Google Scholar]
- Y. Song, J. Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, Score-Based Generative Modeling through Stochastic Differential Equations. in International Conference on Learning Representations (ICLR), 2021.
- Y. Song and S. Ermon, Improved Techniques for Training Score-Based Generative Models. in Advances in Neural Information Processing Systems (NeurIPS), 2020.
- A. van den Oord, S. A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, WaveNet: A Generative Model for Raw Audio. arXiv 2016, arXiv:1609.03499. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. in Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Li, D.; Chen, D.; Jin, Y.; Shi, L.; Goh, R.S.M.; Ng, S.K. Mad-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. in International Conference on Artificial Neural Networks (ICANN), pp. 703–716, 2019. Springer.
- Sohn, K.; Yan, X.; Lee, H. Learning Structured Output Representation using Deep Conditional Generative Models. in Advances in Neural Information Processing Systems (NeurIPS), 2015.
- K. Fragkiadaki, S. K. Fragkiadaki, S. Levine, P. Felsen, and J. Malik, Recurrent network models for human dynamics. in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4346–4354, 2015.
- Lasko, T.A.; Denny, J.C.; Levy, M.A. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data, PLOS ONE 2013, 8, e66341. [Google Scholar] [CrossRef]
- H. Wu, Y. H. Wu, Y. Xu, J. Wang, G. Long, C. Jiang, and T. Zhang, Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. in Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Lin, L.; Li, Z.; Li, R.; Li, X.; Gao, J. Diffusion models for time-series applications: a survey. Frontiers of Information Technology & Electronic Engineering 2024, 25, 19–41. [Google Scholar]
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks. Communications of the ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Boll, S. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 2003, 27, 113–120. [Google Scholar] [CrossRef]







| Dataset | #Classes | Length | #Instances | Domain |
|---|---|---|---|---|
| ECG200 | 2 | 96 | 200 | Biomedical |
| GunPoint | 2 | 150 | 200 | Human motion |
| UWaveGestureLibrary_X | 8 | 315 | 896 | Multivariate gesture |
| FordA | 2 | 500 | 1320 | Industrial sensor |
| Model | ECG200 | GunPoint | FordA | ChlorineConc | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FID ↓ | DTW ↓ | CCA ↑ | FID ↓ | DTW ↓ | CCA ↑ | FID ↓ | DTW ↓ | CCA ↑ | FID ↓ | DTW ↓ | CCA ↑ | |
| TimeGAN | 50.9 | 11.6 | 0.90 | 47.9 | 6.4 | 0.87 | 50.4 | 16.8 | 0.77 | 38.0 | 10.6 | 0.89 |
| RCGAN-UCR | 45.8 | 17.3 | 0.84 | 29.1 | 13.3 | 0.76 | 53.1 | 14.5 | 0.88 | 34.2 | 19.6 | 0.88 |
| TTS-CGAN | 51.1 | 7.9 | 0.84 | 21.8 | 7.3 | 0.84 | 49.8 | 19.5 | 0.81 | 34.8 | 12.0 | 0.79 |
| CSDI | 48.5 | 9.8 | 0.88 | 25.3 | 6.7 | 0.85 | 47.6 | 13.2 | 0.86 | 33.6 | 9.7 | 0.87 |
| DiffWave | 42.7 | 6.7 | 0.90 | 22.4 | 5.9 | 0.88 | 45.3 | 10.2 | 0.88 | 31.2 | 8.3 | 0.90 |
| Diffusion-TS | 38.2 | 7.2 | 0.91 | 20.7 | 5.2 | 0.89 | 43.9 | 9.9 | 0.89 | 28.9 | 7.3 | 0.89 |
| FMD-GAN (Ours) | 38.4 | 6.7 | 0.91 | 20.1 | 5.1 | 0.89 | 41.8 | 9.6 | 0.89 | 28.5 | 7.1 | 0.88 |
| Model | ECG200 | GunPoint | Coffee | Beef |
|---|---|---|---|---|
| TTS-CGAN | 0.092 | 0.084 | 0.113 | 0.105 |
| CSDI | 0.081 | 0.075 | 0.109 | 0.093 |
| DiffWave | 0.064 | 0.058 | 0.091 | 0.078 |
| Diffusion-TS | 0.062 | 0.055 | 0.087 | 0.072 |
| FMD-GAN (Ours) | 0.053 | 0.046 | 0.079 | 0.065 |
| ECG200 | GunPoint | |||||||
|---|---|---|---|---|---|---|---|---|
| Variant | FID ↓ | DTW ↓ | CCA ↑ | SD ↓ | FID ↓ | DTW ↓ | CCA ↑ | SD ↓ |
| NoMask | 43.7 | 8.5 | 0.86 | 0.074 | 23.9 | 6.1 | 0.85 | 0.067 |
| NoMarkov | 41.5 | 8.2 | 0.88 | 0.069 | 22.4 | 5.8 | 0.86 | 0.063 |
| NoDiff | 46.2 | 9.1 | 0.83 | 0.089 | 26.1 | 6.6 | 0.82 | 0.079 |
| FMD-GAN (Full) | 38.4 | 6.7 | 0.91 | 0.053 | 20.1 | 5.1 | 0.89 | 0.046 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).