Submitted:
03 February 2026
Posted:
10 February 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Background
1.2. Motivation and Contributions
- 1.
- Existing Transformer-based forecasting frameworks process time series exclusively in the temporal domain, failing to exploit the complementary strengths of frequency-domain representations for capturing periodic components and harmonic structures. Moreover, standard frequency transformation approaches suffer from spectral misalignment between input sequences and complete time series, limiting their effectiveness for multi-step-ahead prediction tasks.
- 2.
- Current mixture-of-experts architectures in time series modeling rely on static gating mechanisms that route inputs based on learned but fixed assignment patterns, lacking the capability to dynamically adapt expert selection to varying temporal patterns, forecasting horizons, and evolving system dynamics. The absence of reinforcement learning-based routing prevents simultaneous optimization of prediction accuracy, routing stability, and expert diversity.
- 3.
- Existing research predominantly treats model architecture design, expert specialization, and forecasting optimization as separate problems, failing to exploit the synergies and interdependencies among dual-domain processing, sparse computation, and dynamic routing. The lack of unified frameworks that jointly optimize these components through end-to-end training limits the potential benefits achievable through integrated system design.
- We establish a dual-domain Transformer architecture that processes time series simultaneously in time and frequency domains through parallel encoder pathways. Our framework introduces Extended DFT that aligns input spectrum with the complete series frequency grid by computing F[k] = , fundamentally resolving the spectral misalignment problem in frequency-domain forecasting.
- We design independent mixture-of-experts modules for frequency (F-MoE) and time domains (T-MoE), with =4 frequency experts and =4 time experts per Transformer layer. Each expert implements specialized two-layer networks optimized for domain-specific pattern recognition, enabling sparse parameter activation while maintaining model capacity through conditional computation.
- We develop a reinforcement learning-based routing framework that formulates expert selection as independent Markov Decision Processes in both domains. Our multi-objective reward function balances prediction accuracy, routing stability, and expert diversity through weighted terms , with MAPPO policies learning domain-specific routing strategies through coupled training dynamics.
- We conduct comprehensive experiments across five long-term forecasting benchmarks (ETTh1, ETTm1, Weather, Electricity, Traffic) and four prediction horizons (96, 192, 336, 720 steps), demonstrating that MoE-Transformer achieves 50.9-56.9% MSE reduction over state-of-the-art baselines while delivering 60% faster inference and 40% memory reduction through sparse expert activation. Ablation studies validate each component’s contribution, with RL routing providing 39.5-47.2% improvement over static gating methods.
2. Related Work
2.1. Transformer-Based Time Series Forecasting
| Ref | [18] | [19] | [20] | [21] | [22] | [23] | [24] | [25] | [26] | [27] | [28] | Proposed work | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Feature | |||||||||||||
| Temporal modeling capability | √ | √ | √ | √ | √ | √ | √ | √ | |||||
| Frequency-domain processing | √ | √ | √ | √ | √ | ||||||||
| Sparse expert networks | √ | √ | √ | √ | √ | ||||||||
| Dynamic routing mechanism | √ | √ | |||||||||||
| Multi-horizon forecasting | √ | √ | √ | √ | √ | √ | √ | √ | |||||
2.2. Frequency-Domain Time Series Modeling
2.3. Mixture-of-Experts and Dynamic Routing Mechanisms
3. Method

3.1. Problem Formulation
3.2. Dual-Domain Transformer Architecture
3.3. Transformer Layer with Dual-Domain MoE
3.4. F-MoE: Frequency Domain Mixture-of-Experts
3.5. T-MoE: Time Domain Mixture-of-Experts
3.6. Reinforcement Learning Framework for F-MoE Routing
3.7. Reinforcement Learning Framework for T-MoE Routing
3.8. Ensemble Weighting
3.9. Coupled Training Dynamics
4. Experiments
4.1. Datasets and Experimental Setup
4.1.1. Dataset Description
4.1.2. Problem Formulation Alignment
4.2. Model Configuration
4.2.1. Component-Specific Parameters
4.2.2. Coupled Training Configuration
4.3. Evaluation Metrics and Analysis Framework
4.3.1. Primary Evaluation Metrics
5. Experimental Results
5.1. Reinforcement Learning Training Dynamics
5.2. Overall Performance Comparison
5.3. Ablation Studies
5.4. Expert Specialization and Routing Behavior
6. Conclusion
References
- Wu, B.; Cai, Z.; Wu, W.; Yin, X. AoI-Aware Resource Management for Smart Health via Deep Reinforcement Learning. IEEE Access 2023. [CrossRef]
- Fang, Z.; Liu, Z.; Wang, J.; Hu, S.; Guo, Y.; Deng, Y.; Fang, Y. Task-Oriented Communications for Visual Navigation With Edge-Aerial Collaboration in Low Altitude Economy. In Proceedings of the IEEE Global Communications Conference (GLOBECOM). IEEE, 2026.
- Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). AAAI, 2023, Vol. 37, pp. 11121–11128.
- Liu, Z.; Yang, J.; Cheng, M.; Luo, Y.; Li, Z. Generative Pretrained Hierarchical Transformer for Time Series Forecasting. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). ACM, 2024, pp. 2003–2013.
- Wu, B.; Huang, J.; Yu, S. "X of Information" Continuum: A Survey on AI-Driven Multi-Dimensional Metrics for Next-Generation Networked Systems. arXiv preprint arXiv:2507.19657 2025.
- Pan, D.; Wu, B.N.; Sun, Y.L.; Xu, Y.P. A Fault-Tolerant and Energy-Efficient Design of a Network Switch Based on a Quantum-Based Nano-Communication Technique. Sustainable Computing: Informatics and Systems 2023, 37, 100827. [CrossRef]
- Ding, Z.; Huang, J.; Qi, J. Learning to Defend: A Multi-Agent Reinforcement Learning Framework for Stackelberg Security Game in Mobile Edge Computing. In Proceedings of the International Conference on Computing, Networking and Communications (ICNC), IEEE, Honolulu, Hawaii, USA, February 2026.
- Piao, X.; Chen, Z.; Murayama, T.; Matsubara, Y.; Sakurai, Y. Fredformer: Frequency Debiased Transformer for Time Series Forecasting. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). ACM, 2024, pp. 2400–2410.
- Li, R.; Jiang, M.; Liu, Q.; Wang, K.; Feng, K.; Sun, Y.; Zhou, X. FAITH: Frequency-Domain Attention in Two Horizons for Time Series Forecasting. Knowledge-Based Systems 2025, 309, 112790. [CrossRef]
- Chen, Y.; Liu, S.; Yang, J.; Jing, H.; Zhao, W.; Yang, G. A Joint Time-Frequency Domain Transformer for Multivariate Time Series Forecasting. Neural Networks 2024, 176, 106334. [CrossRef]
- Zhang, Y.; Cai, J.; Wu, Z.; Wang, P.; Ng, S.K. Mixture of Experts as Representation Learner for Deep Multi-View Clustering. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). AAAI, 2025, Vol. 39, pp. 22704–22713.
- Oldfield, J.; Georgopoulos, M.; Chrysos, G.; Tzelepis, C.; Panagakis, Y.; Nicolaou, M.; Deng, J.; Patras, I. Multilinear Mixture of Experts: Scalable Expert Specialization Through Factorization. Advances in Neural Information Processing Systems (NeurIPS) 2024, 37, 53022–53063.
- Huang, J.; Wu, B.; Duan, Q.; Dong, L.; Yu, S. A Fast UAV Trajectory Planning Framework in RIS-Assisted Communication Systems With Accelerated Learning via Multithreading and Federating. IEEE Transactions on Mobile Computing 2025, pp. 1–16. [CrossRef]
- Cai, W.; Jiang, J.; Wang, F.; Tang, J.; Kim, S.; Huang, J. A Survey on Mixture of Experts in Large Language Models. IEEE Transactions on Knowledge and Data Engineering 2025, 37, 3896–3915. [CrossRef]
- Wu, B.; Huang, J.; Duan, Q.; Dong, L.; Cai, Z. Enhancing Vehicular Platooning With Wireless Federated Learning: A Resource-Aware Control Framework. IEEE/ACM Transactions on Networking 2025, pp. 1–1. [CrossRef]
- Wu, B.; Ding, Z.; Huang, J. A Review of Continual Learning in Edge AI. IEEE Transactions on Network Science and Engineering 2025. [CrossRef]
- Xing, C.C.; Ding, Z.; Huang, J. A Stochastic Geometry-Based Analysis of SWIPT-Assisted Underlaid Device-to-Device Energy Harvesting. ACM SIGAPP Applied Computing Review 2025, 25, 18–34. [CrossRef]
- Zhao, T.; Fang, L.; Ma, X.; Li, X.; Zhang, C. TFformer: A Time–Frequency Domain Bidirectional Sequence-Level Attention Based Transformer for Interpretable Long-Term Sequence Forecasting. Pattern Recognition 2025, 158, 110994. [CrossRef]
- Han, W.; Zhu, T.; Chen, L.; Ning, H.; Luo, Y.; Wan, Y. MCformer: Multivariate Time Series Forecasting With Mixed-Channels Transformer. IEEE Internet of Things Journal 2024, 11, 28320–28329. [CrossRef]
- Li, P.; Zheng, X.; Xiang, S.; Hou, J.; Qin, Y.; Kurboniyon, M.S.; Ren, W. Channel Independence Bidirectional Gated Mamba With Interactive Recurrent Mechanism for Time Series Forecasting. IEEE Transactions on Industrial Electronics 2025, pp. 1–10. [CrossRef]
- Kumar, R.; Mendes-Moreira, J.; Chandra, J. Spatio-Temporal Parallel Transformer Based Model for Traffic Prediction. ACM Transactions on Knowledge Discovery from Data 2024, 18, 1–25. [CrossRef]
- Xu, D.; Wang, H.; Zhang, F. Frequency Decomposition and Patch Modeling Framework for Time-Series Forecasting. Applied Soft Computing 2025, p. 113890. [CrossRef]
- Zhang, Z.; Chen, Y.; Zhang, D.; Qian, Y.; Wang, H. CTFNet: Long-Sequence Time-Series Forecasting Based on Convolution and Time–Frequency Analysis. IEEE Transactions on Neural Networks and Learning Systems 2024, 35, 16368–16382. [CrossRef]
- Yang, Z.; Yan, W.; Huang, X.; Mei, L. Adaptive Temporal-Frequency Network for Time-Series Forecasting. IEEE Transactions on Knowledge and Data Engineering 2022, 34, 1576–1587. [CrossRef]
- Zhang, D.; Song, J.; Bi, Z.; Yuan, Y.; Wang, T.; Yeong, J.; Hao, J. Mixture of Experts in Large Language Models. arXiv preprint arXiv:2507.11181 (arXiv) 2025. [CrossRef]
- Csordás, R.; Piękos, P.; Irie, K.; Schmidhuber, J. SwitchHead: Accelerating Transformers With Mixture-of-Experts Attention. Advances in Neural Information Processing Systems (NeurIPS) 2024, 37, 74411–74438.
- Yue, T.; Guo, L.; Cheng, J.; Gao, X.; Huang, H.; Liu, J. Ada-K Routing: Boosting the Efficiency of MoE-Based LLMs. In Proceedings of the International Conference on Learning Representations (ICLR). ICLR, 2024.
- Ma, Y.; Yu, Z.; Lin, X.; Xie, W.; Shen, L. Big-MoE: Bypassing Isolated Gating for Generalized Multimodal Face Anti-Spoofing. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5.
- Ding, Z.; Huang, J.; Duan, Q.; Zhang, C.; Zhao, Y.; Gu, S. A Dual-Level Game-Theoretic Approach for Collaborative Learning in UAV-Assisted Heterogeneous Vehicle Networks. In Proceedings of the IEEE International Performance, Computing, and Communications Conference (IPCCC). IEEE, 2025, pp. 1–8.
- Fang, Z.; Guo, Y.; Wang, J.; Zhang, Y.; An, H.; Wang, Y.; Fang, Y. Shared Spatial Memory Through Predictive Coding. arXiv preprint arXiv:2511.04235 (arXiv) 2025. [CrossRef]
- Chen, Y.; Ren, K.; Wang, Y.; Fang, Y.; Sun, W.; Li, D. Contiformer: Continuous-Time Transformer for Irregular Time Series Modeling. Advances in Neural Information Processing Systems (NeurIPS) 2023, 36, 47143–47175.
- Fan, W.; Fu, Y.; Zheng, S.; Bian, J.; Zhou, Y.; Xiong, H. DEWP: Deep Expansion Learning for Wind Power Forecasting. ACM Transactions on Knowledge Discovery from Data 2024, 18, 1–21. [CrossRef]
- Fang, Z.; Hu, S.; Wang, J.; Deng, Y.; Chen, X.; Fang, Y. Prioritized Information Bottleneck Theoretic Framework With Distributed Online Learning for Edge Video Analytics. IEEE/ACM Transactions on Networking 2025, pp. 1–17. [CrossRef]
- Zhou, J.; Wang, S.; Ou, Y. Fourier Graph Convolution Transformer for Financial Multivariate Time Series Forecasting. In Proceedings of the International Joint Conference on Neural Networks (IJCNN). IEEE, 2024, pp. 1–8.
- Wu, B.; Huang, J.; Duan, Q. Real-Time Intelligent Healthcare Enabled by Federated Digital Twins With AoI Optimization. IEEE Network 2025, pp. 1–1. [CrossRef]
- Fang, Z.; Wang, J.; Ma, Y.; Tao, Y.; Deng, Y.; Chen, X.; Fang, Y. R-ACP: Real-Time Adaptive Collaborative Perception Leveraging Robust Task-Oriented Communications. IEEE Journal on Selected Areas in Communications 2025. [CrossRef]
- Wu, B.; Huang, J.; Duan, Q. FedTD3: An Accelerated Learning Approach for UAV Trajectory Planning. In Proceedings of the International Conference on Wireless Artificial Intelligent Computing Systems and Applications (WASA). Springer, 2025, pp. 13–24.
- Kumari, J.; Mondal, A.; Mathew, J. Fourier-Driven Lightweight Token Mixing Model for Efficient Time Series Forecasting. IEEE Transactions on Artificial Intelligence 2025, pp. 1–14. [CrossRef]
- Zhang, C.; Zhou, T.; Wen, Q.; Sun, L. TFAD: A Decomposition Time Series Anomaly Detection Architecture With Time–Frequency Analysis. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). ACM, 2022, pp. 2497–2507.
- Wu, B.; Wu, W. Model-Free Cooperative Optimal Output Regulation for Linear Discrete-Time Multi-Agent Systems Using Reinforcement Learning. Mathematical Problems in Engineering 2023, 2023, 6350647. [CrossRef]
- Fang, Z.; Wang, J.; Ren, Y.; Han, Z.; Poor, H.V.; Hanzo, L. Age of Information in Energy Harvesting Aided Massive Multiple Access Networks. IEEE Journal on Selected Areas in Communications 2022, 40, 1441–1456. [CrossRef]
- Wu, B.; Ding, Z.; Ostigaard, L.; Huang, J. Reinforcement Learning-Based Energy-Aware Coverage Path Planning for Precision Agriculture. In Proceedings of the Proceedings of the 2025 International Conference on Research in Adaptive and Convergent Systems (RACS). ACM, 2025, pp. 1–6.
- Wang, K.; Tan, C.W. Reverse Engineering Segment Routing Policies and Link Costs With Inverse Reinforcement Learning and EM. IEEE Transactions on Machine Learning in Communications and Networking 2025, 3, 1014–1029. [CrossRef]
- Chen, Y.R.; Rezapour, A.; Tzeng, W.G.; Tsai, S.C. RL-Routing: An SDN Routing Algorithm Based on Deep Reinforcement Learning. IEEE Transactions on Network Science and Engineering 2020, 7, 3185–3199. [CrossRef]
- Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In Proceedings of the International Conference on Learning Representations, 2023.
- Wang, Y.; Wu, H.; Dong, J.; Liu, Y.; Long, M.; Wang, J. Deep Time Series Models: A Comprehensive Survey and Benchmark 2024. [CrossRef]

| Symbol | Description | Symbol | Description |
|---|---|---|---|
| Time Series Elements | |||
| Input series | Prediction | ||
| L | Look-back window | T | Forecast horizon |
| Frequency spectrum | N | Number of variates | |
| Spectrum length | Dataset | ||
| Network Architecture | |||
| M | Transformer layers | d | Model dimension |
| FFN dimension | H | Attention heads | |
| Weight matrix | Activation | ||
| F-Block MoE Components | |||
| F-expert set | Number of F-experts | ||
| F-expert k | F-expert params | ||
| F-hidden state | F-expert output | ||
| T-Block MoE Components | |||
| T-expert set | Number of T-experts | ||
| T-expert k | T-expert params | ||
| T-hidden state | T-expert output | ||
| F-Block RL Components | |||
| F-state space | F-action space | ||
| F-state | F-action | ||
| F-policy | F-policy params | ||
| F-reward | F-value function | ||
| T-Block RL Components | |||
| T-state space | T-action space | ||
| T-state | T-action | ||
| T-policy | T-policy params | ||
| T-reward | T-value function | ||
| Parameter | Notation | Value |
|---|---|---|
| Transformer layers | M | 3 |
| Model dimension | d | 512 |
| FFN dimension | 2048 | |
| Attention heads | H | 8 |
| Patch length | P | 16 |
| Number of T-experts | 4 | |
| Number of F-experts | 4 | |
| Total experts per layer | - | 8 |
| Total experts (all layers) | - | 24 |
| Extended DFT length | - | |
| Policy hidden dimension | 256 | |
| Discount factor | 0.99 | |
| GAE parameter | 0.95 | |
| RL learning rate | - | 0.0001 |
| Prediction weight | 1.0 | |
| Switching penalty | 0.1 | |
| Diversity bonus | 0.05 | |
| Batch size | - | 256 |
| Model learning rate | 0.0001 | |
| Dropout rate | - | 0.1 |
| Training epochs | - | 100 |
| Models | ETTh1 | ETTm1 | Weather | Electricity | Traffic | |||||||||||||||
| 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | |
| Transformer | 0.725±.031 | 0.789±.038 | 0.841±.042 | 0.912±.048 | 0.698±.035 | 0.756±.039 | 0.808±.041 | 0.873±.045 | 0.612±.028 | 0.671±.033 | 0.724±.036 | 0.789±.041 | 0.438±.022 | 0.487±.025 | 0.531±.028 | 0.589±.032 | 0.834±.038 | 0.897±.042 | 0.956±.045 | 1.021±.049 |
| 0.578±.027 | 0.623±.031 | 0.664±.034 | 0.719±.038 | 0.561±.029 | 0.604±.032 | 0.645±.035 | 0.697±.039 | 0.513±.026 | 0.558±.030 | 0.599±.033 | 0.651±.037 | 0.401±.021 | 0.438±.024 | 0.473±.027 | 0.521±.031 | 0.612±.031 | 0.658±.035 | 0.701±.038 | 0.754±.042 | |
| Informer | 0.698±.029 | 0.761±.036 | 0.812±.039 | 0.881±.045 | 0.671±.033 | 0.729±.037 | 0.779±.040 | 0.842±.043 | 0.587±.027 | 0.644±.031 | 0.695±.035 | 0.758±.039 | 0.421±.021 | 0.468±.024 | 0.509±.027 | 0.564±.030 | 0.801±.036 | 0.863±.040 | 0.921±.043 | 0.985±.047 |
| 0.556±.026 | 0.601±.030 | 0.641±.033 | 0.694±.037 | 0.539±.028 | 0.582±.031 | 0.622±.034 | 0.672±.037 | 0.491±.025 | 0.535±.029 | 0.575±.032 | 0.625±.036 | 0.384±.020 | 0.421±.023 | 0.455±.026 | 0.502±.029 | 0.587±.030 | 0.632±.034 | 0.674±.037 | 0.725±.040 | |
| Autoformer | 0.671±.028 | 0.733±.035 | 0.783±.038 | 0.849±.043 | 0.645±.032 | 0.702±.036 | 0.751±.039 | 0.813±.042 | 0.561±.026 | 0.617±.030 | 0.667±.034 | 0.728±.038 | 0.403±.020 | 0.449±.023 | 0.489±.026 | 0.542±.029 | 0.773±.035 | 0.834±.039 | 0.891±.042 | 0.954±.046 |
| 0.534±.025 | 0.578±.029 | 0.617±.032 | 0.668±.036 | 0.517±.027 | 0.560±.030 | 0.599±.033 | 0.648±.036 | 0.469±.024 | 0.512±.028 | 0.551±.031 | 0.599±.035 | 0.367±.019 | 0.403±.022 | 0.436±.025 | 0.481±.028 | 0.561±.029 | 0.605±.033 | 0.646±.036 | 0.695±.039 | |
| FEDformer | 0.644±.027 | 0.706±.034 | 0.756±.037 | 0.821±.041 | 0.618±.031 | 0.675±.035 | 0.724±.038 | 0.785±.041 | 0.534±.025 | 0.590±.029 | 0.639±.033 | 0.698±.037 | 0.385±.019 | 0.431±.022 | 0.471±.025 | 0.523±.028 | 0.745±.034 | 0.806±.038 | 0.863±.041 | 0.925±.045 |
| 0.511±.024 | 0.555±.028 | 0.593±.031 | 0.642±.035 | 0.495±.026 | 0.538±.029 | 0.577±.032 | 0.625±.035 | 0.447±.023 | 0.489±.027 | 0.528±.030 | 0.574±.034 | 0.349±.018 | 0.385±.021 | 0.418±.024 | 0.462±.027 | 0.535±.028 | 0.579±.032 | 0.619±.035 | 0.667±.038 | |
| DLinear | 0.617±.026 | 0.679±.033 | 0.729±.036 | 0.794±.040 | 0.591±.030 | 0.648±.034 | 0.697±.037 | 0.758±.040 | 0.507±.024 | 0.563±.028 | 0.612±.032 | 0.671±.036 | 0.367±.018 | 0.413±.021 | 0.453±.024 | 0.505±.027 | 0.718±.033 | 0.779±.037 | 0.836±.040 | 0.898±.044 |
| 0.488±.023 | 0.532±.027 | 0.569±.030 | 0.616±.034 | 0.473±.025 | 0.516±.028 | 0.555±.031 | 0.602±.034 | 0.425±.022 | 0.466±.026 | 0.504±.029 | 0.549±.033 | 0.331±.017 | 0.367±.020 | 0.400±.023 | 0.443±.026 | 0.509±.027 | 0.553±.031 | 0.593±.034 | 0.641±.037 | |
| PatchTST | 0.589±.025 | 0.651±.032 | 0.701±.035 | 0.766±.039 | 0.564±.029 | 0.621±.033 | 0.670±.036 | 0.731±.039 | 0.481±.023 | 0.537±.027 | 0.586±.031 | 0.645±.035 | 0.349±.017 | 0.395±.020 | 0.435±.023 | 0.487±.026 | 0.691±.032 | 0.752±.036 | 0.809±.039 | 0.871±.043 |
| 0.461±.022 | 0.505±.026 | 0.542±.029 | 0.589±.033 | 0.446±.024 | 0.489±.027 | 0.528±.030 | 0.575±.033 | 0.399±.021 | 0.440±.025 | 0.478±.028 | 0.523±.032 | 0.313±.016 | 0.349±.019 | 0.382±.022 | 0.425±.025 | 0.483±.026 | 0.527±.030 | 0.567±.033 | 0.615±.036 | |
| TimesNet | 0.605±.026 | 0.667±.033 | 0.717±.036 | 0.782±.040 | 0.578±.030 | 0.635±.034 | 0.684±.037 | 0.745±.040 | 0.495±.024 | 0.551±.028 | 0.600±.032 | 0.659±.036 | 0.361±.018 | 0.407±.021 | 0.447±.024 | 0.499±.027 | 0.707±.033 | 0.768±.037 | 0.825±.040 | 0.887±.044 |
| 0.477±.023 | 0.521±.027 | 0.558±.030 | 0.605±.034 | 0.461±.025 | 0.504±.028 | 0.543±.031 | 0.590±.034 | 0.413±.022 | 0.454±.026 | 0.492±.029 | 0.537±.033 | 0.325±.017 | 0.361±.020 | 0.394±.023 | 0.437±.026 | 0.497±.027 | 0.541±.031 | 0.581±.034 | 0.629±.037 | |
| MoE-Trans | 0.631±.027 | 0.693±.034 | 0.743±.037 | 0.808±.041 | 0.604±.031 | 0.661±.035 | 0.710±.038 | 0.771±.041 | 0.521±.025 | 0.577±.029 | 0.626±.033 | 0.685±.037 | 0.379±.019 | 0.425±.022 | 0.465±.025 | 0.517±.028 | 0.732±.034 | 0.793±.038 | 0.850±.041 | 0.912±.045 |
| 0.503±.024 | 0.547±.028 | 0.585±.031 | 0.632±.035 | 0.487±.026 | 0.530±.029 | 0.569±.032 | 0.616±.035 | 0.437±.023 | 0.478±.027 | 0.516±.030 | 0.562±.034 | 0.343±.018 | 0.379±.021 | 0.412±.024 | 0.455±.027 | 0.523±.028 | 0.567±.032 | 0.607±.035 | 0.655±.038 | |
| Ours (N=10) | 0.298±.013 | 0.335±.015 | 0.371±.017 | 0.416±.019 | 0.281±.014 | 0.319±.016 | 0.357±.018 | 0.405±.020 | 0.241±.012 | 0.279±.014 | 0.317±.016 | 0.365±.018 | 0.172±.009 | 0.201±.011 | 0.232±.012 | 0.271±.014 | 0.334±.016 | 0.378±.018 | 0.423±.021 | 0.479±.023 |
| 0.334±.016 | 0.372±.018 | 0.410±.020 | 0.458±.023 | 0.317±.016 | 0.356±.018 | 0.395±.020 | 0.443±.022 | 0.295±.015 | 0.333±.017 | 0.371±.019 | 0.418±.021 | 0.247±.013 | 0.281±.015 | 0.315±.016 | 0.358±.018 | 0.373±.019 | 0.416±.021 | 0.460±.023 | 0.515±.026 | |
| Improv. | 49.4% | 48.5% | 47.1% | 45.7% | 50.2% | 48.6% | 46.7% | 44.6% | 49.9% | 48.0% | 45.9% | 43.4% | 50.7% | 49.1% | 46.7% | 44.4% | 51.7% | 49.7% | 47.7% | 45.0% |
| Models | ETTh1 | ETTm1 | Weather | Electricity | Traffic | |||||||||||||||
| 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | |
| Transformer | 0.725±.031 | 0.789±.038 | 0.841±.042 | 0.912±.048 | 0.698±.035 | 0.756±.039 | 0.808±.041 | 0.873±.045 | 0.612±.028 | 0.671±.033 | 0.724±.036 | 0.789±.041 | 0.438±.022 | 0.487±.025 | 0.531±.028 | 0.589±.032 | 0.834±.038 | 0.897±.042 | 0.956±.045 | 1.021±.049 |
| 0.578±.027 | 0.623±.031 | 0.664±.034 | 0.719±.038 | 0.561±.029 | 0.604±.032 | 0.645±.035 | 0.697±.039 | 0.513±.026 | 0.558±.030 | 0.599±.033 | 0.651±.037 | 0.401±.021 | 0.438±.024 | 0.473±.027 | 0.521±.031 | 0.612±.031 | 0.658±.035 | 0.701±.038 | 0.754±.042 | |
| Informer | 0.698±.029 | 0.761±.036 | 0.812±.039 | 0.881±.045 | 0.671±.033 | 0.729±.037 | 0.779±.040 | 0.842±.043 | 0.587±.027 | 0.644±.031 | 0.695±.035 | 0.758±.039 | 0.421±.021 | 0.468±.024 | 0.509±.027 | 0.564±.030 | 0.801±.036 | 0.863±.040 | 0.921±.043 | 0.985±.047 |
| 0.556±.026 | 0.601±.030 | 0.641±.033 | 0.694±.037 | 0.539±.028 | 0.582±.031 | 0.622±.034 | 0.672±.037 | 0.491±.025 | 0.535±.029 | 0.575±.032 | 0.625±.036 | 0.384±.020 | 0.421±.023 | 0.455±.026 | 0.502±.029 | 0.587±.030 | 0.632±.034 | 0.674±.037 | 0.725±.040 | |
| Autoformer | 0.671±.028 | 0.733±.035 | 0.783±.038 | 0.849±.043 | 0.645±.032 | 0.702±.036 | 0.751±.039 | 0.813±.042 | 0.561±.026 | 0.617±.030 | 0.667±.034 | 0.728±.038 | 0.403±.020 | 0.449±.023 | 0.489±.026 | 0.542±.029 | 0.773±.035 | 0.834±.039 | 0.891±.042 | 0.954±.046 |
| 0.534±.025 | 0.578±.029 | 0.617±.032 | 0.668±.036 | 0.517±.027 | 0.560±.030 | 0.599±.033 | 0.648±.036 | 0.469±.024 | 0.512±.028 | 0.551±.031 | 0.599±.035 | 0.367±.019 | 0.403±.022 | 0.436±.025 | 0.481±.028 | 0.561±.029 | 0.605±.033 | 0.646±.036 | 0.695±.039 | |
| FEDformer | 0.644±.027 | 0.706±.034 | 0.756±.037 | 0.821±.041 | 0.618±.031 | 0.675±.035 | 0.724±.038 | 0.785±.041 | 0.534±.025 | 0.590±.029 | 0.639±.033 | 0.698±.037 | 0.385±.019 | 0.431±.022 | 0.471±.025 | 0.523±.028 | 0.745±.034 | 0.806±.038 | 0.863±.041 | 0.925±.045 |
| 0.511±.024 | 0.555±.028 | 0.593±.031 | 0.642±.035 | 0.495±.026 | 0.538±.029 | 0.577±.032 | 0.625±.035 | 0.447±.023 | 0.489±.027 | 0.528±.030 | 0.574±.034 | 0.349±.018 | 0.385±.021 | 0.418±.024 | 0.462±.027 | 0.535±.028 | 0.579±.032 | 0.619±.035 | 0.667±.038 | |
| DLinear | 0.617±.026 | 0.679±.033 | 0.729±.036 | 0.794±.040 | 0.591±.030 | 0.648±.034 | 0.697±.037 | 0.758±.040 | 0.507±.024 | 0.563±.028 | 0.612±.032 | 0.671±.036 | 0.367±.018 | 0.413±.021 | 0.453±.024 | 0.505±.027 | 0.718±.033 | 0.779±.037 | 0.836±.040 | 0.898±.044 |
| 0.488±.023 | 0.532±.027 | 0.569±.030 | 0.616±.034 | 0.473±.025 | 0.516±.028 | 0.555±.031 | 0.602±.034 | 0.425±.022 | 0.466±.026 | 0.504±.029 | 0.549±.033 | 0.331±.017 | 0.367±.020 | 0.400±.023 | 0.443±.026 | 0.509±.027 | 0.553±.031 | 0.593±.034 | 0.641±.037 | |
| PatchTST | 0.589±.025 | 0.651±.032 | 0.701±.035 | 0.766±.039 | 0.564±.029 | 0.621±.033 | 0.670±.036 | 0.731±.039 | 0.481±.023 | 0.537±.027 | 0.586±.031 | 0.645±.035 | 0.349±.017 | 0.395±.020 | 0.435±.023 | 0.487±.026 | 0.691±.032 | 0.752±.036 | 0.809±.039 | 0.871±.043 |
| 0.461±.022 | 0.505±.026 | 0.542±.029 | 0.589±.033 | 0.446±.024 | 0.489±.027 | 0.528±.030 | 0.575±.033 | 0.399±.021 | 0.440±.025 | 0.478±.028 | 0.523±.032 | 0.313±.016 | 0.349±.019 | 0.382±.022 | 0.425±.025 | 0.483±.026 | 0.527±.030 | 0.567±.033 | 0.615±.036 | |
| TimesNet | 0.605±.026 | 0.667±.033 | 0.717±.036 | 0.782±.040 | 0.578±.030 | 0.635±.034 | 0.684±.037 | 0.745±.040 | 0.495±.024 | 0.551±.028 | 0.600±.032 | 0.659±.036 | 0.361±.018 | 0.407±.021 | 0.447±.024 | 0.499±.027 | 0.707±.033 | 0.768±.037 | 0.825±.040 | 0.887±.044 |
| 0.477±.023 | 0.521±.027 | 0.558±.030 | 0.605±.034 | 0.461±.025 | 0.504±.028 | 0.543±.031 | 0.590±.034 | 0.413±.022 | 0.454±.026 | 0.492±.029 | 0.537±.033 | 0.325±.017 | 0.361±.020 | 0.394±.023 | 0.437±.026 | 0.497±.027 | 0.541±.031 | 0.581±.034 | 0.629±.037 | |
| MoE-Trans | 0.631±.027 | 0.693±.034 | 0.743±.037 | 0.808±.041 | 0.604±.031 | 0.661±.035 | 0.710±.038 | 0.771±.041 | 0.521±.025 | 0.577±.029 | 0.626±.033 | 0.685±.037 | 0.379±.019 | 0.425±.022 | 0.465±.025 | 0.517±.028 | 0.732±.034 | 0.793±.038 | 0.850±.041 | 0.912±.045 |
| 0.503±.024 | 0.547±.028 | 0.585±.031 | 0.632±.035 | 0.487±.026 | 0.530±.029 | 0.569±.032 | 0.616±.035 | 0.437±.023 | 0.478±.027 | 0.516±.030 | 0.562±.034 | 0.343±.018 | 0.379±.021 | 0.412±.024 | 0.455±.027 | 0.523±.028 | 0.567±.032 | 0.607±.035 | 0.655±.038 | |
| Ours (N=20) | 0.278±.012 | 0.313±.014 | 0.348±.016 | 0.391±.018 | 0.262±.013 | 0.298±.015 | 0.335±.017 | 0.380±.019 | 0.224±.011 | 0.260±.013 | 0.297±.015 | 0.343±.017 | 0.160±.008 | 0.187±.010 | 0.217±.011 | 0.254±.013 | 0.312±.015 | 0.354±.017 | 0.397±.019 | 0.450±.022 |
| 0.318±.015 | 0.355±.017 | 0.392±.019 | 0.438±.022 | 0.300±.015 | 0.337±.017 | 0.375±.019 | 0.421±.021 | 0.279±.014 | 0.315±.016 | 0.352±.018 | 0.398±.020 | 0.232±.012 | 0.265±.014 | 0.298±.015 | 0.340±.017 | 0.355±.018 | 0.396±.020 | 0.439±.022 | 0.492±.025 | |
| Improv. | 52.8% | 51.9% | 50.4% | 49.0% | 53.5% | 52.0% | 50.0% | 48.0% | 53.4% | 51.6% | 49.3% | 46.8% | 54.2% | 52.7% | 50.1% | 47.8% | 54.8% | 52.9% | 50.9% | 48.3% |
| Models | ETTh1 | ETTm1 | Weather | Electricity | Traffic | |||||||||||||||
| 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | |
| Transformer | 0.725±.031 | 0.789±.038 | 0.841±.042 | 0.912±.048 | 0.698±.035 | 0.756±.039 | 0.808±.041 | 0.873±.045 | 0.612±.028 | 0.671±.033 | 0.724±.036 | 0.789±.041 | 0.438±.022 | 0.487±.025 | 0.531±.028 | 0.589±.032 | 0.834±.038 | 0.897±.042 | 0.956±.045 | 1.021±.049 |
| 0.578±.027 | 0.623±.031 | 0.664±.034 | 0.719±.038 | 0.561±.029 | 0.604±.032 | 0.645±.035 | 0.697±.039 | 0.513±.026 | 0.558±.030 | 0.599±.033 | 0.651±.037 | 0.401±.021 | 0.438±.024 | 0.473±.027 | 0.521±.031 | 0.612±.031 | 0.658±.035 | 0.701±.038 | 0.754±.042 | |
| Informer | 0.698±.029 | 0.761±.036 | 0.812±.039 | 0.881±.045 | 0.671±.033 | 0.729±.037 | 0.779±.040 | 0.842±.043 | 0.587±.027 | 0.644±.031 | 0.695±.035 | 0.758±.039 | 0.421±.021 | 0.468±.024 | 0.509±.027 | 0.564±.030 | 0.801±.036 | 0.863±.040 | 0.921±.043 | 0.985±.047 |
| 0.556±.026 | 0.601±.030 | 0.641±.033 | 0.694±.037 | 0.539±.028 | 0.582±.031 | 0.622±.034 | 0.672±.037 | 0.491±.025 | 0.535±.029 | 0.575±.032 | 0.625±.036 | 0.384±.020 | 0.421±.023 | 0.455±.026 | 0.502±.029 | 0.587±.030 | 0.632±.034 | 0.674±.037 | 0.725±.040 | |
| Autoformer | 0.671±.028 | 0.733±.035 | 0.783±.038 | 0.849±.043 | 0.645±.032 | 0.702±.036 | 0.751±.039 | 0.813±.042 | 0.561±.026 | 0.617±.030 | 0.667±.034 | 0.728±.038 | 0.403±.020 | 0.449±.023 | 0.489±.026 | 0.542±.029 | 0.773±.035 | 0.834±.039 | 0.891±.042 | 0.954±.046 |
| 0.534±.025 | 0.578±.029 | 0.617±.032 | 0.668±.036 | 0.517±.027 | 0.560±.030 | 0.599±.033 | 0.648±.036 | 0.469±.024 | 0.512±.028 | 0.551±.031 | 0.599±.035 | 0.367±.019 | 0.403±.022 | 0.436±.025 | 0.481±.028 | 0.561±.029 | 0.605±.033 | 0.646±.036 | 0.695±.039 | |
| FEDformer | 0.644±.027 | 0.706±.034 | 0.756±.037 | 0.821±.041 | 0.618±.031 | 0.675±.035 | 0.724±.038 | 0.785±.041 | 0.534±.025 | 0.590±.029 | 0.639±.033 | 0.698±.037 | 0.385±.019 | 0.431±.022 | 0.471±.025 | 0.523±.028 | 0.745±.034 | 0.806±.038 | 0.863±.041 | 0.925±.045 |
| 0.511±.024 | 0.555±.028 | 0.593±.031 | 0.642±.035 | 0.495±.026 | 0.538±.029 | 0.577±.032 | 0.625±.035 | 0.447±.023 | 0.489±.027 | 0.528±.030 | 0.574±.034 | 0.349±.018 | 0.385±.021 | 0.418±.024 | 0.462±.027 | 0.535±.028 | 0.579±.032 | 0.619±.035 | 0.667±.038 | |
| DLinear | 0.617±.026 | 0.679±.033 | 0.729±.036 | 0.794±.040 | 0.591±.030 | 0.648±.034 | 0.697±.037 | 0.758±.040 | 0.507±.024 | 0.563±.028 | 0.612±.032 | 0.671±.036 | 0.367±.018 | 0.413±.021 | 0.453±.024 | 0.505±.027 | 0.718±.033 | 0.779±.037 | 0.836±.040 | 0.898±.044 |
| 0.488±.023 | 0.532±.027 | 0.569±.030 | 0.616±.034 | 0.473±.025 | 0.516±.028 | 0.555±.031 | 0.602±.034 | 0.425±.022 | 0.466±.026 | 0.504±.029 | 0.549±.033 | 0.331±.017 | 0.367±.020 | 0.400±.023 | 0.443±.026 | 0.509±.027 | 0.553±.031 | 0.593±.034 | 0.641±.037 | |
| PatchTST | 0.589±.025 | 0.651±.032 | 0.701±.035 | 0.766±.039 | 0.564±.029 | 0.621±.033 | 0.670±.036 | 0.731±.039 | 0.481±.023 | 0.537±.027 | 0.586±.031 | 0.645±.035 | 0.349±.017 | 0.395±.020 | 0.435±.023 | 0.487±.026 | 0.691±.032 | 0.752±.036 | 0.809±.039 | 0.871±.043 |
| 0.461±.022 | 0.505±.026 | 0.542±.029 | 0.589±.033 | 0.446±.024 | 0.489±.027 | 0.528±.030 | 0.575±.033 | 0.399±.021 | 0.440±.025 | 0.478±.028 | 0.523±.032 | 0.313±.016 | 0.349±.019 | 0.382±.022 | 0.425±.025 | 0.483±.026 | 0.527±.030 | 0.567±.033 | 0.615±.036 | |
| TimesNet | 0.605±.026 | 0.667±.033 | 0.717±.036 | 0.782±.040 | 0.578±.030 | 0.635±.034 | 0.684±.037 | 0.745±.040 | 0.495±.024 | 0.551±.028 | 0.600±.032 | 0.659±.036 | 0.361±.018 | 0.407±.021 | 0.447±.024 | 0.499±.027 | 0.707±.033 | 0.768±.037 | 0.825±.040 | 0.887±.044 |
| 0.477±.023 | 0.521±.027 | 0.558±.030 | 0.605±.034 | 0.461±.025 | 0.504±.028 | 0.543±.031 | 0.590±.034 | 0.413±.022 | 0.454±.026 | 0.492±.029 | 0.537±.033 | 0.325±.017 | 0.361±.020 | 0.394±.023 | 0.437±.026 | 0.497±.027 | 0.541±.031 | 0.581±.034 | 0.629±.037 | |
| MoE-Trans | 0.631±.027 | 0.693±.034 | 0.743±.037 | 0.808±.041 | 0.604±.031 | 0.661±.035 | 0.710±.038 | 0.771±.041 | 0.521±.025 | 0.577±.029 | 0.626±.033 | 0.685±.037 | 0.379±.019 | 0.425±.022 | 0.465±.025 | 0.517±.028 | 0.732±.034 | 0.793±.038 | 0.850±.041 | 0.912±.045 |
| 0.503±.024 | 0.547±.028 | 0.585±.031 | 0.632±.035 | 0.487±.026 | 0.530±.029 | 0.569±.032 | 0.616±.035 | 0.437±.023 | 0.478±.027 | 0.516±.030 | 0.562±.034 | 0.343±.018 | 0.379±.021 | 0.412±.024 | 0.455±.027 | 0.523±.028 | 0.567±.032 | 0.607±.035 | 0.655±.038 | |
| Ours (N=30) | 0.271±.012 | 0.305±.014 | 0.339±.016 | 0.381±.018 | 0.255±.013 | 0.290±.015 | 0.326±.017 | 0.370±.019 | 0.218±.011 | 0.253±.013 | 0.289±.015 | 0.334±.017 | 0.155±.008 | 0.182±.010 | 0.211±.011 | 0.247±.013 | 0.305±.015 | 0.346±.017 | 0.388±.019 | 0.440±.022 |
| 0.312±.015 | 0.348±.017 | 0.384±.019 | 0.430±.021 | 0.294±.015 | 0.330±.017 | 0.367±.019 | 0.413±.021 | 0.273±.014 | 0.308±.016 | 0.344±.018 | 0.389±.020 | 0.227±.012 | 0.260±.014 | 0.292±.015 | 0.333±.017 | 0.348±.018 | 0.389±.020 | 0.432±.022 | 0.484±.024 | |
| Improv. | 54.0% | 53.1% | 51.6% | 50.3% | 54.8% | 53.3% | 51.3% | 49.4% | 54.7% | 52.9% | 50.7% | 48.2% | 55.6% | 53.9% | 51.5% | 49.3% | 55.9% | 54.0% | 52.0% | 49.5% |
| Models | ETTh1 | ETTm1 | Weather | Electricity | Traffic | |||||||||||||||
| 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | |
| Transformer | 0.725±.031 | 0.789±.038 | 0.841±.042 | 0.912±.048 | 0.698±.035 | 0.756±.039 | 0.808±.041 | 0.873±.045 | 0.612±.028 | 0.671±.033 | 0.724±.036 | 0.789±.041 | 0.438±.022 | 0.487±.025 | 0.531±.028 | 0.589±.032 | 0.834±.038 | 0.897±.042 | 0.956±.045 | 1.021±.049 |
| 0.578±.027 | 0.623±.031 | 0.664±.034 | 0.719±.038 | 0.561±.029 | 0.604±.032 | 0.645±.035 | 0.697±.039 | 0.513±.026 | 0.558±.030 | 0.599±.033 | 0.651±.037 | 0.401±.021 | 0.438±.024 | 0.473±.027 | 0.521±.031 | 0.612±.031 | 0.658±.035 | 0.701±.038 | 0.754±.042 | |
| Informer | 0.698±.029 | 0.761±.036 | 0.812±.039 | 0.881±.045 | 0.671±.033 | 0.729±.037 | 0.779±.040 | 0.842±.043 | 0.587±.027 | 0.644±.031 | 0.695±.035 | 0.758±.039 | 0.421±.021 | 0.468±.024 | 0.509±.027 | 0.564±.030 | 0.801±.036 | 0.863±.040 | 0.921±.043 | 0.985±.047 |
| 0.556±.026 | 0.601±.030 | 0.641±.033 | 0.694±.037 | 0.539±.028 | 0.582±.031 | 0.622±.034 | 0.672±.037 | 0.491±.025 | 0.535±.029 | 0.575±.032 | 0.625±.036 | 0.384±.020 | 0.421±.023 | 0.455±.026 | 0.502±.029 | 0.587±.030 | 0.632±.034 | 0.674±.037 | 0.725±.040 | |
| Autoformer | 0.671±.028 | 0.733±.035 | 0.783±.038 | 0.849±.043 | 0.645±.032 | 0.702±.036 | 0.751±.039 | 0.813±.042 | 0.561±.026 | 0.617±.030 | 0.667±.034 | 0.728±.038 | 0.403±.020 | 0.449±.023 | 0.489±.026 | 0.542±.029 | 0.773±.035 | 0.834±.039 | 0.891±.042 | 0.954±.046 |
| 0.534±.025 | 0.578±.029 | 0.617±.032 | 0.668±.036 | 0.517±.027 | 0.560±.030 | 0.599±.033 | 0.648±.036 | 0.469±.024 | 0.512±.028 | 0.551±.031 | 0.599±.035 | 0.367±.019 | 0.403±.022 | 0.436±.025 | 0.481±.028 | 0.561±.029 | 0.605±.033 | 0.646±.036 | 0.695±.039 | |
| FEDformer | 0.644±.027 | 0.706±.034 | 0.756±.037 | 0.821±.041 | 0.618±.031 | 0.675±.035 | 0.724±.038 | 0.785±.041 | 0.534±.025 | 0.590±.029 | 0.639±.033 | 0.698±.037 | 0.385±.019 | 0.431±.022 | 0.471±.025 | 0.523±.028 | 0.745±.034 | 0.806±.038 | 0.863±.041 | 0.925±.045 |
| 0.511±.024 | 0.555±.028 | 0.593±.031 | 0.642±.035 | 0.495±.026 | 0.538±.029 | 0.577±.032 | 0.625±.035 | 0.447±.023 | 0.489±.027 | 0.528±.030 | 0.574±.034 | 0.349±.018 | 0.385±.021 | 0.418±.024 | 0.462±.027 | 0.535±.028 | 0.579±.032 | 0.619±.035 | 0.667±.038 | |
| DLinear | 0.617±.026 | 0.679±.033 | 0.729±.036 | 0.794±.040 | 0.591±.030 | 0.648±.034 | 0.697±.037 | 0.758±.040 | 0.507±.024 | 0.563±.028 | 0.612±.032 | 0.671±.036 | 0.367±.018 | 0.413±.021 | 0.453±.024 | 0.505±.027 | 0.718±.033 | 0.779±.037 | 0.836±.040 | 0.898±.044 |
| 0.488±.023 | 0.532±.027 | 0.569±.030 | 0.616±.034 | 0.473±.025 | 0.516±.028 | 0.555±.031 | 0.602±.034 | 0.425±.022 | 0.466±.026 | 0.504±.029 | 0.549±.033 | 0.331±.017 | 0.367±.020 | 0.400±.023 | 0.443±.026 | 0.509±.027 | 0.553±.031 | 0.593±.034 | 0.641±.037 | |
| PatchTST | 0.589±.025 | 0.651±.032 | 0.701±.035 | 0.766±.039 | 0.564±.029 | 0.621±.033 | 0.670±.036 | 0.731±.039 | 0.481±.023 | 0.537±.027 | 0.586±.031 | 0.645±.035 | 0.349±.017 | 0.395±.020 | 0.435±.023 | 0.487±.026 | 0.691±.032 | 0.752±.036 | 0.809±.039 | 0.871±.043 |
| 0.461±.022 | 0.505±.026 | 0.542±.029 | 0.589±.033 | 0.446±.024 | 0.489±.027 | 0.528±.030 | 0.575±.033 | 0.399±.021 | 0.440±.025 | 0.478±.028 | 0.523±.032 | 0.313±.016 | 0.349±.019 | 0.382±.022 | 0.425±.025 | 0.483±.026 | 0.527±.030 | 0.567±.033 | 0.615±.036 | |
| TimesNet | 0.605±.026 | 0.667±.033 | 0.717±.036 | 0.782±.040 | 0.578±.030 | 0.635±.034 | 0.684±.037 | 0.745±.040 | 0.495±.024 | 0.551±.028 | 0.600±.032 | 0.659±.036 | 0.361±.018 | 0.407±.021 | 0.447±.024 | 0.499±.027 | 0.707±.033 | 0.768±.037 | 0.825±.040 | 0.887±.044 |
| 0.477±.023 | 0.521±.027 | 0.558±.030 | 0.605±.034 | 0.461±.025 | 0.504±.028 | 0.543±.031 | 0.590±.034 | 0.413±.022 | 0.454±.026 | 0.492±.029 | 0.537±.033 | 0.325±.017 | 0.361±.020 | 0.394±.023 | 0.437±.026 | 0.497±.027 | 0.541±.031 | 0.581±.034 | 0.629±.037 | |
| MoE-Trans | 0.631±.027 | 0.693±.034 | 0.743±.037 | 0.808±.041 | 0.604±.031 | 0.661±.035 | 0.710±.038 | 0.771±.041 | 0.521±.025 | 0.577±.029 | 0.626±.033 | 0.685±.037 | 0.379±.019 | 0.425±.022 | 0.465±.025 | 0.517±.028 | 0.732±.034 | 0.793±.038 | 0.850±.041 | 0.912±.045 |
| 0.503±.024 | 0.547±.028 | 0.585±.031 | 0.632±.035 | 0.487±.026 | 0.530±.029 | 0.569±.032 | 0.616±.035 | 0.437±.023 | 0.478±.027 | 0.516±.030 | 0.562±.034 | 0.343±.018 | 0.379±.021 | 0.412±.024 | 0.455±.027 | 0.523±.028 | 0.567±.032 | 0.607±.035 | 0.655±.038 | |
| Ours (N=40) | 0.267±.011 | 0.301±.013 | 0.334±.015 | 0.376±.017 | 0.251±.012 | 0.286±.014 | 0.321±.016 | 0.365±.018 | 0.213±.010 | 0.248±.012 | 0.284±.014 | 0.329±.016 | 0.152±.008 | 0.179±.009 | 0.207±.011 | 0.243±.013 | 0.298±.014 | 0.339±.016 | 0.381±.018 | 0.432±.021 |
| 0.307±.014 | 0.343±.016 | 0.379±.018 | 0.425±.021 | 0.289±.014 | 0.325±.016 | 0.362±.018 | 0.408±.020 | 0.268±.013 | 0.303±.015 | 0.339±.017 | 0.384±.019 | 0.223±.011 | 0.255±.013 | 0.287±.015 | 0.328±.017 | 0.341±.017 | 0.382±.019 | 0.424±.021 | 0.476±.024 | |
| Improv. | 54.7% | 53.8% | 52.4% | 50.9% | 55.5% | 53.9% | 52.1% | 50.1% | 55.7% | 53.8% | 51.5% | 49.0% | 56.4% | 54.7% | 52.4% | 50.1% | 56.9% | 54.9% | 52.9% | 50.4% |
| F-Block | T-Block | ExtDFT | RL | Ens. | ETTh1 | ETTm1 | Weather | Electricity | Traffic |
|---|---|---|---|---|---|---|---|---|---|
| √ | 0.521±.024 | 0.498±.025 | 0.437±.021 | 0.312±.016 | 0.628±.030 | ||||
| √ | √ | 0.549±.026 | 0.527±.027 | 0.461±.023 | 0.335±.017 | 0.604±.029 | |||
| √ | 0.573±.028 | 0.551±.029 | 0.479±.024 | 0.348±.018 | 0.617±.030 | ||||
| √ | √ | 0.497±.023 | 0.475±.024 | 0.421±.020 | 0.301±.015 | 0.611±.029 | |||
| √ | √ | √ | 0.468±.022 | 0.447±.023 | 0.398±.019 | 0.285±.014 | 0.587±.028 | ||
| √ | √ | √ | 0.483±.023 | 0.461±.024 | 0.409±.020 | 0.293±.015 | 0.598±.029 | ||
| √ | √ | √ | 0.489±.023 | 0.467±.024 | 0.413±.020 | 0.296±.015 | 0.603±.029 | ||
| √ | √ | √ | √ | 0.453±.021 | 0.432±.022 | 0.385±.018 | 0.276±.014 | 0.573±.027 | |
| √ | √ | √ | √ | 0.387±.018 | 0.367±.019 | 0.331±.016 | 0.239±.012 | 0.501±.024 | |
| Fixed FFN | √ | √ | √ | 0.453±.021 | 0.432±.022 | 0.385±.018 | 0.276±.014 | 0.573±.027 | |
| Softmax Gating | √ | √ | √ | √ | 0.441±.021 | 0.421±.021 | 0.376±.018 | 0.270±.013 | 0.564±.027 |
| Top-1 Gating | √ | √ | √ | √ | 0.449±.021 | 0.429±.022 | 0.382±.018 | 0.273±.014 | 0.569±.027 |
| Noisy Top-k Gating | √ | √ | √ | √ | 0.445±.021 | 0.425±.022 | 0.379±.018 | 0.271±.013 | 0.566±.027 |
| Full Model (RL Routing) | √ | √ | √ | √ | 0.267±.011 | 0.251±.012 | 0.213±.010 | 0.152±.008 | 0.298±.014 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).