Submitted:
07 October 2025
Posted:
08 October 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- (1)
- TAGAT-LSTM-trans mitigates the shortcomings of GAT in capturing road spatial features by introducing a transfer probability matrix, which represents the ability of road nodes to transmit and receive traffic flow features based on their current state. Additionally, a distance decay matrix captures the diminishing spatial dependencies between nodes as distance increases. By integrating these matrices into GAT, the model more effectively extracts spatial features in road networks.
- (2)
- TAGAT-LSTM-trans incorporates a gating network layer to bridge spatial and temporal feature extraction. This structure enables the model to dynamically integrate spatial features from different time intervals, capturing temporal dynamics and enhancing its capability to model complex traffic patterns.
- (3)
- TAGAT-LSTM-trans combines LSTM with a Transformer Encoder module, leveraging the Transformer's strong global feature extraction capabilities and LSTM's proficiency in handling extended temporal sequences. This fusion allows the model to detect local and global temporal variations, resulting in more detailed and comprehensive traffic flow features.
2. Related work
3. Methods
3.1. TAGAT-LSTM-Trans Deep Learning Framework
- (1)
- Inputs module: This module processes raw input data through smoothing and normalization, ensuring the data is properly preconditioned for subsequent modelling stages.
- (2)
- Spatial feature extraction module (TA-GAT): This module integrates a GAT, a transfer probability matrix, and a distance decay matrix. It is responsible for modelling and capturing the spatial dependencies within the road network, accounting for interactions between road nodes.
- (3)
- Gating Network module: Comprising multiple gating networks, this module performs preliminary temporal aggregation of the spatial features extracted by the TA-GAT module. It enhances the model's ability to capture temporal variations in spatial dependencies.
- (4)
- Temporal feature extraction module (LSTM-Trans): This module combines a Long-Short Term Memory (LSTM) network with a Transformer Encoder layer, effectively capturing temporal features in historical traffic sequences by modelling local and global temporal dynamics.
- (5)
- Training and output module: This module uses a fully connected (FC) layer to map the extracted spatio-temporal feature vectors to the prediction outcomes. The predicted results are then denormalized to revert them to their original scale. Finally, the model is trained using a loss function to optimize prediction accuracy.
3.2. Inputs Module
3.2.1. Smooth
3.2.2. Normalization
3.3. Spatial Feature Extraction Module
3.3.1. GAT
3.3.2. Transfer Probability Matrix
3.3.3. Distance Decay Matrix
3.3.4. Fusion
3.4. Gating Network Module
3.5. Temporal Feature Extraction Module
3.6. Training and Output Module
4. Experiments and Analysis
4.1. Datasets
4.2. Settings
4.3. Baselines
- (1)
- ARIMA [29]: Autoregressive integrated moving average (ARIMA) model is a traditional temporal sequence analysis method widely used for short-term forecasting, particularly effective in handling linear trends and seasonal variations in data.
- (2)
- VAR [30]: Vector auto-regressive (VAR) model is a multivariate temporal sequence model used to analyze several interdependent temporal sequence data and the relationships among their components.
- (3)
- LSTM [31]: LSTM is a specialized type of RNN that effectively handles long-term dependency issues by introducing memory cells and forgetting mechanisms.
- (4)
- DCRNN [32]: Diffusion convolutional recurrent neural network (DCRNN) utilizes bidirectional random walks on a graph and recurrent neural networks to learn the spatio-temporal features of traffic flow.
- (5)
- ASTGCN(r) [33]: Attention-based spatial-temporal graph convolutional networks (ASTGCN) integrate the spatio-temporal attention mechanism and convolution, capturing time-period dependencies across different time scales, including recent, daily, and weekly, through three components, with their outputs combined to generate the final predictions. For fairness, only the temporal block of the recent cycle is utilized to simulate periodicity.
- (6)
- STDSGNN [34]: Spatial-temporal dynamic semantic graph neural network (STDSGNN) constructs two types of semantic adjacency matrices using dynamic time warping and Pearson correlation, incorporates a dynamic aggregation method for feature weighting, and employs an injection-stacked structure to reduce over-smoothing and improve forecasting accuracy.
4.4. Results Analysis
4.5. Ablation Experiments
- (1)
- Basic: This variant removes the transfer-aware (TA), gating network (GN), and Transformer Encoder (trans) modules, relying solely on GAT and LSTM to learn the spatio-temporal dependencies. It represents the most basic model configuration.
- (2)
- TA+GN: This variant omits the trans module to evaluate the necessity of capturing global temporal dependencies in traffic flow.
- (3)
- GN+trans: This variant excludes the TA module, assessing the importance of considering the transmission capacity of traffic features between nodes when aggregating spatial features from neighbouring nodes.
- (4)
- TA+trans: This variant eliminates the GN module to evaluate the impact of integrating spatial features of traffic flow from adjacent time steps.
5. Conclusion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Cao, S., et al. 2022. A spatio-temporal sequence-to-sequence network for traffic flow prediction. Information Sciences, 610, 185-203. [CrossRef]
- Zhang, J., et al. 2018. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artificial Intelligence, 259, 147-166.
- Lv, M., et al. 2020. Temporal multi-graph convolutional network for traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems, 22(6), 3337-3348. [CrossRef]
- Boukerche, A. and Wang, J. 2020. Machine learning-based traffic prediction models for intelligent transportation systems. Computer Networks, 181, 107530.
- Zhuang, W. and Cao, Y. 2022. Short-term traffic flow prediction based on cnn-bilstm with multicomponent information. Applied Sciences, 12(17), 8714.
- Lecun, Y., Bengio, Y. and Hinton, G. 2015. Deep learning. nature, 521(7553), 436-444.
- Ren, C., et al. 2021. Short-Term Traffic Flow Prediction: A Method of Combined Deep Learnings. Journal of Advanced Transportation, 2021(1), 9928073.
- Zhang, W., et al., 2019b. Short-term traffic flow prediction based on spatio-temporal analysis and CNN deep learning [online]. [Accessed 2 15].
- Fu, R., Zhang, Z. and Li, L., Using LSTM and GRU neural network methods for traffic flow prediction. ed. 2016 31st Youth academic annual conference of Chinese association of automation (YAC), 2016, 324-328.
- Zheng, C., et al. 2019. DeepSTD: Mining spatio-temporal disturbances of multiple context factors for citywide traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems, 21(9), 3744-3755.
- Narmadha, S. and Vijayakumar, V. 2023. Spatio-Temporal vehicle traffic flow prediction using multivariate CNN and LSTM model. Materials today: proceedings, 81, 826-833.
- Peng, H., et al. 2021. Dynamic graph convolutional network for long-term traffic flow prediction with reinforcement learning. Information Sciences, 578, 401-416.
- Luan, S., et al. 2022. Traffic congestion propagation inference using dynamic Bayesian graph convolution network. Transportation research part C: emerging technologies, 135, 103526.
- Qu, Z., Liu, X. and Zheng, M. 2022. Temporal-spatial quantum graph convolutional neural network based on Schrödinger approach for traffic congestion prediction. IEEE Transactions on Intelligent Transportation Systems, 24(8), 8677-8686.
- Veličković, P., et al. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903.
- Zhang, C., James, J. and Liu, Y. 2019a. Spatial-temporal graph attention networks: A deep learning approach for traffic forecasting. Ieee Access, 7, 166246-166256.
- Li, D. and Lasenby, J. 2021. Spatiotemporal attention-based graph convolution network for segment-level traffic prediction. IEEE Transactions on Intelligent Transportation Systems, 23(7), 8337-8345.
- Reza, S., et al. 2022. A multi-head attention-based transformer model for traffic flow forecasting with a comparative analysis to recurrent neural networks. Expert Systems with Applications, 202, 117275.
- Chen, C., et al. 2022. Bidirectional spatial-temporal adaptive transformer for urban traffic flow forecasting. IEEE Transactions on Neural Networks and Learning Systems, 34(10), 6913-6925.
- Ji, J., et al., Spatio-temporal self-supervised learning for traffic flow prediction. ed. Proceedings of the AAAI conference on artificial intelligence, 2023, 4356-4364.
- Jiang, J., et al., Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction. ed. Proceedings of the AAAI conference on artificial intelligence, 2023, 4365-4373.
- Bao, Y., et al. 2023. Spatial–temporal complex graph convolution network for traffic flow prediction. Engineering Applications of Artificial Intelligence, 121, 106044.
- Chen, X., et al. 2020. Sensing data supported traffic flow prediction via denoising schemes and ANN: A comparison. IEEE Sensors Journal, 20(23), 14317-14328.
- Vaswani, A. 2017. Attention is all you need. Advances in neural information processing systems, 30.
- Gers, F. A., Schmidhuber, J. and Cummins, F. 2000. Learning to forget: Continual prediction with LSTM. Neural computation, 12(10), 2451-2471.
- An, Q. and Dong, M. 2024. Design and Case Study of Long Short Term Modeling for Next POI Recommendation. International Journal of Engineering Research And Management, 11(06), 18-21.
- Li, S., et al. 2024. Unifying Lane-Level Traffic Prediction from a Graph Structural Perspective: Benchmark and Baseline. arXiv preprint arXiv:2403.14941.
- Kingma, D. P. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Williams, B. M. and Hoel, L. A. 2003. Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results. Journal of transportation engineering, 129(6), 664-672.
- Zivot, E. and Wang, J. 2006. Vector autoregressive models for multivariate time series. Modeling financial time series with S-PLUS®, 385-429.
- Hochreiter, S. 1997. Long Short-term Memory. Neural computation, 9(8), 1735-1780.
- Li, Y., et al. 2017. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926.
- Guo, S., et al., Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. ed. Proceedings of the AAAI conference on artificial intelligence, 2019, 922-929.
- Zhang, R., et al. 2022. Spatial-temporal dynamic semantic graph neural network. Neural Computing and Applications, 34(19), 16655-16668.










| Datasets | Sensors | Edges | Time |
|---|---|---|---|
| PeMS03 | 358 | 547 | 09/01/2018 - 11/30/2018 |
| PeMS04 | 307 | 340 | 01/01/2018 - 02/28/2018 |
| Module | Hyperparameters | Numbers |
|---|---|---|
| TAGAT | Hidden units | 32 |
| Attention heads | 2 | |
| LSTM | Hidden units | 256 |
| Layers | 2 | |
| Transformer | Hidden units | 256 |
| Attention heads | 4 | |
| Other | Batch size | 50 |
| Learning rate | 5e-4 | |
| Dropout | 0.1 |
| Model | PeMS03 | PeMS04 | ||
| МАЕ | RMSE | MAE | RMSE | |
| ARIMA | 23.07 | 40.62 | 37.84 | 59.03 |
| VAR | 23.65 | 38.26 | 33.76 | 51.73 |
| LSTM | 21.33 | 35.11 | 27.14 | 41.59 |
| DCRNN | 18.18 | 30.31 | 24.70 | 38.12 |
| ASTGCN(r) | 17.69 | 29.66 | 22.93 | 35.22 |
| STDSGNN | 16.12 | 25.59 | 20.67 | 32.40 |
| TAGAT-LSTM-trans | 14.38 | 25.09 | 19.29 | 31.58 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).