Submitted:
15 July 2025
Posted:
22 July 2025
You are already at the latest version
Abstract
Keywords:
MSC: 68T07; 68R10; 91G45
1. Introduction
- Causal Temporal Modeling Mechanism The Gated Causal Convolution network is adopted, strictly following the causal constraints of financial time series to avoid the leakage of future information.
- Dynamic Industry Association Graph Attention Model An adaptive graph attention architecture that integrates industry rotation factors is constructed. Through the edge - attribute attention mechanism of GATv2, the association weights between financial entities are dynamically learned. The model takes industry classification information and market correlation matrices as edge - attribute inputs, and through the attention mechanism, it automatically distinguishes the true associations driven by industry fundamentals from the false correlations caused by market sentiment.
- Multi - scale Industry Sector MIMO Framework A multi - input multi - output prediction architecture based on industry sectors is designed, breaking through the limitations of single - asset independent modeling. Through the industry grouping mechanism, homogeneous financial entities are clustered into sub - graphs, and the graph attention layer is used to capture the intra - group synergy and inter - group transmission relationships.
2. Literature Review
3. Theory Fundamentals
3.1. Graph Theory and Graph Convolutional Networks
3.2. Convolutional Neural Network
3.3. The Fusion of Spatio-Temporal Graph Convolutional Network
4. STGAT: Model Structure and Innovation
4.1. Temporal Convolutional Layer
- Causal Conv: Causal convolution [33] is the core operation of the temporal convolutional layer. It strictly adheres to temporal causality, ensuring that the output at each time step depends solely on historical and current inputs, excluding future data. This design supports accurate forecasting in tasks such as time-series prediction and action segmentation.
-
Gating Mechanism: The Gated Linear Unit (GLU) [34] controls information flow through a gating mechanism, defined as:The input undergoes two linear transformations: one produces a feature vector, and the other generates a gating signal (0 to 1) via the Sigmoid function. The gating signal is element-wise multiplied by the feature vector to selectively filter relevant features. For efficiency, the input can be split along the feature dimension. GLU’s flexibility and variants enable its application to diverse tasks, enhancing the model’s ability to focus on critical information.This model leverages the Gated Linear Unit (GLU) to optimize temporal feature processing.The temporal convolution layer employs a dual-branch structure. One branch applies Sigmoid activation to the causal convolution output, generating a gating coefficient between 0 and 1 to regulate information flow. The other branch processes the causal convolution output to produce the main feature, which is gated by the coefficient via the Hadamard product and combined with the residual component.During feature fusion, the residual component is optimized to enhance effective temporal features. Residual connections stabilize training and preserve essential features, making this approach a spatio-temporal variant of the GLU mechanism.
4.2. Spatial Convolution Layer
-
GATv2: GATv2 (Graph Attention Network v2) [24] is an improved version of the traditional Graph Attention Network (GAT), and the core innovation is to solve the "masking bias" problem of the attention mechanism in the original GAT. Traditional GAT applies masks to invisible nodes (e.g. non-neighbour nodes) when calculating the attention weights, which leads to bias in the attention calculation process. GATv2, on the other hand, by redesigning the calculation of the attention mechanism, makes the model no longer rely on masks when calculating the attention weights, so that it can deal with all the nodes in a fairer way, and improves the model’s expressive and generalisation abilities. This optimization builds on advanced graph attention techniques, enhancing spatial relationship modeling in STGAT [35].The core operation of GATv2 is represented as follows:where and represent the feature vectors of nodes i and j, is a learnable weight matrix, and represents the attention mechanism.
- Explicit Edge Attributes: Traditional GATv2 [24] relies solely on node features, limiting its use of edge semantic information, graph structure, and domain knowledge. By incorporating edge attributes, such as industry correlation weights and stock relevance coefficients, into the GATv2 layer’s attention calculation, our model enhances domain knowledge integration and graph structure modeling. This approach dynamically adjusts attention weights, shifting from node-centered to edge-node synergistic modeling. It effectively addresses complex applications, such as financial modeling, where edge attributes carry rich semantic information.
4.3. Output Layer
5. Experimental Design and Process
5.1. Data Description
- Basic trading indicators: Open, High, Low, Close, Volume
- Technical indicators: EMA (Exponential Moving Average), RSI_14 (Relative Strength Index with 14-day period)
- Daily return: Return
5.2. Data Processes
- Each window contains features from to t
- The label for each window is the target value at time
- If future data is unavailable, the label is set to 0
5.3. Graph Structure Construction
5.3.1. Node Feature Construction
5.3.2. Spatial Edge Construction
- Industry-based Edges: Encode domain prior knowledge. Stocks in the same sector are connected with a fixed weight of :
- Correlation-based Edges: Data-driven edges computed from Pearson correlation coefficients of daily closing prices:where is the Pearson correlation coefficient between the closing prices of and .
5.3.3. Temporal Extension
| Algorithm 1: Spatio-temporal Graph Construction |
|
Input: Industry information I, Price correlation matrix C, Window size w, Number of stocks N
Output: Spatio-temporal edge set E with combined weights
|
5.4. Experimental Setup and Evaluation
5.4.1. Data Partition and Optimization Strategy
- Learning rate:
- Weight decay:
- AMSGrad variant enabled for training stability.
5.4.2. Loss Function and Evaluation Metrics
5.5. Comparison Models in the Experiment
- SARIMA: As a classic time series analysis method, SARIMA performs well in handling data with seasonality and trend. The model structure constructed based on statistical principles can effectively capture the internal laws of the data.
- LSTM: As a powerful recurrent neural network, LSTM solves the problems of gradient vanishing and gradient explosion in traditional recurrent neural networks by introducing a gating mechanism, and can better handle the dependency relationships in long - sequence data.
6. Experimental Results and Analysis
6.1. Performance on Commercial Banking Sector
6.2. Performance on Metal Sector
6.3. Ablation Study
6.4. Time Complexity
7. Conclusion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Vuong, P.H.; Phu, L.H.; Van Nguyen, T.H.; Duy, L.N.; Bao, P.T.; Trinh, T.D. A bibliometric literature review of stock price forecasting: from statistical model to deep learning approach. Science Progress 2024, 107, 00368504241236557. [Google Scholar] [CrossRef] [PubMed]
- Sonkavde, G.; Dharrao, D.S.; Bongale, A.M.; Deokate, S.T.; Doreswamy, D.; Bhat, S.K. Forecasting stock market prices using machine learning and deep learning models: A systematic review, performance analysis and discussion of implications. International Journal of Financial Studies 2023, 11, 94. [Google Scholar] [CrossRef]
- Bhattacharjee, I.; Bhattacharja, P. Stock price prediction: a comparative study between traditional statistical approach and machine learning approach. In Proceedings of the 2019 4th international conference on electrical information and communication technology (EICT). IEEE; 2019; pp. 1–6. [Google Scholar]
- Dargan, S.; Kumar, M.; Ayyagari, M.R.; Kumar, G. A survey of deep learning and its applications: a new paradigm to machine learning. Archives of computational methods in engineering 2020, 27, 1071–1092. [Google Scholar] [CrossRef]
- Yu, P.; Yan, X. Stock price prediction based on deep neural networks. Neural Computing and Applications 2020, 32, 1609–1628. [Google Scholar] [CrossRef]
- Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena 2020, 404, 132306. [Google Scholar] [CrossRef]
- Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural computation 2019, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed]
- Patel, M.; Jariwala, K.; Chattopadhyay, C. A Systematic Review on Graph Neural Network-based Methods for Stock Market Forecasting. ACM Computing Surveys 2024, 57, 1–38. [Google Scholar] [CrossRef]
- Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv 2017, arXiv:1709.04875 2017. [Google Scholar]
- Makridakis, S.; Hibon, M. ARMA models and the Box–Jenkins methodology. Journal of forecasting 1997, 16, 147–163. [Google Scholar] [CrossRef]
- Ariyo, A.A.; Adewumi, A.O.; Ayo, C.K. Stock price prediction using the ARIMA model. In Proceedings of the 2014 UKSim-AMSS 16th international conference on computer modelling and simulation. IEEE; 2014; pp. 106–112. [Google Scholar]
- Engle, R.F. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica: Journal of the econometric society 1982, 987–1007. [Google Scholar] [CrossRef]
- Engle, R.F.; Bollerslev, T. Modelling the persistence of conditional variances. Econometric reviews 1986, 5, 1–50. [Google Scholar] [CrossRef]
- Tay, F.E.; Cao, L. Application of support vector machines in financial time series forecasting. omega 2001, 29, 309–317. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Machine learning 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural computation 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Fischer, T.; Krauss, C. Deep learning with long short-term memory networks for financial market predictions. European journal of operational research 2018, 270, 654–669. [Google Scholar] [CrossRef]
- Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018. [Google Scholar] [CrossRef]
- Van Den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K.; et al. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 2016, arXiv:1609.03499 2016, 1212. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907 2016. [Google Scholar]
- Duan, H.; Li, Q.; He, L.; Zhang, J.; An, H.; Ali, R.; Vazifedoust, M. Climate classification for major cities in China using cluster analysis. Atmosphere 2024, 15, 741. [Google Scholar] [CrossRef]
- An, H.; Li, Q.; Lv, X.; Li, G.; Qian, Q.; Zhou, G.; Nie, G.; Zhang, L.; Zhu, L. Forecasting daily extreme temperatures in Chinese representative cities using artificial intelligence models. Weather and Climate Extremes 2023, 42, 100621. [Google Scholar] [CrossRef]
- Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y.; et al. Graph attention networks. stat 2017, 1050, 10–48550. [Google Scholar]
- Brody, S.; Alon, U.; Yahav, E. How attentive are graph attention networks? arXiv 2021, arXiv:2105.14491 2021. [Google Scholar]
- Wu, Z.; Shen, C.; Van Den Hengel, A. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern recognition 2019, 90, 119–133. [Google Scholar] [CrossRef]
- Sawhney, R.; Agarwal, S.; Wadhwa, A.; Derr, T.; Shah, R.R. Stock selection via spatiotemporal hypergraph attention network: A learning to rank approach. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2021, Vol.
- Kanwal, A.; Lau, M.F.; Ng, S.P.; Sim, K.Y.; Chandrasekaran, S. BiCuDNNLSTM-1dCNN—A hybrid deep learning-based predictive model for stock price prediction. Expert Systems with Applications 2022, 202, 117123. [Google Scholar] [CrossRef]
- Jin, Y. GraphCNNpred: A stock market indices prediction using a Graph based deep learning system. In Proceedings of the Proceedings of the 2024 2nd International Conference on Artificial Intelligence, Systems and Network Security, 2024, pp.
- Liu, C.; Paterlini, S. Stock price prediction using temporal graph model with value chain data. arXiv 2023. [Google Scholar] [CrossRef]
- Yan, W.; Tan, Y. TCGPN: Temporal-Correlation Graph Pre-trained Network for Stock Forecasting. arXiv 2024, arXiv:2407.18519 2024. [Google Scholar]
- West, D.B.; et al. Introduction to graph theory; Vol. 2, Prentice hall Upper Saddle River, 2001.
- Wu, J. Introduction to convolutional neural networks. National Key Lab for Novel Software Technology. Nanjing University. China 2017, 5, 495. [Google Scholar]
- Nauta, M.; Bucur, D.; Seifert, C. Causal discovery with attention-based convolutional neural networks. Machine Learning and Knowledge Extraction 2019, 1, 19. [Google Scholar] [CrossRef]
- Zhou, G.B.; Wu, J.; Zhang, C.L.; Zhou, Z.H. Minimal gated unit for recurrent neural networks. International Journal of Automation and Computing 2016, 13, 226–234. [Google Scholar] [CrossRef]
- Pan, C.H.; Qu, Y.; Yao, Y.; Wang, M.J.S.; et al. HybridGNN: A Self-Supervised Graph Neural Network for Efficient Maximum Matching in Bipartite Graphs. Symmetry 2024, 16, 1631. [Google Scholar] [CrossRef]
- Wang, S.; Wang, Y.; Wang, M. Connectivity and matching preclusion for leaf-sort graphs. Journal of Interconnection Networks 2019, 19, 1940007. [Google Scholar] [CrossRef]
- Wang, M.; Lin, Y.; Wang, S.; Wang, M. Sufficient conditions for graphs to be maximally 4-restricted edge connected. Australas. J Comb. 2018, 70, 123–136. [Google Scholar]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp.
- Vuong, P.H.; Dat, T.T.; Mai, T.K.; Uyen, P.H.; et al. Stock-price forecasting based on XGBoost and LSTM. Computer Systems Science & Engineering 2022, 40. [Google Scholar]
- Wang, M.; Wang, S. Connectivity and diagnosability of center k-ary n-cubes. Discrete Applied Mathematics 2021, 294, 98–107. [Google Scholar] [CrossRef]
- Wang, M.; Xiang, D.; Qu, Y.; Li, G. The diagnosability of interconnection networks. Discrete Applied Mathematics 2024, 357, 413–428. [Google Scholar] [CrossRef]












| Model | Total time | Unit | |
|---|---|---|---|
| Commercial Bank Sector | Metals sector | ||
| STGAT | 1344 | 530 | seconds |
| XGBOOST | 1503 | 213 | seconds |
| LSTM | 397 | 180 | seconds |
| SARIMA | 813 | 231 | seconds |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
