A Multi-Scale Prediction Model for Stock Volatility Based on a Hybrid Attention Mechanism

Daniel K. Morgan; Yifan Zhou; Laura M. Bennett; Arjun Patel

doi:10.20944/preprints202511.0849.v1

Submitted:

11 November 2025

Posted:

12 November 2025

You are already at the latest version

Abstract

This study proposes a multi-scale attention CNN–BiLSTM model to predict stock market volatility. The model combines CNN layers to capture short-term movements, BiLSTM layers to learn long-term patterns, and a multi-scale attention unit to adjust feature importance across different time spans. Daily data from the Chinese A-share and NASDAQ 100 markets from 2015 to 2024 were used for testing. The results show that the model reduced RMSE by 9.4% compared with the LightGBM baseline and performed better during high-volatility periods. These results suggest that using information from multiple time scales improves forecasting accuracy and model stability. The method can support risk management and investment decisions. Future work will apply the model to higher-frequency data and improve attention interpretation for real-time market use.

Keywords:

stock volatility

;

CNN–BiLSTM

;

multi-scale attention

;

time series

;

financial prediction

;

market risk

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Volatility forecasting is essential to financial risk management, portfolio allocation, and option pricing because abrupt price fluctuations can trigger severe losses and increase systemic uncertainty in capital markets [1]. Classical econometric approaches such as GARCH and EGARCH can portray time-varying volatility behavior; however, they rely on strong assumptions regarding return distributions and struggle to capture nonlinear structures or abrupt market transitions, particularly during crisis periods [2]. Recent advances in machine learning have shown that deep learning models can extract complex relationships directly from historical data without restrictive parametric specifications, making them suitable for financial time-series modeling [3]. Among these models, convolutional neural networks (CNNs) excel at mining short-term local fluctuations, while recurrent architectures such as BiLSTM effectively capture long-range temporal dependencies related to macroeconomic cycles and investor sentiment [4]. Prior studies demonstrate that hybrid architectures combining CNN and BiLSTM outperform single-structure baselines by capturing both local and contextual dynamics [5]. A recent study further demonstrated that a LightGBM-based volatility prediction algorithm achieved notable improvements over traditional econometric baselines, confirming the benefit of feature-driven learning in financial time-series modeling [6]. This highlights the importance of advanced representation learning mechanisms for improved volatility prediction. Despite these promising developments, several limitations remain. First, most existing models operate on a single temporal scale, neglecting the fact that volatility evolves through interacting short-, medium-, and long-term cycles induced by high-frequency trading, macroeconomic events, and investor behavior [7]. Such scale interactions are critical, as ignoring them often leads to inadequate responses to regime changes. Second, mainstream CNN–LSTM frameworks lack targeted mechanisms for emphasizing informative segments. Without attention-based selection, these models typically dilute the impact of influential observations, leading to degraded performance during rapid market shocks and overfitting in calm periods [8]. Although attention mechanisms have been introduced to strengthen key signals, most studies adopt a single-level attention, limiting their ability to capture heterogeneous temporal dynamics or sudden volatility bursts [9]. Third, many existing forecasting methods are validated only on a single market index, making it difficult to evaluate their generalization capability under different trading conditions or regulatory environments [10]. Performance inconsistency across markets further indicates insufficient adaptability of current methods, especially during high-volatility intervals.

The study proposes a multi-scale attention-enhanced CNN–BiLSTM model for stock volatility forecasting. The CNN module first extracts short-term local patterns from price series and technical indicators, while the BiLSTM module captures bidirectional temporal dependencies across longer horizons. Building on these representations, a multi-scale attention module is introduced to assign dynamic weights to features across multiple temporal resolutions, enabling the model to emphasize key patterns embedded in both mild and turbulent market phases. This design allows the model to better characterize volatility-relevant structures and mitigate performance degradation during extreme events. To evaluate the model’s robustness, empirical experiments are conducted on the Chinese A-share market and the NASDAQ-100 index, representing different market structures and regulatory environments. The proposed model achieves a 9.4% reduction in RMSE relative to LightGBM and produces more stable forecasts during high-volatility windows. These findings demonstrate that integrating scale-aware attention into CNN–BiLSTM architectures enhances volatility representation quality and contributes practical value to financial risk management, asset allocation, and dynamic hedging strategies. Collectively, this work contributes to the literature by (1) presenting a hybrid deep architecture capable of capturing multi-scale temporal patterns, (2) introducing a novel multi-scale attention mechanism that highlights high-impact time points under varying regimes, and (3) verifying cross-market applicability through extensive empirical evaluation. The results provide a promising direction for improving volatility forecasting models in increasingly complex financial environments.

2. Materials and Methods

2.1. Data Sources and Sample Description

This study used daily trading data from two major markets: the Chinese A-share market and the NASDAQ 100 index. The data covered the period from January 2015 to December 2024, which included both calm and volatile stages. The dataset contained open, high, low, close prices, and trading volumes, all adjusted for stock splits and dividends. Technical indicators such as moving averages (MA5, MA10), relative strength index (RSI), and average true range (ATR) were calculated. Volatility was measured as the standard deviation of log returns over a 10-day rolling window. Missing values were filled by linear interpolation, and all variables were normalized between 0 and 1 before being used for modeling.

2.2. Experimental Design and Control Setup

The proposed multi-scale attention CNN–BiLSTM model was tested against several benchmark models, including GARCH(1,1), LSTM, BiLSTM, and LightGBM. The data was divided chronologically, with 80% used for training and 20% for testing, to prevent look-ahead bias. The CNN extracted short-term patterns from 30-day sequences, and the BiLSTM captured both forward and backward temporal relationships. The multi-scale attention module adjusted weights across different time scales, giving more focus to important volatility periods. All models were trained using the Adam optimizer (learning rate = 0.001) with early stopping after 10 non-improving epochs. Using the same training setup for all models ensured that performance differences came from architecture design, not training bias.

2.3. Measurement Methods and Quality Control

The model was evaluated with three standard metrics: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and R². To test robustness, separate evaluations were performed during high-volatility periods (top 15% by realized variance). Each model was trained and tested five times with different random seeds to reduce random effects. Data preprocessing was conducted in Python using NumPy and pandas, and model training used TensorFlow. Random seeds were fixed, and parameter settings were recorded to ensure reproducibility.

2.4. Data Processing and Model Equations

The predicted volatility

{\hat{y}}_{t}

was estimated from the input features

X_{t}

by a nonlinear mapping function

f (.)

[10]:

{\hat{y}}_{t} = f (X_{t}; θ)

where

θ

represents model parameters. The loss function minimized during training was the Mean Squared Error (MSE) [11]:

MSE = \frac{1}{n} \sum_{t = 1}^{n} (y_{t} - {\hat{y}}_{t})^{2}

The attention mechanism assigned a normalized weight

w_{i}

to each time scale iii according to its hidden activation

h_{i}

[12]:

w_{i} = \frac{\exp (h_{i})}{\sum_{j = 1}^{k} \exp (h_{j})}

This allowed the model to emphasize more relevant temporal scales when volatility patterns changed across markets.

2.5. Validation and Robustness Analysis

To test model transferability, cross-market validation was performed by training on A-share data and testing on NASDAQ 100, and vice versa. Rolling window validation was used to evaluate stability over different time periods. Statistical tests (paired t-test, p < 0.05) confirmed that the observed improvements were not due to random variation. The model consistently outperformed the baselines across both datasets, showing reliable generalization and robustness for real-world volatility forecasting.

3. Results and Discussion

3.1. Overall Forecasting Performance

The proposed multi-scale attention CNN–BiLSTM model showed lower prediction errors than other methods on both datasets. For the Chinese A-share market, RMSE decreased by 9.4% compared with LightGBM, and MAE also improved. On the NASDAQ 100 index, the model maintained similar accuracy, showing that it performs well across different market conditions. These findings are consistent with recent work where CNN–LSTM models captured short-term changes and long-term dependencies effectively [13].

Figure 1. Comparison of forecasting accuracy among CNN–LSTM models on different stock market datasets.

3.2. Contribution of the Multi-Scale Attention

When the attention module was removed, RMSE increased by 4–6%, especially during volatile periods. This suggests that the attention layer plays a key role in identifying time steps with stronger influence on volatility. Unlike single-head attention, the multi-scale version assigns weights to different time intervals, which helps capture both short-term jumps and slower changes. Similar improvements were reported in studies using attention-based CNN–BiLSTM models for non-stationary financial data, confirming the importance of multi-level weighting [14,15].

3.3. Robustness in High-Volatility Phases

During high-volatility phases (top 15% of realized variance), the model kept its advantage over GARCH, LSTM, and LightGBM. Traditional models tended to smooth peaks, while the proposed structure preserved short bursts and transmitted them across time steps. A similar observation was made, which a CNN–BiLSTM–Attention model improved volatility forecasting under sudden market movements [16].

Figure 2. Volatility prediction performance during high-fluctuation periods using CNN–BiLSTM–Attention architecture.

3.4. Comparison with Transformer-Based and Hybrid Models

Compared with more complex hybrid models such as CNN–LSTM–GNN or transformer-based systems, this model achieved comparable accuracy with lower computation time. Heavy transformer layers can slow down training and require large datasets, which limits their use in real-time forecasting [17,18]. In contrast, the present model provides a simpler and faster solution while keeping strong generalization. This supports the findings from other MDPI studies showing that compact CNN–RNN architectures with attention can balance accuracy and efficiency in financial forecasting applications.

4. Conclusion

This study built a multi-scale attention CNN–BiLSTM model to improve stock volatility forecasting at different time scales. The model used CNN layers to capture short-term patterns, BiLSTM layers to learn long-term trends, and a multi-scale attention block to highlight key time points under changing market conditions. Tests on both the Chinese A-share and NASDAQ 100 datasets showed that the model lowered RMSE by 9.4% compared with the LightGBM baseline and performed better during volatile periods. These findings show that combining information from multiple time scales can improve the stability and accuracy of financial forecasts. The method offers a practical tool for market risk control and trading strategy support. Still, the study used only daily data from two markets; future work should test higher-frequency data, include more markets, and add interpretable attention analysis for real-time use.

References

Nafiu, A., Balogun, S. O., Oko-Odion, C., & Odumuwagun, O. O. (2025). Risk management strategies: Navigating volatility in complex financial market environments. World Journal of Advanced Research and Reviews, 25(1), 236-250.
James, N., & Menzies, M. (2024). Nonlinear shifts and dislocations in financial market structure and composition. Chaos: An Interdisciplinary Journal of Nonlinear Science, 34(7).
Hu, Q., Li, X., Li, Z., & Zhang, Y. (2025). Generative AI of Pinecone Vector Retrieval and Retrieval-Augmented Generation Architecture: Financial Data-Driven Intelligent Customer Recommendation System.
Nsengiyumva, E., Mung’atu, J. K., & Ruranga, C. (2025). A comparative study of multivariate CNN, BiLSTM and hybrid CNN–BiLSTM models for forecasting foreign exchange rate using deep learning. Cogent Economics & Finance, 13(1), 2526148.
Whitmore, J., Mehra, P., Yang, J., & Linford, E. (2025). Privacy Preserving Risk Modeling Across Financial Institutions via Federated Learning with Adaptive Optimization. Frontiers in Artificial Intelligence Research, 2(1), 35-43.
Liu, Z. (2022, January). Stock volatility prediction using LightGBM based algorithm. In 2022 International Conference on Big Data, Information and Computer Network (BDICN) (pp. 283-286). IEEE.
Bulut, E. (2024). Market Volatility and Models for Forecasting Volatility. In Business Continuity Management and Resilience: Theories, Models, and Processes (pp. 220-248). IGI Global Scientific Publishing.
Li, S. (2025). Momentum, volume and investor sentiment study for us technology sector stocks—A hidden markov model based principal component analysis. PLoS One, 20(9), e0331658.
Benzi, K. M. (2017). From recommender systems to spatio-temporal dynamics with network science (Doctoral dissertation, Ecole Polytechnique Fédérale de Lausanne).
Zhu, W., & Yang, J. (2025). Causal Assessment of Cross-Border Project Risk Governance and Financial Compliance: A Hierarchical Panel and Survival Analysis Approach Based on H Company’s Overseas Projects.
Khattak, B. H. A., Shafi, I., Khan, A. S., Flores, E. S., Lara, R. G., Samad, M. A., & Ashraf, I. (2023). A systematic survey of AI models in financial market forecasting for profitability analysis. Ieee Access, 11, 125359-125380.
Wang, J., & Xiao, Y. (2025). Assessing the Spillover Effects of Marketing Promotions on Credit Risk in Consumer Finance: An Empirical Study Based on AB Testing and Causal Inference.
Li, T., Liu, S., Hong, E., & Xia, J. (2025). Human Resource Optimization in the Hospitality Industry Big Data Forecasting and Cross-Cultural Engagement.
Stuart-Smith, R., Studebaker, R., Yuan, M., Houser, N., & Liao, J. (2022). Viscera/L: Speculations on an Embodied, Additive and Subtractive Manufactured Architecture. Traits of Postdigital Neobaroque: Pre-Proceedings (PDNB), edited by Marjan Colletti and Laura Winterberg. Innsbruck: Universitat Innsbruck.
Sheng, Z., Liu, Q., Hu, Y., & Liu, H. (2025). A Multi-Feature Stock Index Forecasting Approach Based on LASSO Feature Selection and Non-Stationary Autoformer. Electronics, 14(10), 2059.
Pawitra, M. T., Fakhrurroja, H., & Abdurrahman, L. (2024, October). Predicting Stock Market using CNN and BiLSTM Model. In 2024 International Conference on Computer, Control, Informatics and its Applications (IC3INA) (pp. 267-272). IEEE.
Yang, J., Li, Y., Harper, D., Clarke, I., & Li, J. (2025). Macro Financial Prediction of Cross Border Real Estate Returns Using XGBoost LSTM Models. Journal of Artificial Intelligence and Information, 2, 113-118.
Casolaro, A., Capone, V., Iannuzzo, G., & Camastra, F. (2023). Deep learning for time series forecasting: Advances and open problems. Information, 14(11), 598.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.