Submitted:
26 October 2025
Posted:
28 October 2025
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
- We introduce FIN-MIND, a novel multi-task TCN framework with decoupled attention for joint price and risk forecasting, mitigating cross-task temporal interference.
- We propose a causal, per-query normalized attention mechanism to isolate temporal relevance for each task and integrate them into dedicated MLP heads.
- We establish a leakage-aware evaluation protocol using strict temporal causality and walk-forward refitting for reliable out-of-sample validation.
- We demonstrate that FIN-MIND achieves significant and consistent gains over state-of-the-art baselines in MAE, RMSE, and , validated across multiple seeds and equities.
2. Related Works
2.1. Classical and Machine Learning Forecasting
2.2. Deep and Attention-Based Time-Series Models
2.3. Deep Volatility and Risk Modeling
2.4. Multi-Task and Decoupled Attention Learning
3. Methods
3.1. Overview
3.2. Temporal Alignment and Data Preprocessing
3.3. TCN Backbone Network
3.4. Decoupled Attention (FIN-MIND)
3.5. Multi-Dimensional Output Heads
3.6. Loss Function
4. Experiments
4.1. Setup
4.2. Comparative Experiments
4.2.1. Price forecasting
4.2.2. Risk forecasting
4.2.3. Attention analysis
4.3. Ablation Study
| Model | MAE |
|---|---|
| TCN (no attention) | 1.89 |
| TCN (unified attention) | 1.63 |
| Single-task TCN (price attention) | 1.62 |
| FIN-MIND (ours) | 1.60 |
5. Limitation
6. Conclusions
References
- Gu, S.; Kelly, B.; Xiu, D. Empirical asset pricing via machine learning. The Review of Financial Studies 2020, 33, 2223–2273. [CrossRef]
- Zhang, C.; Sjarif, N.N.A.; Ibrahim, R. Deep learning models for price forecasting of financial time series: A review of recent advancements (2020–2022). arXiv preprint arXiv:2305.04811 2023. [CrossRef]
- Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent neural networks for time series forecasting: Current status and future directions. International Journal of Forecasting 2021, 37, 388–427. [CrossRef]
- Ariyo, A.A.; Adewumi, A.O.; Ayo, C.K. Stock price prediction using the ARIMA model. In Proceedings of the UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, 2014, pp. 106–112. [CrossRef]
- Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control, 5th ed.; Wiley, 2015.
- Bollerslev, T. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 1986, 31, 307–327. [CrossRef]
- Engle, R.F.; Rangel, J.G. The Spline-GARCH model for low-frequency volatility and its global macroeconomic causes. The Review of Financial Studies 2008, 21, 1187–1222. [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Computation 1997, 9, 1735–1780.
- Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 2018.
- Qin, Y.; Song, D.; Chen, H.; Cheng, W.; Jiang, G.; Cottrell, G.W. A dual-stage attention-based recurrent neural network for time series prediction. In Proceedings of the Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. arXiv:1704.02971. [CrossRef]
- Lim, B.; Arik, S.O.; Loeff, N.; Pfister, T. Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting 2021, 37, 1748–1764. [CrossRef]
- Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In Proceedings of the NeurIPS, 2021.
- Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting. In Proceedings of the Proceedings of the 39th International Conference on Machine Learning (ICML), 2022.
- Nie, Y.; Rao, N.H.; Sun, J.; Li, M. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
- Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
- Zhang, Z.; Wang, Y.; et al. DeformableTST: Transformer for Time Series Forecasting without Patching. In Proceedings of the NeurIPS, 2024.
- Ang, G.; Zhang, Z.; Liu, X.; Chen, C.; Zhao, L. Guided Attention Multi-modal Multitask Network for Financial Forecasting. In Proceedings of the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), 2022, pp. 6347–6358.
- Li, M.; Wang, R.; Xu, W.; Zhang, F. NUMHTML: Numerical and Multimodal Hierarchical Transformer for Financial Forecasting. IEEE Transactions on Neural Networks and Learning Systems 2022. [CrossRef]
- Moreno-Pino, F.; Zumbülte, A.; Almgren, R. DeepVol: Volatility forecasting from high-frequency data with dilated causal convolutions. Quantitative Finance 2024.
- Mishra, A.K.; et al. Volatility forecasting and assessing risk of financial markets using Transformer and Multi-transformer layers. Engineering Applications of Artificial Intelligence 2024. Article 108223.
- Sener, O.; Koltun, V. Multi-task learning as multi-objective optimization. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2018.
- Yu, T.; Kumar, S.; Gupta, A.; Levine, S.; Hausman, K.; Finn, C. Gradient surgery for multi-task learning. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Kendall, A.; Gal, Y.; Cipolla, R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [CrossRef]
- Liu, B.; Yu, X.; Zhou, Q.; et al. Conflict-Averse Gradient Descent for Multi-task Learning. In Proceedings of the NeurIPS, 2021.
- Kumar, A. Tesla Inc (TSLA) Historical Stock Dataset. https://www.kaggle.com/datasets/abhimaneukj/tesla-inc-tsla-dataset, 2024. Kaggle dataset.
- Leung, D. Apple (AAPL) Historical Stock Dataset. https://www.kaggle.com/datasets/deanleung/apple-dataset/data, 2024. Kaggle dataset.



| Experimental Methods | MAE | RMSE | |
|---|---|---|---|
| AAPL | |||
| ARIMA | 2.41 | 3.12 | 0.80 |
| LSTM | 1.98 | 2.71 | 0.84 |
| TCN (no attention) | 1.89 | 2.58 | 0.86 |
| TCN-Attn | 1.63 | 2.21 | 0.88 |
| FIN-MIND (Ours) | 1.60 | 2.18 | 0.90 |
| TSLA | |||
| ARIMA | 4.18 | 5.67 | 0.78 |
| LSTM | 3.65 | 4.89 | 0.81 |
| TCN (no attention) | 3.41 | 4.62 | 0.83 |
| TCN-Attn | 3.01 | 4.02 | 0.86 |
| FIN-MIND (Ours) | 2.91 | 3.51 | 0.89 |
| Experimental Methods | Volatility Scale MAE () |
Sharpe Ratio MAE () |
|---|---|---|
| AAPL | ||
| TCN-Attn | 0.088 | 0.097 |
| FIN-MIND (Ours) | 0.079 | 0.088 |
| TSLA | ||
| TCN-Attn | 0.092 | 0.101 |
| FIN-MIND (Ours) | 0.083 | 0.091 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).