Submitted:
03 October 2024
Posted:
04 October 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
2.1. Stock Recommendation Systems
2.2. Graph Neural Network Learning in Financial Markets
2.3. Self-Learning in Stock Recommendations
3. Problem Definition
4. Method
4.1. Construction of Stock-Stock Graph
4.2. Initializing Stock Embeddings with FastText
4.3. Generating Graph Embeddings with Graph Attention Networks
4.4. Generating Temporal Embeddings with Gated Recurrent Units
- Open price (): The price at which a stock first trades upon the opening of an exchange on a given trading day.
- Close price (): The last price at which a stock trades during a regular trading session.
- High price (): The highest price at which a stock traded during the course of the trading day.
- Low price (): The lowest price at which a stock traded during the course of the trading day.
- Additional derived technical indicators, such as moving averages, relative strength index (RSI), and volume-weighted average price (VWAP), which provide further insights into price trends and momentum.
4.5. Integrating Graph and Temporal Embeddings for Comprehensive Stock Representation
4.6. Self-Learning Strategy for Stock and Concept Embeddings
5. Experiments
5.1. Stock Candidate Data
5.2. Finance Documents Data
5.3. Technical Pricing Data
5.4. Concept Recommendation Data
5.5. Data Splitting
5.6. Baselines and Parameter Settings
- BM25 [38]: As a naive information retrieval baseline, this model leverages the BM25 scoring metric to rank stocks. For each stock s and concept c, it queries an inverted index to retrieve the top-k ranked documents based on their relevance to the stock-concept pair. Stocks are then ranked by averaging the BM25 scores from these top documents, providing a straightforward, non-contextual baseline for comparison.
- Rank [39]: This baseline improves upon simple retrieval by incorporating a two-step process. Initially, for each concept c, the model retrieves the top-5 documents from the finance documents data. Subsequently, it integrates these documents with the concept to rank stocks across the two major exchanges, providing a basic framework for assessing stock relevance based on recent financial literature.
- Word2Vec [40]: Utilizes the Word2Vec embedding model to directly rank stocks by measuring the naive semantic relatedness between concepts and stocks. This method assesses how closely the embedded representations of stocks and concepts align, offering a simple yet effective measure of semantic similarity.
- Word2Vec+: This model extends the basic Word2Vec approach by including the eight most semantically similar words to the original concept in the analysis. This expansion aims to capture broader semantic fields and nuanced meanings associated with each concept, potentially enhancing the model’s ability to discern relevant stocks.
- Word2Vec++: Further develops the Word2Vec+ method by incorporating additional words that exhibit a similarity score higher than 0.65. On average, this method considers 6.3 concepts for expansion. This extensive expansion is designed to thoroughly explore the semantic space around each concept, thereby improving the precision of stock recommendations.
- MineEvidence [7]: Previously established as the state-of-the-art, this model employs reinforcement learning techniques to dynamically expand the concept representation before ranking stocks. By iteratively refining the concept expansion based on reinforcement feedback, MineEvidence aims to optimize the selection of stocks that best match the refined concepts, offering a sophisticated comparison point for our proposed method.
5.7. Training Parameters
5.8. Metrics
5.9. Recommendation Accuracy
5.10. Ablation
5.11. Case Study
5.12. Visualization of Embedding
5.13. Influence of Training Data
5.14. Training Efficiency
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Leippold, M.; Wang, Q.; Zhou, W. Machine learning in the Chinese stock market. Journal of Financial Economics 2022, 145, 64–82. [Google Scholar] [CrossRef]
- Chong, T.T.L.; Lam, T.H.; Yan, I.K.M. Is the Chinese stock market really inefficient? China Economic Review 2012, 23, 122–137. [Google Scholar] [CrossRef]
- Carpenter, J.N.; Lu, F.; Whitelaw, R.F. The real value of China’s stock market. Journal of Financial Economics 2021, 139, 679–696. [Google Scholar] [CrossRef]
- Seddighi*, H.; Nian, W. The Chinese stock exchange market: operations and efficiency. Applied Financial Economics 2004, 14, 785–797. [Google Scholar] [CrossRef]
- Stoll, H.R.; Whaley, R.E. Stock market structure and volatility. The Review of Financial Studies 1990, 3, 37–71. [Google Scholar] [CrossRef]
- Dizon, A.E.; Lockyer, C.; Perrin, W.F.; Demaster, D.P.; Sisson, J. Rethinking the stock concept: a phylogeographic approach. Conservation Biology 1992, 24–36. [Google Scholar] [CrossRef]
- Liu, Q.; Zhang, Y. Mining evidences for concept stock recommendation. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018, pp. 2103–2112.
- Waldman, J.R. Definition of stocks: an evolving concept. In Stock identification methods; Elsevier, 2005; pp. 7–16.
- Li, Q.; Yang, J.; Hsiao, C.; Chang, Y.J. The relationship between stock returns and volatility in international stock markets. Journal of Empirical Finance 2005, 12, 650–665. [Google Scholar] [CrossRef]
- Matsunaga, S.R. The effects of financial reporting costs on the use of employee stock options. Accounting Review, 1995; 1–26. [Google Scholar]
- Pervan, I. Voluntary financial reporting on the internet: analysis of the practice of stock-market listed croatian and slovene joint stock companies. Financial theory and practice 2006, 30, 1–27. [Google Scholar]
- Fama, E.F. Session topic: stock market price behavior. The Journal of Finance 1970, 25, 383–417. [Google Scholar] [CrossRef]
- Tang, J.; Chen, X. Stock market prediction based on historic prices and news titles. Proceedings of the 2018 international conference on machine learning technologies, 2018, pp. 29–34.
- Fang, Y.; Wang, H. Fund manager characteristics and performance. Investment Analysts Journal 2015, 44, 102–116. [Google Scholar] [CrossRef]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 2020, 32, 4–24. [Google Scholar] [CrossRef]
- Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI open 2020, 1, 57–81. [Google Scholar] [CrossRef]
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE transactions on neural networks 2008, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
- Grossberg, S. Recurrent neural networks. Scholarpedia 2013, 8, 1888. [Google Scholar] [CrossRef]
- Salehinejad, H.; Sankar, S.; Barfett, J.; Colak, E.; Valaee, S. Recent advances in recurrent neural networks. arXiv preprint arXiv:1801.01078, arXiv:1801.01078 2017.
- Medsker, L.; Jain, L.C. Recurrent neural networks: design and applications; CRC press, 1999.
- Geva, T.; Zahavi, J. Empirical evaluation of an automated intraday stock recommendation system incorporating both market data and textual news. Decision support systems 2014, 57, 212–223. [Google Scholar] [CrossRef]
- Hernández-Nieves, E.; Bartolomé del Canto, Á.; Chamoso-Santos, P.; de la Prieta-Pintado, F.; Corchado-Rodríguez, J.M. A machine learning platform for stock investment recommendation systems. Distributed Computing and Artificial Intelligence, 17th International Conference. Springer, 2021, pp. 303–313.
- Barber, B.M.; Lehavy, R.; Trueman, B. Comparing the stock recommendation performance of investment banks and independent research firms. Journal of financial economics 2007, 85, 490–517. [Google Scholar] [CrossRef]
- Patel, J.; Shah, S.; Thakkar, P.; Kotecha, K. Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert systems with applications 2015, 42, 259–268. [Google Scholar] [CrossRef]
- Galí, J.; Gambetti, L. The effects of monetary policy on stock market bubbles: Some evidence. American Economic Journal: Macroeconomics 2015, 7, 233–257. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, S.; Xiao, Y.; Song, R. A review on graph neural network methods in financial applications. arXiv preprint arXiv:2111.15367, arXiv:2111.15367 2021.
- Chen, W.; Jiang, M.; Zhang, W.G.; Chen, Z. A novel graph convolutional feature based convolutional neural network for stock trend prediction. Information Sciences 2021, 556, 67–94. [Google Scholar] [CrossRef]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv preprint arXiv:1710.10903, arXiv:1710.10903 2017.
- Raina, R.; Battle, A.; Lee, H.; Packer, B.; Ng, A.Y. Self-taught learning: transfer learning from unlabeled data. Proceedings of the 24th international conference on Machine learning, 2007, pp. 759–766.
- Ying, Z.; Cheng, D.; Chen, C.; Li, X.; Zhu, P.; Luo, Y.; Liang, Y. Predicting stock market trends with self-supervised learning. Neurocomputing 2024, 568, 127033. [Google Scholar] [CrossRef]
- Chen, F.; Wang, Y.C.; Wang, B.; Kuo, C.C.J. Graph representation learning: a survey. APSIPA Transactions on Signal and Information Processing 2020, 9, e15. [Google Scholar] [CrossRef]
- Saha, S.; Gao, J.; Gerlach, R. A survey of the application of graph-based approaches in stock market analysis and prediction. International Journal of Data Science and Analytics 2022, 14, 1–15. [Google Scholar] [CrossRef]
- Xian, R.; Wang, X.; Kothandaraman, D.; Manocha, D. Pmi sampler: Patch similarity guided frame selection for aerial action recognition. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 6982–6991.
- Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching word vectors with subword information. Transactions of the association for computational linguistics 2017, 5, 135–146. [Google Scholar] [CrossRef]
- Wang, B.; Wang, A.; Chen, F.; Wang, Y.; Kuo, C.C.J. Evaluating word embedding models: Methods and experimental results. APSIPA transactions on signal and information processing 2019, 8, e19. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser. ; Polosukhin, I. Attention is all you need. Advances in neural information processing systems 2017, 30. [Google Scholar]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, arXiv:1412.3555 2014.
- Robertson, S.; Zaragoza, H.; others. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 2009, 3, 333–389. [Google Scholar] [CrossRef]
- Singhal, A.; others. Modern information retrieval: A brief overview. IEEE Data Eng. Bull. 2001, 24, 35–43. [Google Scholar]
- Mikolov, T. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, arXiv:1301.3781 2013.




| Concept | Company Code | Company Description |
|---|---|---|
| 3D Glass | 300083 | As of June 1, 2016, the company stated in an interaction that its wholly-owned subsidiary Genesis Glass Machines is mainly used for processing 2D and 2.5D glass products, with the 3D glass prototype still under development. |
| Security Monitoring | 300150 | The company’s main products belong to the "post-station project," holding a monopolistic position in certain sub-markets such as railway security monitoring systems. |
| 5G | 002364 | In March 2014, the company used its funds to increase the capital of its wholly-owned subsidiary Dinglian Science Communication. The purpose of the capital increase was to meet the qualification review requirements for bidding business participants and to align with Dinglian’s current production and operational needs. |
| 5G Messaging/RCS | 300608 | The company has been actively laying out industries related to integrated communications, based on Yixin, in addition to traditional 5G text messaging business opening and billing. |
| 4D Printing | 002473 | The company invested 79.83 million yuan in a high-precision titanium-nickel alloy memory temperature control device automation production line for technological expansion and production increase. |
| Heavy Stock Holding | 000423 | As of March 31, 2019, China Pacific Life Insurance Co., Ltd. - Dividend - Individual Dividend holds a 1.2423% share, and China Pacific Life Insurance Co., Ltd. - Traditional - Ordinary Insurance Product holds a 0.9582% share. |
| CAR-T Therapy | 300109 | In January 2018, the company signed a strategic cooperation agreement with Yongtai Biology, which is at a leading level in domestic cell immunology research. EAL has been successfully applied in clinical settings and is researching CAR-T. |
| Method | P@5 | P@10 | R@30 | MAP |
|---|---|---|---|---|
| Jinrongjie Concepts | ||||
| BM25 | 0.402 | 0.315 | 0.338 | 0.296 |
| Word2Vec | 0.450 | 0.367 | 0.380 | 0.332 |
| Word2Vec+ | 0.471 | 0.370 | 0.391 | 0.352 |
| Word2Vec++ | 0.478 | 0.375 | 0.396 | 0.359 |
| Rank | 0.467 | 0.376 | 0.402 | 0.365 |
| MineEvidence | 0.524 | 0.427 | 0.428 | 0.398 |
| Our Method | 0.601 | 0.472 | 0.478 | 0.435 |
| Tonghuashun Concepts | ||||
| BM25 | 0.387 | 0.302 | 0.315 | 0.278 |
| Word2Vec | 0.437 | 0.347 | 0.360 | 0.327 |
| Word2Vec+ | 0.448 | 0.356 | 0.374 | 0.345 |
| Word2Vec++ | 0.453 | 0.362 | 0.380 | 0.351 |
| Rank | 0.458 | 0.373 | 0.381 | 0.356 |
| MineEvidence | 0.507 | 0.402 | 0.422 | 0.378 |
| Our Method | 0.562 | 0.441 | 0.452 | 0.430 |
| Method | P@5 | P@10 | R@30 | MAP |
|---|---|---|---|---|
| Jinrongjie Concepts | ||||
| No Graph Embeddings | 0.571 | 0.455 | 0.439 | 0.410 |
| No Temporal Embeddings | 0.586 | 0.448 | 0.448 | 0.408 |
| No RSI Feature | 0.590 | 0.452 | 0.451 | 0.414 |
| No VWAP Feature | 0.595 | 0.455 | 0.450 | 0.419 |
| No Attention | 0.582 | 0.461 | 0.446 | 0.421 |
| Our Method | 0.601 | 0.472 | 0.478 | 0.435 |
| Tonghuashun Concepts | ||||
| No Graph Embeddings | 0.543 | 0.421 | 0.441 | 0.381 |
| No Temporal Embeddings | 0.548 | 0.430 | 0.440 | 0.406 |
| No RSI Feature | 0.553 | 0.438 | 0.441 | 0.412 |
| No VWAP Feature | 0.556 | 0.436 | 0.442 | 0.415 |
| No Attention | 0.550 | 0.428 | 0.448 | 0.389 |
| Our Method | 0.562 | 0.441 | 0.452 | 0.430 |
| Ours | MineEvidence |
|---|---|
| New Energy Vehicles | |
| Founder Motor | Wanxiang Qianchao |
| Jiangling Motors | China National Machinery Industry Corporation |
| Tianjin Motor Dies Company | Tianjin Motor Dies Company |
| Shanghai Automobile & Electrical | Asia-Pacific Mechanical & Electrical |
| Great Wall Motors | Shanghai Lingang |
| Intelligent Logistics | |
| YTO Express | Fiyta |
| Fiyta | China Chengtong Holdings |
| Meiling | Eastcompeace |
| China Railway Tielong | YTO Express |
| STO Express | Hubei Feilihua Fiber |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).