Submitted:
13 September 2024
Posted:
13 September 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. News Data Pre-Processing and Preparation
- Stopwords (high-frequency words with limited semantic meaning) are re moved to improve accuracy.
- Data conversion converts text to lowercase letters,
- Concatenation joins text strings for better feature engineering and financial data preparation.
- Tokenization breaks text into manageable tokens, which enhances learning model performance
- Noise abatement removes random or unnecessary data that masks market trends and reduces data analysis and machine learning model precision
- Normalization is a process that standardises text to improve speed and quality of text analysis. Stemming and lemmatisation are methods used to standardize words by removing suffixes and affixes to reveal their root form. Stemming algorithms use heuristic principles for efficiency and simplicity. Heuristic principles use pattern matching, rule-based simplifications, fixed-order operations, search space reduction, and statistical heuristics to guide problem-solving and decision-making [7]. Lemmatisation analyses context and grammatical components to generate a lemma (root word), which improves text analysis, accuracy, and clarity through contextual comprehension [8].
- Feature extraction is a method that transforms data into features for machine and deep learning algorithms [9]. It improves data interpretability, model performance, and dimensionality. BERT embeddings was used for BERT due to their contextual comprehension, transfer learning potential, and resilience in false news identification and sentiment analysis, while TF-IDF vectorisation was used for logistic regression as it performs, interprets, and efficiently handles sparse and dimensional data like news data.
2.2. Algorithm Selection and Computation for Financial News
2.3. FinBERT Architecture, Development and Training
2.4. GPT 4
2.5. Logistic Regression Architecture Development and Training:
3. Results
3.1. FinBERT
| Best hyperparameters: learning_rate: 3.564937469182303e-05, batch_size': 16 | ||
|---|---|---|
| Test set metrics | Test Metrics (Percentage) | |
| Accuracy | 0.6333 | 63.33 |
| Precision | 0.6376 | 63.76 |
| Test Recall | 0.6333 | 63.33 |
| Test F1 Score | 0.6330 | 63.30 |
| Test ROC AUC | 0.6559 | 65.59 |
3.1.1. Model Evaluation
3.1.2. Visual Inspection


3.2. GPT

3.2.1. Model Evaluation:
| Evaluation Metrics | Predefined Sentiment | Predefined Sentiment |
|---|---|---|
| Set Metrics | Set Metrics (Percentage) | |
| Validation Accuracy: | 0.5720 | 57.20 |
| Test Evaluation | ||
| Accuracy | 0.5419 | 54.19 |
| Precision | 0.7266 | 72.64 |
| F1 Score | 0.4509 | 45.09 |
| Recall (Sensitivity): | 0.3269 | 32.69 |
| AUC-ROC | 0.6537 | 65.37 |
3.2.2. Visual Inspection


3.3. Logistic Regression
| Best hyperparameters: {'C': 3.037005064126959, 'solver': 'liblinear', 'penalty': 'l2'} | |
|---|---|
| Training metrics Accuracy: 0.8093 = 80.93% | |
| Test set metrics | Metric % |
| Accuracy | 0.8183 81.83 |
| Precision | 0.8257 82.57 |
| Test Recall | 0.8115 81.15 |
| Test F1 Score | 0.8185 81.85 |
| Test ROC AUC | 0.8976 89.76 |
3.3.1. Model Evaluation
3.3.2. Visual Inspection



4. Discussion
| Test set metrics | GPT Predefined Approach (%) | FinBert (%) | Logistic Regression (%) |
|---|---|---|---|
| Accuracy | 54.19 | 63.33 | 81.83 |
| Precision | 72.66 | 63.76 | 82.57 |
| Test Recall | 45.09 | 63.33 | 81.15 |
| Test F1 Score | 32.69 | 63.30 | 81.85 |
| Test ROC AUC | 65.37 | 65.59 | 89.76 |
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Fatouros, G.; Soldatos, J.; Kouroumali, K. Transforming sentiment analysis in the financial domain with ChatGPT. Machine Learning with Applications. 2023. Available from: https://www.sciencedirect.com/science/article/pii/S2666827023000610.
- Shapiro, A.H.; Sudhof, M.; Wilson, D.J. Measuring news sentiment. Journal of Econometrics. 2022. Available from: https://www.sciencedirect.com/science/article/pii/S0304407620303535.
- Liu, Z.; Huang, D.; Huang, K.; Li, Z.; Zhao, J. FinBERT: A pre-trained financial language representation model for financial text mining. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. 2021. Available from: https://www.ijcai.org/proceedings/2020/0622.pdf.
- Leippold, M. Sentiment spin: Attacking financial sentiment with GPT-3. Finance Research Letter. 2023. Available from: https://www.sciencedirect.com/science/article/pii/S154461232300329X.
- Yang, J.; Wang, Y.; Li, X. Prediction of stock price direction using the LASSO-LSTM model combining technical indicators and financial sentiment analysis. PeerJ Computer Science. 2022. Available from: https://peerj.com/articles/cs-1148.pdf.
- Sidogi, T.; Mbuvha, R.; Marwala, T. Stock price prediction using FinBERT and LSTM. 2021 IEEE International Conference Systems Man and Cybernetics. 2021. Available from: https://ieeexplore.ieee.org/abstract/document/9659283.
- Gigerenzer, G. Simple heuristics to run a research group. PsyCh J. 2022, 11, 133–135. [Google Scholar] [CrossRef] [PubMed]
- Bafitlhile, K. D. (2022). A Context-aware Lemmatization Model for Setswana Language using Machine Learning. Botswana International University of Science and Technology. http://repository. biust.ac.bw/handle/123456789/536.
- Taherdoost, H. What are different research approaches? Comprehensive Review of Qualitative, quantitative, and mixed method research, their applications, types, and limitations. J. Manag. Sci. Eng. Res. 2022, 5, 53–63. [Google Scholar] [CrossRef]
- Priyatno, A.M.; Ningsih, L.; Noor, M. Harnessing machine learning for stock price prediction with random forest and simple moving average techniques. Journal of Engineering and Science Application. 2024, 1, 1–8. Available online: https://jesa.aks.or.id/index.php/jesa/article/view/1. [CrossRef]
- Lin, F.; Cohen, W.W. Semi-Supervised Classification of Network Data Using Very Few Labels. IEEE Conf. 2010. Available from: https://ieeexplore.ieee.org/abstract/document/5562771/.
- Chen T, Zhang Y, Yu G, Zhang D, Zeng L, He Q. EFSA: Towards Event-Level Financial Sentiment Analysis. Computation and Language. arXiv preprint arXiv:2404.08681. Available from: https://arxiv.org/abs/2404.08681.
- Kirtac, K.; Germano, G. Sentiment trading with large language models. Finance Research Letters. Available from: https://www.sciencedirect.com/science/article/pii/S1544612324002575.
- Varghese, R.R.; Mohan, B.R. Dynamics of Nonlinear Causality: Exploring the Influence of Positive and Negative Financial News on the Indian Equity Market. In: Proceedings of Annual International Conference on Intelligent Systems and Signal Processing; 2023. Available from: https://ieeexplore.ieee.org/abstract/document/10420348/.
- Senapaty, M.K.; Ray, A.; Padhy, N. A Decision Support System for Crop Recommendation Using Machine Learning Classification Algorithms. Agriculture 2024, 14, 1256. [Google Scholar] [CrossRef]
- Bagate, R.; Joshi, A.; Trivedi, A.; Pandey, A.; Tripathi, D. Survey on algorithmic trading using sentiment analysis. In: Proceedings of the 6th International Conference on Advance Computing and Intelligent Engineering: ICACIE 2021; 2022 Sep; Singapore: Springer Nature Singapore. p. 241-252.
- Paripati, L.; Hajari, V.R.; Narukulla, N.; Prasad, N.; Shah, J.; Agarwal, A. Ethical Considerations in AI-Driven Predictive Analytics: Addressing Bias and Fairness Issues. Darpan Int Res Anal. 2024, 12, 34–50. [Google Scholar]
- Yang, H.; Ye, C.; Lin, X.; Zhou, H. Stock Market Prediction Based on BERT Embedding and News Sentiment Analysis. In: Wang Z, Wang S, Xu H, editors. Service Science. ICSS 2023. Communications in Computer and Information Science. Vol. 1844. Springer; 2023. p. 334-348. [CrossRef]
| Literature Info | Data | Significance | Contrast | |
|---|---|---|---|---|
| Processing Methods | References | Our Work | ||
| Liu Z. et al. [3] | FinBERT for Financial Text Mining | Enhanced financial sentiment analysis with domain-specific FinBERT model | Limit to only FinBERT for sentiment analysis | Associate FinBERT and other deep and machine learning models for better analysis. |
| Leippold M. [4] | GPT-3 for Financial Sentiment Analysis | Explored adversarial attacks on financial sentiment predictions | Limited transparency of GPT-3’s decision-making | Compares GPT-4 with financial news data features to improve interpretability. |
| Yang J. et al. [5] | LASSO-LSTM with FinBERT | Combined technical indicators with FinBERT for stock predictions. | Feature extraction limitation | Feature extraction improvement using NGX label and financial news for stock market prediction & accuracy. |
| Sidogi T. et al. [6] | FinBERT with LSTM for Stock Price Prediction | Utilized FinBERT for financial sentiment analysis and LSTM for predicting stock price movements based on sentiment. | Limit performance metrics to Root Mean Square Error (RMSE) and mean absolute error (MAE) only | Addition of more evaluation metrics and other deep learning GPT 4 and classic machine learning for better perspective |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).