1. Introduction
In recent years, with the rapid development and complexity of financial markets, the traditional capital asset pricing model (CAPM) is no longer able to fully capture the risk factors in the market and predict stock returns. Since Fama and French proposed the three-factor model in 1992, scholars have been striving to extend and refine these models to improve the accuracy and explanatory power of stock return forecasts. The three-factor model was subsequently extended by Carhart into a four-factor model by adding momentum factors, and eventually evolved into the five-factor model proposed by Fama and French in 2015 [
1].
Although these models have gained wide application and recognition in academia and practice, their effectiveness in dealing with data from some specific sectors or emerging markets remains to be verified. With the development of artificial intelligence technology, machine learning methods, especially those based on neural networks, show potential in the field of financial forecasting [2-3]. Ke et al. (2024) utilized BP-GA to study the volatility and returns of indices, achieving commendable results. This has served as a valuable inspiration for us to explore neural networks after employing traditional models [
4]. Meanwhile, Hu et al. (2024) successfully applied GANs models in their research on Bitcoin returns, which not only confirmed the effectiveness of using the latest models but also pointed the way for our future research endeavors [
5]. LSTM models, as a special kind of recurrent neural network, have attracted much attention for their effectiveness in dealing with long-term dependence problems in time series data, like prediction of return and risk management [6-10]. Of course, GANs is also another popular approach [
11].
The purpose of this study is to compare the predictive effectiveness of three-factor, four-factor, and five-factor models in three major U.S. stock sectors through empirical analyses, and to explore the application of LSTM models in stock return prediction in order to provide insights into the integration of traditional financial theories with modern machine learning methods. This study aims to answer the following core questions: Comparison of model effectiveness: How effective are the three-factor, four-factor, and five-factor models in predicting stock returns for each of the major U.S. stock market sectors? Are there any significant differences in the performance of these models across different industry sectors?Application of LSTM models: Can LSTM models provide predictive accuracy beyond that of traditional factor models in stock return forecasting?What are the strengths and possible challenges of LSTM models when dealing with stock market data?Possibilities for model fusion: Is it possible to combine LSTM models with traditional factor models to predict stock returns? Possibilities of model fusion: Is it possible to combine traditional factor models with LSTM models to improve the accuracy of stock return forecasting?
In order to answer the above research questions, the following analytical steps were taken in this study:
Data collection and processing: Data source: The databases of the three major U.S. stock markets (NYSE, AMEX, and NASDAQ) are selected as the main data source to collect monthly data from January 2004 to January 2024, including stock returns and related financial variables of the Manuf, Hitec, and Other sectors [
12,
13]. Pre-processing: the collected data are cleaned, including the filling of vacant values and the elimination of outliers, to ensure that the quality of the data meets the requirements of statistical analyses and machine learning models.
Model building and validation:Construct the three-factor, four-factor and five-factor models of Fama-French, and calculate the corresponding regression coefficients and statistical significance [
14]. LSTM model building: design the LSTM network architecture, determine the appropriate hyperparameters, and divide the dataset into the training set and the test set, which are used to train and validate the predictive performance of the model.
Model comparison:The goodness of fit and prediction accuracy of the models were assessed by comparing the statistical metrics such as R-squared, RMSE (root mean square error) and MAE (mean absolute error) of the different models.
Analysis and interpretation of results:To analyse the contribution of factors in different models and their explanatory power for industry stock returns. Model Effectiveness: Discuss the performance of LSTM models in predicting stock returns and their advantages and limitations relative to traditional factor models [
15,
16]. Industry-specific analysis: to explore the differences in the performance of the models in different industry sectors and analyse how industry characteristics affect the predictive effectiveness of the models.
Conclusions and outlook:Summary: To summarise the research findings and summarise the strengths and limitations of each model. Directions for future research: based on the results of this study, suggest possible directions for future research, such as exploring the application of other machine learning techniques in stock market forecasting.