Deep learning for Stock Market Prediction

Prediction of stock groups' values has always been attractive and challenging for shareholders. This paper concentrates on the future prediction of stock market groups. Four groups named diversified financials, petroleum, non-metallic minerals and basic metals from Tehran stock exchange are chosen for experimental evaluations. Data are collected for the groups based on ten years of historical records. The values predictions are created for 1, 2, 5, 10, 15, 20 and 30 days in advance. The machine learning algorithms utilized for prediction of future values of stock market groups. We employed Decision Tree, Bagging, Random Forest, Adaptive Boosting (Adaboost), Gradient Boosting and eXtreme Gradient Boosting (XGBoost), and Artificial neural network (ANN), Recurrent Neural Network (RNN) and Long short-term memory (LSTM). Ten technical indicators are selected as the inputs into each of the prediction models. Finally, the result of predictions is presented for each technique based on three metrics. Among all the algorithms used in this paper, LSTM shows more accurate results with the highest model fitting ability. Also, for tree-based models, there is often an intense competition between Adaboost, Gradient Boosting, and XGBoost.


Introduction
The prediction process of stock values is always a challenging problem [1] because of its unpredictable nature. The dated market hypothesis believe that it is impossible to predict stock values and that stocks behave randomly, but recent technical analyses show that the most stocks values are reflected in previous records, therefore the movement trends are vital to predict values effectively [2]. Moreover, stock market's groups and movements are affected by several economic factors such as political events, general economic conditions, commodity price index, investors' expectations, movements of other stock markets, psychology of investors, etc [3]. The value of stock groups is computed with high market capitalization. There are different technical parameters to obtain statistical data from value of stocks prices [4]. Generally, stock indices are gained from prices of stocks with high market investment and they often give an estimation of economy status in each country. For example, findings prove that economic growth in countries is positively impacted by the stock market capitalization [5].
The nature of stock values movement is ambiguous and makes investments totally risky for investors. Also, it is usually a big problem to detect the market status for governments. It is true that the stock values are generally dynamic, non-parametric and non-linear; therefore they often cause weak performance of the statistical models and disability to predict the accurate values and movements [6,7] Machine learning is the most powerful tool which includes different algorithms to effectively develop their performance on a certain case study. It is common belief that ML have a significant ability of identifying valid information and detecting patterns from the dataset [8].
In contrast with the traditional methods in the ML area, the ensemble models are a machine learning based way in which some common algorithms are used to work out a particular problem, and have been confirmed to outperform each of methods when predicting time series [9][10][11]. For prediction problems in machine learning area, boosting and bagging are effective and popular algorithms among ensemble ways. There is recent progress of tree based models with introducing gradient boosting and XGBoost algorithms, which have been significantly employed by top data scientists in competitions. Indeed, a modern trend in ML, which is named deep learning (DL), can deem a deep nonlinear topology in its specific structure, has its excellent ability from the financial time series to extract relevant information [12]. Contrary to simple artificial neural network, recurrent neural networks (RNN) have achieved a considerable success in the financial area on account of their great performance [13,14]. It is clear that the prediction process of the stock market is not only related to the current information but the earlier data has a vital role, so the training will be insufficient if only the data is used at the latest time. RNN is able to employ the network to sustain memory of recent events and build connections between each unit of a network, so, it is completely proper for the economic predictions [15,16]. Long short-term memory (LSTM) is an improved subset of RNN method which used in deep learning area. LSTM has three different gates to remove the problems in RNN cells and also is able to process single data points or whole sequences of data.
In academic fields, many studies have been conducted on market prediction ways. Also, there are various approaches to time series modeling. Exponential smoothing ,moving average and ARIMA are common linear models for predicting future prices [17,18]. Several research activities have done for extensive predictions with Artificial Neural Networks (ANN), Genetic Algorithms (GA), fuzzy logic etc [19][20][21]. Zhang et al. [22] combined Improved Bacterial Chemotaxis Optimization (IBCO) with artificial neural network. They indicated that their proposed method is able to predict stock index for a short time (1 day ahead) and a long time (15 days ahead), and their outcomes showed the excellent results of the method. Asadi et al. [1] used preprocessing ways as a combination of data, by feed forward neural networks and employing genetic algorithms and Levenberg-Marquardt (LM) method for learning. Preprocessing ways such as data transformation and selection of input variables were employed for developing the model performance. The final results demonstrated that the proposed method was capable of dealing with the stock market fluctuations with suitable prediction accuracy. Shen, Guo, Wu et al [23] introduced the Artificial Fish Swarm Algorithm (AFSA) for training radial basis function neural network (RBFNN). Their experimental works was based on data from Shanghai Stock Exchange to show that the optimized RBF by AFSA was a practical method with significant accuracy. Jigar et al. [24] predicted the Indian stock market index by a combination of machine learning methods; they considered two different stages, a single stage scenario in comparison with hybrid combination of models with better results. S Olaniyi et al [25] supposed a linear regression method of analyzing stock market behaviors. The approach successfully predicted stock prices based on two parameters.
This study concentrates on the process of future values prediction for stock market groups, which are totally crucial for investors. The predictions are evaluated for 1,2,5,10,15,20 and 30 days in advance. It has been noted from the research background, the most of them focused on classification problems instead of regression ones [26][27][28]. By considering literature review, this research work examines the prediction performance of a set of cutting-edge machine learning methods, which involves tree-based models and neural networks. Also, employing the whole of tree-based methods, RNN and LSTM techniques for regression problems in the stock market area is a novel research activity which presented in this study.
This paper involves three different sections. At the first, through methodology section, the evolution of tree-based models with the introduction of each one are presented. In addition, basic structure of neural networks and recurrent ones are described briefly. In the research data section, ten technical indicators are shown in detail with selected methods parameters. At the final step, after introducing three regression metrics, machine learning results are reported for each group, and the models behavior are compared.

Tree-based models
Since the set of splitting rules employed to differently divide the predictor space can be summarized in a tree, these types of models are known as decision-tree methods. Fig 1 shows the evolution of tree-based algorithms over several years. Decision Trees are a popular supervised learning technique used for classification and regression jobs. The purpose is to make a model that predicts a target value by learning easy decision rules formed from the data features. There are some advantages of using this method like being easy to understand and interpret or Able to work out problems with multi-outputs; on the contrary, creating over-complex trees which results in overfitting is a fairly common disadvantage. A schematic illustration of Decision tree is shown in

. Bagging
A Bagging model (as a regressor model) is an ensemble estimator that fits each basic regressor on random subsets of the dataset and next accumulate their single predictions, either by voting or by averaging, to make the final prediction. This method is a meta-estimator and can commonly be employed as an approach to decrease the variance of an estimator like a decision tree by using randomization into its construction procedure and then creating an ensemble out of it. In this method samples are drawn with replacement and predictions, and obtained through a majority voting mechanism.

Random Forest
The random forest model is created by great number of decision trees. This method simply averages the prediction result of trees, which is called a forest. Also, this model has three random concepts, randomly choosing training data when making trees, selecting some subsets of features when splitting nodes and considering only a subset of all features for splitting each node in each simple decision tree. During training data in a random forest, each tree learns from a random sample of the data points. A schematic illustration of Random forest is indicated in  Boosting method refers to a group of algorithms which converts weak learners to a powerful learner. The method is ensemble for developing the model predictions of any learning algorithm. The concept of boosting is to sequentially train weak learners in order to correct its past performance. AdaBoost is a meta-estimator that starts by fitting a model on the main dataset and then fits additional copies of the model on the similar dataset. During the process, samples' weights are adapted based on the current prediction error, so subsequent models concentrates more on difficult items.

. Gradient Boosting
Gradient Boosting method is like AdaBoost when it sequentially adds predictors to an ensemble model, each of them corrects its past performance. In contrast with AdaBoost, Gradient Boosting fits a new predictor to the residual errors (made by the prior predictor) with using gradient descent to find the failing in the predictions of previous learner. Overall, the final model is capable of employing for the base model to decreases errors over the time.

XGBoost
XGBoost is an ensemble tree method (like Gradient Boosting ) and the method apply the principle of boosting for weak learners. However, XGBoost was introduced for better speed and performance. In-built cross-validation ability, efficient handling of missing data, regularization for avoiding overfitting, catch awareness, tree pruning and parallelized tree building are common advantages of XGBoost algorithm.  shown for each of the hidden or output nodes, while a node takes the weighted sum of the inputs, added to a bias value, and passes it through an activation function (usually a non-linear function). The result is the output of the node that becomes another node input for the next layer. The procedure moves from the input to the output, and the final output is determined by doing this process for all nodes. Learning process of weights and biases associated with all nodes for training the neural network.
The Equation 1 shows the relationship between nodes and weights, and biases [29]. The weighted sum of inputs for a layer passed through a non-linear activation function to another node in the next layer. It can be interpreted as a vector, where X1, X2 … and Xn are inputs, w1, w2, … and wn are weights respectively, n is the inputs number for the final node, f is activation function and z is the output.

Figure 5. An illustration of relationship between inputs and output for ANN
By calculating weights/biases, the training process is completed by some rules: initialize the weights/biases for all the nodes randomly, performing a forward pass by the current weights/biases and calculating each node output, comparing the final output with the actual target, and modifying the weights/biases consequently by gradient descent with backwards pass, generally known as backpropagation algorithm.

RNN
RNN is a very prominent version of neural networks extensively used in various processes. In a common neural network, an input is processed through a number of layers and an output is made. It is assumed that two consecutive inputs are independent of each other. However, the situation is not correct in all processes. For example, for prediction of the stock market at a certain time, it is crucial to consider the previous observations. RNN is named recurrent due to it does the same task for each item of a sequence when the output is related to the previous computed values. As another important point, RNN has a specific memory, which stores previous computed information for a long time. In theory, RNN can use information randomly for long sequences, but in real practices, there is a limitation to look back just a few steps.  Without investigation of too much detail, LSTM solves the problems by employing assigned gates for forgetting old information and learning new ones. LTSM layer is made of four neural network layers that interact in a specific method. A usual LSTM unit involves three different parts, a cell, an output gate and a forget gate. The main task of cell is recognizing values over random time intervals and the task of controlling the information flow into the cell and out of it belongs to the gates.

Research data
This paper employs data from November 2009 to November 2019 (ten years) of four stock market groups, Diversified Financials, Petroleum, Non-metallic minerals and Basic metals, which are completely generous. From opening, close, low high and prices of the groups, ten technical indicators are calculated. The whole of data for the study is acquired from www . tsetmc.com website. As an important point, to prevent the effect of the larger value of an indicator on the smaller one, the values of ten technical indicators for all groups are normalized independently. Simple n-day moving average = Accumulation/Distribution (A/D) oscillator: While: Ct is the closing price at time t Lt and Ht is the low price and high price at time t respectively __ − +1 and __ − +1 is the lowest low and highest high prices in the last n days respectively UPt and DWt means upward price change and downward price change at time t respectively EMA( Dataset used for all models -except RNN and LSTM models-are identical. There are 10 features (10 technical indicators) and one target (stock index of the group) for each sample of the dataset. As mentioned, all 10 features are normalized independently before using to fit models to improve the performance of algorithms.
Since the goal is to develop models to predict stock group values, datasets are rearranged to incorporate the 10 features of each day to the target value of n-days ahead. In this study, models are evaluated by training them for predicting the target value for 1, 2, 5, 10, 15, 20, and 30 days ahead.
There are several parameters related each model. For tree-based models, number of trees (ntrees) is the design parameter while other common parameters are set identical between all models. Parameters and their values for each model are listed in Table 2. For RNN and LSTM networks, because of their time series behavior, datasets are arranged to include the features of more than just one day. While for ANN model all parameters but epochs are constant, for RNN and LSTM models the variable parameters are number of days included in training dataset and respective epochs. By increasing the number of days in training set, the number of epochs is increased to train the models with an adequate number of epochs. Table 3 presents all valid values for parameters of each model. For example, if 5 days are included in the training set for ANN or LSTM models, the number of epochs is set to 300 in order to thoroughly train the models. Mean Absolute Percentage Error (MAPE) is often employed to assess the performance of the prediction methods. MAPE is also a measure of prediction accuracy for forecasting methods in machine learning area, it commonly presents accuracy as a percentage. Equation 2 shows its formula [30].
where At is the actual value and Ft is the forecast value. In the formula, the absolute value of difference between those is divided by At. The absolute value is summed for every forecasted value and divided by the number of data. Finally, the percentage error is made by multiplying to 100.

Mean absolute error
Mean absolute error (MAE) is a measure of difference between two values. MAE is an average of the difference between the prediction and the actual values. MAE is a usual measure of prediction error for regression analysis in machine learning area. The formula is shown in Equation 3 [30].
where At is the true value and Ft is the prediction value. In the formula, the absolute value of difference between those is divided by n (number of samples) and is summed for every forecasted value.

R 2
R 2 is known as R Squared or the determination coefficient, which reports the goodness of fit measure for prediction models. R 2 is a value between 0 (no-fit) and 1 (perfect fit) to present the variance proportion for a dependent parameter that is explained by an independent parameter in a regression analysis. It also indicates the relationship strength between an independent parameter and dependent one to examine how many of the observed variation can be clarified by the regression model's inputs. The formula is shown in Equation 4 [30].
Where SSres and SStot are Explained variation and Total variation respectively.

Results
Six tree-based models namely Decision Tree, Bagging, Random Forest, Adaboost, Gradient Boosting and XGBoost, and also three neural networks based algorithms (ANN, RNN  It is prominent to note that comprehensive number of experiments are performed for each of the groups and prediction models with various model parameters. Following tables show the best parameters where minimum prediction error is obtained. Indeed, it is clear from the results that error values rise when prediction models are created for more and more number of days ahead. This may be evident for all algorithms.
Based on extensive experimental works and reported values the following results are obtained: Among tree-based models • Decision Tree always has the lowest rank for prediction • For Diversified Financials and Petroleum groups, the best average performance belongs to Adaboost regressor • For Non-metallic minerals and Basic metals, Gradient Boosting regressor has the best average performance • XGboost is the best by considering accuracy, strength of fitting and running time all together Through neural networks • ANN generally occupies the bottom for forecasting • LSTM models outperform RNN ones significantly On the whole LSTM is powerfully the best model for prediction all stock market groups with the lowest error and the best ability to fit, but the problem is the long run time

Conclusion
For all investors it is always necessary to predict stock market changes for detecting accurate profits and reducing potential mark risks. This study effort was employing Tree-based models (Decision Tree, Bagging, Random Forest, Adaboost, Gradient Boosting and XGBoost) and neural networks (ANN, RNN and LSTM) in order to correctly forecast the values of four stock market groups (Diversified Financials, Petroleum, Non-metallic minerals and Basic metals) as a regression problem. The predictions were made for 1, 2, 5, 10, 15, 20 and 30 days ahead. As far as our belief and knowledge, this study is the successful and recent research work that involves ensemble learning methods and deep learning algorithms for predicting stock groups as a popular application. To be more detailed, exponentially smoothed technical indicators and features were used as inputs for prediction. In this prediction problem, the methods were able to significantly advance their performance, and LSTM was the top performer in comparison with other techniques. Overall, as a logical conclusion, both tree-based and deep learning algorithms showed remarkable potential in regression problems in the area of machine learning.