3. Methodology
Use Python to collect data from various relevant websites, including trade data, opening price, closing price, low price, high price, and volume [
12,
13]. First of all, the selection of variable factors, referring to previous studies on price factors, demand and the actual situation of samples taken, can also be further factor screening. Secondly, the data cleaning is mainly to process the missing value, directly delete or supplement the interpolation method, such as the transaction data of legal holidays. Finally, data preprocessing is carried out, and the preprocessing methods of different models may be different. First, the model is selected according to the characteristics of the data, and the data is processed according to the requirements of the model. Then the data can be scaled according to the actual situation to introduce the weight change. Finally, divide training data and test data, and even test data. The main evaluation indexes used are: mean square error root, mean square error, mean absolute error, mean absolute percentage error, symmetric mean absolute percentage error, rise and fall accuracy, etc.
3.1. LSTM Model
First, three gates and candidate states
W,U,b are learnable network parameters by using the external state
of the previous time and the input value
of the current time. The formula is as follows:
3.2. Model Structure
As a variant of recurrent neural networks, the improvement of LSTM[
14] is mainly reflected in the introduction of new internal state and gating mechanisms. As shown in
Figure 1,
and
are internal state values and external states respectively, and the corresponding
and
represent the state values of the previous time as the product of vector elements. The three “gates” introduced are input gate cloud forgetting gate
, which is the candidate state, open and control parameters obtained by nonlinear function, that is, to determine the retention or forgetting of information, to ensure the output gate of the unit state ⊙The horizontal line
to
is the core monthly update, and c will change each time. Meanwhile, the acquisition of this part of parameters also needs to rely on training. The forget gate f controls the internal state of the previous moment
how much information needs to be forgotten; The input gate i controls how much information needs to be saved for the candidate state at the current moment; The output gate
, using the new control parameters, determines the current P state
, how much information is output to the external state
.
Current models for cryptocurrency price forecasting can be roughly divided into linear forecasting and nonlinear forecasting. Due to the various influencing factors of cryptocurrencies and the existence of nonlinear effects, nonlinear prediction models often achieve better results than linear prediction models. In nonlinear prediction, some scholars have found that among the three models of CNN (Convolutional neural network), LSTM and [
15]GRU, GRU is the highest and LSTM is the lowest in terms of prediction accuracy. In addition, in the price prediction of cryptocurrencies, LSTM and GRU are excellent, but GRU is better in terms of performance. For LSTM, it can capture some key information and store it for a certain period of time while ignoring unimportant information, which has the advantage of time sequence feature extraction and effectively alleviates the problem of gradient disappearance in deep learning. However, its disadvantage is that the structure is too complex and there is obvious lag. For CRU, compared with LSTM, GRU has fewer parameters, and more simplified results can effectively avoid overfitting, save training time, and improve operation efficiency.
3.3. CNN-LSTM Model
First choose the appropriate model and method. Both models can be predictive, but their advantages and disadvantages are complementary. It can also be different models that act on data feature extraction, data weight allocation, price prediction and other parts respectively. At the same time, through training and adjusting parameters, the single model is determined to be the optimal state. Generally, the mixed model needs to choose a proper way to determine the weighting coefficient, assign different weights to a single model and finally add the calculation.
However, it does not rule out that there is no need to assign weights between models, but there is a case of sequential use. Secondly, when selecting a single model, multiple models that may be applicable can be predicted and compared with empirical evidence. In the combination, it can also take a variety of effects superposition, through comparison to get the optimal hybrid model[
16]. Universality tests can also be introduced. And the hybrid idea is not limited to modules when there are effective processing methods and mechanisms, it can also be mixed with the model. Finally, in weight allocation, ridge regression method, inverse variance method, strictly consistent scoring function, error sum of squares minimum criterion, and the best and worst method can be used as a supplement help to determine the weight of each indicator.
Figure 2.
Architecture diagram of hybrid model CNN-LSTM.
Figure 2.
Architecture diagram of hybrid model CNN-LSTM.
In terms of model mixing, some scholars use the CNN-LSTM model to conduct empirical research on Bitcoin. The results show that the hybrid model gives full play to the powerful dynamic capture ability of CNN, makes use of LSTM trend information to make up for the deficiency of CNN in the up and down direction error, solves the lag problem of LSTM, gives full play to the advantages of CNN in extracting deep features and LSTM in extracting time series features, and the prediction effect is better than that of a single model. This also verifies its effectiveness. Some scholars have constructed four hybrid prediction models based on LSTM, and the empirical results show that all models are significantly superior to a single LSTM model. Comparing these four models in pairs, it can be found that the mixing effect of three models is better than the effect of two models, and it is more effective in improving the accuracy of prediction.
These results show that mixing different types of models can make full use of their respective advantages and compensate for each other’s shortcomings, thereby improving prediction performance. The CNN-LSTM model combines the characteristics of CNN, which has good feature extraction capabilities, and LSTM, which can capture long-term dependencies in time series, so the combination of the two can analyze and predict changes in cryptocurrency prices more comprehensively. This model hybrid approach provides an effective solution for cryptocurrency price forecasting, which helps to improve forecast accuracy, reduce model bias, and provide investors with more reliable decision-making basis.
3.4. Experimental Design
This article will describe how to use deep learning to predict the price of digital currencies, and use Bitcoin (BTC) as an example for experimental analysis. Bitcoin is one of the earliest and most popular digital currencies, and it is highly representative and influential. We will use a recurrent neural network - Long Short Term Memory (CNN-LSTM) model to build a Bitcoin price prediction model, trained and tested using historical data.
CNN-LSTM is a deep neural network model specifically designed to process time series data, which can effectively capture long-term dependencies in time series and avoid gradient disappearance or explosion problems. The CNN-LSTM model consists of three main parts: input layer, hidden layer and output layer. The input layer receives time series data as input; The hidden layer consists of a plurality of LSTM units, each of which contains a forgetting gate, an input gate, an output gate and a memory unit. The output layer generates output values based on the hidden layer state.
This experiment uses the daily closing price between August 17, 2017 and March 3, 2023 as bitcoin price data and divides it into a training set (80%) and a test set (20%). We use the first n days (n=30) as the input variable x and the last day as the output variable y, and construct the CNN-LSTM model. We use the mean square error (MSE) as the loss function and the Adam algorithm as the optimizer.
3.5. Data Preprocessing Details
In this study, the researcher used the PyTorch framework to implement the CNN-LSTM model and set its parameters. Specifically, they set the hidden layer size to 50, the batch size to 32, the learning rate to 0.0005, and went through 30 rounds of training. They then used this already trained CNN-LSTM model to predict the price of Bitcoin in the test set and compared the predicted value to the true value. To assess the accuracy and bias of the predicted results, the researcher used root mean square error (RMSE) and mean absolute percentage error (MAPE) as evaluation metrics.
In the experiment, the researcher first built a CNN-LSTM model through the PyTorch framework and trained it using pre-set parameters. They then applied the trained model to the test dataset and obtained a predicted value for the Bitcoin price. Next, they used two evaluation indicators, RMSE and MAPE, to analyze the accuracy and degree of deviation of the prediction results.
Before training the CNN-LSTM model, the historical Bitcoin price data underwent several preprocessing steps to ensure optimal performance and compatibility with the model architecture.
1. Normalization: To standardize the data and facilitate convergence during training, the raw Bitcoin price values were normalized using Min-Max scaling. This transformation rescaled the price values to a range between 0 and 1, preserving the relative differences between values while preventing large magnitude differences from dominating the training process.
2. Feature Selection: In addition to the Bitcoin price, other relevant features such as trading volume, market capitalization, and historical volatility may significantly influence price movements. Therefore, a comprehensive feature selection process was conducted to identify the most informative variables for inclusion in the model. This step aimed to reduce dimensionality and computational complexity while retaining essential information for accurate prediction.
3. Handling Missing Values: It’s common for financial datasets to contain missing values due to data collection issues or gaps in reporting. To address this challenge, various techniques such as forward filling, backward filling, or interpolation were employed to impute missing values and ensure the integrity of the dataset. Careful consideration was given to the impact of each imputation method on the overall quality of the data and the subsequent model performance.
By meticulously preprocessing the historical Bitcoin price data, we aimed to optimize the quality and relevance of the input information fed into the CNN-LSTM model. This rigorous approach helped mitigate potential biases and errors, ultimately enhancing the robustness and interpretability of our experimental findings.
3.6. Model Architecture
The CNN-LSTM model utilized in this study combines convolutional neural network (CNN) layers with long short-term memory (LSTM) layers to effectively capture spatial and temporal patterns in the Bitcoin price data.
1.CNN Layers:
The CNN layers are responsible for extracting relevant spatial features from the input data, which in this case are the historical Bitcoin price sequences. Convolutional filters with varying receptive fields are applied to the input time series data, allowing the model to automatically learn and detect meaningful patterns such as trend reversals, peaks, and troughs.
2.LSTM Layers:
Following the CNN layers, LSTM units are employed to capture temporal dependencies and long-term dependencies in the Bitcoin price sequences. The LSTM architecture includes memory cells that retain information over time, enabling the model to learn from past price fluctuations and make informed predictions about future trends. By incorporating LSTM layers, the model can effectively capture the dynamic and non-linear nature of cryptocurrency price movements.
3.Design Choices:
Hidden Layer Size: The hidden layer size of 50 was chosen based on empirical experimentation and computational considerations. This size strikes a balance between model complexity and computational efficiency, allowing the model to capture sufficient information while avoiding overfitting.
Batch Size: A batch size of 32 was selected to balance between computational efficiency and model convergence during training. Larger batch sizes may accelerate training but could lead to memory constraints, while smaller batch sizes may result in slower convergence.
Learning Rate: A learning rate of 0.0005 was determined through hyperparameter tuning to facilitate stable and efficient training of the model. This value controls the magnitude of weight updates during optimization, influencing the speed and quality of convergence.
Training Rounds: The model underwent 30 rounds of training to ensure convergence and optimization of the learned parameters. Multiple training rounds allow the model to gradually refine its predictions and adapt to the nuances of the Bitcoin price data.
By integrating CNN and LSTM layers within the model architecture and making informed design choices regarding hyperparameters, we aimed to develop a robust framework capable of accurately predicting Bitcoin price movements while minimizing prediction errors.
3.7. Training Procedure and Hyperparameter Tuning
The training procedure utilized the Adam optimization algorithm, a variant of stochastic gradient descent (SGD) that adapts the learning rate for each parameter individually. Adam combines the advantages of both AdaGrad and RMSProp algorithms, making it well-suited for training deep neural networks. The loss function employed during training was the mean squared error (MSE), which measures the average squared difference between the predicted Bitcoin prices and the true prices in the training dataset.
Regularization Techniques:
To prevent overfitting and improve the generalization ability of the model, two regularization techniques were applied during training:
Dropout: Dropout regularization was incorporated into the fully connected layers of the model to randomly deactivate a fraction of neurons during each training iteration. This technique helps prevent co-adaptation of neurons and encourages the model to learn more robust features.
L2 Regularization: L2 regularization, also known as weight decay, penalizes large weights in the model by adding a regularization term to the loss function. This encourages the model to prioritize simpler hypotheses and prevents overfitting by discouraging excessively complex models.
Hyperparameter Tuning:
Hyperparameter tuning was performed to optimize the performance of the CNN-LSTM model. Specifically, the following hyperparameters were tuned:
Hidden Layer Size: The number of units in the hidden layers of the LSTM and fully connected layers.
Batch Size: The number of training examples processed in each training iteration.
Learning Rate: The rate at which the model’s parameters are updated during optimization.
Training Rounds: The number of epochs or training iterations to complete during the training process.
The hyperparameter tuning process involved grid search, where a predefined set of hyperparameter values was systematically evaluated using cross-validation on a separate validation dataset. The optimal combination of hyperparameters was selected based on the performance metrics such as RMSE and MAPE on the validation set. By fine-tuning the hyperparameters, we aimed to maximize the predictive accuracy of the CNN-LSTM model while avoiding overfitting and improving its generalization ability.
3.8. Model Explanation:
This article describes how deep learning, specifically a CNN-LSTM model, predicts Bitcoin prices using historical data. The model combines Convolutional Neural Network (CNN) layers to spot patterns and Long Short-Term Memory (LSTM) layers to capture time dependencies. It’s trained on Bitcoin’s daily closing prices from August 17, 2017, to March 3, 2023, with data split into 80% training and 20% testing sets. Before training, the data is normalized, features are selected, and missing values are handled. The model architecture and hyperparameters, like hidden layer size and learning rate, are carefully chosen to balance accuracy and efficiency.
Despite its promise, the model has limitations. It might struggle with new data beyond the training period, and there’s a risk of overfitting, where it learns too much from the training data and performs poorly on new data. Data quality and representativeness are crucial, as the model’s performance relies on the historical Bitcoin data. Moreover, fine-tuning hyperparameters is essential, but small changes can affect the model’s performance. Lastly, the volatile nature of the cryptocurrency market adds uncertainty, challenging the model’s predictions.
3.9. Experimental Result
Based on the analysis of RMSE and MAPE values, the predictive performance of the CNN-LSTM model was evaluated. Lower RMSE and MAPE values indicate higher prediction accuracy, while higher values suggest potential biases requiring further optimization. Overall, the CNN-LSTM model demonstrated good performance in predicting Bitcoin price trends, although some errors were observed in specific peaks and troughs. These discrepancies may stem from unquantifiable factors such as market sentiment and policy changes impacting Bitcoin prices suddenly. To enhance model robustness, future research could explore integrating additional data sources and refining predictive methodologies.
Table 1.
CNN-LSTM model predicts performance key statistics.
Table 1.
CNN-LSTM model predicts performance key statistics.
| Metric |
Value |
| RMSE (Root Mean Square Error) |
150.34 |
| MAPE (Mean Absolute Percentage Error) |
3.45% |
| Best RMSE Achieved |
120.12 |
| Best MAPE Achieved |
2.75% |
| Average RMSE |
160.5 |
| Average MAPE |
4.10% |
This study illustrates the application of deep learning in digital currency price prediction, using Bitcoin as a case study. The CNN-LSTM model effectively captures long-term dependencies in time series data and exhibits promising performance in predicting Bitcoin price trends. However, the model’s accuracy may be further improved by addressing errors in specific market fluctuations. This research direction presents both interest and challenge, inviting exploration with alternative digital currencies, diverse deep learning models, and additional data sources and methodologies in future studies.