5.1. Experimental Data Set
The following data sets were used in this experiment:
SZ-taxi Data Set:This dataset is the Shenzhen taxi trajectory dataset from January 1st to 31st, 2015. It includes monitoring points on 156 main roads in Luohu District, Shenzhen, and records vehicle information data at 15-minute intervals. Meanwhile, it also records five external datasets that affect traffic flow, namely the Points of Interest (POI) dataset, the weather dataset, the day-of-the-week dataset, the holiday dataset, and the morning and evening traffic peak and off-peak dataset. The POI types are classified into nine categories: catering, commerce, transportation, education, medicine, etc. The main types among them correspond to the POI characteristics of road sections. The weather data is divided into five types: cloudy, light rain, fog, sunny, and heavy rain. In this study, datasets of the day-of-the-week, holidays, and traffic peak and valley periods within a day were supplemented. The source data can be obtained from the website:
https://github.com/lehaifeng/T-GCN/tree/master/data.
The PEMS04 Data Set: This data set describes the San Francisco Bay Area, covering 3848 sensors on 29 roads. It contains traffic flow information for 59 days from January 1st to February 28th, 2018, with flow information recorded at five-minute intervals at each monitoring point. It also includes data information about the day-of-the-week. In this study, corresponding regional weather data sets, holiday data sets, and traffic flow peak and valley period data sets within a day were supplemented. The source data set can be obtained from thewebsite:
https://github.com/Davidham3/ASTGCN/tree/master/data/ PEMS04.
5.5. Experimental Result Analysis and Comparison
In this study, the ARIMA algorithm, the AST_GCN algorithm, the T_GCN algorithm, and the IM-GCN algorithm proposed in this paper were used to make predictions on the SZ-taxi data set and the PEMS04 data set. The initial 70% portion of the dataset was utilized as the training set, while the leftover 30% served as the testing set. The outcomes and analysis of the experiment are presented below.
Figure 12 shows the comparison between the predicted values and the actual values obtained by using the IM-GCN, AST_GCN, T_GCN, and ARIMA algorithms respectively to predict the SZ-taxi data set. In the figure, the blue solid line represents the curve of the actual traffic flow values, and the red solid line represents the traffic flow predicted values obtained by the IM-GCN algorithm proposed in this paper. It can be seen from the figure that since the algorithm proposed in this paper integrates data from multiple aspects more comprehensively for prediction and has a larger receptive field, the IM-GCN algorithm achieves the best prediction accuracy. The AST_GCN obtains the second-best prediction performance, and the traditional ARIMA algorithm has the lowest prediction accuracy.
In this paper, the data of the previous 10 time steps (with each time step being 15 minutes) is utilized to predict the next time step, using the SZ-taxi data set. When predicting one time step, the prediction results of the model are shown in
Table 3. Moreover, the proposed model in this paper is compared with models such as ARIMA, HA, SVR, GCN, T-GCN, and AST-GCN. It can be seen that the prediction model proposed in this paper is superior to the baseline models in all aspects.
It can be seen from
Table 3 that the RMSE and MAE values of the IM-GCN model are smaller than those of other models, while its accuracy (ACC) and the coefficient of determination R2 are higher, indicating that the IM-GCN model has better prediction performance and higher accuracy. Especially in time series prediction problems, compared with traditional statistical methods (such as ARIMA), the IM-GCN model has stronger prediction ability, with an improvement of more than 40% in test performance. This may be due to the limitations of traditional methods when dealing with such high-complexity data.
Compared with the second-best model AST-GCN, the prediction results of the IM-GCN model in terms of RMSE, MAE, ACC, and R2 have been improved by 2.83%, 3.8%, 2.38%, and 1.89% respectively.
Figure 13 is a comparison chart of the predicted values and the actual values obtained by using the IM-GCN, AST_GCN, T_GCN, and ARIMA algorithms respectively to predict the PEMS04 dataset. In the figure, the blue solid line represents the curve of the actual traffic flow values, and the red solid line represents the traffic flow predicted value curve obtained by the IM-GCN algorithm proposed in this paper. It can be seen from the figure that the IM-GCN algorithm achieves the best prediction accuracy, the AST_GCN obtains the second-best prediction performance, and the traditional ARIMA algorithm has the lowest prediction accuracy.
It can be seen from the data in
Table 4 that the RMSE and MAE values of the IM-GCN model are smaller than those of other models, indicating that the IM-GCN model has better prediction performance and higher accuracy. Compared with the second-best model, the prediction results of MAE, MAPE (Mean Absolute Percentage Error), and RMSE are improved by 1.54%, 2.81%, and 1.41% respectively.
The AST-GCN model only concatenates some external factors that affect traffic flow with traffic information and fails to comprehensively capture the connections among these pieces of information. The IM-GCN model proposed in this paper captures the spatio-temporal characteristics of external information by embedding comprehensive traffic flow information, thus improving the prediction performance of the model.
Based on 1,000 iterations, the model in this paper is compared with the AST-GCN. As shown in
Table 5, the model in this paper is slightly superior to the AST-GCN in all aspects.
Table 5 shows the performance evaluation results of three models (T-GCN, AST-GCN and IM-GCN) on the SZ_taxi dataset under different time ranges (15 minutes, 30 minutes, 45 minutes and 60 minutes). Based on
Table 5, it can be seen that the IM-GCN model outperforms the T-GCN and AST-GCN models in all time ranges. Specifically, within the prediction ranges of 45 and 60 minutes, compared with the T-GCN model, the IM-GCN model shows that the RMSE is reduced by 2.15% and 1.87%, respectively, and the MAE is reduced by 3.51% and 2.44%, respectively. The RMSE of the IM-GCN model is reduced by 1.24% and 1.17% compared with the AST-GCN model, respectively, and the MAE is reduced by 1.40% and 1.40% compared with the AST-GCN model, respectively, with a decrease of 1.40% and 1.19% respectively. It is worth noting that the prediction performance of all models declines as the time interval increases. This is because as the time interval increases, the prediction task becomes more difficult, and the model needs to capture the changes in data over a longer time range, which may lead to overfitting and an increase in prediction errors. However, although the performance of the metrics declines, the decline is not significant, indicating that the models possess a certain degree of robustness and prediction ability.
5.6. Ablation Experiment
In order to verify the necessity of conducting experiments by adding each external factor that affects traffic flow, this study carried out ablation experiments on weather factors, distribution factors of surrounding Points of Interest (POI), day-of-the-week factors, holiday factors, and morning and evening rush hour factors respectively.
Figure 14 is the curve graph of the predicted values obtained from the ablation experiment on the external factors affecting traffic flow by the IM-GCN algorithm. This experiment was conducted using the PEMS04 traffic flow data set. After the predicted data was input into the evaluation index formula and calculated, the values of each evaluation index are shown in
Table 6.
Table 6 is a table comparing the performance metrics of the models in the ablation experiments on various factors affecting traffic flow in the PEMS04 data set. It can be seen from the table that when the weather factor is not considered, the mean absolute error increases by 2.95 and the mean absolute percentage error increases by 1.73%. When the holiday factor is not considered, the mean absolute error increases by 2.58 and the mean absolute percentage error increases by 1.4%. Based on this table, it can be seen respectively that when the weather factor, holiday factor, weekday factor, POI factor, and traffic peak and trough period factor are ablated individually, the performance metrics of the prediction model all decline to varying degrees.
The ablation experiments in this study demonstrate that integrating the external factors affecting traffic flow, namely weather factors, holiday factors, weekday factors, POI factors, and traffic peak and trough period factors, for traffic flow prediction can effectively improve the performance of the prediction model. In other words, in order to obtain a more accurate traffic flow prediction model, it is necessary to integrate these external factors.