Error evolution in multi-step ahead streamflow forecasting for the operation of hydropower reservoirs

hydropower reservoirs Georgia Papacharalampous*, Hristos Tyralis, and Demetris Koutsoyiannis Department of Water Resources and Environmental Engineering, School of Civil Engineering, National Technical University of Athens, Iroon Polytechniou 5, 157 80 Zografou, Greece * Corresponding author, papacharalampous.georgia@gmail.com  Abstract: Multi-step ahead streamflow forecasting is of practical interest for the operation of hydropower reservoirs. We provide generalized results on the error evolution in multi-step ahead forecasting by conducting several large-scale experiments based on simulations. We also present a multiple-case study using monthly time series of streamflow. Our findings suggest that some forecasting methods are more useful than others. However, the errors computed at each time step of a forecast horizon within a specific case study strongly depend on the case examined and can be either small or large, regardless of the forecasting method used and the time step of interest.


Introduction
The available methodologies for time series forecasting regarding the forecasting horizon can be classified as one-and multi-step ahead forecasting.There are five strategies for multi-step ahead forecasting, namely the recursive, direct, DirRec, MIMO and DIRMO (Bontempi et al. 2013, Taieb et al. 2012).Despite its far more challenging nature in comparison to one-step ahead forecasting, multi-step ahead forecasting is a common practice in hydrology (e.g.Papacharalampous et al. (2017b), Valipour et al. (2013)) and beyond.Moreover, it is of particular importance for the operation of hydropower reservoirs (e.g.Ballini et al. (2001), Cheng et al. (2008), Coulibaly et al. (2000), Luna et al. (2009)) and, by extension, for the energy industry, especially if we consider that hydropower is a form of energy both reliable and sustainable (Koutsoyiannis 2011).
Herein, we conduct several large-scale computational experiments based on simulations to provide generalized results on the error evolution in multi-step ahead forecasting.We additionally conduct a multiple-case study using monthly time series of streamflow to highlight important facts, which exhibit greater interest when presented using real-world data.Our aim is to create a representative image of the underlying phenomena and, thus, we compare an adequate number of forecasting methods on a large number of time series.The latter are simulated according to a linear model of stationary stochastic processes, which is widely used for the modelling of hydrological processes.

Time series
We simulate time series according to the ARFIMA(p,d,q) model, where ARFIMA stands for Autoregressive Fractionally Integrated Moving Average.Although this specific modelling is accompanied by certain problems (Koutsoyiannis 2016), it is considered rather satisfying for the present study and has been widely applied in the literature (e.g.Montanari et al. (1997)).
We use the fracdiff.simalgorithm of the fracdiff R package (Fraley et al. 2012) to simulate time series of the types stated in Table 1.For the real-world case study we use 92 mean monthly time series of streamflow, which originate from catchments in Australia and are extracted from a larger data set (Peel et al. 2000).We use the deseasonalized time series for the application of the forecasting methods.To describe the long-term persistence of the deseasonalized time series we estimate their Hurst parameter H using the mleHK algorithm of the HKprocess R package (Tyralis 2016), which implements the maximum likelihood method (Tyralis and Koutsoyiannis 2011).The parameter H ranges in the interval (0,1) and values > 0.5 indicate long-range dependence of the Hurst -Kolmogorov stochastic process, which is widely used for the modelling of geophysical processes instead of the ARFIMA(0,d,0) model.The estimated values range between 0.56 and 0.99 with a mean value of 0.78.

Forecasting methods
We use 16 forecasting methods originating from the implementation of several popular forecasting algorithms (see Table 2).Specifically, we apply the simple, auto_ARFIMA, state space, exponential smoothing and NN_3 methods using the R package forecast (Hyndman et al. 2017, Hyndman andKhandakar 2008) and the remaining forecasting methods using the R package rminer (Cortez 2010(Cortez , 2016)), as also several built-in R algorithms (R Core Team 2017).
The R package rminer uses the nnet algorithm of the nnet R package (Venables and Ripley 2002) the randomForest algorithm of the randomForest R package (Liaw and Wiener 2002) and the ksvm algorithm of the kernlab R package (Karatzoglou et al. 2004) for the application of the neural networks, random forests and support vector machines respectively.The source code for the implementation of the forecasting methods, as well as generalized information about their performance when applied to linear stochastic processes, can be found in Papacharalampous et al. (2017a).

Methodology outline
We conduct 6 large-scale simulation experiments.The latter are determined by the simulated time series, as presented in Table 3.We additionally conduct a multiple-case study, which is composed of 92 single-case studies.Some basic information about the time series used in the present study are provided in Section 2.1.We apply several popular forecasting methods (see Section 2.2) to the time series.Regarding the application of the forecasting methods, we split each time series into a fitting and a test set.The latter is the last 50 values for the simulation experiments and the last 12 observations for the multiple-case study.We fit the models to the fitting set and make predictions corresponding to the test set using the recursive multi-step ahead forecasting method.Next, we calculate the errors and the absolute errors at each time step of the forecast horizon.Within the simulation experiments we carry out a statistical analysis on the formed data sets and we present the results accordingly.As regards the real-world time series, the fitting set is used after deseasonalization, which is performed using a multiplicative model of time series decomposition, while the seasonality is subsequently added to the predicted time series.This specific practice is suggested for the improvement of the forecast quality (Taieb et al. 2012).
We present the results of the multiple-case study in a qualitative form to facilitate the detection of systematic patterns.

Simulation experiments
In Section 3.1 we present a representative part of the results of the simulation experiments to support our generalized findings.In more detail, in Figure 1 and Figure 2 we present the errors and the absolute errors computed at each time step of the forecast horizon within the simulation experiment SE_1a for several of the forecasting methods respectively, while in Figure 3 we present the median absolute errors within each of the forecasting experiments for the total of the forecasting methods.
In Figure 1 we observe that the error evolution can differ to a great extent from the one forecasting method to the other.However, all the error distributions tend to be approximately symmetric around zero.At the first few time steps ahead we observe an apparent increase of the median and interquartile range values.This increase is followed by a stabilization of the error distributions for most of the forecasting methods (e.g.Naïve and NN_3).On the contrary, when using the RW and ETS_s forecasting methods the errors seem to keep increasing until the last time step of the forecast horizon.Furthermore, the outliers are more frequent and lay farther from the median values when using specific forecasting methods (e.g.NN_3).This form of instability should also be considered when choosing a forecasting method.Moreover, in Figure 2 we observe that the auto_ARFIMA and Theta forecasting methods have been proven more accurate than the Naïve benchmark.The same applies to BATS, which however produces far outliers.The latter tend to be farther from the median values, as the time step increases.Finally, Figure 3 summarizes the basic information provided by the simulation experiments in a concise manner.As we observe, the results vary from the one simulation experiment to the other to an extent depending on the forecasting method.For instance, NN_1 can deliver either moderate or good performance depending on the simulation experiment, while auto_ARFIMA has been proven the most accurate forecasting method within each of the simulation experiments conducted, a fact expected at the forefront of our calculations (see Papacharalampous et al. (2017a)).Admittedly, the findings suggest that some forecasting methods are more useful than others.Additionally, we note that the absolute errors are in general higher on the ARFIMA(1,0.30,0)processes and lower on the ARFIMA(0,0.30,1)processes than they are on the ARFIMA(0,0.30,0)processes, while they are also lower for the fitting set of 300 values than they are for the fitting set of 100 values.

Multiple-case study
In Section 3.2 we present a part of the results of the multiple-case study.In more detail, in  The relative magnitude of the absolute errors seems to strongly depend on the case examined, while the effect of the forecasting method used or the time step of the forecasting horizon on the error evolution cannot be extracted from these figures, neither from any other single-or multiple-case study.Therefore, although our simulation experiments prove that some forecasting methods are more likely to produce more accurate forecasts than others, as well as that the errors are more likely to be smaller at the first few time steps of the forecast horizon than they are at the next time steps (see Section 3.1), within a specific single-case study the largest (or the smallest) absolute errors can result from the implementation of any forecasting method at any time step of the forecast horizon.

Conclusions
We deliver generalized results on the error evolution in multi-step ahead forecasting using the recursive technique by comparing the performance of 16 forecasting methods.The present study is an expansion of Papacharalampous et al. (2017a), as it provides complementary information about the forecasting methods also implemented in the latter.Our findings indicate that the error evolution can differ to a great extent from the one forecasting method to the other.This specific information can be used to decide on a forecasting method, since some forecasting methods have been proven more useful than others.
However, due to the stochastic nature of forecasting, the errors computed at each time step of a forecast horizon within a specific single-case study strongly depend on the individual case examined and can be either small or large, regardless the forecasting method used and the time step of our interest.In fact, the limitations accompanying time series forecasting, which were emphasized by Koutsoyiannis et al. (2008) and also by Papacharalampous et al. (2017aPapacharalampous et al. ( , 2017b)), are highly perceivable here as well.These limitations might impose the implementation of probabilistic forecasting methodologies (e.g. using Bayesian statistics, as in Tyralis and Koutsoyiannis (2014)) instead of point forecasting.

Figure 2 .
Figure 2. Absolute errors computed at each time step of the forecast horizon within the simulation experiment SE_1a for the (a) Naïve, (b) auto_ARFIMA, (c) BATS and (d) Theta forecasting methods.

Figure 4
Figure4we present eight heatmaps, each corresponding to a specific single-case study, and in Figure5we present six heatmaps, each corresponding to the cross-case synthesis of the results of the 92 single-case studies when using a specific forecasting method.These heatmaps can facilitate the comparison of the absolute errors within and across the various single-case studies in a representative manner.

Figure 4 .
Figure 4. Heatmaps for the comparison of the absolute errors computed at each time step of the forecast horizon within several single-case studies.The values are scaled in the row direction and the darker the colour the better the forecasts.

Figure 5 .
Figure 5. Heatmaps for the comparison of the absolute errors computed at each time step of the forecast horizon across the 92 individual cases when using the (a) Naïve, (b) auto_ARFIMA, (c) BATS, (d) ETS_s, (e) RF_1 and (f) SVM_2 forecasting methods.The values are scaled in the row direction and the darker the colour the better the forecasts.

Table 1 .
Types of time series simulated in the present study.

Table 2 .
Forecasting methods used in the present study.

Table 3 .
Simulation experiments of the present study.The time series types are presented in Table 1.