4. TS Clustering Algorithms and Pivot Clustering, Empirical Evaluation
We have shown that the when the retailer forecasts using clusters of its demand streams as opposed to the individual streams, its MSFE can vary greatly from close to the optimal value (when forecasting using the individual streams) to close to the MSFE of a forecast based on the aggregate of all the streams. Hence, if the retailer wishes to forecast using clusters of its demand streams, the selection of the clusters is important. In general, a retailer might have customer related information that could be used to generate the clusters. The benefit of generating the clusters and forecasting them is that there could be situations where it would be cumbersome for the retailer to collect the individual demand streams and service them individually. Forecasting clusters would therefore be a second best option.
We propose Pivot Clustering for determining clusters which usually results in a relatively low MSFE among all choices of streams to clusters. We consider two ways to obtain the subaggregated MSFE based on some clustering assignment. The first is to use the individual ARMA demand models appearing in (
2) to compute the subaggregated theoretical MSFE as per
Section 2.2,
Section 2.3 and
Section 2.4. We note that if there is only one cluster then the MSFE is the one computed for a forecast based on the aggregate of the all the streams while if the number of clusters is equal to the number of streams, the MSFE is for a forecast based on the disaggregated (individual) sequences. We also estimate the MSFE by generating demand realizations for each stream based on (
2). Once a demand realizations are simulated for each stream, we subaggregate the realizations based on our choice of clusters. So if some cluster is made up of streams
and
, we say that the cluster has realization
. We then estimate an ARMA(5,5)
4 model using each cluster’s realization. Finally we use the estimated models to obtain in-sample forecast errors and compute the covariance matrix of these forecast errors to estimate the MSFE for a particular assignment of streams to clusters. In the analysis below we see that the estimated MSFEs are close to theoretical ones and often lead to similar choices of clusters. Based on a predetermined number of clusters
n, Pivot Clustering works as follows.
For each stream, randomly assign it to a cluster.
For each cluster,
For each stream in the cluster,
Compute or estimate the MSFE for the current assignment along
with the resulting MSFEs if the stream was moved to each of the other clusters.
# MSFE can be either estimated based on realizations of the demand streams or
# computed using Equations (
28), (
29), (
41) and (
42).
Move each stream in the cluster to a cluster which leads to largest overall MSFE-
reduction among all choice of clusters.
In the remainder of this section we perform various simulations to assess the efficacy of Pivot clustering. We focus on ARMA(1,1) models as these do not require too much runtime for Pivot clustering based on theoretical MSFE and are complex enough to describe demand data such as in [
10]. Additionally forecasting an aggregate of ARMA(1,1) demand sequences has been studied by [
11] where forecasts were based on exponential smoothing. The methods herein are generally applicable however to higher-order ARMA models. From a computational standpoint, it is possible to determine theoretical MSFE based on the aggregate of up to twenty demand streams generated by ARMA(1,1) models. The burden lies in having to find roots of large degree polynomials in order to determine the ARMA model and shock variance that describes the aggregated sequence. To understand the computational requirements we check the runtimes of Pivot Clustering when determining theoretical and estimated MSFEs. Based on a given number of streams (between 10 and 20) we carry out Pivot Clustering for twenty different combinations of ARMA(1,1) models and plot the average of the runtimes in
Figure 1 in assigning the streams to four clusters. If using estimated MSFE then many more streams can be clustered and Pivot Clustering has faster runtimes for larger amounts of streams. Upon checking, Pivot Clustering with estimated MSFEs for 200 streams takes approximately 20 minutes.
We can check the efficacy of Pivot Clustering through simulation. We begin by randomly assigning coefficients to twenty ARMA(1,1) models to produce twenty demand streams as well as the covariance matrix of the shock sequences. We make sure that each assignment results in causal and invertible demand with respect to the shocks and that the resulting covariance matrix is positive definite. The AR and MA coefficients and covariance matrix can be found in
https://github.com/vkovtun84/Pivot-Clustering-of-Demand-Streams under Models.csv and covarmat.csv.
After randomly assigning streams to one of four clusters we compute both the estimated and theoretical one-step-ahead MSFEs based on this random assignment and use it to start Pivot Clustering. We output the clusters found by Pivot as well as the MSFE of the forecast based on this set of clusters. We iterate this procedure 50 times to study how much the MSFE improves based on Pivot Clustering for the starting allocations. The MSFEs of the final clusters and random clusters can be found under MSFEresults.csv in
https://github.com/vkovtun84/Pivot-Clustering-of-Demand-Streams. These can also be compared with the MSFEs of the forecast based on the individual (disaggregated) demand streams and the forecast based on fully aggregating the streams.
For the twenty demand streams and models used, the theoretical and estimated MSFEs when forecasting based on individual (disaggregated) streams are 102.1 and 96.2. The theoretical and estimated MSFEs when forecasting based on the fully aggregated streams are 231.3 and 220.6. For the 50 simulations of assigning streams to random clusters (used in the initialization step of Pivot) the average of the theoretical and estimated MSFEs based on the subaggregated random clusters are 202.2 and 194.2. After Pivot Clustering is carried out to obtain a better set of subaggregated clusters in each of the 50 simulations, the averages of the theoretical and estimated MSFEs are 109.4 and 101.0. The various theoretical and estimated MSFEs for the different initializations are provided in
Figure 2 and
Figure 3. We note that regardless of the initial random assignment of streams to clusters, Pivot Clustering leads to the clustering of streams such that the subaggregated MSFE is very low. In fact, typically Pivot Clustering results in clusters for which the subaggregated MSFE ends up very close to the MSFE obtained when forecasts are based on the individual (disaggregated) streams.
We can compare our results with existing time-series clustering methods. Two distance measures that can be computed for time-series realizations are available in the TSclust package for R, namely AR.PIC and AR.LPC.CEPS. These distances can be used to perform hierarchical clustering such as average-linkage clustering. The final groups determined by these methods lead to MSFEs of 123.4 and 108.8 respectively, higher than those found by Pivot starting from random assignments. We note that the cluster assignments found by these methods can also be used in the initialization of Pivot Clustering, potentially leading to even better clusters.
Since the previous simulations were carried out on only one set of twenty ARMA(1,1) demand models, we should also check the efficacy of Pivot Clustering for other sets of models as well. As such, we consider twenty simulations where within each simulation a new set of twenty demand models is considered. We compare the estimated and theoretical MSFEs of one random assignment of streams to four clusters with the estimated and theoretical MSFEs of the four clusters obtained by Pivot Clustering. In each simulation we also compute the MSFEs that would be found when fully aggregating the streams or when considering forecasts based on individual streams as well as the MSFEs that would be found using the AR.PIC and AR.LPC.CEPS distances for hierarchical clustering streams into four clusters. The results of these simulations are displayed in
Figure 4 and
Figure 5. We note that if forecasts are to be based on four clusters, the lowest MSFEs are obtained when clusters are formed using Pivot Clustering. Furthermore, Pivot Clustering leads to forecasts whose MSFE is very close to the MSFE of the forecast based on the individual streams in every simulation.
We continue with twenty simulations where in each simulation we consider a separate set of 20 streams being subaggregated into four clusters with 10 random initializations of Pivot Clustering. The means of the various theoretical and estimated MSFEs under different clustering approaches are displayed in
Figure 6 and
Figure 7. We note again that in every set of twenty streams, the averaged subaggregated MSFEs are very close to the disaggregated MSFEs when averaged for different initial random assignments of streams to clusters.
We continue by assessing how well Pivot clustering does when compared against an exhaustive algorithm which checks all possible assignments of streams to clusters using theoretical MSFE calculations. To do so, we consider twenty simulations where within each simulation we randomly generate 10 ARMA(1,1) streams
5 and compute the lowest MSFE possible among all choices of streams to 3 clusters. Furthermore we consider 10 random initializations of Pivot each time new streams are considered. The results are displayed in
Figure 8. We note in the first row of that table that for the first simulation of twenty streams 8 out of 10 initializations of our algorithm led Pivot clustering to find the optimal solution. Among the 2 initializations that did not lead to the optimal solution, the ratio of optimal MSFE to the MSFE of the grouping found by pivot is 96.81%. The median was 96.81% while the minimum ratio was 96.24%. In some instances Pivot clustering never found an optimal solution (such as in the 6th simulation), however the average MSFE of the optimal solution was around 99.6% of the MSFE of the groupings found by Pivot. In the worst performance of Pivot (simulation 15), the best possible grouping led to an MSFE that was 80.27464% lower than the MSFE found by Pivot.
Finally, we consider the robustness of the Pivot algorithm to cases where the data generation process is not ARMA. To do so we perform ten simulations where in each simulation we simulate twenty demand stream realizations such that stream
follows an ARFIMA(
) model given by
where
and
may be nonzero. Each realization, consisting of 1500 time periods is used to fit an ARMA(5,5) model to compute an estimated one-step-ahead MSFE for the disaggregated series (appearing as a blue dot in
Figure 9). Summing the realizations together to fit an ARMA(5,5) model yields an estimated MSFE for the aggregated series (appearing as a black dot in
Figure 9). Finally the Pivot algorithm is carried out using five different random initializations of assigning streams to one of four clusters. The MSFEs for the subaggregated random clusters and Pivot clusters appear as red and green dots in
Figure 9. We note that in the subaggregated case the number of ARMA models that needs to be estimated is equal to the number of clusters.
As before, we note that Pivot clustering provides a sharp reduction in MSFE compared to random cluster assignments as well as compared to the aggregated case. We also observe that when fitting ARMA models to non-ARMA data it is possible for Pivot clustering to yield clusters which lead to a subaggregated MSFE that is lower than the MSFE using the individual (disaggregated) series. The exact cause of this is unclear, however it is possible it has to do with the extra number of misspecified ARMA models that are fit in the disaggregated case.
Figure 9.
Estimated MSFEs when using aggregated, disaggregated, and subaggregated series to forecast one-step-ahead demand for series that are not generated using an ARMA process. A total of 10 simulations is performed where within each simulation twenty separate demand realizations are generated according to the ARFIMA() model with a different d for each realization such that the shocks appearing in the ARFIMA model are contemporaneously correlated. Pivot clustering is carried out for 5 random initialization of assigning the streams to one of four clusters. We compute the estimated MSFEs for the disaggregated series (blue), aggregated series (black), subaggregated clusters generated using random assignment (red) and subaggregated clusters generated using the result of Pivot clustering. We note the supremacy of Pivot clustering in all ten simulations.
Figure 9.
Estimated MSFEs when using aggregated, disaggregated, and subaggregated series to forecast one-step-ahead demand for series that are not generated using an ARMA process. A total of 10 simulations is performed where within each simulation twenty separate demand realizations are generated according to the ARFIMA() model with a different d for each realization such that the shocks appearing in the ARFIMA model are contemporaneously correlated. Pivot clustering is carried out for 5 random initialization of assigning the streams to one of four clusters. We compute the estimated MSFEs for the disaggregated series (blue), aggregated series (black), subaggregated clusters generated using random assignment (red) and subaggregated clusters generated using the result of Pivot clustering. We note the supremacy of Pivot clustering in all ten simulations.