2. Model Framework
We consider a retailer with possibly many large customers. In general, retailers forecast their aggregate demand stream since the forecasting of the individual customer demand streams is often thought of as being cumbersome and time consuming. We demonstrate that a retailer observing multiple streams of ARMA demand can drastically reduce its Mean Squared Forecasting Error by forecasting the individual demand streams or aggregated clusters of similar demand streams. Specifically, consider a retailer that observes multiple demand streams for a single product
. Each demand stream
is ARMA with respect to a sequence of shocks
given by
such that
is invertible and causal with respect to
(see Brockwell and Davis, page 77 for a definition and discussion about causal and invertible ARMA models). We denote the variance of each shock sequence
. Furthermore we note that the shock sequences are potentially contemporaneously correlated with
. In general, this set up guarantees that the shocks
are the retailer’s Wold shocks and that the one-step-ahead MSFE of one-step-ahead leadtime demand (when using the disaggregated (individual) streams) is the sum of the elements in the covariance matrix
where
such that
(see Equation (
10).
The focus of this paper is evaluating the difference in one-step-ahead MSFEs at time t when the forecast of leadtime demand, given by , is based on the disaggregated (individual) sequences versus when the forecast is based on the aggregated sequence versus when the forecast is based on subaggregated clusters where and the superscript indicates that belongs to cluster . Studying one-step-ahead MSFEs is mathematically simpler than those for general leadtimes since the former does not depend on model parameters.
Our problem is related to the one posed by [
5] where a two-stage supply chain was considered with the retailer observing two demand streams. The focus of that paper was in evaluating information sharing between the retailer and supplier in a situation where the retailer forecasts each demand stream separately. Here we show the benefit to the retailer
1 in determining the separate forecasts, while considering the existence of (possibly) more than two demand streams. Kohn [
4] was the first to identify conditions under which using the separate ARMA streams leads to a better forecast than using the aggregate sequence, however he did not determine the MSFE in the two cases.
In this paper we extend the results of [
5] by providing formulas for computing MSFEs under the possibility of more than two streams and a general leadtime in the situation where a player’s (retailer’s) forecast can only be based on their Wold shocks (see [
2] pp 187-188 for a description of a Wold decomposition of a time series). We also describe how a retailer would forecast its demand by identifying clusters of similar demand streams and then forecasting each cluster after it is aggregated. Assuming general ARMA demand streams, we provide a pivot algorithm, which we call Pivot Clustering, for identifying a locally optimal assignment of streams into a fixed number of clusters. The algorithm will often find the best possible assignment. We also describe a fast clustering algorithm for assigning independent MA(1) streams to clusters which results in a globally optimal assignment. Our results show that after the algorithms are carried out, much of the benefit of forecasting individual streams can be obtained by forecasting the aggregated clusters.
2.1. Forecasting Using the Disaggregated (Individual) Sequences
In this section we are interested in forecasting
when the forecast is based on the individual sequences
. The forecasting contained here is a textbook multivariate time-series result along with the propagation described in [
3] as seen in (
7) .
Thus we consider the ARMA processes given by
with covariance matrix of the shock sequences given by
such that
.
The total MSFE of the BLF is given as the
sum of the elements in the
Nx
N matrix
which takes the form
We note that the diagonal elements are those obtained in [
3]. Specifically,
where
where
is the
coefficient appearing in the MA(
∞) representation of
with respect to
. That is
and
We note that .
Remark 1.
The off-diagonal elements of (5) can be summarized as
Thus the MSFE of the BLF of leadtime demand when using the individual sequences
is given by
where we observe the equivalence
.
Thus we compare (
10) with (
27) below to determine the value of using the individual sequences as opposed to the aggregate.
2.2. ARMA Representation of a Summed Sequence
In this subsection we determine the ARMA representation and the variance of the shocks appearing in the Wold representation of a series , which is the sum of several ARMA streams given by . This will allow us to determine the MSFE when forecasts are based on the fully aggregated series as well as when the forecasts are based on subaggregated clusters. In order to obtain the ARMA representation we first need to obtain the spectral density and the covariance generating function of .
Proposition 1.
Let . The spectral density of is given by
where the cross-spectrum is defined with .
Proof. We will prove this by induction. Note that when
,
is given by
and
is ARMA. This matches representation (
11).
Now suppose (
11) holds for
processes. Thus
Consider
. Since
is ARMA, we observe that
Note that starting with the definition of
,
Thus from (
13),
or equivalently
which can be simply written as
and the result is proved. □
Now consider the covariance generating function
of
. Here we use the equivalence
and note the following:
Thus from Proposition 1 we observe that
As described in Theorem 5 of [
5], the covariance generating function
can be factorized as the ratio
where
,
and
are Laurent polynomials, with
having all its roots on the unit circle and
and
having no roots on the unit circle. This result follows from the fact that each additive term in (
77) is a ratio of Laurent Polynomials and the fact that for any Laurent polynomial
, both
and
will be Laurent polynomials. Furthermore if
and
are Laurent polynomials then
and
will be Laurent polynomials as well.
We can now use the factorization provided in [
7] and described in [
5] to obtain the ARMA representation of
with respect to the shocks
appearing in its Wold representation. It should be noted that Remark 2 is simply a restatement of Theorem 5 of [
5] with the slight addition of determining the polynomials appearing in the ARMA representation of the aggregate sequence
.
Remark 2.
can be represented with respect to shocks as the ARMA model
where where are the roots of on or inside the unit circle and where are the roots of inside the unit circle. Furthermore
where is the coefficient of in and is the coefficient of in .
2.3. Forecasting Using the Fully Aggregated Sequence for A General Leadtime
We note that Remark 2 can be used to obtain the ARMA representation of
with respect to its Wold shocks
as well as
. We can therefore use Lemma 1 of [
3] and its proof to determine the best linear forecast (BLF) of
and its MSFE when forecasting using
. That is, if we consider the MA(
∞) representation of
with respect to
given by
where
, then the MSFE of the BLF when using
is
where
We note that
.
2.4. Forecasting Using Subaggregated Sequences
In this section we use the results in the previous section to create a forecast and compute its MSFE when some of the individual streams are subaggregated. That is, for , let cluster consist of streams such that . We are interested in the forecast and MSFE based on .
Section 2.2 describes how we can obtain the ARMA representation and variance of shocks appearing in the Wold representation for each sequence
. This can then be used to create a forecast from each subaggregated sequence, the sum of which can be taken as the forecast for
. The one-step ahead MSFE of this forecast is the sum of the entries of the covariance matrix of the shocks appearing in the Wold representation of
. Equation (
10) describes how we can also use the covariance matrix of the shocks of
to obtain the MSFE for multi-step ahead forecasts. The remainder of this section will focus on obtaining this covariance matrix.
Without loss of generality, consider two subaggregated series
and
with ARMA representations
where
and
are the shocks appearing in the Wold representation of
and
respectively. We note that the variances of
and
can be obtained using Remark 2. To obtain the covariance
consider the following.
We can rewrite the ARMA representations above as
The ARMA representations of
can also be rewritten as
Based on the definition of
and
we observe that
To obtain
we need to compute the expectation of the the product of the right-hand-sides of these two equations. Thus we need to consider the sum of terms such as
Note that we can write
and
where
and
can be obtained in the same way as the MA(
∞) coefficients in (
28). Hence the term in Equation (
36) can be rewritten as
Since the shock sequences are not correlated across time by assumption, this is equivalent to
or equivalently
Adding up these terms as required would yield the covariance. Hence
We note that this methodology can easily be extended to obtain any of the covariances in the covariance matrix
We will use the previous methodology for all theoretical MSFE computations found in this paper. In the next subsection, we provide an example describing the importance of forecasting leadtime demand based upon the individual sequences.
2.5. Example
Consider a retailer that observes Aggregate demand
where each individual demand stream is given by the following ARMA processes:
where the shock covariance matrix is given by
We use the results described in
Section 2.1 and
Section 2.2 to compare the MSFE of the forecasts of leadtime demand
when using the individual sequences
versus when using the aggregate sequence
.
The covariance generating function of
is given by
and the ARMA representation of
is given by
where
. We can check that the ARMA representation of
and its covariance generating function match up by noting that
with
and
being the MA and AR polynomials appearing in the ARMA representation of
as defined in Remark 2.
We note that the one-step ahead forecast then has a MSFE=5.629561 and a two-step ahead forecast (of ) has a MSFE=11.44056.
To obtain the MSFE when using the individual sequences we consider the multivariate-ARMA representation
with the shock-covariance matrix as stated earlier. Using Equation (
10) and noting that
in Equation (
7) we see that the one-step ahead forecast has a MSFE=1.5. We note that this forecast error is 4.129561 (73.3%) lower that the forecast error when using the aggregated series. We can likewise compute the list of elements
when forecasting
. In this case,
and
,
, and
. From (
10) this implies that the MSFE=7.311, which is also 4.129561 (36.1%) lower than the forecast error when using the aggregated series. We note that in general the difference in MSFE when forecasting using the aggregate or the individual series will be different for different lead-times however when considering individual series other than MA(1). In the next section we demonstrate, with example, the benefit of forecasting using subaggregated clusters (which would have been identified using a suitable technique).