Preprint

Article

Altmetrics

Downloads

115

Views

25

Comments

0

A peer-reviewed article of this preprint also exists.

David Fernando Muñoz

David Fernando Muñoz

This version is not peer-reviewed

When there is uncertainty in the value of parameters of the input random components of a stochastic simulation model, two-level nested simulation algorithms are used to estimate the expectation of performance variables of interest. In the outer level of the algorithm (n) observations are generated for the parameters, and in the inner level (m) observations of the simulation model are generated with the value of parameters fixed at the value generated in the outer level. In this article, we consider the case in which the observations at both levels of the algorithm are independent, showing how the variance of the observations can be decomposed into the sum of a parametric variance and a stochastic variance. Next, we derive central limit theorems that allow us to compute asymptotic confidence intervals to assess the accuracy of the simulation-based estimators for the point forecast and the variance components. Under this framework, we derive analytical expressions for the point forecast and the variance components of a Bayesian model to forecast sporadic demand; and we use these expressions to illustrate the validity of our theoretical results by performing simulation experiments using this forecast model.

Keywords:

Subject: Computer Science and Mathematics - Computational Mathematics

Simulation is widely recognized as an effective technique to produce forecasts, evaluate risk (see, e.g., [1]), animate and illustrate the performance of a system over time (see, e.g., [2]). When there is uncertainty in a component of a simulation model, it is said to be a random component, and it is modeled using a probability distribution and/or a stochastic process that is generated during the simulation run, to produce a stochastic simulation. Random component typically depends on the value of certain parameters, and we will denote by $\theta $ a particular value for the vector of parameters of the random components of a stochastic simulation, and $\mathsf{\Theta}$ will denote the random vector that corresponds to the parameter values when there is uncertainty on the value of these parameters.

In general, the output of a stochastic (dynamic) simulation can be regarded as a stochastic process $\left\{Y\right(s):s\ge 0;\mathsf{\Theta}\}$, where $Y\left(s\right)$ is a random vector (of arbitrary dimension d) representing the state of the simulation at time $s\ge 0$. The term transient simulation applies to a dynamic simulation that has a well-defined termination time, so that the output of a transient simulation can be viewed as a stochastic process $\left\{Y\right(s):0\le s\le T;\mathsf{\Theta}\}$, where T is a stopping time (may be deterministic), see, e.g., [3] for a definition of stopping time. Note that this notation includes the case of a discrete-time output ${Z}_{0},{Z}_{1},\cdots $, if we assume that $Y\left(s\right)={Z}_{\lfloor s\rfloor}$ , where $\lfloor s\rfloor $ denotes the integer part of s.

A performance variable W in transient simulation is a real-valued random variable (r.v.) that depends on the simulation output up to time T, i.e., $W=f\left(Y\right(s),0\le s\ge T;\mathsf{\Theta})$, and the expectation of a performance variable W is a performance measure that we usually estimate through experimentation with the simulation model. When there is no uncertainty in the parameters of the random components, the standard methodology that is used to estimate a performance measure in transient simulation is the method of independent replications, that consists on running the simulation model to produce n replications ${W}_{1},{W}_{2},\cdots ,{W}_{n}$ that can be regarded as independent and identically distributed (i.i.d.) random variables (see Figure 1) .

Under the method of independent replications, a point estimator for the expectation $\alpha =E\left[{W}_{1}\right]$ is the average $\widehat{\alpha}\left(n\right)=\frac{{\sum}_{i=1}^{n}{W}_{i}}{n}$. If $E\left[\right|{W}_{1}\left|\right]<\infty $, it follows from the classical Law of Large Numbers (LLN), that $\widehat{\alpha}\left(n\right)$ is consistent, i.e., it satisfies $\widehat{\alpha}\left(n\right)\Rightarrow \alpha $ , as $n\to \infty $ (where ⇒ denotes weak convergence of random variables), see, e.g., [3] for a proof. Consistency guarantees that the estimator approaches the parameter as the number of replications n increases, and the accuracy of the simulation-based estimator $\widehat{\alpha}\left(n\right)$ is typically assessed by an asymptotic confidence interval (ACI) for the parameter. The expression for an ACI for a parameter of a stochastic simulation is usually obtained through a Central Limit Theorem (CLT) for the estimator (see, for example, chapter 3 of [4]). For the case of the expectation $\alpha $ in the algorithm of Figure 1, if $E\left[{W}_{1}^{2}\right]<\infty $, the classical CLT implies that
as $n\to \infty $, where ${\sigma}^{2}=E\left[{({W}_{1}-\alpha )}^{2}\right]$ and $N(0,1)$ denotes a r.v. distributed as normal with mean 0 and variance 1. Then, if $E\left[{W}_{1}^{2}\right]<\infty $, it follows from (1) and Slutsky’s Theorem (see the Appendix) that
as $n\to \infty $, where $\widehat{\sigma}\left(n\right)$ denotes the sample standard deviation, i.e., ${\widehat{\sigma}}^{2}\left(n\right)=\frac{{\sum}_{i=1}^{n}{({W}_{i}-\widehat{\alpha}\left(n\right))}^{2}}{n-1}$. This CLT implies that
for $0<\beta <1$, where ${z}_{\beta}$ denotes the ($1-\beta /2$)-quantile of a N(0,1), which is sufficient to establish a $(1-\beta )100\%$ ACI for $\alpha $ with halfwidth

$$\frac{\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )}{\sigma}\Rightarrow N(0,1),$$

$$\frac{\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )}{\widehat{\sigma}\left(n\right)}\Rightarrow N(0,1),$$

$$\underset{n\to \infty}{lim}P\left[\right|\widehat{\alpha}\left(n\right)-\alpha |\le {z}_{\beta}\widehat{\sigma}\left(n\right)/\sqrt{n}]=1-\beta ,$$

$$H{W}_{\alpha}={z}_{\beta}\widehat{\sigma}\left(n\right)/\sqrt{n}.$$

A halfwidth in the form of (2) is the typical measure used in simulation software (e.g., Simio, see [2]) to assess the accuracy of $\widehat{\alpha}\left(n\right)$ for the estimation of expectation $\alpha $.

In contrast to the estimation of (output) performance measures, parameters of (input) random components of a simulation model are usually estimated from real-data observations (x) and, while most applications covered in the relevant literature assume that no uncertainty exists in the value of these parameters, the uncertainty can be significant when little data is available. In these cases, Bayesian statistics can be used to incorporate this uncertainty in the output analysis of simulation experiments via the use of a posterior distribution $p\left(\theta \right|x)$. A methodology currently proposed for the analysis of simulation experiments under parameter uncertainty, is a two-level nested simulation algorithm (see, e.g., [6,7,8]. In the outer level, we simulate (n) observations for the parameters from a posterior distribution $p\left(\theta \right|x)$, while in the inner level we simulate (m) observations for the performance variable with the parameters fixed at the value ($\theta $) generated in the outer level (see Figure 2). In this paper, we focus on the output analysis of two-level simulation experiments, for the case where the observations at the inner level are independent, showing how the variance of a simulated observation can be decomposed into parametric and stochastic variance components. Afterwards, we derive a CLT for both the estimator of the point forecast and the estimators of the variance components. Our CLTs allow us to compute an ACI for each estimator. Our results are validated through experiments with a forecast model for sporadic demand reported in [10]. This paper is an extended version of results initially reported in [11] and the missing proofs in [11] are provided.

Following this introduction, we present the proposed methodology for the construction of an ACI for the point forecast and the variance components in a two-level simulation experiment. Afterwards, we present an illustrative example that has an analytical solution for the parameters of interest in this paper. This example is used in the next section to illustrate the application and validity of our proposed methodologies for the construction of an ACI. Finally, in the last section, we present conclusions and directions for future research.

To identify the variance components in each observation ${W}_{ij}$ of the algorithm illustrated in Figure 2, let $\mu \left(\mathsf{\Theta}\right)=E\left[{W}_{11}\right|\mathsf{\Theta}]$, and ${\sigma}^{2}\left(\mathsf{\Theta}\right)=E\left[{W}_{11}^{2}\right|\mathsf{\Theta}]-{\mu}^{2}\left(\mathsf{\Theta}\right)$. Under this notation, the point forecast is $\alpha =E\left[\mu \right(\mathsf{\Theta}\left)\right]$, and the variance of each ${W}_{ij}$ is:
for $i=1,...,n$; $j=1,...,m$, where ${\sigma}_{P}^{2}=V\left[\mu \left(\mathsf{\Theta}\right)\right]\stackrel{def}{=}E\left[\mu {\left(\mathsf{\Theta}\right)}^{2}\right]-E{\left[\mu \left(\mathsf{\Theta}\right)\right]}^{2}$, and ${\sigma}_{S}^{2}=E\left[{\sigma}^{2}\left(\mathsf{\Theta}\right)\right]$. It is worth mentioning that, in the relevant literature, ${\sigma}_{S}^{2}$ is commonly referred to as stochastic variance and ${\sigma}_{P}^{2}$ is commonly referred to as parametric variance.

$$V\left[{W}_{ij}\right]\stackrel{def}{=}E\left[{W}_{ij}^{2}\right]-E{\left[{W}_{ij}\right]}^{2}=E[E\left[{W}_{ij}^{2}\right|\mathsf{\Theta}]-\mu {\left(\mathsf{\Theta}\right)}^{2}]+E\left[\mu {\left(\mathsf{\Theta}\right)}^{2}\right]-E{\left[\mu \left(\mathsf{\Theta}\right)\right]}^{2}={\sigma}_{S}^{2}+{\sigma}_{P}^{2},$$

In this paper, we are interested in both the estimation of the point forecast $\alpha =E\left[\mu \right(\mathsf{\Theta}\left)\right]$ and the estimators of the variance components of every observations generated in the algorithm of Figure 2 and defined in (3), thus we first consider the natural point estimators
where ${\widehat{\alpha}}_{i}={m}^{-1}{\sum}_{j=1}^{m}{W}_{ij}$, and ${S}_{i}^{2}={(m-1)}^{-1}{\sum}_{j=1}^{m}{({W}_{ij}-{\widehat{\alpha}}_{i})}^{2}$, $i=1,...,m$. Note that the ${\widehat{\alpha}}_{i}$’s are i.i.d. with expectation $E\left[{\widehat{\alpha}}_{1}\right]=\alpha $ and variance
On the other hand, the ${S}_{i}^{2}$ are i.i.d. with expectation $E\left[{S}_{1}^{2}\right]={\sigma}_{S}^{2}$. Thus, the next proposition follows from the classical LLN .

$$\widehat{\alpha}\left(n\right)=\frac{1}{n}\sum _{i=1}^{n}{\widehat{\alpha}}_{i},{\widehat{\sigma}}_{T}^{2}\left(n\right)=\frac{1}{n-1}\sum _{i=1}^{n}{({\widehat{\alpha}}_{i}-\widehat{\alpha}\left(n\right))}^{2},{\widehat{\sigma}}_{S}^{2}\left(n\right)=\frac{1}{n}\sum _{i=1}^{n}{S}_{i}^{2},$$

$$\begin{array}{ccc}\hfill {\sigma}_{T}^{2}& \stackrel{def}{=}& E\left[{({\widehat{\alpha}}_{1}-\alpha )}^{2}\right]={m}^{-2}(mE\left[{({W}_{11}-\alpha )}^{2}\right]+m(m-1)E\left[({W}_{11}-\alpha )({W}_{12}-\alpha )\right])\hfill \\ & =& {m}^{-1}({\sigma}_{S}^{2}+{\sigma}_{P}^{2})+{m}^{-1}(m-1){\sigma}_{S}^{2}={\sigma}_{S}^{2}+{m}^{-1}{\sigma}_{P}^{2}.\hfill \end{array}$$

Given $m\ge 1$, if $E\left[{W}_{11}^{2}\right]<\infty $ then $\widehat{\alpha}\left(n\right)$ and ${\widehat{\sigma}}_{T}^{2}\left(n\right)$ are unbiased and consistent (as $n\to \infty $) estimators for α and ${\sigma}_{T}^{2}$ (as defined in (5)), respectively. Furthermore, if $m\ge 2$ and $E\left[{W}_{11}^{2}\right]<\infty $, then ${\widehat{\sigma}}_{S}^{2}\left(n\right)$ is an unbiased and consistent (as $n\to \infty $) estimator for ${\sigma}_{S}^{2}$ (as defined in (3)).

As we established in Proposition 1, under mild assumptions the point estimators proposed in (4) are consistent, and thus converge to the corresponding parameter value (as $n\to \infty $). Nonetheless, to establish the level of accuracy of these estimators, we must establish a CLT for each estimator to derive a valid expression for the corresponding ACI. Note that both $\widehat{\alpha}\left(n\right)$ and ${\widehat{\sigma}}_{S}^{2}\left(n\right)$ are averages of i.i.d observations, thus the next proposition follows from the classical CLT for i.i.d. observations.

Given $m\ge 1$, if $E\left[{W}_{11}^{2}\right]<\infty $ then

$$\frac{\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )}{{\sigma}_{T}}\Rightarrow N(0,1),$$

as $n\to \infty $. Furthermore, if $m\ge 2$ and $E\left[{W}_{11}^{4}\right]<\infty $, then

$$\frac{\sqrt{n}({\widehat{\sigma}}_{S}^{2}\left(n\right)-{\sigma}_{S}^{2})}{\sqrt{{V}_{S}}}\Rightarrow N(0,1),$$

Since we have consistent estimators for ${\sigma}_{T}^{2}$ and ${V}_{S}$ (under mild assumptions), the next corollary follows from Proposition 1 and Slutsky’s Theorem, details of a proof are given in the Appendix.

Under the same notation and assumptions as in Proposition 2, for $m\ge 1$ we have

$$\frac{\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )}{\sqrt{{\widehat{\sigma}}_{T}^{2}\left(n\right)}}\Rightarrow N(0,1),$$

as $n\to \infty $, and for $m\ge 2$ we have

$$\frac{\sqrt{n}({\widehat{\sigma}}_{S}^{2}\left(n\right)-{\sigma}_{S}^{2})}{\sqrt{{\widehat{V}}_{S}\left(n\right)}}\Rightarrow N(0,1),$$

as $n\to \infty $, where ${\widehat{\sigma}}_{S}^{2}\left(n\right)$ and ${\widehat{\sigma}}_{T}^{2}\left(n\right)$ are defined in (4), and

$$\widehat{{V}_{s}}\left(n\right)=\frac{1}{n-1}\sum _{i=1}^{n}{({S}_{i}^{2}-{S}^{2})}^{2},{S}^{2}=\frac{1}{n}\sum _{i=1}^{n}{S}_{i}^{2}$$

In order to obtain a CLT for ${\widehat{\sigma}}_{T}^{2}\left(n\right)$, note that this estimator is the sample variance of a set of i.i.d. observations, thus we can use the following Lemma. A proof using the Delta Method (see, e.g., Proposition 2 of [9] for a proof) is provided in the Appendix.

If ${X}_{1},{X}_{2},...$ is a sequence of i.i.d. random variables with $E\left[{X}_{1}^{4}\right]<\infty $, then

$$\frac{\sqrt{n}({S}^{2}\left(n\right)-{\sigma}_{1}^{2})}{\sqrt{{\sigma}_{2}^{2}}}\Rightarrow N(0,1),$$

as $n\to \infty $, where ${\sigma}_{1}^{2}={\mu}_{2}-{\mu}_{1}^{2}$, ${\sigma}_{2}^{2}={\mu}_{1}^{2}{\mu}_{2}-4{\mu}_{1}^{4}-4{\mu}_{1}{\mu}_{3}+{\mu}_{4}-{\mu}_{2}^{2}$, ${\mu}_{k}=E\left[{X}_{1}^{k}\right]$, $k=1,2,3,4$; ${S}^{2}\left(n\right)={(n-1)}^{-1}{\sum}_{i=1}^{n}{({X}_{i}-{\widehat{\mu}}_{1})}^{2}$, ${\widehat{\mu}}_{1}={n}^{-1}{\sum}_{i=1}^{n}{X}_{i}$.

Under the same assumptions as in Lemma 1 we have

$$\frac{\sqrt{n}({S}^{2}\left(n\right)-{\sigma}_{S}^{2})}{\sqrt{{\widehat{\sigma}}_{2}^{2}\left(n\right)}}\Rightarrow N(0,1),$$

as $n\to \infty $, where ${\widehat{\sigma}}_{2}^{2}=8{\widehat{\mu}}_{1}^{2}{\widehat{\mu}}_{2}-4{\widehat{\mu}}_{1}^{4}-4{\widehat{\mu}}_{1}{\widehat{\mu}}_{3}+{\widehat{\mu}}_{4}-{\widehat{\mu}}_{2}^{2}$, ${\widehat{\mu}}_{k}={n}^{-1}{\sum}_{i=1}^{n}{X}_{i}^{k}$.

Corollary 2 follows from the fact that ${\widehat{\mu}}_{k}$ is an unbiased and consistent estimator of ${\mu}_{k}$, and the next corollary follows from the fact that ${\widehat{\sigma}}_{T}^{2}\left(n\right)$ is the sample variance of the ${\widehat{\alpha}}_{i}$.

Given $m\ge 1$, if $E\left[{W}_{11}^{4}\right]<\infty $ then

$$\frac{\sqrt{n}({\widehat{\sigma}}_{T}^{2}\left(n\right)-{\sigma}_{T}^{2})}{\sqrt{{\widehat{V}}_{T}\left(n\right)}}\Rightarrow N(0,1),$$

as $n\to \infty $, where ${\widehat{V}}_{T}\left(n\right)=8{\overline{\alpha}}_{1}^{2}{\overline{\alpha}}_{2}-4{\overline{\alpha}}_{1}^{4}-4{\overline{\alpha}}_{1}{\overline{\alpha}}_{3}+{\overline{\alpha}}_{4}-{\overline{\alpha}}_{2}^{2}$, ${\overline{\alpha}}_{k}={n}^{-1}{\sum}_{i=1}^{n}{\widehat{\alpha}}_{i}^{k}$.

Let $0<\beta <1$, and using corollaries 1 and 3 we can establish a $100(1-\beta )\%$ ACI for the point forecast $\alpha $, and variance components ${\sigma}_{S}^{2}$ and ${\sigma}_{T}^{2}={\sigma}_{S}^{2}+{m}^{-1}{\sigma}_{P}^{2}$; each ACI is centered in the corresponding point estimator ($\widehat{\alpha}\left(n\right)$, ${\widehat{\sigma}}_{S}^{2}\left(n\right)$ or ${\widehat{\sigma}}_{T}^{2}\left(n\right)$) and the corresponding halfwidth is given by:
for $\alpha $, ${\sigma}_{2}^{2}$ and ${\sigma}_{T}^{2}$, respectively, where ${\widehat{\sigma}}_{T}^{2}\left(n\right)$ is defined in (4), ${\widehat{V}}_{S}\left(n\right)$ and ${\widehat{V}}_{T}\left(n\right)$ are defined in Corollary 1, and in Corollary 3, respectively.

$$H{W}_{\alpha}={z}_{\beta}\frac{\sqrt{{\widehat{\sigma}}_{T}^{2}\left(n\right)}}{\sqrt{n}},H{W}_{{\sigma}_{S}^{2}}={z}_{\beta}\frac{\sqrt{{\widehat{V}}_{S}\left(n\right)}}{\sqrt{n}},\mathit{and}\phantom{\rule{4pt}{0ex}}H{W}_{{\sigma}_{T}^{2}}={z}_{\beta}\frac{\sqrt{{\widehat{V}}_{T}\left(n\right)}}{\sqrt{n}},$$

Note that the ACIs proposed in (6) assume that the value of m in the algorithm of Figure 2 is fixed and the accuracy of the estimator improves as n (the number of observations in the outer level) increases (in turn, the halfwidth of the ACI gets smaller). Given that we can build a valid ACI for any value of m, a relevant question is how to find an adequate value of m to get an acceptable level of accuracy in a reasonable amount of running time. In order to answer this question for the case of the point estimator of $\alpha $, let us fix the total number of iterations in the algorithm of Figure 2 to k = $nm$, and note from (5) and Proposition 2 that the asymptotic variance of $(\widehat{\alpha}\left(n\right)-\alpha )$ is
and takes its minimal value for $m=1$, suggesting that the point estimator $\widehat{\alpha}\left(n\right)$ defined in (4) is more accurate as m approaches the value of 1. Note that for $m=1$, a fixed number of iterations $k=nm$ is convenient (from the point of view of running time), when the computation of ${W}_{ij}$ requires the same or more computation time as ${\mathsf{\Theta}}_{i}$, as suggested in the relevant literature (see, for example, [6]). Furthermore, if we allow m to increase with n, we can obtain the following proposition (a proof using Lindeberg-Feller Theorem is provided in the Appendix).

$${n}^{-1}{\sigma}_{T}^{2}={k}^{-1}(m{\sigma}_{S}^{2}+{\sigma}_{P}^{2}),$$

Given $0<p\le 1$, if $m=*{n}^{-1+1/p}$ and $E\left[{W}_{11}^{2}\right]<\infty $ then

$$\frac{\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )}{\sqrt{{\sigma}_{T}^{2}}}\Rightarrow N(0,1),$$

as $n\to \infty $, where ${\sigma}_{T}^{2}$ is defined in (5).

Note that the last proposition implies that the ACI defined in equation (6) for the point forecast $\alpha $ is also valid under the assumptions of Proposition 3. If, once again, we set the total number of iterations in the algorithm of Figure 2 to $k=nm$, we let $n\approx {k}^{p}$, $m\approx {k}^{1-p}$, and nm = k, it follows from Proposition 3 that the asymptotic variance of $\widehat{\alpha}\left(n\right)$ is ${n}^{-1}{\sigma}_{T}^{2}\approx {k}^{-p}({k}^{1-p}{\sigma}_{S}^{2}+{\sigma}_{P}^{2})$ for every $0\le p\le 1$. Note that, for fixed k, ${n}^{-1}{\sigma}_{T}^{2}$ reaches its minimum value when $p=1$, that is, when $n=k$ and $m=1$. However, note that we need $m\ge 2$ in order to estimate ${\sigma}_{S}^{2}$. In the following section we report some empirical results that confirm our theoretical results. It is worth mentioning that the case $n=k$ and $m=1$ has been reported in the literature as the posterior sampling algorithm (see, e.g., [12,13])

The following model (reported in [10]) has been proposed to forecast sporadic demand by incorporating data on times between arrivals and customer demand; where uncertainty on the model parameters is incorporated using a Bayesian approach. For this model, we will show analytical expressions for the performance measures defined in Section 2. These expressions are used in the following section to illustrate the validity of the ACIs proposed in the previous section.

Customer arrivals for a particular item in a shop follow a Poisson process, yet there is uncertainty in the arrival rate ${\mathsf{\Theta}}_{0}$, so that given $[{\mathsf{\Theta}}_{0}={\theta}_{0}]$, interarrival times between customers are i.i.d. with exponential density:
where ${\theta}_{0}\in {S}_{00}=(0,\infty )$. Every client can order j units of this item with probability ${\mathsf{\Theta}}_{1j}$, $j=1,...,q$, $q\ge 2$. Let ${\mathsf{\Theta}}_{1}=({\mathsf{\Theta}}_{11},...{\mathsf{\Theta}}_{1(q-1)})$ and ${\mathsf{\Theta}}_{1q}=1-{\sum}_{j=1}^{q-1}{\mathsf{\Theta}}_{1j}$, then $\mathsf{\Theta}=({\mathsf{\Theta}}_{0},{\mathsf{\Theta}}_{1})$ is the parameter vector, and ${S}_{0}={S}_{00}\u2a02{S}_{01}$ is the parameter space, where ${S}_{01}=\{({\theta}_{11},...,{\theta}_{1(q-1)}):{\sum}_{j=1}^{q-1}{\theta}_{1j}\le 1;{\theta}_{1j}\ge 0,j=1,...,q-1\}$.

$$f\left(y\right|{\theta}_{0})=\left(\right)open="\{"\; close>\begin{array}{cc}\hfill {\theta}_{0}{e}^{-{\theta}_{0}y},& \hfill y0,\\ \hfill 0,& \hfill \mathrm{otherwise},\end{array}$$

Total demand during a period of length T is
where $N\left(s\right)$ is the number of customer arrivals during the interval $[0,s]$, $s\ge 0$, and ${U}_{1},{U}_{2},...$ are the individual demands (conditionally independent relative to $\mathsf{\Theta}$). The information about $\mathsf{\Theta}$ consists of i.i.d. observations $v=({v}_{1},...,{v}_{r})$, $u=({u}_{1},...,{u}_{r})$ of past customers, where ${v}_{i}$ is the interarrival time between customer i and customer ($i-1$), and ${u}_{i}$ is the number of units ordered by client i. By taking Jeffrey’s non-informative prior as the prior density for $\mathsf{\Theta}$, we obtain the posterior density (see [10] for details) $p\left(\theta \right|x)=p\left({\theta}_{0}\right|v)p\left({\theta}_{1}\right|u)$, where ${x}_{i}=({v}_{i},{u}_{i})$, $i=1,...,r$ , $x=({x}_{1},...,{x}_{r})$, $\theta =({\theta}_{0},{\theta}_{1})$ as
where ${c}_{j}={\sum}_{i=1}^{r}I[{u}_{i}=j]$, and $B({a}_{1},...{a}_{q})={\mathsf{\Pi}}_{j=1}^{q}\Gamma \left({a}_{j}\right)/\Gamma \left({\sum}_{j=1}^{q}{a}_{j}\right)$, for ${a}_{1},...,{a}_{q}>0$. Using this notation, we can show that (see [1] for details)
where $E\left[T{\mathsf{\Theta}}_{0}\right]=Tr{\left({\sum}_{i=1}^{r}{v}_{i}\right)}^{-1}$, $E\left[{T}^{2}{\mathsf{\Theta}}_{o}^{2}\right]={T}^{2}r(1+r){\left({\sum}_{i=1}^{r}{v}_{i}\right)}^{-2}$, ${p}_{j}={q}_{j}/{q}_{0}$, ${q}_{j}={c}_{j}+1/2$, $j=1,...,q$, ${q}_{0}={\sum}_{j=1}^{q}{q}_{j}$, and ${c}_{j}$ are defined in (10).

$$D=\left(\right)open="\{"\; close>\begin{array}{cc}\hfill \sum _{i=1}^{N\left(T\right)}{U}_{i},& \hfill N\left(T\right)0\\ \hfill 0,& \hfill \mathrm{otherwise},\end{array}$$

$$p\left({\theta}_{0}\right|v)=\frac{{\theta}_{0}^{r-1}{\left({\sum}_{i=1}^{r}{v}_{i}\right)}^{r}{e}^{-{\theta}_{0}{\sum}_{i=1}^{r}{v}_{i}}}{(r-1)!},p\left({\theta}_{1}\right|u)=\frac{{(1-{\sum}_{j=1}^{q-1}{\theta}_{1j})}^{{c}_{q}-1/2}{\mathsf{\Pi}}_{j=1}^{q-1}{\theta}_{1j}^{{c}_{j}-1/2}}{B({c}_{1}+1/2,...,{c}_{q}+1/2)},$$

$$\alpha =E\left[T{\mathsf{\Theta}}_{0}\right]\sum _{j=1}^{q}j{p}_{j},$$

$${\sigma}_{P}^{2}=\frac{E\left[{T}^{2}{\mathsf{\Theta}}_{0}^{2}\right]}{({q}_{0}+1)}\sum _{j=1}^{q}{j}^{2}{p}_{j}+\frac{E{\left[T{\mathsf{\Theta}}_{0}\right]}^{2}[({q}_{0}/n)-1]}{({q}_{0}+1)}{\left(\sum _{j=1}^{q}j{p}_{j}\right)}^{2},$$

$${\sigma}_{S}^{2}=E\left[T{\mathsf{\Theta}}_{0}\right]\sum _{j=1}^{q}{j}^{2}{p}_{j},$$

To validate the ACIs proposed in (4), we conducted some experiments with the Bayesian model of the previous section to illustrate the estimation of $\alpha $, ${\sigma}_{S}^{2}$ and ${\sigma}_{T}^{2}$. We considered the values $T=15$, $r=20$, ${\sum}_{i=1}^{r}{x}_{i}=10$, $q=5$, ${c}_{1}=5$, ${c}_{2}=3$, ${c}_{3}=2$, ${c}_{4}=3$, ${c}_{5}=7$. With this data, the point forecast is $\alpha \approx 95.333$, and the variance components are ${\sigma}_{S}^{2}\approx 380.667$, ${\sigma}_{P}^{2}\approx 568.598$. The empirical results that we report below illustrate a typical behavior that we should experiment for any other feasible data set.

In all the experiments reported in this Section we considered 1000 independent replications of the algorithm of Figure 2 for different number of observations in the outer level (n) and in the inner level (m); in each replication we computed the point estimators for $\alpha $, ${\sigma}_{S}^{2}$, and ${\sigma}_{T}^{2}$, and the corresponding halfwidths of 90% ACI’s according to equations (6). Since we know the value of the parameters we are estimating, we were able to report (for n and m given), the empirical coverage (i.e., the fraction of independent replications that the corresponding ACI covered the true parameter value), the average and standard deviation of halfwidths, and the squared root of the empirical mean squared error defined by
where ${\widehat{\theta}}_{i}$ denotes the value obtained in the i-th replication for the estimation of a parameter $\theta ,i=1,2,\cdots ,{n}_{0}$ (${n}_{0}=1000$ in our experiments).

$$RMSE=\sqrt{\frac{1}{{n}_{0}}{\displaystyle \sum _{i=1}^{{n}_{0}}}{({\widehat{\theta}}_{i}-\theta )}^{2}},$$

In a first set of experiments we considered $nm=200,2000,20000$, and $m=2,4,8$ for each value of $nm$, to compare the effect of increasing the number of observations in the inner level for a given value value of $nm$. The results of this set of experiments are summarized in Figure 3, Figure 4 and Figure 5. Note that we are not considering $m=1$ in this set of experiments to be able to construct an ACI for the stochastic variance ${\sigma}_{S}^{2}$.

In Figure 3 we illustrate the performance measures for the quality of the estimation procedure that we obtained for the estimation of the point forecast $\alpha $. As we observe from Figure 3, the coverages are acceptable (very close to the nominal value of 0.9, even for $n=100$). These results validate the ACI defined in (6) for the point forecast $\alpha $. We also observe from Figure 3 that the RMSE, average halfwidth and standard deviation of halwidths improve (decrease) as the number of observations in the outer level (n) increases, as suggested by Corollary 1. Note also from Figure 3 that a smaller value of m provides smaller RMSE, average halfwidths and standard deviations of halwidths, validating our theoretical results.

In Figure 4 we illustrate the performance measures for the quality of the estimation procedure that we obtained for the estimation of the stochastic variance ${\sigma}_{S}^{2}$. As we observe from Figure 4, the coverages are acceptable (very close to the nominal value of 0.9, even for $n=100$). These results validate the ACI defined in (6) for the stochastic variance ${\sigma}_{S}^{2}$. We also observe from Figure 3 that the RMSE, average halfwidth and standard deviation of halwidths improve (decrease) as the number of observations in the outer level (n) increases, as suggested by Corollary 2. However, contrary to what we observe for the estimation of $\alpha $, a larger value of m provides smaller RMSE, average halfwidths and standard deviations of halwidths, suggesting that, for a fixed value of $nm$, the quality of the estimation for the stochastic variance ${\sigma}_{S}^{2}$ improves as the number of the observations in the inner loop (m) increases.

For the estimation of the total variance ${\sigma}_{T}^{2}$ (illustrated n Figure 5) we obtained similar results for the quality of the estimation as for the estimation of the point forecast $\alpha $, except that a larger values of n is required to obtain reliable coverages. As we observe from Figure 3, the coverages are acceptable (very close to the nominal value of 0.9, for $n=1000$ and 10000). These results validate the ACI defined in (6) for the total variance ${\sigma}_{T}^{2}$. We also observe from Figure 3 that the RMSE, average halfwidth and standard deviation of halwidths improve (decrease) as the number of observations in the outer level (n) increases, as suggested by Corollary 3. Note also from Figure 5 that a smaller value of m provides smaller RMSE, average halfwidths and standard deviations of halwidths, validating our theoretical results.

In a second set of experiments we considered $nm=100,1000,10000$, with $m=1$ and $m\approx {\left(nm\right)}^{1/3}$ for each value of $nm$, to compare the quality of the estimation procedures using the value of m that we suggest as optimal for the estimation of point forecast $\alpha $ with the value of m suggested in [6] as an adequate choice for m in the case of biased estimators in the inner level of the algorithm of Figure 2. The results of this set of experiments are summarized in Figure 6 and Figure 7. Note that we are not considering the estimation of the stochastic variance ${\sigma}_{S}^{2}$ in this set of experiments because $m\ge 2$ is required to construct an ACI for the stochastic variance ${\sigma}_{S}^{2}$. Note also that we considered ${100}^{1/3}\approx 5$, ${1000}^{1/3}\approx 10$, and ${10000}^{1/3}\approx 20$, and we are using the same color for $m=5,10,20$ in Figure 6 and Figure 7.

In Figure 6 we illustrate the performance measures for the quality of the estimation procedure that we obtained for the estimation of the point forecast $\alpha $ in our second set of experiments . As we observe from Figure 6, the coverages are acceptable (very close to the nominal value of 0.9, even for $n=100$). These results validate the ACI defined in (6) for the point forecast $\alpha $, and the ACI suggested by Proposition 3. We also observe from Figure 6 that the RMSE, average halfwidth and standard deviation of halwidths are worse for $m\approx {\left(nm\right)}^{1/3}$, confirming our finding that, for the same number of replications $nm$, $m=1$ produces better point estimators for $\alpha $ than $m\approx {\left(nm\right)}^{1/3}$ confirming the result of Proposition 3.

Finally, in Figure 7 we show the results of our second set of experiments for the estimation of the total variance ${\sigma}_{T}^{2}$. We found similar resulta as for the case of the estimation of the point forecast $\alpha $, coverages are very good (even for n = 100), and all performance measure for the ACI (RMSE, average and standard deviation of halfwidths) are worse for $m\approx {\left(nm\right)}^{1/3}$, suggesting that, for the same number of replications $nm$, $m=1$ produces better point estimators for ${\sigma}_{T}^{2}$ than $m\approx {\left(nm\right)}^{1/3}$.

In this paper, we propose methodologies to calculate point estimators (and their corresponding halfwidths), for both the point forecast and the variance components in two-level nested stochastic simulation experiments, for the case where the observations at both levels of the algorithm are independent. These methods can be applied to the construction of Bayesian forecasts based on experiments using a simulation model under parameter uncertainty.

Both our theoretical and our experimental results confirm that the proposed point estimators and their corresponding halfwidths are asymptotically valid, i.e., the point estimators converge to the corresponding parameter values and the halfwidths converge to the nominal coverage as the number of replications (n) of the outer level increases.

Furthermore, given a fixed number of total observations ($nm$), we show that the choice of only one replication in the inner level ($m=1$) provides more accurate estimators for both the point forecast ($\alpha $), and the variance of the point forecast (${\sigma}_{T}^{2}$). However, $m\ge 2$ is required for the estimation of ${\sigma}_{S}^{2}$.

Directions for future research on this topic includes experimentation with other point estimators, such as, quasi Monte Carlo or Simpson integration, with the objective of finding more accurate point estimators for the parameters considered in this paper.

This research was supported by the Asociación Mexicana de Cultura A.C. and the National Council of Science of Technology of Mexico under Award Number 1200/158/2022.

The raw data corresponding to our experiments are available in the repository of AppliedMath.

The Author declare no conflict of interest.

For completeness, we first write three well known theorems. Proofs of Theorem A1 and Theorem A2 can be found, e.g., in [5], and a proof of Theorem A3 can be found, e.g., in [9]. In what follows, we write ⇒ for weak converge (as $n\to \infty $ without explicit mention).

- (i)
- ${X}_{n}+{Y}_{n}\Rightarrow X+c$
- (ii)
- ${X}_{n}{Y}_{n}\Rightarrow cX$
- (iii)
- ${X}_{n}/{Y}_{n}\Rightarrow X/c$, if $c\ne 0$

$$\sqrt{n}[\overline{Y}\left(n\right)-\mu ]\Rightarrow G{N}_{k}(0,1)$$

is satisfied, where $\overline{Y}\left(n\right)={n}^{-1}{\sum}_{i=1}^{m}{Y}_{i}$, and ${N}_{k}(0,I)$ denotes a (k -variate) normal distribution with mean 0 and variance I (the identity), then

$$\sqrt{n}[g\left(\overline{Y}\left(n\right)\right)-g\left(\mu \right)]\Rightarrow \sigma N(0,1),$$

where $\sigma =\sqrt{\nabla g{\left(\mu \right)}^{T}G{G}^{T}\nabla g\left(\mu \right))}$.

Since $\widehat{{\alpha}_{1}},\widehat{{\alpha}_{2}},...$ are i.i.d. with $E\left[{\widehat{\alpha}}_{1}^{2}\right]<\infty $, it follows from the Law of Large Numbers that ${n}^{-1}({\sum}_{i=1}^{n}{\widehat{\alpha}}_{i}^{2},{\sum}_{i=1}^{n}{\widehat{\alpha}}_{i})\Rightarrow (E\left[{\widehat{\alpha}}_{1}^{2}\right],E\left[{\widehat{\alpha}}_{1}\right])$. Therefore, by taking $g({x}_{1},{x}_{2})=\sqrt{{x}_{1}-{x}_{2}^{2}},$ for ${x}_{1}-{x}_{2}^{2}\ge 0$ in Theorem A2, we have $\sqrt{(n-1){\widehat{\sigma}}_{T}^{2}\left(n\right)/n}\Rightarrow \sqrt{{\sigma}_{T}^{2}}$, so that ${Y}_{n}=\sqrt{{\widehat{\sigma}}_{T}^{2}\left(n\right)}/\sqrt{{\sigma}_{T}^{2}}\Rightarrow 1$. Finally, by taking ${X}_{n}=\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )/\sqrt{{\sigma}_{T}^{2}}$ in Theorem A1, it follows from Proposition 2 that

$$\frac{\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )}{\sqrt{{\widehat{\sigma}}_{T}^{2}\left(n\right)}}\Rightarrow N(0,1)$$

Similarly, since ${S}_{1}^{2},{S}_{2}^{2},...$ are i.i.d. with $E\left[{S}_{1}^{4}\right]<\infty $, it follows from Theorem A1 and Proposition 2 that

$$\frac{\sqrt{n}({\widehat{\sigma}}_{S}^{2}\left(n\right)-{\sigma}_{S}^{2})}{\sqrt{{\widehat{V}}_{S}\left(n\right)}}\Rightarrow N(0,1).$$

□

Let $k=2$, ${Y}_{i}=({X}_{i},{X}_{i}^{2})$, $\mu =({\mu}_{1},{\mu}_{2})$, then the TLC of Theorem A3 is satisfied for

$$G{G}^{T}=\left[\begin{array}{cc}{\mu}_{2}-{\mu}_{1}^{2}& {\mu}_{3}-{\mu}_{1}{\mu}_{2}\\ {\mu}_{3}-{\mu}_{1}{\mu}_{2}& {\mu}_{4}-{\mu}_{2}^{2}\end{array}\right]$$

By taking $g\left(\mu \right)={\mu}_{2}-{\mu}_{1}^{2}={\sigma}_{1}^{2}$, we have $g\left(\overline{Y}\left(n\right)\right)=(n-1){S}^{2}\left(n\right)/n$, $\nabla g{\left(\mu \right)}^{T}=(-2{\mu}_{1},1)$, and

$$\nabla f{\left(\mu \right)}^{T}G{G}^{T}\nabla f\left(\mu \right)=8{\mu}_{1}^{2}{\mu}_{2}-4{\mu}_{1}^{4}-4{\mu}_{1}{\mu}_{3}+{\mu}_{4}-{\mu}_{2}^{2}={\sigma}_{2}^{2}.$$

Then, it follows from Theorem A3 that
and the final conclusion follows from Theorem A1. □

$$\sqrt{n}\lfloor (n-1){S}^{2}\left(n\right)/n-{\sigma}_{1}^{2}\rfloor \Rightarrow {\sigma}_{2}N(0,1),$$

In this proof we follow the notation of Lindeberg-Feller Theorem as in Theorem 7.2.1 of [3].

For n = 1, 2, ..., let ${m}_{n}=\lfloor {n}^{-1+1/p}\rfloor $ and ${\alpha}_{j}\left(n\right)=\left({\sum}_{i=1}^{{m}_{n}}{W}_{ij}\right)/{m}_{n},j=1,...,n$. Then ${\alpha}_{1}\left(n\right),{\alpha}_{2}\left(n\right),...,{\alpha}_{n}\left(n\right)$ are independent, and for ${X}_{nj}=({\alpha}_{j}\left(n\right)-\alpha )/\sqrt{n{\sigma}_{T}^{2}}$ we also have that ${X}_{n1},{X}_{n2},...{X}_{nn}$ are independent.

Then if ${Y}_{nj}=({\alpha}_{j}\left(n\right)-\alpha )/{\sigma}_{T}$, we have $E\left[{Y}_{nj}\right]=0$ and $E\left[{Y}_{nj}^{2}\right]=1$, so that given $\u03f5>0$ there exists ${\eta}_{0}>0$ such that ${\int}_{\left|y\right|<{\eta}_{0}}{y}^{2}dF{y}_{nj}\left(y\right)<\u03f5$.

Therefore, given $\eta >0$, for $n\ge max\{1,{({\eta}_{0}/\eta )}^{2}\}$ we have
so that (1) of Theorem 7.2.1 of [14] is satisfied, and it follows from this Theorem that ${S}_{n}\Rightarrow N(0,1)$, where

$$\sum _{j=1}^{n}{\int}_{\left|x\right|<\eta}{x}^{2}dF{x}_{nj}\left(x\right)\le \sum _{j=1}^{n}\frac{1}{n}{\int}_{\left|y\right|<{\eta}_{0}}{y}^{2}dF{y}_{nj}\left(y\right)<\u03f5,$$

$${S}_{n}=\sum _{j=1}^{n}{X}_{nj}=\frac{\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )}{\sqrt{{\sigma}_{T}^{2}}}.$$

□

- Muñoz, D.F. Simulation output analysis for risk assessment and mitigation. In Multi-Criteria Decision Analysis for Risk Assessment and Management; Ren, J., Ed.; Springer: Heidelberg, Germany, 2021. [Google Scholar]
- Smith, J.S.; Sturrock, D.T. Simio and Simulation: Modeling, Analysis, Aplications, 6th ed.; Simio LLC: Sewickley, Pennsylvania, 2022. [Google Scholar]
- Chung, K.L. A Course in Probability Theory; Academic Press: Cambridge, Massachusetts, 2001. [Google Scholar]
- Asmussen, S.; Glynn, P.W. Stochastic Simulation Algorithms and Analysis; Springer: Heidelberg, Germany, 2007. [Google Scholar]
- Serfling, R.J. Approximation Theorems of Mathematical Statistics; John Wiley & Sons: Hoboken, New Jersey, 2009. [Google Scholar]
- Andradóttir, S.; Glynn, P.W. Computing bayesian means using simulation. ACM TOMACS
**2016**, 26, paper–10. [Google Scholar] [CrossRef] - L’Ecuyer, P. Quasi-Monte Carlo methods with applications in finance. Finance and Stochastics
**2009**, 13, 307–349. [Google Scholar] [CrossRef] - Zouaoui, F. , Wilson J.R. Accounting for parameter uncertainty in simulation input modeling. IIE Transactions
**2003**, 35, 781–792. [Google Scholar] [CrossRef] - Muñoz, D.F.; Glynn, P.W. A batch means methodology for estimation of a nonlinear function of a steady-state mean. Manag Sci
**1997**, 43, 1121–1135. [Google Scholar] [CrossRef] - Muñoz, D.F.; Muñoz, D.F. Bayesian forecasting of spare parts using simulation. In Service Parts Management: Demand Forecasting and Inventory Control; Altay, N., Litteral, L.A., Eds.; Springer: Heidelberg, Germany, 2011. [Google Scholar]
- Muñoz, D.F. Estimation of expectations in two-level nested simulation experiments. In Proceedings of the Name of the 29th European Modeling and Simulation Symposium, Barcelona, Spain, 18–20 Sep 2017; 233-238. [Google Scholar]
- Russo, D.; Van Roy, B. Learning to optimize via posterior sampling. Mathematics of Operations Research
**2014**, 39, 1221–1243. [Google Scholar] [CrossRef] - Muñoz, D.F.; Muñoz, D.F.; Ramírez-López, A. On the incorporation of parameter uncertainty for inventory management using simulation. International Transactions in Operational Research
**2013**, 20, 493–513. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Submitted:

08 May 2023

Posted:

09 May 2023

You are already at the latest version

Alerts

A peer-reviewed article of this preprint also exists.

David Fernando Muñoz

David Fernando Muñoz

This version is not peer-reviewed

Submitted:

08 May 2023

Posted:

09 May 2023

You are already at the latest version

Alerts

When there is uncertainty in the value of parameters of the input random components of a stochastic simulation model, two-level nested simulation algorithms are used to estimate the expectation of performance variables of interest. In the outer level of the algorithm (n) observations are generated for the parameters, and in the inner level (m) observations of the simulation model are generated with the value of parameters fixed at the value generated in the outer level. In this article, we consider the case in which the observations at both levels of the algorithm are independent, showing how the variance of the observations can be decomposed into the sum of a parametric variance and a stochastic variance. Next, we derive central limit theorems that allow us to compute asymptotic confidence intervals to assess the accuracy of the simulation-based estimators for the point forecast and the variance components. Under this framework, we derive analytical expressions for the point forecast and the variance components of a Bayesian model to forecast sporadic demand; and we use these expressions to illustrate the validity of our theoretical results by performing simulation experiments using this forecast model.

Keywords:

Subject: Computer Science and Mathematics - Computational Mathematics

Simulation is widely recognized as an effective technique to produce forecasts, evaluate risk (see, e.g., [1]), animate and illustrate the performance of a system over time (see, e.g., [2]). When there is uncertainty in a component of a simulation model, it is said to be a random component, and it is modeled using a probability distribution and/or a stochastic process that is generated during the simulation run, to produce a stochastic simulation. Random component typically depends on the value of certain parameters, and we will denote by $\theta $ a particular value for the vector of parameters of the random components of a stochastic simulation, and $\mathsf{\Theta}$ will denote the random vector that corresponds to the parameter values when there is uncertainty on the value of these parameters.

In general, the output of a stochastic (dynamic) simulation can be regarded as a stochastic process $\left\{Y\right(s):s\ge 0;\mathsf{\Theta}\}$, where $Y\left(s\right)$ is a random vector (of arbitrary dimension d) representing the state of the simulation at time $s\ge 0$. The term transient simulation applies to a dynamic simulation that has a well-defined termination time, so that the output of a transient simulation can be viewed as a stochastic process $\left\{Y\right(s):0\le s\le T;\mathsf{\Theta}\}$, where T is a stopping time (may be deterministic), see, e.g., [3] for a definition of stopping time. Note that this notation includes the case of a discrete-time output ${Z}_{0},{Z}_{1},\cdots $, if we assume that $Y\left(s\right)={Z}_{\lfloor s\rfloor}$ , where $\lfloor s\rfloor $ denotes the integer part of s.

A performance variable W in transient simulation is a real-valued random variable (r.v.) that depends on the simulation output up to time T, i.e., $W=f\left(Y\right(s),0\le s\ge T;\mathsf{\Theta})$, and the expectation of a performance variable W is a performance measure that we usually estimate through experimentation with the simulation model. When there is no uncertainty in the parameters of the random components, the standard methodology that is used to estimate a performance measure in transient simulation is the method of independent replications, that consists on running the simulation model to produce n replications ${W}_{1},{W}_{2},\cdots ,{W}_{n}$ that can be regarded as independent and identically distributed (i.i.d.) random variables (see Figure 1) .

Under the method of independent replications, a point estimator for the expectation $\alpha =E\left[{W}_{1}\right]$ is the average $\widehat{\alpha}\left(n\right)=\frac{{\sum}_{i=1}^{n}{W}_{i}}{n}$. If $E\left[\right|{W}_{1}\left|\right]<\infty $, it follows from the classical Law of Large Numbers (LLN), that $\widehat{\alpha}\left(n\right)$ is consistent, i.e., it satisfies $\widehat{\alpha}\left(n\right)\Rightarrow \alpha $ , as $n\to \infty $ (where ⇒ denotes weak convergence of random variables), see, e.g., [3] for a proof. Consistency guarantees that the estimator approaches the parameter as the number of replications n increases, and the accuracy of the simulation-based estimator $\widehat{\alpha}\left(n\right)$ is typically assessed by an asymptotic confidence interval (ACI) for the parameter. The expression for an ACI for a parameter of a stochastic simulation is usually obtained through a Central Limit Theorem (CLT) for the estimator (see, for example, chapter 3 of [4]). For the case of the expectation $\alpha $ in the algorithm of Figure 1, if $E\left[{W}_{1}^{2}\right]<\infty $, the classical CLT implies that
as $n\to \infty $, where ${\sigma}^{2}=E\left[{({W}_{1}-\alpha )}^{2}\right]$ and $N(0,1)$ denotes a r.v. distributed as normal with mean 0 and variance 1. Then, if $E\left[{W}_{1}^{2}\right]<\infty $, it follows from (1) and Slutsky’s Theorem (see the Appendix) that
$$\frac{\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )}{\widehat{\sigma}\left(n\right)}\Rightarrow N(0,1),$$
as $n\to \infty $, where $\widehat{\sigma}\left(n\right)$ denotes the sample standard deviation, i.e., ${\widehat{\sigma}}^{2}\left(n\right)=\frac{{\sum}_{i=1}^{n}{({W}_{i}-\widehat{\alpha}\left(n\right))}^{2}}{n-1}$. This CLT implies that
$$\underset{n\to \infty}{lim}P\left[\right|\widehat{\alpha}\left(n\right)-\alpha |\le {z}_{\beta}\widehat{\sigma}\left(n\right)/\sqrt{n}]=1-\beta ,$$
for $0<\beta <1$, where ${z}_{\beta}$ denotes the ($1-\beta /2$)-quantile of a N(0,1), which is sufficient to establish a $(1-\beta )100\%$ ACI for $\alpha $ with halfwidth

$$\frac{\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )}{\sigma}\Rightarrow N(0,1),$$

$$H{W}_{\alpha}={z}_{\beta}\widehat{\sigma}\left(n\right)/\sqrt{n}.$$

A halfwidth in the form of (2) is the typical measure used in simulation software (e.g., Simio, see [2]) to assess the accuracy of $\widehat{\alpha}\left(n\right)$ for the estimation of expectation $\alpha $.

In contrast to the estimation of (output) performance measures, parameters of (input) random components of a simulation model are usually estimated from real-data observations (x) and, while most applications covered in the relevant literature assume that no uncertainty exists in the value of these parameters, the uncertainty can be significant when little data is available. In these cases, Bayesian statistics can be used to incorporate this uncertainty in the output analysis of simulation experiments via the use of a posterior distribution $p\left(\theta \right|x)$. A methodology currently proposed for the analysis of simulation experiments under parameter uncertainty, is a two-level nested simulation algorithm (see, e.g., [6,7,8]. In the outer level, we simulate (n) observations for the parameters from a posterior distribution $p\left(\theta \right|x)$, while in the inner level we simulate (m) observations for the performance variable with the parameters fixed at the value ($\theta $) generated in the outer level (see Figure 2). In this paper, we focus on the output analysis of two-level simulation experiments, for the case where the observations at the inner level are independent, showing how the variance of a simulated observation can be decomposed into parametric and stochastic variance components. Afterwards, we derive a CLT for both the estimator of the point forecast and the estimators of the variance components. Our CLTs allow us to compute an ACI for each estimator. Our results are validated through experiments with a forecast model for sporadic demand reported in [10]. This paper is an extended version of results initially reported in [11] and the missing proofs in [11] are provided.

Following this introduction, we present the proposed methodology for the construction of an ACI for the point forecast and the variance components in a two-level simulation experiment. Afterwards, we present an illustrative example that has an analytical solution for the parameters of interest in this paper. This example is used in the next section to illustrate the application and validity of our proposed methodologies for the construction of an ACI. Finally, in the last section, we present conclusions and directions for future research.

To identify the variance components in each observation ${W}_{ij}$ of the algorithm illustrated in Figure 2, let $\mu \left(\mathsf{\Theta}\right)=E\left[{W}_{11}\right|\mathsf{\Theta}]$, and ${\sigma}^{2}\left(\mathsf{\Theta}\right)=E\left[{W}_{11}^{2}\right|\mathsf{\Theta}]-{\mu}^{2}\left(\mathsf{\Theta}\right)$. Under this notation, the point forecast is $\alpha =E\left[\mu \right(\mathsf{\Theta}\left)\right]$, and the variance of each ${W}_{ij}$ is:
$$V\left[{W}_{ij}\right]\stackrel{def}{=}E\left[{W}_{ij}^{2}\right]-E{\left[{W}_{ij}\right]}^{2}=E[E\left[{W}_{ij}^{2}\right|\mathsf{\Theta}]-\mu {\left(\mathsf{\Theta}\right)}^{2}]+E\left[\mu {\left(\mathsf{\Theta}\right)}^{2}\right]-E{\left[\mu \left(\mathsf{\Theta}\right)\right]}^{2}={\sigma}_{S}^{2}+{\sigma}_{P}^{2},$$
for $i=1,...,n$; $j=1,...,m$, where ${\sigma}_{P}^{2}=V\left[\mu \left(\mathsf{\Theta}\right)\right]\stackrel{def}{=}E\left[\mu {\left(\mathsf{\Theta}\right)}^{2}\right]-E{\left[\mu \left(\mathsf{\Theta}\right)\right]}^{2}$, and ${\sigma}_{S}^{2}=E\left[{\sigma}^{2}\left(\mathsf{\Theta}\right)\right]$. It is worth mentioning that, in the relevant literature, ${\sigma}_{S}^{2}$ is commonly referred to as stochastic variance and ${\sigma}_{P}^{2}$ is commonly referred to as parametric variance.

In this paper, we are interested in both the estimation of the point forecast $\alpha =E\left[\mu \right(\mathsf{\Theta}\left)\right]$ and the estimators of the variance components of every observations generated in the algorithm of Figure 2 and defined in (3), thus we first consider the natural point estimators
$$\widehat{\alpha}\left(n\right)=\frac{1}{n}\sum _{i=1}^{n}{\widehat{\alpha}}_{i},{\widehat{\sigma}}_{T}^{2}\left(n\right)=\frac{1}{n-1}\sum _{i=1}^{n}{({\widehat{\alpha}}_{i}-\widehat{\alpha}\left(n\right))}^{2},{\widehat{\sigma}}_{S}^{2}\left(n\right)=\frac{1}{n}\sum _{i=1}^{n}{S}_{i}^{2},$$
where ${\widehat{\alpha}}_{i}={m}^{-1}{\sum}_{j=1}^{m}{W}_{ij}$, and ${S}_{i}^{2}={(m-1)}^{-1}{\sum}_{j=1}^{m}{({W}_{ij}-{\widehat{\alpha}}_{i})}^{2}$, $i=1,...,m$. Note that the ${\widehat{\alpha}}_{i}$’s are i.i.d. with expectation $E\left[{\widehat{\alpha}}_{1}\right]=\alpha $ and variance
$$\begin{array}{ccc}\hfill {\sigma}_{T}^{2}& \stackrel{def}{=}& E\left[{({\widehat{\alpha}}_{1}-\alpha )}^{2}\right]={m}^{-2}(mE\left[{({W}_{11}-\alpha )}^{2}\right]+m(m-1)E\left[({W}_{11}-\alpha )({W}_{12}-\alpha )\right])\hfill \\ & =& {m}^{-1}({\sigma}_{S}^{2}+{\sigma}_{P}^{2})+{m}^{-1}(m-1){\sigma}_{S}^{2}={\sigma}_{S}^{2}+{m}^{-1}{\sigma}_{P}^{2}.\hfill \end{array}$$
On the other hand, the ${S}_{i}^{2}$ are i.i.d. with expectation $E\left[{S}_{1}^{2}\right]={\sigma}_{S}^{2}$. Thus, the next proposition follows from the classical LLN .

Given $m\ge 1$, if $E\left[{W}_{11}^{2}\right]<\infty $ then $\widehat{\alpha}\left(n\right)$ and ${\widehat{\sigma}}_{T}^{2}\left(n\right)$ are unbiased and consistent (as $n\to \infty $) estimators for α and ${\sigma}_{T}^{2}$ (as defined in (5)), respectively. Furthermore, if $m\ge 2$ and $E\left[{W}_{11}^{2}\right]<\infty $, then ${\widehat{\sigma}}_{S}^{2}\left(n\right)$ is an unbiased and consistent (as $n\to \infty $) estimator for ${\sigma}_{S}^{2}$ (as defined in (3)).

As we established in Proposition 1, under mild assumptions the point estimators proposed in (4) are consistent, and thus converge to the corresponding parameter value (as $n\to \infty $). Nonetheless, to establish the level of accuracy of these estimators, we must establish a CLT for each estimator to derive a valid expression for the corresponding ACI. Note that both $\widehat{\alpha}\left(n\right)$ and ${\widehat{\sigma}}_{S}^{2}\left(n\right)$ are averages of i.i.d observations, thus the next proposition follows from the classical CLT for i.i.d. observations.

Given $m\ge 1$, if $E\left[{W}_{11}^{2}\right]<\infty $ then

$$\frac{\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )}{{\sigma}_{T}}\Rightarrow N(0,1),$$

as $n\to \infty $. Furthermore, if $m\ge 2$ and $E\left[{W}_{11}^{4}\right]<\infty $, then
$$\frac{\sqrt{n}({\widehat{\sigma}}_{S}^{2}\left(n\right)-{\sigma}_{S}^{2})}{\sqrt{{V}_{S}}}\Rightarrow N(0,1),$$

Since we have consistent estimators for ${\sigma}_{T}^{2}$ and ${V}_{S}$ (under mild assumptions), the next corollary follows from Proposition 1 and Slutsky’s Theorem, details of a proof are given in the Appendix.

Under the same notation and assumptions as in Proposition 2, for $m\ge 1$ we have
$$\frac{\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )}{\sqrt{{\widehat{\sigma}}_{T}^{2}\left(n\right)}}\Rightarrow N(0,1),$$

as $n\to \infty $, and for $m\ge 2$ we have
$$\frac{\sqrt{n}({\widehat{\sigma}}_{S}^{2}\left(n\right)-{\sigma}_{S}^{2})}{\sqrt{{\widehat{V}}_{S}\left(n\right)}}\Rightarrow N(0,1),$$

as $n\to \infty $, where ${\widehat{\sigma}}_{S}^{2}\left(n\right)$ and ${\widehat{\sigma}}_{T}^{2}\left(n\right)$ are defined in (4), and
$$\widehat{{V}_{s}}\left(n\right)=\frac{1}{n-1}\sum _{i=1}^{n}{({S}_{i}^{2}-{S}^{2})}^{2},{S}^{2}=\frac{1}{n}\sum _{i=1}^{n}{S}_{i}^{2}$$

In order to obtain a CLT for ${\widehat{\sigma}}_{T}^{2}\left(n\right)$, note that this estimator is the sample variance of a set of i.i.d. observations, thus we can use the following Lemma. A proof using the Delta Method (see, e.g., Proposition 2 of [9] for a proof) is provided in the Appendix.

If ${X}_{1},{X}_{2},...$ is a sequence of i.i.d. random variables with $E\left[{X}_{1}^{4}\right]<\infty $, then
$$\frac{\sqrt{n}({S}^{2}\left(n\right)-{\sigma}_{1}^{2})}{\sqrt{{\sigma}_{2}^{2}}}\Rightarrow N(0,1),$$

as $n\to \infty $, where ${\sigma}_{1}^{2}={\mu}_{2}-{\mu}_{1}^{2}$, ${\sigma}_{2}^{2}={\mu}_{1}^{2}{\mu}_{2}-4{\mu}_{1}^{4}-4{\mu}_{1}{\mu}_{3}+{\mu}_{4}-{\mu}_{2}^{2}$, ${\mu}_{k}=E\left[{X}_{1}^{k}\right]$, $k=1,2,3,4$; ${S}^{2}\left(n\right)={(n-1)}^{-1}{\sum}_{i=1}^{n}{({X}_{i}-{\widehat{\mu}}_{1})}^{2}$, ${\widehat{\mu}}_{1}={n}^{-1}{\sum}_{i=1}^{n}{X}_{i}$.

Under the same assumptions as in Lemma 1 we have
$$\frac{\sqrt{n}({S}^{2}\left(n\right)-{\sigma}_{S}^{2})}{\sqrt{{\widehat{\sigma}}_{2}^{2}\left(n\right)}}\Rightarrow N(0,1),$$

as $n\to \infty $, where ${\widehat{\sigma}}_{2}^{2}=8{\widehat{\mu}}_{1}^{2}{\widehat{\mu}}_{2}-4{\widehat{\mu}}_{1}^{4}-4{\widehat{\mu}}_{1}{\widehat{\mu}}_{3}+{\widehat{\mu}}_{4}-{\widehat{\mu}}_{2}^{2}$, ${\widehat{\mu}}_{k}={n}^{-1}{\sum}_{i=1}^{n}{X}_{i}^{k}$.

Corollary 2 follows from the fact that ${\widehat{\mu}}_{k}$ is an unbiased and consistent estimator of ${\mu}_{k}$, and the next corollary follows from the fact that ${\widehat{\sigma}}_{T}^{2}\left(n\right)$ is the sample variance of the ${\widehat{\alpha}}_{i}$.

Given $m\ge 1$, if $E\left[{W}_{11}^{4}\right]<\infty $ then
$$\frac{\sqrt{n}({\widehat{\sigma}}_{T}^{2}\left(n\right)-{\sigma}_{T}^{2})}{\sqrt{{\widehat{V}}_{T}\left(n\right)}}\Rightarrow N(0,1),$$

as $n\to \infty $, where ${\widehat{V}}_{T}\left(n\right)=8{\overline{\alpha}}_{1}^{2}{\overline{\alpha}}_{2}-4{\overline{\alpha}}_{1}^{4}-4{\overline{\alpha}}_{1}{\overline{\alpha}}_{3}+{\overline{\alpha}}_{4}-{\overline{\alpha}}_{2}^{2}$, ${\overline{\alpha}}_{k}={n}^{-1}{\sum}_{i=1}^{n}{\widehat{\alpha}}_{i}^{k}$.

Let $0<\beta <1$, and using corollaries 1 and 3 we can establish a $100(1-\beta )\%$ ACI for the point forecast $\alpha $, and variance components ${\sigma}_{S}^{2}$ and ${\sigma}_{T}^{2}={\sigma}_{S}^{2}+{m}^{-1}{\sigma}_{P}^{2}$; each ACI is centered in the corresponding point estimator ($\widehat{\alpha}\left(n\right)$, ${\widehat{\sigma}}_{S}^{2}\left(n\right)$ or ${\widehat{\sigma}}_{T}^{2}\left(n\right)$) and the corresponding halfwidth is given by:
$$H{W}_{\alpha}={z}_{\beta}\frac{\sqrt{{\widehat{\sigma}}_{T}^{2}\left(n\right)}}{\sqrt{n}},H{W}_{{\sigma}_{S}^{2}}={z}_{\beta}\frac{\sqrt{{\widehat{V}}_{S}\left(n\right)}}{\sqrt{n}},\mathit{and}\phantom{\rule{4pt}{0ex}}H{W}_{{\sigma}_{T}^{2}}={z}_{\beta}\frac{\sqrt{{\widehat{V}}_{T}\left(n\right)}}{\sqrt{n}},$$
for $\alpha $, ${\sigma}_{2}^{2}$ and ${\sigma}_{T}^{2}$, respectively, where ${\widehat{\sigma}}_{T}^{2}\left(n\right)$ is defined in (4), ${\widehat{V}}_{S}\left(n\right)$ and ${\widehat{V}}_{T}\left(n\right)$ are defined in Corollary 1, and in Corollary 3, respectively.

Note that the ACIs proposed in (6) assume that the value of m in the algorithm of Figure 2 is fixed and the accuracy of the estimator improves as n (the number of observations in the outer level) increases (in turn, the halfwidth of the ACI gets smaller). Given that we can build a valid ACI for any value of m, a relevant question is how to find an adequate value of m to get an acceptable level of accuracy in a reasonable amount of running time. In order to answer this question for the case of the point estimator of $\alpha $, let us fix the total number of iterations in the algorithm of Figure 2 to k = $nm$, and note from (5) and Proposition 2 that the asymptotic variance of $(\widehat{\alpha}\left(n\right)-\alpha )$ is
and takes its minimal value for $m=1$, suggesting that the point estimator $\widehat{\alpha}\left(n\right)$ defined in (4) is more accurate as m approaches the value of 1. Note that for $m=1$, a fixed number of iterations $k=nm$ is convenient (from the point of view of running time), when the computation of ${W}_{ij}$ requires the same or more computation time as ${\mathsf{\Theta}}_{i}$, as suggested in the relevant literature (see, for example, [6]). Furthermore, if we allow m to increase with n, we can obtain the following proposition (a proof using Lindeberg-Feller Theorem is provided in the Appendix).

$${n}^{-1}{\sigma}_{T}^{2}={k}^{-1}(m{\sigma}_{S}^{2}+{\sigma}_{P}^{2}),$$

Given $0<p\le 1$, if $m=*{n}^{-1+1/p}$ and $E\left[{W}_{11}^{2}\right]<\infty $ then
$$\frac{\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )}{\sqrt{{\sigma}_{T}^{2}}}\Rightarrow N(0,1),$$

as $n\to \infty $, where ${\sigma}_{T}^{2}$ is defined in (5).

Note that the last proposition implies that the ACI defined in equation (6) for the point forecast $\alpha $ is also valid under the assumptions of Proposition 3. If, once again, we set the total number of iterations in the algorithm of Figure 2 to $k=nm$, we let $n\approx {k}^{p}$, $m\approx {k}^{1-p}$, and nm = k, it follows from Proposition 3 that the asymptotic variance of $\widehat{\alpha}\left(n\right)$ is ${n}^{-1}{\sigma}_{T}^{2}\approx {k}^{-p}({k}^{1-p}{\sigma}_{S}^{2}+{\sigma}_{P}^{2})$ for every $0\le p\le 1$. Note that, for fixed k, ${n}^{-1}{\sigma}_{T}^{2}$ reaches its minimum value when $p=1$, that is, when $n=k$ and $m=1$. However, note that we need $m\ge 2$ in order to estimate ${\sigma}_{S}^{2}$. In the following section we report some empirical results that confirm our theoretical results. It is worth mentioning that the case $n=k$ and $m=1$ has been reported in the literature as the posterior sampling algorithm (see, e.g., [12,13])

The following model (reported in [10]) has been proposed to forecast sporadic demand by incorporating data on times between arrivals and customer demand; where uncertainty on the model parameters is incorporated using a Bayesian approach. For this model, we will show analytical expressions for the performance measures defined in Section 2. These expressions are used in the following section to illustrate the validity of the ACIs proposed in the previous section.

Customer arrivals for a particular item in a shop follow a Poisson process, yet there is uncertainty in the arrival rate ${\mathsf{\Theta}}_{0}$, so that given $[{\mathsf{\Theta}}_{0}={\theta}_{0}]$, interarrival times between customers are i.i.d. with exponential density:
$$f\left(y\right|{\theta}_{0})=\left(\right)open="\{"\; close>\begin{array}{cc}\hfill {\theta}_{0}{e}^{-{\theta}_{0}y},& \hfill y0,\\ \hfill 0,& \hfill \mathrm{otherwise},\end{array}$$
where ${\theta}_{0}\in {S}_{00}=(0,\infty )$. Every client can order j units of this item with probability ${\mathsf{\Theta}}_{1j}$, $j=1,...,q$, $q\ge 2$. Let ${\mathsf{\Theta}}_{1}=({\mathsf{\Theta}}_{11},...{\mathsf{\Theta}}_{1(q-1)})$ and ${\mathsf{\Theta}}_{1q}=1-{\sum}_{j=1}^{q-1}{\mathsf{\Theta}}_{1j}$, then $\mathsf{\Theta}=({\mathsf{\Theta}}_{0},{\mathsf{\Theta}}_{1})$ is the parameter vector, and ${S}_{0}={S}_{00}\u2a02{S}_{01}$ is the parameter space, where ${S}_{01}=\{({\theta}_{11},...,{\theta}_{1(q-1)}):{\sum}_{j=1}^{q-1}{\theta}_{1j}\le 1;{\theta}_{1j}\ge 0,j=1,...,q-1\}$.

Total demand during a period of length T is
$$D=\left(\right)open="\{"\; close>\begin{array}{cc}\hfill \sum _{i=1}^{N\left(T\right)}{U}_{i},& \hfill N\left(T\right)0\\ \hfill 0,& \hfill \mathrm{otherwise},\end{array}$$
where $N\left(s\right)$ is the number of customer arrivals during the interval $[0,s]$, $s\ge 0$, and ${U}_{1},{U}_{2},...$ are the individual demands (conditionally independent relative to $\mathsf{\Theta}$). The information about $\mathsf{\Theta}$ consists of i.i.d. observations $v=({v}_{1},...,{v}_{r})$, $u=({u}_{1},...,{u}_{r})$ of past customers, where ${v}_{i}$ is the interarrival time between customer i and customer ($i-1$), and ${u}_{i}$ is the number of units ordered by client i. By taking Jeffrey’s non-informative prior as the prior density for $\mathsf{\Theta}$, we obtain the posterior density (see [10] for details) $p\left(\theta \right|x)=p\left({\theta}_{0}\right|v)p\left({\theta}_{1}\right|u)$, where ${x}_{i}=({v}_{i},{u}_{i})$, $i=1,...,r$ , $x=({x}_{1},...,{x}_{r})$, $\theta =({\theta}_{0},{\theta}_{1})$ as
$$p\left({\theta}_{0}\right|v)=\frac{{\theta}_{0}^{r-1}{\left({\sum}_{i=1}^{r}{v}_{i}\right)}^{r}{e}^{-{\theta}_{0}{\sum}_{i=1}^{r}{v}_{i}}}{(r-1)!},p\left({\theta}_{1}\right|u)=\frac{{(1-{\sum}_{j=1}^{q-1}{\theta}_{1j})}^{{c}_{q}-1/2}{\mathsf{\Pi}}_{j=1}^{q-1}{\theta}_{1j}^{{c}_{j}-1/2}}{B({c}_{1}+1/2,...,{c}_{q}+1/2)},$$
where ${c}_{j}={\sum}_{i=1}^{r}I[{u}_{i}=j]$, and $B({a}_{1},...{a}_{q})={\mathsf{\Pi}}_{j=1}^{q}\Gamma \left({a}_{j}\right)/\Gamma \left({\sum}_{j=1}^{q}{a}_{j}\right)$, for ${a}_{1},...,{a}_{q}>0$. Using this notation, we can show that (see [1] for details)
$${\sigma}_{P}^{2}=\frac{E\left[{T}^{2}{\mathsf{\Theta}}_{0}^{2}\right]}{({q}_{0}+1)}\sum _{j=1}^{q}{j}^{2}{p}_{j}+\frac{E{\left[T{\mathsf{\Theta}}_{0}\right]}^{2}[({q}_{0}/n)-1]}{({q}_{0}+1)}{\left(\sum _{j=1}^{q}j{p}_{j}\right)}^{2},$$
where $E\left[T{\mathsf{\Theta}}_{0}\right]=Tr{\left({\sum}_{i=1}^{r}{v}_{i}\right)}^{-1}$, $E\left[{T}^{2}{\mathsf{\Theta}}_{o}^{2}\right]={T}^{2}r(1+r){\left({\sum}_{i=1}^{r}{v}_{i}\right)}^{-2}$, ${p}_{j}={q}_{j}/{q}_{0}$, ${q}_{j}={c}_{j}+1/2$, $j=1,...,q$, ${q}_{0}={\sum}_{j=1}^{q}{q}_{j}$, and ${c}_{j}$ are defined in (10).

$$\alpha =E\left[T{\mathsf{\Theta}}_{0}\right]\sum _{j=1}^{q}j{p}_{j},$$

$${\sigma}_{S}^{2}=E\left[T{\mathsf{\Theta}}_{0}\right]\sum _{j=1}^{q}{j}^{2}{p}_{j},$$

To validate the ACIs proposed in (4), we conducted some experiments with the Bayesian model of the previous section to illustrate the estimation of $\alpha $, ${\sigma}_{S}^{2}$ and ${\sigma}_{T}^{2}$. We considered the values $T=15$, $r=20$, ${\sum}_{i=1}^{r}{x}_{i}=10$, $q=5$, ${c}_{1}=5$, ${c}_{2}=3$, ${c}_{3}=2$, ${c}_{4}=3$, ${c}_{5}=7$. With this data, the point forecast is $\alpha \approx 95.333$, and the variance components are ${\sigma}_{S}^{2}\approx 380.667$, ${\sigma}_{P}^{2}\approx 568.598$. The empirical results that we report below illustrate a typical behavior that we should experiment for any other feasible data set.

In all the experiments reported in this Section we considered 1000 independent replications of the algorithm of Figure 2 for different number of observations in the outer level (n) and in the inner level (m); in each replication we computed the point estimators for $\alpha $, ${\sigma}_{S}^{2}$, and ${\sigma}_{T}^{2}$, and the corresponding halfwidths of 90% ACI’s according to equations (6). Since we know the value of the parameters we are estimating, we were able to report (for n and m given), the empirical coverage (i.e., the fraction of independent replications that the corresponding ACI covered the true parameter value), the average and standard deviation of halfwidths, and the squared root of the empirical mean squared error defined by
$$RMSE=\sqrt{\frac{1}{{n}_{0}}{\displaystyle \sum _{i=1}^{{n}_{0}}}{({\widehat{\theta}}_{i}-\theta )}^{2}},$$
where ${\widehat{\theta}}_{i}$ denotes the value obtained in the i-th replication for the estimation of a parameter $\theta ,i=1,2,\cdots ,{n}_{0}$ (${n}_{0}=1000$ in our experiments).

In a first set of experiments we considered $nm=200,2000,20000$, and $m=2,4,8$ for each value of $nm$, to compare the effect of increasing the number of observations in the inner level for a given value value of $nm$. The results of this set of experiments are summarized in Figure 3, Figure 4 and Figure 5. Note that we are not considering $m=1$ in this set of experiments to be able to construct an ACI for the stochastic variance ${\sigma}_{S}^{2}$.

In Figure 3 we illustrate the performance measures for the quality of the estimation procedure that we obtained for the estimation of the point forecast $\alpha $. As we observe from Figure 3, the coverages are acceptable (very close to the nominal value of 0.9, even for $n=100$). These results validate the ACI defined in (6) for the point forecast $\alpha $. We also observe from Figure 3 that the RMSE, average halfwidth and standard deviation of halwidths improve (decrease) as the number of observations in the outer level (n) increases, as suggested by Corollary 1. Note also from Figure 3 that a smaller value of m provides smaller RMSE, average halfwidths and standard deviations of halwidths, validating our theoretical results.

In Figure 4 we illustrate the performance measures for the quality of the estimation procedure that we obtained for the estimation of the stochastic variance ${\sigma}_{S}^{2}$. As we observe from Figure 4, the coverages are acceptable (very close to the nominal value of 0.9, even for $n=100$). These results validate the ACI defined in (6) for the stochastic variance ${\sigma}_{S}^{2}$. We also observe from Figure 3 that the RMSE, average halfwidth and standard deviation of halwidths improve (decrease) as the number of observations in the outer level (n) increases, as suggested by Corollary 2. However, contrary to what we observe for the estimation of $\alpha $, a larger value of m provides smaller RMSE, average halfwidths and standard deviations of halwidths, suggesting that, for a fixed value of $nm$, the quality of the estimation for the stochastic variance ${\sigma}_{S}^{2}$ improves as the number of the observations in the inner loop (m) increases.

For the estimation of the total variance ${\sigma}_{T}^{2}$ (illustrated n Figure 5) we obtained similar results for the quality of the estimation as for the estimation of the point forecast $\alpha $, except that a larger values of n is required to obtain reliable coverages. As we observe from Figure 3, the coverages are acceptable (very close to the nominal value of 0.9, for $n=1000$ and 10000). These results validate the ACI defined in (6) for the total variance ${\sigma}_{T}^{2}$. We also observe from Figure 3 that the RMSE, average halfwidth and standard deviation of halwidths improve (decrease) as the number of observations in the outer level (n) increases, as suggested by Corollary 3. Note also from Figure 5 that a smaller value of m provides smaller RMSE, average halfwidths and standard deviations of halwidths, validating our theoretical results.

In a second set of experiments we considered $nm=100,1000,10000$, with $m=1$ and $m\approx {\left(nm\right)}^{1/3}$ for each value of $nm$, to compare the quality of the estimation procedures using the value of m that we suggest as optimal for the estimation of point forecast $\alpha $ with the value of m suggested in [6] as an adequate choice for m in the case of biased estimators in the inner level of the algorithm of Figure 2. The results of this set of experiments are summarized in Figure 6 and Figure 7. Note that we are not considering the estimation of the stochastic variance ${\sigma}_{S}^{2}$ in this set of experiments because $m\ge 2$ is required to construct an ACI for the stochastic variance ${\sigma}_{S}^{2}$. Note also that we considered ${100}^{1/3}\approx 5$, ${1000}^{1/3}\approx 10$, and ${10000}^{1/3}\approx 20$, and we are using the same color for $m=5,10,20$ in Figure 6 and Figure 7.

In Figure 6 we illustrate the performance measures for the quality of the estimation procedure that we obtained for the estimation of the point forecast $\alpha $ in our second set of experiments . As we observe from Figure 6, the coverages are acceptable (very close to the nominal value of 0.9, even for $n=100$). These results validate the ACI defined in (6) for the point forecast $\alpha $, and the ACI suggested by Proposition 3. We also observe from Figure 6 that the RMSE, average halfwidth and standard deviation of halwidths are worse for $m\approx {\left(nm\right)}^{1/3}$, confirming our finding that, for the same number of replications $nm$, $m=1$ produces better point estimators for $\alpha $ than $m\approx {\left(nm\right)}^{1/3}$ confirming the result of Proposition 3.

Finally, in Figure 7 we show the results of our second set of experiments for the estimation of the total variance ${\sigma}_{T}^{2}$. We found similar resulta as for the case of the estimation of the point forecast $\alpha $, coverages are very good (even for n = 100), and all performance measure for the ACI (RMSE, average and standard deviation of halfwidths) are worse for $m\approx {\left(nm\right)}^{1/3}$, suggesting that, for the same number of replications $nm$, $m=1$ produces better point estimators for ${\sigma}_{T}^{2}$ than $m\approx {\left(nm\right)}^{1/3}$.

In this paper, we propose methodologies to calculate point estimators (and their corresponding halfwidths), for both the point forecast and the variance components in two-level nested stochastic simulation experiments, for the case where the observations at both levels of the algorithm are independent. These methods can be applied to the construction of Bayesian forecasts based on experiments using a simulation model under parameter uncertainty.

Both our theoretical and our experimental results confirm that the proposed point estimators and their corresponding halfwidths are asymptotically valid, i.e., the point estimators converge to the corresponding parameter values and the halfwidths converge to the nominal coverage as the number of replications (n) of the outer level increases.

Furthermore, given a fixed number of total observations ($nm$), we show that the choice of only one replication in the inner level ($m=1$) provides more accurate estimators for both the point forecast ($\alpha $), and the variance of the point forecast (${\sigma}_{T}^{2}$). However, $m\ge 2$ is required for the estimation of ${\sigma}_{S}^{2}$.

Directions for future research on this topic includes experimentation with other point estimators, such as, quasi Monte Carlo or Simpson integration, with the objective of finding more accurate point estimators for the parameters considered in this paper.

This research was supported by the Asociación Mexicana de Cultura A.C. and the National Council of Science of Technology of Mexico under Award Number 1200/158/2022.

The raw data corresponding to our experiments are available in the repository of AppliedMath.

The Author declare no conflict of interest.

For completeness, we first write three well known theorems. Proofs of Theorem A1 and Theorem A2 can be found, e.g., in [5], and a proof of Theorem A3 can be found, e.g., in [9]. In what follows, we write ⇒ for weak converge (as $n\to \infty $ without explicit mention).

- (i)
- ${X}_{n}+{Y}_{n}\Rightarrow X+c$
- (ii)
- ${X}_{n}{Y}_{n}\Rightarrow cX$
- (iii)
- ${X}_{n}/{Y}_{n}\Rightarrow X/c$, if $c\ne 0$

$$\sqrt{n}[\overline{Y}\left(n\right)-\mu ]\Rightarrow G{N}_{k}(0,1)$$

is satisfied, where $\overline{Y}\left(n\right)={n}^{-1}{\sum}_{i=1}^{m}{Y}_{i}$, and ${N}_{k}(0,I)$ denotes a (k -variate) normal distribution with mean 0 and variance I (the identity), then

$$\sqrt{n}[g\left(\overline{Y}\left(n\right)\right)-g\left(\mu \right)]\Rightarrow \sigma N(0,1),$$

where $\sigma =\sqrt{\nabla g{\left(\mu \right)}^{T}G{G}^{T}\nabla g\left(\mu \right))}$.

Since $\widehat{{\alpha}_{1}},\widehat{{\alpha}_{2}},...$ are i.i.d. with $E\left[{\widehat{\alpha}}_{1}^{2}\right]<\infty $, it follows from the Law of Large Numbers that ${n}^{-1}({\sum}_{i=1}^{n}{\widehat{\alpha}}_{i}^{2},{\sum}_{i=1}^{n}{\widehat{\alpha}}_{i})\Rightarrow (E\left[{\widehat{\alpha}}_{1}^{2}\right],E\left[{\widehat{\alpha}}_{1}\right])$. Therefore, by taking $g({x}_{1},{x}_{2})=\sqrt{{x}_{1}-{x}_{2}^{2}},$ for ${x}_{1}-{x}_{2}^{2}\ge 0$ in Theorem A2, we have $\sqrt{(n-1){\widehat{\sigma}}_{T}^{2}\left(n\right)/n}\Rightarrow \sqrt{{\sigma}_{T}^{2}}$, so that ${Y}_{n}=\sqrt{{\widehat{\sigma}}_{T}^{2}\left(n\right)}/\sqrt{{\sigma}_{T}^{2}}\Rightarrow 1$. Finally, by taking ${X}_{n}=\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )/\sqrt{{\sigma}_{T}^{2}}$ in Theorem A1, it follows from Proposition 2 that
$$\frac{\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )}{\sqrt{{\widehat{\sigma}}_{T}^{2}\left(n\right)}}\Rightarrow N(0,1)$$

Similarly, since ${S}_{1}^{2},{S}_{2}^{2},...$ are i.i.d. with $E\left[{S}_{1}^{4}\right]<\infty $, it follows from Theorem A1 and Proposition 2 that
$$\frac{\sqrt{n}({\widehat{\sigma}}_{S}^{2}\left(n\right)-{\sigma}_{S}^{2})}{\sqrt{{\widehat{V}}_{S}\left(n\right)}}\Rightarrow N(0,1).$$

□

Let $k=2$, ${Y}_{i}=({X}_{i},{X}_{i}^{2})$, $\mu =({\mu}_{1},{\mu}_{2})$, then the TLC of Theorem A3 is satisfied for
$$G{G}^{T}=\left[\begin{array}{cc}{\mu}_{2}-{\mu}_{1}^{2}& {\mu}_{3}-{\mu}_{1}{\mu}_{2}\\ {\mu}_{3}-{\mu}_{1}{\mu}_{2}& {\mu}_{4}-{\mu}_{2}^{2}\end{array}\right]$$

By taking $g\left(\mu \right)={\mu}_{2}-{\mu}_{1}^{2}={\sigma}_{1}^{2}$, we have $g\left(\overline{Y}\left(n\right)\right)=(n-1){S}^{2}\left(n\right)/n$, $\nabla g{\left(\mu \right)}^{T}=(-2{\mu}_{1},1)$, and
$$\nabla f{\left(\mu \right)}^{T}G{G}^{T}\nabla f\left(\mu \right)=8{\mu}_{1}^{2}{\mu}_{2}-4{\mu}_{1}^{4}-4{\mu}_{1}{\mu}_{3}+{\mu}_{4}-{\mu}_{2}^{2}={\sigma}_{2}^{2}.$$

Then, it follows from Theorem A3 that
$$\sqrt{n}\lfloor (n-1){S}^{2}\left(n\right)/n-{\sigma}_{1}^{2}\rfloor \Rightarrow {\sigma}_{2}N(0,1),$$
and the final conclusion follows from Theorem A1. □

In this proof we follow the notation of Lindeberg-Feller Theorem as in Theorem 7.2.1 of [3].

For n = 1, 2, ..., let ${m}_{n}=\lfloor {n}^{-1+1/p}\rfloor $ and ${\alpha}_{j}\left(n\right)=\left({\sum}_{i=1}^{{m}_{n}}{W}_{ij}\right)/{m}_{n},j=1,...,n$. Then ${\alpha}_{1}\left(n\right),{\alpha}_{2}\left(n\right),...,{\alpha}_{n}\left(n\right)$ are independent, and for ${X}_{nj}=({\alpha}_{j}\left(n\right)-\alpha )/\sqrt{n{\sigma}_{T}^{2}}$ we also have that ${X}_{n1},{X}_{n2},...{X}_{nn}$ are independent.

Then if ${Y}_{nj}=({\alpha}_{j}\left(n\right)-\alpha )/{\sigma}_{T}$, we have $E\left[{Y}_{nj}\right]=0$ and $E\left[{Y}_{nj}^{2}\right]=1$, so that given $\u03f5>0$ there exists ${\eta}_{0}>0$ such that ${\int}_{\left|y\right|<{\eta}_{0}}{y}^{2}dF{y}_{nj}\left(y\right)<\u03f5$.

Therefore, given $\eta >0$, for $n\ge max\{1,{({\eta}_{0}/\eta )}^{2}\}$ we have
$$\sum _{j=1}^{n}{\int}_{\left|x\right|<\eta}{x}^{2}dF{x}_{nj}\left(x\right)\le \sum _{j=1}^{n}\frac{1}{n}{\int}_{\left|y\right|<{\eta}_{0}}{y}^{2}dF{y}_{nj}\left(y\right)<\u03f5,$$
so that (1) of Theorem 7.2.1 of [14] is satisfied, and it follows from this Theorem that ${S}_{n}\Rightarrow N(0,1)$, where
$${S}_{n}=\sum _{j=1}^{n}{X}_{nj}=\frac{\sqrt{n}(\widehat{\alpha}\left(n\right)-\alpha )}{\sqrt{{\sigma}_{T}^{2}}}.$$

□

- Muñoz, D.F. Simulation output analysis for risk assessment and mitigation. In Multi-Criteria Decision Analysis for Risk Assessment and Management; Ren, J., Ed.; Springer: Heidelberg, Germany, 2021. [Google Scholar]
- Smith, J.S.; Sturrock, D.T. Simio and Simulation: Modeling, Analysis, Aplications, 6th ed.; Simio LLC: Sewickley, Pennsylvania, 2022. [Google Scholar]
- Chung, K.L. A Course in Probability Theory; Academic Press: Cambridge, Massachusetts, 2001. [Google Scholar]
- Asmussen, S.; Glynn, P.W. Stochastic Simulation Algorithms and Analysis; Springer: Heidelberg, Germany, 2007. [Google Scholar]
- Serfling, R.J. Approximation Theorems of Mathematical Statistics; John Wiley & Sons: Hoboken, New Jersey, 2009. [Google Scholar]
- Andradóttir, S.; Glynn, P.W. Computing bayesian means using simulation. ACM TOMACS
**2016**, 26, paper–10. [Google Scholar] [CrossRef] - L’Ecuyer, P. Quasi-Monte Carlo methods with applications in finance. Finance and Stochastics
**2009**, 13, 307–349. [Google Scholar] [CrossRef] - Zouaoui, F. , Wilson J.R. Accounting for parameter uncertainty in simulation input modeling. IIE Transactions
**2003**, 35, 781–792. [Google Scholar] [CrossRef] - Muñoz, D.F.; Glynn, P.W. A batch means methodology for estimation of a nonlinear function of a steady-state mean. Manag Sci
**1997**, 43, 1121–1135. [Google Scholar] [CrossRef] - Muñoz, D.F.; Muñoz, D.F. Bayesian forecasting of spare parts using simulation. In Service Parts Management: Demand Forecasting and Inventory Control; Altay, N., Litteral, L.A., Eds.; Springer: Heidelberg, Germany, 2011. [Google Scholar]
- Muñoz, D.F. Estimation of expectations in two-level nested simulation experiments. In Proceedings of the Name of the 29th European Modeling and Simulation Symposium, Barcelona, Spain, 18–20 Sep 2017; 233-238. [Google Scholar]
- Russo, D.; Van Roy, B. Learning to optimize via posterior sampling. Mathematics of Operations Research
**2014**, 39, 1221–1243. [Google Scholar] [CrossRef] - Muñoz, D.F.; Muñoz, D.F.; Ramírez-López, A. On the incorporation of parameter uncertainty for inventory management using simulation. International Transactions in Operational Research
**2013**, 20, 493–513. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Estimation of Expectations and Variance Components in Two-Level Nested Simulation Experiments

David Fernando Muñoz

,

2023

Real-Time Demand Side Management Algorithm Using Stochastic Optimization

Moses Amoasi Acquah

et al.

,

2018

Intraday Load Forecasts with Uncertainty

David Kozak

et al.

,

2019

© 2024 MDPI (Basel, Switzerland) unless otherwise stated