Preprint
Article

This version is not peer-reviewed.

The Empirical Bayes Estimators of the Variance Parameter of the Normal Distribution with a Normal-Inverse-Gamma Prior under Stein's Loss Function

Submitted:

11 December 2025

Posted:

12 December 2025

You are already at the latest version

Abstract
For the hierarchical normal and normal-inverse-gamma model, we derive the Bayesian estimator of the variance parameter in the normal distribution under Stein's loss function---a penalty function that treats gross overestimation and underestimation equally---and compute the associated Posterior Expected Stein's Loss (PESL). Additionally, we determine the Bayesian estimator of the same variance parameter under the squared error loss function, along with its corresponding PESL. We further develop empirical Bayes estimators for the variance parameter using a conjugate normal-inverse-gamma prior, employing both the method of moments and Maximum Likelihood Estimation (MLE). Through numerical simulations, we examine five key aspects: (1) the consistency of moment-based and MLE-based hyperparameter estimators; (2) the influence of κ₀ on quantities of interest as functions of the most recent observation; (3) two inequalities involving the Bayesian estimators and their respective PESL values; (4) the model's goodness-of-fit to simulated data; and (5) graphical representations of marginal densities under different hyperparameter settings. The simulation results demonstrate that MLEs outperform moment estimators in estimating hyperparameters, particularly with respect to consistency and model fit. Finally, we apply our methodology to real-world data on poverty levels---specifically, the percentage of individuals living below the poverty line---to validate and illustrate our theoretical findings.
Keywords: 
;  ;  ;  ;  

1. Introduction

The empirical Bayes method depends on a conjugate prior modeling, where the hyperparameters are estimated from the observations and the “estimated prior" is then used as a regular prior in the subsequent inference. See [1,2,3] and the references therein. The empirical Bayes method is introduced in [4,5,6]. From Bayesian perspective, it means that the sampling distribution is known, but the prior distribution is not. The marginal distribution is then used to restore the prior distribution from the observations. More literature on empirical Bayes method can be found, for instance, in [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24].
The motivations of this paper are summarized as follows. Example 1.5.1 (p.20) of [25], Part I (pp.69-70) of [26], and [19] have considered the following hierarchical normal and normal-inverse-gamma model:
X i | μ , θ iid N μ , θ , i = 1 , 2 , , n , μ | θ N ( μ 0 , θ / κ 0 ) , θ I G ˜ ν 0 / 2 , ν 0 σ 0 2 / 2 ,
where < μ 0 < , κ 0 > 0 , ν 0 > 0 , and σ 0 2 > 0 are known hyperparameters, and iid means independent and identically distributed. The N I G ˜ ( μ 0 , κ 0 , ν 0 , σ 0 2 ) distribution is a joint conjugate prior for μ , θ of the normal distribution N μ , θ , so that the posterior distribution of μ , θ is an N I G ˜ ( μ n , κ n , ν n , σ n 2 ) distribution with updated hyperparameters. However, in reality, the hyperparameters are unknown. [19] have estimated the hyperparameters of the model (1) by the moment method and the Maximum Likelihood Estimation (MLE) method. Moreover, they obtained the Bayes estimators of the mean and variance parameters of the model (1) under the squared error loss function. Finally, they obtained the empirical Bayes estimators of the mean and variance parameters of the model (1) under the squared error loss function by the moment method and the MLE method. However, in their empirical Bayes estimators, the sample x n = x 1 , x 2 , , x n have been used twice. First, the sample x n are utilized to estimate the hyperparameters μ 0 , κ 0 , ν 0 , and σ 0 2 . Second, the sample x n are used to obtain the Bayes estimators. To avoid using the sample x n twice, and to be compatible with the usual empirical Bayes analysis, we will use the following hierarchical normal and normal-inverse-gamma model in this paper:
X i | μ i , θ i independent N μ i , θ i , i = 1 , 2 , , n , n + 1 , μ i | θ i independent N ( μ 0 , θ i / κ 0 ) , θ i iid I G ˜ ν 0 / 2 , ν 0 σ 0 2 / 2 ,
where < μ 0 < , κ 0 > 0 , ν 0 > 0 , and σ 0 2 > 0 are hyperparameters to be determined, < μ n + 1 < and θ n + 1 > 0 are the unknown parameters of interest, N μ i , θ i is a normal distribution with an unknown mean μ i and an unknown variance θ i , the conditional conjugate prior distribution of μ i given θ i is N ( μ 0 , θ i / κ 0 ) which is a normal distribution with mean μ 0 and an unknown variance θ i / κ 0 , the marginal conjugate prior distribution of θ i is I G ˜ ν 0 / 2 , ν 0 σ 0 2 / 2 which is an inverse gamma distribution with shape parameter ν 0 / 2 and scale parameter ν 0 σ 0 2 / 2 . Note that the joint conjugate prior π ( μ , θ ) N I G ˜ ( μ 0 , κ 0 , ν 0 , σ 0 2 ) is a normal-inverse-gamma distribution. As described in [7], the statistician observes data x 1 , x 2 , , x n + 1 = x n + 1 and wishes to make an inference about μ n + 1 and θ n + 1 . Therefore, x n + 1 provides direct information about the parameters μ n + 1 and θ n + 1 , while supplementary information x 1 , x 2 , , x n = x n is also available. The connection between the prime data x n + 1 and the supplementary information x n is provided by the common distributions N μ i , θ i , N ( μ 0 , θ i / κ 0 ) , and I G ˜ ν 0 / 2 , ν 0 σ 0 2 / 2 . Moreover, since the variance parameter of the normal distribution is a positive restricted parameter, the squared error loss function is not appropriate. In contrast, we will choose Stein’s loss function because it penalizes gross overestimation and gross underestimation equally, that is, an action a will incur an infinite loss when it tends to 0 or . Note that the squared error loss function does not have this property. For more literature on Stein’s loss function, we refer readers to [20,27,28,29,30].
The key contributions of this work are outlined below. For the hierarchical normal and normal-inverse-gamma (N-NIG) framework (2), we begin by deriving the posterior distributions and marginal density, as stated in Theorem 1. Subsequently, we derive Bayes estimators for the variance parameter of the normal distribution under two loss criteria: Stein’s loss and the squared error loss. We further compute the corresponding Posterior Expected Stein’s Losses (PESLs) associated with these two estimators. It is shown that the two Bayes estimators and their respective PESLs satisfy specific inequalities, namely (15) and (16). Additionally, we derive moment-based estimators for the hyperparameters of the N-NIG model (2) and establish their consistency in Theorem 2. Similarly, Maximum Likelihood Estimators (MLEs) for the same hyperparameters are derived, and their consistency is proven in Theorem 3. Building on these results, we construct empirical Bayes estimators for the variance parameter under Stein’s loss using both the method of moments and MLE, as presented in Theorem 4. The theoretical findings are supported by numerical experiments and an application to real-world data.
Comparing models (1) and (2) carefully, we find that the samples x 1 n = x 11 , x 12 , , x 1 n and x 2 n = x 21 , x 22 , , x 2 n generated from the two models are iid from different distributions. On the one hand, the sample x 1 n generated from (1) are iid from N μ , θ . Although the marginal densities of x 11 , x 12 , , x 1 n are t ν 0 μ 0 , σ 0 2 κ 0 + 1 / κ 0 , and thus x 11 , x 12 , , x 1 n can be thought to be from the t distribution. However, x 11 , x 12 , , x 1 n are not iid from the t distribution. In other words, x 11 , x 12 , , x 1 n are dependent from the t distribution. On the other hand, the sample x 2 n generated from (2) are iid from the t distribution. That is, x 21 , x 22 , , x 2 n are independent and identically distributed from the t distribution. The sample x 1 n can be used to estimate the parameters μ and θ from N μ , θ , while the sample x 2 n can be used to estimate the hyperparameters μ 0 , ν 0 , and u 0 from t ν 0 μ 0 , u 0 , where u 0 = σ 0 2 κ 0 + 1 / κ 0 .
The structure of this paper is outlined as follows. Section 2 begins with the derivation of posterior distributions and the marginal likelihood for the hierarchical normal and normal-inverse-gamma model. Subsequently, we derive the corresponding Bayes estimators and PESLs, which are shown to fulfill two key inequalities: (15) and (16). Additionally, Theorem 4 presents the empirical Bayes estimators for the model’s variance component under Stein’s loss function, obtained via both the method of moments and MLE. In Section 3, we conduct numerical experiments from five different perspectives to evaluate performance. Section 4 illustrates the methodology using real-world data, specifically poverty rate data reflecting the proportion of individuals living below the poverty line. Finally, Section 5 offers concluding remarks and a discussion of the findings.

2. Theoretical Results

Suppose that X G ˜ α , β and Y = 1 / X I G ˜ α , β . The probability density functions (pdfs) of X and Y are respectively given by
f X x | α , β = β α Γ α x α 1 exp x β , x > 0 , α , β > 0 , f Y y | α , β = β α Γ α 1 y α + 1 exp β y , y > 0 , α , β > 0 .
It is easy to calculate E X = α / β and E Y = β / α 1 , for α > 1 and β > 0 .

2.1. The Bayes Estimators and the PESLs

For the hierarchical normal and normal-inverse-gamma model (2), we have the following theorem which calculates the posterior densities π μ n + 1 , θ n + 1 | x n + 1 , π θ n + 1 | x n + 1 , π μ n + 1 | x n + 1 , π μ n + 1 | θ n + 1 , x n + 1 , and the marginal density m x n + 1 | μ 0 , κ 0 , ν 0 , σ 0 2 . The proof of the theorem can be found in the Appendix A.
Theorem 1. 
For the hierarchical normal and normal-inverse-gamma model (2), the joint posterior density of μ n + 1 , θ n + 1 is
π μ n + 1 , θ n + 1 | x n + 1 N I G ˜ ( μ 1 , κ 1 , ν 1 , σ 1 2 ) ,
the marginal posterior density of θ n + 1 is
π θ n + 1 | x n + 1 I G ˜ α * , β * ,
the marginal posterior density of μ n + 1 is
π μ n + 1 | x n + 1 t ν 1 μ 1 , σ 1 2 κ 1 ,
the conditional posterior density of μ n + 1 is
π μ n + 1 | θ n + 1 , x n + 1 N ( μ 1 , θ n + 1 / κ 1 ) ,
where
μ 1 = x n + 1 + κ 0 μ 0 / κ 0 + 1 , κ 1 = κ 0 + 1 , ν 1 = ν 0 + 1 , ν 1 σ 1 2 = ν 0 σ 0 2 + κ 0 κ 0 + 1 x n + 1 μ 0 2 ,
α * = ν 1 2 = ν 0 + 1 2 ,
and
β * = ν 1 σ 1 2 2 = 1 2 ν 0 σ 0 2 + κ 0 κ 0 + 1 x n + 1 μ 0 2 .
Moreover, the marginal density of x n + 1 is given by
m x n + 1 | μ 0 , κ 0 , ν 0 , σ 0 2 t ν 0 μ 0 , σ 0 2 κ 0 + 1 κ 0 ,
with probability density function (pdf) given by
m x n + 1 | μ 0 , κ 0 , ν 0 , σ 0 2 = Γ ν 0 + 1 2 Γ ν 0 2 π ν 0 σ 0 2 κ 0 + 1 / κ 0 1 + 1 ν 0 x n + 1 μ 0 2 σ 0 2 κ 0 + 1 / κ 0 ν 0 + 1 2 ,
for < x n + 1 < , < μ 0 < , κ 0 > 0 , ν 0 > 0 , and σ 0 2 > 0 .
In the following, we will calculate the Bayes estimator of θ n + 1 under Stein’s loss function δ s π , θ n + 1 x n + 1 , the Bayes estimator of θ n + 1 under the usual squared error loss function δ 2 π , θ n + 1 x n + 1 , and the PESLs at δ s π , θ n + 1 x n + 1 and δ 2 π , θ n + 1 x n + 1 ( P E S L s π , θ n + 1 x n + 1 and P E S L 2 π , θ n + 1 x n + 1 ).
Similar to [28], the two Bayes estimators and the two PESLs are respectively given by
δ s π , θ n + 1 x n + 1 = 1 E 1 θ n + 1 | x n + 1 ,
δ 2 π , θ n + 1 x n + 1 = E θ n + 1 | x n + 1 ,
P E S L s π , θ n + 1 x n + 1 = log E 1 θ n + 1 | x n + 1 + E log θ n + 1 | x n + 1 ,
P E S L 2 π , θ n + 1 x n + 1 = E θ n + 1 | x n + 1 E 1 θ n + 1 | x n + 1 1 log E θ n + 1 | x n + 1 + E log θ n + 1 | x n + 1 .
To calculate the two Bayes estimators and the two PESLs, it remains to calculate
E 1 = E θ n + 1 | x n + 1 , E 2 = E 1 θ n + 1 | x n + 1 , and E 3 = E log θ n + 1 | x n + 1 .
Similar to [28], it can be shown that
δ s π , θ n + 1 x n + 1 δ 2 π , θ n + 1 x n + 1
by exploiting Jensen’s inequality. Moreover,
P E S L s π , θ n + 1 x n + 1 P E S L 2 π , θ n + 1 x n + 1 ,
which is a direct consequence of the general methodology for finding a Bayes estimator. According to construction, δ s π , θ n + 1 x n + 1 minimizes the Posterior Expected Stein’s Loss (PESL).
Now let us analytically calculate the Bayes estimators δ s π , θ n + 1 x n + 1 and δ 2 π , θ n + 1 x n + 1 , and the PESLs P E S L s π , θ n + 1 x n + 1 and P E S L 2 π , θ n + 1 x n + 1 under the hierarchical normal and normal-inverse-gamma model (2) from (7), (8), (9), and (10). The three expectations are calculated as
E θ n + 1 | x n + 1 = β * α * 1 , E 1 θ n + 1 | x n + 1 = α * β * , E log θ n + 1 | x n + 1 = log β * ψ α * ,
for α * > 1 , where α * and β * are given by (4) and (5). From (7), the Bayes estimator of θ n + 1 under Stein’s loss function is given by
δ s π , θ n + 1 x n + 1 = 1 E 1 θ n + 1 | x n + 1 = β * α * .
From (8), the Bayes estimator of θ n + 1 under the usual squared error loss function is given by
δ 2 π , θ n + 1 x n + 1 = E θ n + 1 | x n + 1 = β * α * 1 ,
for α * > 1 . It is easy to show that
δ s π , θ n + 1 x n + 1 = β * α * < β * α * 1 = δ 2 π , θ n + 1 x n + 1 ,
which exemplifies the theoretical study of (11). Furthermore, from (9) and (10), the PESL at δ s π , θ n + 1 x n + 1 and δ 2 π , θ n + 1 x n + 1 are respectively given by
P E S L s π , θ n + 1 x n + 1 = log α * ψ α *
and
P E S L 2 π , θ n + 1 x n + 1 = 1 α * 1 + log α * 1 ψ α * ,
where
ψ x = Γ x Γ x = d d x log Γ x = digamma x
is the digamma function. It can be shown that
P E S L s π , θ n + 1 x n + 1 = log α * ψ α * 1 α * 1 + log α * 1 ψ α * = P E S L 2 π , θ n + 1 x n + 1 ,
which exemplifies the theoretical study of (12). It is worth noting that the PESLs P E S L s π , θ n + 1 x n + 1 and P E S L 2 π , θ n + 1 x n + 1 depend only on α * , but not on β * . Therefore, the PESLs depend only on ν 0 , but not on μ 0 , κ 0 , σ 0 2 , and x n + 1 .
We will exemplify the two inequalities (15) and (16) in the simulations section and the real data section. Moreover, we will exemplify that the PESLs depend only on ν 0 , but not on μ 0 , κ 0 , σ 0 2 , and x n + 1 .

2.2. The Empirical Bayes Estimators of θ n + 1

The hyperparameters of model (2) are < μ 0 < , κ 0 > 0 , ν 0 > 0 , and σ 0 2 > 0 . However, we can not directly obtain the estimators of the four hyperparameters of model (2) by the moment method. Let
u 0 = σ 0 2 κ 0 + 1 κ 0 .
Since κ 0 and σ 0 appear together in u 0 , we can not directly obtain the estimators of κ 0 and σ 0 by the moment method. In other words, κ 0 and σ 0 are unidentifiable. In the empirical Bayesian statistical literature, common approaches to addressing the issue of unidentifiability of hyperparameters include the following two. One is to estimate the prior distribution through non-parametric or semi-parametric methods, avoiding strong assumptions about the functional form of the prior distribution, thereby circumventing the problem of unidentifiability of hyperparameters ([31,32]). The other is to use auxiliary data, model structure constraints, or specific assumptions (such as sparsity, spatial correlation) to provide additional information, making the unidentifiable hyperparameters identifiable ([33,34,35]). We adopt the second approach to make hyperparameters ( κ 0 and σ 0 2 ) identifiable. More specifically, when κ 0 is fixed to be a known constant (our recommendation is κ 0 = 1 , which will be made clear later in this subsection), then κ 0 and σ 0 2 are identifiable. Otherwise, κ 0 and σ 0 2 are unidentifiable.
However, we can obtain the estimator of u 0 by the moment method. In the following, we are interested in the hyperparameters < μ 0 < , ν 0 > 0 , and u 0 > 0 . Using hyperparameters μ 0 , ν 0 , and u 0 , the marginal density (6) changes to
m x n + 1 | μ 0 , ν 0 , u 0 = Γ ν 0 + 1 2 Γ ν 0 2 π ν 0 u 0 1 + 1 ν 0 x n + 1 μ 0 2 u 0 ν 0 + 1 2 t ν 0 μ 0 , u 0 .
The moment-based estimators for the hyperparameters μ 0 , ν 0 , and u 0 in model (2), denoted as μ 01 n , ν 01 n , and u 01 n , along with their consistency properties, are presented in the subsequent theorem. The proof of this theorem is provided in the Appendix A.
Theorem 2. 
The estimators of the hyperparameters μ 0 , ν 0 , and u 0 of model (2) by the moment method are
μ 01 n = A 1 , ν 01 n = 14 3 A 1 4 + 4 A 1 2 A 2 + 2 A 2 2 4 3 A 4 2 3 A 1 4 + A 2 2 1 3 A 4 , u 01 n = 1 3 A 2 A 1 2 5 A 1 4 6 A 1 2 A 2 + A 4 7 3 A 1 4 + 2 A 1 2 A 2 + A 2 2 2 3 A 4 ,
where
A k = 1 n i = 1 n X i k , k = 1 , 2 , 4 ,
is the sample kth moment of X. Moreover, the moment estimators are consistent estimators of the hyperparameters.
We remark that the moment estimators μ 01 n , ν 01 n , and u 01 n in Theorem 2 are the same as those in Theorem 2 in [19]. The reason for the same moment estimators is that for the two hierarchical normal and normal-inverse-gamma models (1) and (2), the marginal distributions are the same, and the population moments of X are the same. Moreover, in Theorem 2 of this paper, we have shown that the moment estimators are consistent estimators of the hyperparameters, and this result has not been derived in [19].
The MLE method is used to derive estimators for the hyperparameters μ 0 , ν 0 , and u 0 of the model (2), denoted as μ 02 n , ν 02 n , and u 02 n , respectively. The consistency properties of these estimators are presented in the subsequent theorem, with the detailed proof provided in the Appendix A.
Theorem 3. 
The estimators of the hyperparameters μ 0 , ν 0 , and u 0 of the model (2) by the MLE method μ 02 n , ν 02 n , and u 02 n are the solutions to the following equations:
f 1 μ 0 , ν 0 , u 0 = 1 n i = 1 n ν 0 + 1 x i μ 0 ν 0 u 0 + x i μ 0 2 = 0 ,
f 2 μ 0 , ν 0 , u 0 = 1 2 ψ ν 0 + 1 2 1 2 ψ ν 0 2 1 2 ν 0 1 2 1 n i = 1 n log 1 + x i μ 0 2 ν 0 u 0 ν 0 + 1 x i μ 0 2 ν 0 u 0 + x i μ 0 2 ν 0 = 0 ,
f 3 μ 0 , ν 0 , u 0 = 1 2 u 0 + 1 2 1 n i = 1 n ν 0 + 1 x i μ 0 2 ν 0 u 0 + x i μ 0 2 u 0 = 0 .
Moreover, the MLEs are consistent estimators of the hyperparameters.
The MLEs of the hyperparameters μ 0 , ν 0 , and u 0 cannot be derived analytically by solving equations (19), (20), and (21). As a result, numerical methods must be employed. Newton’s method can be applied to solve these equations and thereby obtain the MLEs of the hyperparameters. It should be emphasized that the accuracy of the resulting MLEs is highly sensitive to the choice of initial values, and moment-based estimators have generally been found to serve as effective starting points.
Finally, the empirical Bayes estimates of the model (2)’s variance parameter under Stein’s loss function, derived using the method of moments and the MLE, are presented in the subsequent theorem.
Theorem 4. 
The empirical Bayes estimator of the variance parameter of the model (2) under Stein’s loss function by the moment method is given by (13) with the hyperparameters estimated by μ 01 n , ν 01 n , u 01 n in Theorem 2. Alternatively, the empirical Bayes estimator of the variance parameter of the model (2) under Stein’s loss function by the MLE method is given by (13) with the hyperparameters estimated by μ 02 n , ν 02 n , u 02 n numerically determined in Theorem 3.
Now let us discuss the selection of κ 0 . We recommend choosing κ 0 = 1 , and the reason is given as follows. From (13), (4), and (5), we have
δ s π , θ n + 1 x n + 1 = β * α * = 1 2 ν 0 σ 0 2 + κ 0 κ 0 + 1 x n + 1 μ 0 2 ν 0 + 1 2 = 1 ν 0 + 1 ν 0 σ 0 2 + κ 0 κ 0 + 1 x n + 1 μ 0 2 = κ 0 κ 0 + 1 1 ν 0 + 1 ν 0 σ 0 2 κ 0 + 1 κ 0 + x n + 1 μ 0 2 = κ 0 κ 0 + 1 1 ν 0 + 1 ν 0 u 0 + x n + 1 μ 0 2 .
It is easy to show that the factor
κ 0 κ 0 + 1 0 , 1 ,
for κ 0 > 0 . Because we have little information about κ 0 , we will choose
κ 0 κ 0 + 1 = 1 2 ,
which is in the middle of the above range. Hence,
κ 0 = 1 .
Therefore, (22) reduces to
δ s π , θ n + 1 x n + 1 = 1 2 1 ν 0 + 1 ν 0 u 0 + x n + 1 μ 0 2 ,
which can then be estimated once the hyperparameters μ 0 , ν 0 , u 0 are estimated. From (17) and (23), we have
σ 0 2 = κ 0 κ 0 + 1 u 0 = 1 2 u 0 ,
which can then be estimated once the hyperparameter u 0 is estimated.
Another reason to choose κ 0 = 1 is given below. Since < μ n + 1 < , the squared error loss function is appropriate. The Bayes estimator of μ n + 1 under the squared error loss function is given by
δ 2 π , μ n + 1 x n + 1 = E μ n + 1 | x n + 1 = μ 1 = x n + 1 + κ 0 μ 0 κ 0 + 1 .
In (24), κ 0 represents a strength of belief on μ 0 . If one harbors no belief on μ 0 , then κ 0 = 0 , and thus δ 2 π , μ n + 1 x n + 1 = x n + 1 , which depends only on the datum x n + 1 . In contrast, if one harbors complete belief on μ 0 , then κ 0 = , and thus δ 2 π , μ n + 1 x n + 1 = μ 0 , which depends only on μ 0 . However, if one believes that x n + 1 and μ 0 are equally important, then κ 0 = 1 is a reasonable choice, and thus δ 2 π , μ n + 1 x n + 1 = x n + 1 + μ 0 / 2 , which is a balanced combination of x n + 1 and μ 0 .
We remark that κ 0 and σ 0 2 affect β * , δ s π , θ n + 1 x n + 1 , δ 2 π , θ n + 1 x n + 1 , and δ 2 π , μ n + 1 x n + 1 , and they do not affect α * , P E S L s π , θ n + 1 x n + 1 , and P E S L 2 π , θ n + 1 x n + 1 .

3. Simulations

This section presents numerical simulations for the hierarchical normal and normal-inverse-gamma model (2). We examine five key aspects: the consistency of moment estimators and MLEs, the influence of κ 0 on quantities of interest as functions of x n + 1 , the validity of two inequalities concerning Bayes estimators and PESLs, the model’s goodness-of-fit to the generated data, and the marginal density distributions of the model parameters.
The simulated data are generated according to the hierarchical normal and normal-inverse-gamma model (2) with the hyperparameters specified by μ 0 = 3 , κ 0 = 1 , ν 0 = 6 , and σ 0 2 = 1 . The reason why we choose these values is that < μ 0 < , κ 0 > 0 , ν 0 > 0 , and σ 0 2 > 0 . Moreover, ν 0 > 4 is required in moment estimations of the hyperparameters. Additional numerical values for the hyperparameters can likewise be defined.

3.1. Consistencies of the Moment Estimators and the MLEs

Currently, both the moment estimators μ 01 ( n ) , ν 01 ( n ) , u 01 ( n ) and the MLEs μ 02 ( n ) , ν 02 ( n ) , u 02 ( n ) serve as consistent estimates for the hyperparameters μ 0 , ν 0 , u 0 of the hierarchical normal and normal-inverse-gamma model (2). The purpose of this subsection is to emphasize that, as established in Theorems 2 and 3, we have rigorously proven the consistency of both the moment-based and MLE-based estimators with respect to the true hyperparameters. It should be noted that all analyses here are based solely on the observed data x ( n ) = ( x 1 , x 2 , , x n ) .
First, we will numerically exemplify that the sample x 1 n = x 11 , x 12 , , x 1 n generated from the model (1) can not be used to estimate the hyperparameters μ 0 , ν 0 , u 0 , while the sample x 2 n = x 21 , x 22 , , x 2 n generated from the model (2) can be used to estimate the hyperparameters μ 0 , ν 0 , u 0 , where u 0 = σ 0 2 κ 0 + 1 / κ 0 . Moreover, we will exemplify that the moment estimators and the MLEs of u 0 can correctly estimate the true hyperparameter u 0 regardless of the κ 0 and σ 0 2 values.
The histograms of the samples and their density curves are plotted in Figure 1. From the figure, we observe the following facts.
  • Plot (a): the sample x 1 n generated from (1) are iid from N μ , θ with μ = 4.817 and θ = 1.866 .
  • Plots (b) and (c): the sample x 2 n are generated from (2) with μ 0 , κ 0 , ν 0 , σ 0 2 = 3 , 1 , 6 , 1 and u 0 = 2 . The sample x 3 n are generated from (2) with μ 0 , κ 0 , ν 0 , σ 0 2 = 3 , 2 , 6 , 4 / 3 and u 0 = 2 . Although κ 0 , σ 0 2 = 1 , 1 and 2 , 4 / 3 are different in the two plots, u 0 = 2 is the same. Therefore, the two samples x 2 n and x 3 n are from the same marginal distribution t ν 0 μ 0 , u 0 with μ 0 , ν 0 , u 0 = 3 , 6 , 2 .
The moment estimators and the MLEs of the hyperparameters μ 0 , ν 0 , u 0 = 3 , 6 , 2 for the samples x 1 n , x 2 n , and x 3 n are summarized in Table 1. From the table, we observe the following facts.
  • The moment estimators of the hyperparameters μ 0 , ν 0 , u 0 for sample x 1 n are far away from the true hyperparameters 3 , 6 , 2 , and thus the samples generated from the model (1) can not be used to estimate the hyperparameters μ 0 , ν 0 , u 0 .
  • For x 1 n , since the moment estimator of ν 0 is 159.725 which is negative, the MLE method fails to iterate, and thus the MLEs of the hyperparameters μ 0 , ν 0 , u 0 are equal to the moment estimators.
  • For x 2 n and x 3 n , the moment estimators and the MLEs of the hyperparameters μ 0 , ν 0 , u 0 are close to the true hyperparameters 3 , 6 , 2 , and thus the samples generated from the model (2) can be used to estimate the hyperparameters μ 0 , ν 0 , u 0 .
  • The sample x 2 n is generated from the model (2) with κ 0 , σ 0 2 = 1 , 1 , while the sample x 3 n is generated from the model (2) with κ 0 , σ 0 2 = 2 , 4 / 3 . Although κ 0 , σ 0 2 = 1 , 1 and 2 , 4 / 3 are different for the two samples x 2 n and x 3 n , u 0 = 2 is the same. We find that both the moment method and the MLE method correctly estimate the true hyperparameter u 0 = 2 for the two samples x 2 n and x 3 n .
  • For x 2 n and x 3 n , the MLEs are closer to the true hyperparameters μ 0 , ν 0 , u 0 = 3 , 6 , 2 than the moment estimators for this simulation.
Following the approach in [21], let ξ represent one of the hyperparameters μ 0 , ν 0 , or u 0 . Consistency is defined as the property that
ξ i ( n ) p ξ as n ,
for i = 1 , 2 , where i = 1 corresponds to the moment estimator and i = 2 to the MLE, with p denoting convergence in probability and n being the sample size. Equivalently, consistency can be expressed as
lim n P ξ i n ξ ε = 0 ,
for any ε > 0 and for all ξ Ξ . These probabilities are estimated using empirical frequencies from simulations:
P ξ i n ξ ε 1 M m = 1 M I ξ i m n ξ ε F ξ i ε , M n ,
where I ( A ) is the indicator function taking value 1 if event A occurs and 0 otherwise, and M denotes the number of simulation replications. Hence, a decreasing trend in the simulated frequencies F ξ i ε , M ( n ) toward zero as n increases provides empirical evidence of estimator consistency, for both estimation methods ( i = 1 , 2 ) and across all hyperparameters ξ { μ 0 , ν 0 , u 0 } .
We now demonstrate that the moment estimators μ 01 ( n ) , ν 01 ( n ) , u 01 ( n ) and the MLEs μ 02 ( n ) , ν 02 ( n ) , u 02 ( n ) are consistent estimators of the hyperparameters μ 0 , ν 0 , u 0 . Table 2 presents the frequency distributions of these estimators for both moment-based and MLE for varying sample sizes n, under settings where ε = 1 , 0.5 , and 0.1 . The simulated data are generated from the hierarchical normal and normal-inverse-gamma model (2), with hyperparameter values set to μ 0 , κ 0 , ν 0 , σ 0 2 = 3 , 1 , 6 , 1 and u 0 = 2 . While other configurations of these hyperparameters could also be considered, the current choice serves to illustrate the behavior effectively. Analysis of the table reveals several key observations.
  • With ε = 1 , 0.5 , or 0.1 , the frequencies of the estimators approach zero as n tends to infinity, indicating that both the moment estimators and the MLEs are consistent for estimating the hyperparameters. In the case of ε = 0.1 , the frequencies of ν 01 ( n ) and ν 02 ( n ) remain relatively high (at least 0.2 across all scenarios). Nevertheless, a decreasing trend toward zero is evident as n grows large, suggesting eventual convergence.
  • By comparing the frequencies corresponding to ε = 1 , 0.5 , and 0.1 , we find that as ε decreases, the frequencies generally increase. This occurs because the constraints
    ξ i m n ξ ε , for ξ = μ 0 , ν 0 , and u 0 , i = 1 , 2 ,
    become easier to satisfy when ε is smaller.
  • When comparing the moment estimators with the MLEs of the hyperparameters μ 0 , ν 0 , and u 0 , it is observed that for large sample sizes n, the MLEs exhibit smaller values than the moment estimators. This indicates that, in terms of consistency, the MLEs perform more reliably in estimating the hyperparameters.

3.2. The Effect of κ 0 for Quantities of Interest as Functions of x n + 1

In this subsection, we will investigate the effect of κ 0 for quantities of interest ( β * , δ s π , θ n + 1 x n + 1 , δ 2 π , θ n + 1 x n + 1 , δ 2 π , μ n + 1 x n + 1 , α * , P E S L s π , θ n + 1 x n + 1 , and P E S L 2 π , θ n + 1 x n + 1 ) as functions of x n + 1 . The motivation of this subsection is that κ 0 is unidentifiable, and we want to know whether using κ 0 = 1 (our recommendation) for the computation will overestimate or underestimate quantities of interest as functions of x n + 1 when the true κ 0 > 0 (e.g., 2 or 0.5 ) is different from κ 0 = 1 .
Quantities of interest as functions of x n + 1 for κ 0 = 2 , 1 , 0.5 in the simulations are plotted in Figure 2. In all the plots, u 0 = σ 0 2 κ 0 + 1 / κ 0 = 2 is fixed. From the figure, we observe the following facts.
  • From the left column of the figure, we see that β * , δ s π , θ n + 1 x n + 1 , δ 2 π , θ n + 1 x n + 1 , and δ 2 π , μ n + 1 x n + 1 are affected by κ 0 . Moreover, β * , δ s π , θ n + 1 x n + 1 , and δ 2 π , θ n + 1 x n + 1 are quadratic functions of x n + 1 (note that their y ranges are different), while δ 2 π , μ n + 1 x n + 1 is a linear function of x n + 1 .
  • For plot (a), we observe that β * when using κ 0 = 1 will underestimate β * when using κ 0 = 2 , and it will overestimate β * when using κ 0 = 0.5 . In other words, β * is an increasing function of κ 0 for a fixed x n + 1 . Similar phenomena are observed for δ s π , θ n + 1 x n + 1 (plot (c)) and δ 2 π , θ n + 1 x n + 1 (plot (e)).
  • For δ 2 π , μ n + 1 x n + 1 (plot (g)), we observe an interesting phenomenon that when x n + 1 < μ 0 ( = 3 in the simulation), δ 2 π , μ n + 1 x n + 1 when using κ 0 = 1 will underestimate that when using κ 0 = 2 , and it will overestimate that when using κ 0 = 0.5 . In other words, δ 2 π , μ n + 1 x n + 1 is an increasing function of κ 0 when x n + 1 < μ 0 . However, when x n + 1 > μ 0 , we observe a reversed phenomenon that δ 2 π , μ n + 1 x n + 1 when using κ 0 = 1 will overestimate that when using κ 0 = 2 , and it will underestimate that when using κ 0 = 0.5 . In other words, δ 2 π , μ n + 1 x n + 1 is a decreasing function of κ 0 when x n + 1 > μ 0 . The phenomena can be explained by the theoretical analysis of
    δ 2 π , μ n + 1 x n + 1 κ 0 = μ 0 x n + 1 κ 0 + 1 2 > 0 , if x n + 1 < μ 0 , = 0 , if x n + 1 = μ 0 , < 0 , if x n + 1 > μ 0 .
  • From the right column of the figure, we see that α * , P E S L s π , θ n + 1 x n + 1 , and P E S L 2 π , θ n + 1 x n + 1 are not affected by κ 0 . Moreover, they also do not depend on x n + 1 .

3.3. Two Inequalities of the Bayes Estimators and the PESLs

This subsection aims to provide numerical illustrations of the two inequalities related to the Bayes estimators and the PESLs—specifically, (15) and (16)—under the oracle scenario where the hyperparameters μ 0 , κ 0 , ν 0 , σ 0 2 are known. The rationale for this section lies in the theoretical foundation established by these two inequalities, which we seek to validate through numerical examples.
First, we fix μ 0 = 3 , κ 0 = 1 , ν 0 = 6 , and σ 0 2 = 1 . Then we set a seed number 1 in R software and draw θ n + 1 = 1.866135 from I G ˜ ν 0 / 2 , ν 0 σ 0 2 / 2 . Next, we draw μ n + 1 = 4.816593 from N ( μ 0 , θ n + 1 / κ 0 ) . After that, we draw x n + 1 = 6.554815 from N μ n + 1 , θ n + 1 . Figure 3 shows the histogram of θ n + 1 | x n + 1 and the density estimation curve of π ( θ n + 1 | x n + 1 ) . It is π ( θ n + 1 | x n + 1 ) that we find δ s π , θ n + 1 x n + 1 to minimize the PESL. Numerical results show that
δ s π , θ n + 1 x n + 1 = 1.759765 < 2.463671 = δ 2 π , θ n + 1 x n + 1
and
P E S L s π , θ n + 1 x n + 1 = 0.1496063 < 0.2131341 = P E S L 2 π , θ n + 1 x n + 1 ,
which exemplify the theoretical studies of (15) and (16).
We now consider the scenario where one of the five parameters— μ 0 , κ 0 , ν 0 , σ 0 2 , and x n + 1 —is allowed to vary while the others remain constant. Specifically, our focus lies on conducting a sensitivity analysis of the Bayes estimators and the PESLs with respect to these five parameters.
Figure 4 illustrates the Bayes estimators and the PESLs as varying with ν 0 . As shown in the left panel, the Bayes estimators exhibit dependence on ν 0 , demonstrating the validity of inequality (15). Specifically, δ s π , θ n + 1 x n + 1 increases as ν 0 increases, whereas δ 2 π , θ n + 1 x n + 1 decreases with rising ν 0 . The right panel reveals that the PESLs are also influenced by ν 0 , supporting the inequality (16), and both PESLs decrease monotonically as ν 0 grows. Additionally, numerical values corresponding to these quantities are provided in Table 3, which aligns with the graphical representation in Figure 4. Overall, both the figure and the table confirm the two theoretical inequalities: (15) and (16).
Figure 5 shows the Bayes estimators and the PESLs as functions of μ 0 , κ 0 , σ 0 2 , and x n + 1 . We see from the left plots of the figure that the Bayes estimators depend on μ 0 , κ 0 , σ 0 2 , and x n + 1 , and (15) is exemplified. Moreover, the Bayes estimators are first decreasing and then increasing functions of μ 0 and x n + 1 , and they are increasing functions of κ 0 and σ 0 2 . The right plots of the figure exhibit that the PESLs do not depend on μ 0 , κ 0 , σ 0 2 , and x n + 1 , and (16) is exemplified. Furthermore, Table 4, Table 5, Table 6 and Table 7 display the numerical values of the Bayes estimators and the PESLs in Figure 5. In summary, the results of Figure 5 and Table 4, Table 5, Table 6 and Table 7 exemplify the two inequalities (15) and (16).
In brief, the results of Figure 4 and Figure 5 exemplify that the PESLs depend only on ν 0 , but not on μ 0 , κ 0 , σ 0 2 , and x n + 1 .
Since the estimators δ 2 π , θ n + 1 x n + 1 and δ s π , θ n + 1 x n + 1 , along with their corresponding PESLs— P E S L 2 π , θ n + 1 x n + 1 and P E S L s π , θ n + 1 x n + 1 —are functions of the parameters α * and β * , where α * > 1 and β * > 0 , we visualize their behavior across the domain D = ( 1 , 10 ] × ( 0 , 10 ] using the persp3d() function from the R package rgl (refer to [20,21,36,37]). A key advantage of persp3d() over the standard persp() function in the graphics package is its ability to overlay multiple surfaces within the same plot and enable interactive rotation of 3D visualizations, thereby offering greater flexibility in graphical analysis. The resulting surface plots for the estimators, the PESLs, and their pairwise differences are displayed in Figure 6. As shown in the two leftmost panels, δ 2 π , θ n + 1 x n + 1 consistently exceeds δ s π , θ n + 1 x n + 1 across all values of α * , β * D , providing empirical support for the inequality stated in (15). Similarly, the rightmost panels indicate that P E S L 2 π , θ n + 1 x n + 1 remains greater than P E S L s π , θ n + 1 x n + 1 throughout D, which visually confirms the theoretical result in (16). Overall, the numerical illustrations in the figure corroborate the analytical findings presented in (15) and (16).

3.4. Goodness-of-Fit of the Model: KS Test

This section focuses on evaluating the goodness-of-fit of the hierarchical normal and normal-inverse-gamma model (2) to the generated synthetic data. The primary objective is to assess how well the marginal distribution of the N-NIG model captures the characteristics of the simulated dataset. It should be noted that the analysis here is based solely on the observed data points x n = x 1 , x 2 , , x n .
The Kolmogorov-Smirnov (KS) statistic measures the maximum discrepancy, denoted as D, between the empirical cumulative distribution function (cdf), F n x , and the theoretical population cdf, F 0 x . Mathematically, this is expressed as
D = sup < x < F n x F 0 x .
In R, the KS test can be carried out using the built-in function ks.test() (refer to [38,39,40]). While the standard KS test is primarily designed for one-dimensional continuous distributions, various adaptations known as KS-type tests have also been proposed in the literature for handling discrete data ([41,42,43]).
The R function ks.test() performs the KS test, and its return value is a list containing the components statistic (the value of the test statistic, or the D value) and p.value (the p-value of the test). In the literature, it is well known that a smaller D value or a larger p-value indicates a better fit of the model to the simulated data. Based on and inspired by the D value and p-value, [21] propose five indices ( D ¯ , p value ¯ , min D % , max p value % , Accept H 0 % ) to compare the three methods, that is, the oracle method, the moment method, and the MLE method in simulations. D ¯ is the average D values (25) of M simulations (the smaller the better). p value ¯ is the average p-values of M simulations (the larger the better). min D % is the percentage of M simulations which attain the minimum D value in the three methods (the larger the better). The three min D % values should sum to 1. max p value % is the percentage of M simulations which attain the maximum p-value in the three methods (the larger the better). The three max p value % values should sum to 1. Accept H 0 % is the percentage of accepting H 0 (defined as p-value > 0.05 ) in M simulations for each method (the larger the better). Each Accept H 0 % value should between 0 % 100 % .
In our problem, the null hypothesis specifies that
H 0 : X N N I G ˜ ( μ 0 , ν 0 , u 0 ) ,
where N N I G ˜ ( μ 0 , ν 0 , u 0 ) is the marginal distribution of the hierarchical normal and normal-inverse-gamma model (2). The marginal density of the N N I G ˜ ( μ 0 , ν 0 , u 0 ) distribution is given by (18) which is obviously one-dimensional and continuous. Therefore, the KS test can serve as an indicator of how well the model fits the data.
It should be noted that the data presented in this section are generated based on the hierarchical normal and normal-inverse-gamma model (2), where the hyperparameters are set to μ 0 , κ 0 , ν 0 , σ 0 2 = 3 , 1 , 6 , 1 and u 0 = 2 . Alternative values for these hyperparameters may also be chosen depending on the context.
Table 8 presents the goodness-of-fit results from the KS test for model (2) applied to simulated data. The simulations are based on a sample size of n = 1 , 000 and repeated M = 100 times. Hyperparameter estimation for μ 0 , ν 0 , and u 0 is conducted using three distinct approaches. The first is the oracle method, which assumes prior knowledge of the true values: μ 0 = 3 , ν 0 = 6 , and u 0 = 2 . The second approach employs the method of moments, utilizing moment estimators derived in Theorem 2. The third approach relies on MLE, where the hyperparameters are numerically computed using the MLEs provided in Theorem 3.
As shown in Table 8, the following observations can be made.
  • The D ¯ values corresponding to the three approaches are 0.0268 , 0.0239 , and 0.0204 , respectively. This indicates that the MLE method performs best, followed by the moment method, while the oracle method yields the least satisfactory results. A plausible explanation for this outcome is that, as shown in (25), the empirical cdf F n x relies on observed data, and similarly, the theoretical cdf F 0 x used in both the moment and MLE methods are derived from data. In contrast, the F 0 x associated with the oracle method does not depend on actual data.
  • The p value ¯ corresponding to the three methods are 0.5230 , 0.6245 , and 0.7674 , respectively, indicating that the MLE method performs best, followed by the moment method, with the oracle method ranking last. A plausible explanation for this outcome is provided in the preceding paragraph.
  • The min D % values for the three approaches are 0.20 , 0.16 , and 0.64 , respectively. For the MLE method, the min D % value exceeds half of the M simulations. In terms of performance ranking, the methods are ordered as follows: MLE method, oracle method, and moment method.
  • The three methods yield max p value % values of 0.20 , 0.16 , and 0.64 , respectively. Since a lower D value is associated with a higher p-value, the method producing the minimum D value will also produce the maximum p-value. As a result, the min D % and max p value % values are identical across the three approaches. Based on performance, the methods can be ranked in descending order of preference as follows: MLE method, oracle method, and moment method.
  • The Accept H 0 % values for the three methods are 98 % , 99 % , and 99 % , respectively, indicating that these values are close to 100 % . This suggests that all three methods exhibit strong performance in terms of model fit.
  • To summarize, the MLE method consistently achieves the top ranking across all five evaluation metrics— D ¯ , p value ¯ , min D % , max p value % , and Accept H 0 % . When comparing the moment method with the MLE method, the results indicate that the MLE method outperforms the moment method on each of these five criteria.
Figure 7 presents the boxplots of the D values and p-values obtained from the three methods. The following observations can be made based on this figure.
  • The D values obtained by the oracle method are higher than those of the other two approaches. Since lower D values indicate better performance, the preferred ranking of the three methods is as follows: MLE method, moment method, and then the oracle method.
  • The oracle method yields the lowest p-values compared to the other two approaches. Since higher p-values are preferable, the methods can be ranked in descending order of preference as follows: MLE method, moment method, and finally the oracle method.
  • Large D values are associated with small p-values, whereas small D values are linked to large p-values.
  • Based on the D values and p-values, the MLE method outperforms the method of moments.

3.5. Marginal Densities for Various Hyperparameters

This subsection presents the marginal density plots of the hierarchical normal and normal-inverse-gamma model (2) under different settings of the hyperparameters μ 0 , κ 0 , ν 0 , and σ 0 2 . The primary objective is to investigate the types of data distributions that can be captured by this hierarchical model. Recall that the marginal density of X is defined by equation (6), which depends on four hyperparameters: μ 0 (ranging over the real line), κ 0 > 0 , ν 0 > 0 , and σ 0 2 > 0 . Our analysis focuses on how variations in these hyperparameters affect the shape of the marginal density, using the baseline configuration μ 0 = 3 , κ 0 = 1 , ν 0 = 6 , and σ 0 2 = 1 as a reference point. Alternative values for the hyperparameters may also be considered to further explore the model’s flexibility.
Figure 8 displays the marginal density functions under different values of μ 0 = 3 , 0 , 3 , 6 , 9 , with κ 0 = 1 , ν 0 = 6 , and σ 0 2 = 1 held constant. It can be observed that as μ 0 increases, the entire distribution shifts rightward without altering its shape, indicating that μ 0 acts as a location parameter. Furthermore, each marginal density is symmetric around its corresponding mean value μ 0 .
Figure 9 illustrates the marginal density functions under different values of κ 0 = 0.25 , 0.5 , 1 , 2 , 4 , with μ 0 = 3 , ν 0 = 6 , and σ 0 2 = 1 held constant. It can be observed that as κ 0 grows larger, the peak of the marginal density becomes more pronounced, indicating a reduction in its spread. This is consistent with the theoretical expression for the variance of X, given by
Var X = ν 0 ν 0 2 σ 0 2 κ 0 + 1 κ 0 ,
which decreases as κ 0 increases. Additionally, each marginal density remains symmetric around the common mean μ 0 .
Figure 10 displays the marginal density functions under different values of ν 0 = 2 , 4 , 6 , 12 , 24 , with μ 0 = 3 , κ 0 = 1 , and σ 0 2 = 1 held constant. It can be observed that as ν 0 grows larger, the height of the density peak rises, indicating a reduction in the spread of the distribution. This is consistent with (26), which shows that the variance of the marginal density decreases as ν 0 increases. Additionally, each marginal density exhibits symmetry around the common mean μ 0 .
Figure 11 displays the marginal density functions under different values of σ 0 2 = 0.25 , 0.5 , 1 , 2 , 4 , with μ 0 = 3 , κ 0 = 1 , and ν 0 = 6 held constant. As observed in the plot, an increase in σ 0 2 leads to a reduction in the height of the density peak, indicating a broader distribution. This reflects a larger variance in the marginal density, which aligns with (26) showing that variance increases with σ 0 2 . Additionally, each marginal density remains symmetric around the mean value μ 0 .

4. A Real Data Example

In this section, we exploit the poverty level data. The data represent percentages of all persons below the poverty level. The sample is from a random collection of 80 = n + 1 cities in the Western U.S. Source: County and City Data Book 12th edition, U.S. Department of Commerce. The data can be downloaded from
under section title “Poverty Level". Note that there is an “_" between understandable and statistics.
The histogram of the sample x n + 1 and its density estimation curveis depicted in Figure 12. From the figure, we see that the data are roughly symmetric about 0.15 .
The estimators of the hyperparameters μ 0 , ν 0 , and u 0 , the goodness-of-fit of the model, the empirical Bayes estimators of the variance parameter of the normal distribution with a conjugate normal-inverse-gamma prior and the PESLs, and the mean and variance of the real data by the moment method and the MLE method are summarized in Table 9. From the table, we observe the following facts.
  • The moment estimator of the hyperparameter μ 0 is equal to the sample mean x ¯ = 0.1669620 of the first n observations. It is worth noting that the MLE of the hyperparameter μ 0 is equal to 0.1619865 , which is very close to the moment estimator of the hyperparameter μ 0 . Furthermore, the moment estimator and the MLE of the hyperparameter u 0 are also very similar, and they are close to 0.006 . However, the moment estimator and the MLE of the hyperparameter ν 0 are quite different. This does not mean that the moment estimator and the MLE are not consistent estimators of the hyperparameter ν 0 , nor mean that the hierarchical normal and normal-inverse-gamma model (2) does not fit the real data. For the big difference between the two estimators, the reason is that the sample size n = 79 is small. Certainly, the MLE of the hyperparameter ν 0 is more reliable, as guaranteed from the previous figures and tables in the simulations section.
  • The KS test is used as a measure of the goodness-of-fit. The p-value of the moment method is 0.5313 > 0.05 , which implies that the t ν 0 μ 0 , u 0 distribution with μ 0 , ν 0 , and u 0 estimated by their moment estimators fits the sample x n = x 1 , x 2 , , x n well. Moreover, the p-value of the MLE method is 0.8233 > 0.05 , which implies that the t ν 0 μ 0 , u 0 distribution with μ 0 , ν 0 , and u 0 estimated by their MLEs fits the sample x n even better. Comparing the moment method and the MLE method, we observe that the D value of the MLEs is smaller, and the p-value of the MLEs is larger, which means that the t ν 0 μ 0 , u 0 distribution with the hyperparameters μ 0 , ν 0 , and u 0 estimated by the MLEs has a better fit to the sample x n than that estimated by the moment estimators.
  • When the hyperparameters are estimated by the MLE method, we see that
    δ s π , θ n + 1 x n + 1 = 0.0032576 < 0.0038348 = δ 2 π , θ n + 1 x n + 1
    and
    P E S L s π , θ n + 1 x n + 1 = 0.0771456 < 0.0912061 = P E S L 2 π , θ n + 1 x n + 1 .
    We observe a similar phenomenon for the Bayes estimators and the PESLs when the hyperparameters are estimated by the moment method. Therefore, the two inequalities (15) and (16) are exemplified.
  • The mean of X (the poverty level data) is estimated by E X μ ^ 0 . The variance of X is estimated by Var X ν ^ 0 ν ^ 0 2 u ^ 0 . It is useful to note that the mean and variance of X by the two methods are very similar, although the estimators of the hyperparameters are very different. Moreover, it is worth mentioning that
    E X 0.1619865 = 16.19865 % , Var X 0.0080536 = 80.536 % 2
    for the MLE method. For the moment method, the mean and variance of X are similar. Hence, the variance of X is very small, not large!

5. Conclusions and Discussions

For the hierarchical normal and normal-inverse-gamma (2) framework, we begin by deriving the posterior distributions π μ n + 1 , θ n + 1 | x n + 1 , π θ n + 1 | x n + 1 , π μ n + 1 | x n + 1 , and π μ n + 1 | θ n + 1 , x n + 1 , along with the marginal distribution m x n + 1 | μ 0 , κ 0 , ν 0 , σ 0 2 , as presented in Theorem 1. Subsequently, we obtain the Bayes estimators δ s π , θ n + 1 x n + 1 and δ 2 π , θ n + 1 x n + 1 , as well as the corresponding PESLs, denoted by P E S L s π , θ n + 1 x n + 1 and P E S L 2 π , θ n + 1 x n + 1 . These quantities fulfill two key inequalities, given in (15) and (16). Additionally, Theorem 2 provides the moment-based estimates for the model’s hyperparameters—specifically μ 01 n , ν 01 n , and u 01 n . In parallel, Theorem 3 summarizes the MLEs of the same hyperparameters, namely μ 02 n , ν 02 n , and u 02 n . Lastly, Theorem 4 compiles the empirical Bayes estimators for the model’s variance component under Stein’s loss function, derived using both estimation approaches.
In the simulation study, we conduct numerical experiments on the hierarchical normal and normal-inverse-gamma model (2) from five perspectives: the consistency of moment estimators and MLEs, the influence of κ 0 on quantities of interest as functions of x n + 1 , the validation of two inequalities related to Bayes estimators and PESLs, the model’s goodness-of-fit to the generated data, and the examination of the model’s marginal density distributions.
In the real data analysis, we utilize poverty level data reflecting the proportion of individuals living below the poverty threshold. Table 9 presents a summary of several key results: estimates of the hyperparameters μ 0 , ν 0 , and u 0 ; model goodness-of-fit measures; empirical Bayes estimates for the variance parameter under a normal distribution with a conjugate normal-inverse-gamma prior; PESL values; as well as the mean and variance estimates of the poverty data computed via both the method of moments and MLE. The MLE-based estimates demonstrate a superior fit to the observed sample x n compared to moment-based estimates, as evidenced by a smaller D statistic and a higher corresponding p-value, indicating that the t ν 0 μ 0 , u 0 distribution using MLE-derived hyperparameters better captures the characteristics of the data.
In empirical Bayes approaches, hyperparameters are treated as unknown quantities, and their estimation relies on the marginal distribution derived from observed data. Two widely used techniques for this purpose are the method of moments and MLE. This study applies both methods to estimate the hyperparameters within a hierarchical normal and normal-inverse-gamma model (2).
The simulation results presented in the preceding figures and tables demonstrate that MLEs outperform moment-based estimators in estimating the hyperparameters μ 0 , ν 0 , and u 0 , particularly with respect to consistency and model fit. Nevertheless, these advantages come at a cost. Unlike moment estimators, MLEs are prone to numerical instability, impose higher computational demands, necessitate the constraints ν 0 > 0 and u 0 > 0 throughout all iterations, and lack closed-form solutions. Importantly, the performance of MLEs is highly sensitive to the choice of initial values, and moment estimators have been empirically shown to serve as reliable starting points. Furthermore, in scenarios where MLEs fail to converge or do not exist, moment estimators provide a justifiable and robust alternative.
Finally, we outline potential directions for future research. One possible extension involves adapting the hierarchical normal and normal-inverse-gamma model (2) to accommodate various non-conjugate prior distributions for the parameters of the normal distribution (refer to [1,44,45] and related works). In these cases, closed-form solutions may not be feasible, necessitating the development of numerical methods to compute the corresponding estimators.

Funding

This study was funded by the National Natural Science Foundation of China (Grant No. 12461051).

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Proofs of Theorems

In the appendix, we will provide some proofs of the theorems.

Appendix A.1. The Proof of Theorem 1

Let η 0 = μ 0 , κ 0 , ν 0 , σ 0 2 be the hyperparameters. The marginal distribution of x n + 1 in model (2) is
m x n + 1 | η 0 = f x n + 1 | μ n + 1 , θ n + 1 π μ n + 1 , θ n + 1 | η 0 π μ n + 1 , θ n + 1 | x n + 1 , η 0 .
To lighten notations, the η 0 will be dropped in the densities. Some of the following derivations are quoted from Example 1.5.1 (p.20) of [25] and [19].
For the random variables, parameters, and hyperparameters, their domains are respectively given by
< x n + 1 , μ n + 1 , μ 0 <
and
θ n + 1 , κ 0 , ν 0 , σ 0 2 > 0 .
By the Bayes Theorem, the joint posterior distribution of μ n + 1 and θ n + 1 is
π μ n + 1 , θ n + 1 | x n + 1 f x n + 1 | μ n + 1 , θ n + 1 π μ n + 1 , θ n + 1 .
The joint conjugate prior distribution of μ n + 1 and θ n + 1 is decomposed as
π μ n + 1 , θ n + 1 = π μ n + 1 | θ n + 1 π θ n + 1 ,
which is a normal-inverse-gamma distribution. Hence,
π μ n + 1 , θ n + 1 | x n + 1 f x n + 1 | μ n + 1 , θ n + 1 π μ n + 1 | θ n + 1 π θ n + 1 .
It is easy to see that
f x n + 1 | μ n + 1 , θ n + 1 = 1 2 π θ n + 1 exp 1 2 θ n + 1 x n + 1 μ n + 1 2 θ n + 1 1 2 exp 1 2 θ n + 1 x n + 1 μ n + 1 2 ,
π μ n + 1 | θ n + 1 = 1 2 π θ n + 1 / κ 0 exp κ 0 2 θ n + 1 μ n + 1 μ 0 2 θ n + 1 1 2 exp κ 0 2 θ n + 1 μ n + 1 μ 0 2 ,
and
π θ n + 1 = ν 0 σ 0 2 2 ν 0 2 Γ ν 0 2 1 θ n + 1 ν 0 2 + 1 exp ν 0 σ 0 2 2 1 θ n + 1 1 θ n + 1 ν 0 2 + 1 exp ν 0 σ 0 2 2 1 θ n + 1 .
Therefore,
π μ n + 1 , θ n + 1 | x n + 1 θ n + 1 1 2 exp 1 2 θ n + 1 x n + 1 μ n + 1 2 θ n + 1 1 2 exp κ 0 2 θ n + 1 μ n + 1 μ 0 2 × θ n + 1 ν 0 2 + 1 exp ν 0 σ 0 2 2 θ n + 1 = θ n + 1 1 2 θ n + 1 ν 0 + 1 2 + 1 exp 1 2 θ n + 1 x n + 1 μ n + 1 2 + κ 0 μ n + 1 μ 0 2 + ν 0 σ 0 2
The expression in the square brackets of (A1) changes to
x n + 1 μ n + 1 2 + κ 0 μ n + 1 μ 0 2 + ν 0 σ 0 2 = μ n + 1 2 2 x n + 1 μ n + 1 + x n + 1 2 + κ 0 μ n + 1 2 2 κ 0 μ 0 μ n + 1 + κ 0 μ 0 2 + ν 0 σ 0 2 = κ 0 + 1 μ n + 1 2 2 x n + 1 + κ 0 μ 0 μ n + 1 + x n + 1 2 + κ 0 μ 0 2 + ν 0 σ 0 2 = κ 0 + 1 μ n + 1 2 2 x n + 1 + κ 0 μ 0 κ 0 + 1 μ n + 1 + x n + 1 2 + κ 0 μ 0 2 + ν 0 σ 0 2 = κ 0 + 1 μ n + 1 x n + 1 + κ 0 μ 0 κ 0 + 1 2 x n + 1 + κ 0 μ 0 2 κ 0 + 1 + x n + 1 2 + κ 0 μ 0 2 + ν 0 σ 0 2 = κ 0 + 1 μ n + 1 x n + 1 + κ 0 μ 0 κ 0 + 1 2 + κ 0 x n + 1 μ 0 2 κ 0 + 1 + ν 0 σ 0 2 ,
where
x n + 1 + κ 0 μ 0 2 κ 0 + 1 + x n + 1 2 + κ 0 μ 0 2 = κ 0 + 1 x n + 1 2 + κ 0 + 1 κ 0 μ 0 2 x n + 1 + κ 0 μ 0 2 κ 0 + 1 = κ 0 + 1 x n + 1 2 + κ 0 + 1 κ 0 μ 0 2 x n + 1 2 κ 0 2 μ 0 2 2 κ 0 μ 0 x n + 1 κ 0 + 1 = κ 0 x n + 1 2 + κ 0 μ 0 2 2 κ 0 μ 0 x n + 1 κ 0 + 1 = κ 0 x n + 1 2 + μ 0 2 2 x n + 1 μ 0 κ 0 + 1 = κ 0 x n + 1 μ 0 2 κ 0 + 1 .
Let
μ 1 = x n + 1 + κ 0 μ 0 / κ 0 + 1 , κ 1 = κ 0 + 1 , ν 1 = ν 0 + 1 , ν 1 σ 1 2 = ν 0 σ 0 2 + κ 0 κ 0 + 1 x n + 1 μ 0 2 .
Then (A1) reduces to
π μ n + 1 , θ n + 1 | x n + 1 θ n + 1 1 2 θ n + 1 ν 1 2 + 1 exp 1 2 θ n + 1 κ 1 μ n + 1 μ 1 2 + ν 1 σ 1 2 .
It is shown that π μ n + 1 , θ n + 1 | x n + 1 is a normal-inverse-gamma distribution as follows. The joint posterior distribution π μ n + 1 , θ n + 1 | x n + 1 can be written as π μ n + 1 | θ n + 1 , x n + 1 π θ n + 1 | x n + 1 , where
π μ n + 1 | θ n + 1 , x n + 1 θ n + 1 κ 1 1 / 2 exp κ 1 2 θ n + 1 μ n + 1 μ 1 2 , π θ n + 1 | x n + 1 θ n + 1 ν 1 2 + 1 exp 1 θ n + 1 ν 1 σ 1 2 2 .
That is,
μ n + 1 | θ n + 1 , x n + 1 N μ 1 , θ n + 1 / κ 1 , θ n + 1 | x n + 1 I G α * = ν 1 / 2 , β * = ν 1 σ 1 2 / 2 .
Therefore, the joint posterior distribution π μ n + 1 , θ n + 1 | x n + 1 is
π μ n + 1 , θ n + 1 | x n + 1 = π μ n + 1 | θ n + 1 , x n + 1 π θ n + 1 | x n + 1 = 1 2 π θ n + 1 / κ 1 exp κ 1 2 θ n + 1 μ n + 1 μ 1 2 × ν 1 σ 1 2 2 ν 1 2 Γ ν 1 2 θ n + 1 ν 1 2 + 1 exp ν 1 σ 1 2 2 1 θ n + 1 = κ 1 2 π ν 1 σ 1 2 2 ν 1 2 Γ ν 1 2 θ n + 1 1 2 θ n + 1 ν 1 2 + 1 exp 1 2 θ n + 1 κ 1 μ n + 1 μ 1 2 + ν 1 σ 1 2 N I G ˜ ( μ 1 , κ 1 , ν 1 , σ 1 2 ) .
The joint prior distribution π μ n + 1 , θ n + 1 is
π μ n + 1 , θ n + 1 = π μ n + 1 | θ n + 1 π θ n + 1 = 1 2 π θ n + 1 / κ 0 exp κ 0 2 θ n + 1 μ n + 1 μ 0 2 × ν 0 σ 0 2 2 ν 0 2 Γ ν 0 2 1 θ n + 1 ν 0 2 + 1 exp ν 0 σ 0 2 2 1 θ n + 1 = κ 0 2 π ν 0 σ 0 2 2 ν 0 2 Γ ν 0 2 θ n + 1 1 2 θ n + 1 ν 0 2 + 1 exp 1 2 θ n + 1 κ 0 μ n + 1 μ 0 2 + ν 0 σ 0 2 N I G ˜ ( μ 0 , κ 0 , ν 0 , σ 0 2 ) .
Comparing (A4) and (A3), we find that the normal-inverse-gamma distribution is a conjugate prior for μ n + 1 , θ n + 1 of N μ n + 1 , θ n + 1 .
Now let us derive the marginal posterior density of μ n + 1 . We have
π μ n + 1 | x n + 1 = 0 π μ n + 1 , θ n + 1 | x n + 1 d θ n + 1 = 0 κ 1 2 π ν 1 σ 1 2 2 ν 1 2 Γ ν 1 2 θ n + 1 ν 1 + 1 2 + 1 exp 1 2 θ n + 1 κ 1 μ n + 1 μ 1 2 + ν 1 σ 1 2 d θ n + 1 = κ 1 2 π ν 1 σ 1 2 2 ν 1 2 Γ ν 1 2 0 θ n + 1 ν 1 + 1 2 + 1 exp 1 2 θ n + 1 κ 1 μ n + 1 μ 1 2 + ν 1 σ 1 2 d θ n + 1 = κ 1 2 π ν 1 σ 1 2 2 ν 1 2 Γ ν 1 2 Γ ν 1 + 1 2 κ 1 μ n + 1 μ 1 2 + ν 1 σ 1 2 2 ν 1 + 1 2 ,
by noting that the integrand of the above integral is the kernel of an inverse gamma distribution I G ˜ α 1 , β 1 with
α 1 = ν 1 + 1 2 , β 1 = κ 1 μ n + 1 μ 1 2 + ν 1 σ 1 2 2 .
Therefore,
π μ n + 1 | x n + 1 = Γ ν 1 + 1 2 Γ ν 1 2 κ 1 2 π ν 1 σ 1 2 2 ν 1 2 1 ν 1 σ 1 2 2 ν 1 + 1 2 1 + 1 ν 1 μ n + 1 μ 1 2 σ 1 2 / κ 1 ν 1 + 1 2 = Γ ν 1 + 1 2 Γ ν 1 2 π ν 1 σ 1 2 / κ 1 1 + 1 ν 1 μ n + 1 μ 1 2 σ 1 2 / κ 1 ν 1 + 1 2 t ν 1 μ 1 , σ 1 2 κ 1 .
Now let us calculate
f x n + 1 | μ n + 1 , θ n + 1 π μ n + 1 , θ n + 1 = 1 2 π θ n + 1 exp 1 2 θ n + 1 x n + 1 μ n + 1 2 × κ 0 2 π ν 0 σ 0 2 2 ν 0 2 Γ ν 0 2 θ n + 1 ν 0 + 1 2 + 1 exp 1 2 θ n + 1 κ 0 μ n + 1 μ 0 2 + ν 0 σ 0 2 = 2 π 1 κ 0 ν 0 σ 0 2 2 ν 0 2 Γ ν 0 2 θ n + 1 1 2 θ n + 1 ν 0 + 1 2 + 1 × exp 1 2 θ n + 1 x n + 1 μ n + 1 2 + κ 0 μ n + 1 μ 0 2 + ν 0 σ 0 2 = 2 π 1 κ 0 ν 0 σ 0 2 2 ν 0 2 Γ ν 0 2 θ n + 1 ν 1 + 1 2 + 1 exp 1 2 θ n + 1 κ 1 μ n + 1 μ 1 2 + ν 1 σ 1 2 .
Finally, let us derive the marginal density of x n + 1 . Combining (A5) and (A3), we have
m x n + 1 | μ 0 , κ 0 , ν 0 , σ 0 2 = f x n + 1 | μ n + 1 , θ n + 1 π μ n + 1 , θ n + 1 π μ n + 1 , θ n + 1 | x n + 1 = 2 π 1 κ 0 ν 0 σ 0 2 2 ν 0 2 Γ ν 0 2 θ n + 1 ν 1 + 1 2 + 1 exp 1 2 θ n + 1 κ 1 μ n + 1 μ 1 2 + ν 1 σ 1 2 2 π 1 2 κ 1 ν 1 σ 1 2 2 ν 1 2 Γ ν 1 2 θ n + 1 ν 1 + 1 2 + 1 exp 1 2 θ n + 1 κ 1 μ n + 1 μ 1 2 + ν 1 σ 1 2
= 2 π 1 2 κ 0 ν 0 σ 0 2 2 ν 0 2 Γ ν 1 2 κ 1 ν 1 σ 1 2 2 ν 1 2 Γ ν 0 2 = 1 2 π κ 0 κ 0 + 1 ν 0 σ 0 2 2 ν 0 2 Γ ν 0 + 1 2 Γ ν 0 2 ν 0 σ 0 2 + κ 0 κ 0 + 1 x n + 1 μ 0 2 2 ν 0 + 1 2 = Γ ν 0 + 1 2 Γ ν 0 2 2 π κ 0 + 1 / κ 0 ν 0 σ 0 2 2 ν 0 2 ν 0 σ 0 2 2 ν 0 + 1 2 1 + 1 ν 0 x n + 1 μ 0 2 σ 0 2 κ 0 + 1 / κ 0 ν 0 + 1 2 = Γ ν 0 + 1 2 Γ ν 0 2 π ν 0 σ 0 2 κ 0 + 1 / κ 0 1 + 1 ν 0 x n + 1 μ 0 2 σ 0 2 κ 0 + 1 / κ 0 ν 0 + 1 2 t ν 0 μ 0 , σ 0 2 κ 0 + 1 κ 0 .
The proof of the theorem is complete. □

Appendix A.2. The Proof of Theorem 2

The hyperparameters of model (2) are < μ 0 < , κ 0 > 0 , ν 0 > 0 , and σ 0 > 0 . By Theorem 1, we know that the marginal distribution of X in model (2) is a non-standardized student-t distribution, that is,
X t ν 0 μ 0 , σ 0 2 κ 0 + 1 κ 0 .
Since there are four hyperparameters, if we want to obtain the estimators of the hyperparameters of model (2) by the moment method, we need to calculate the first four moments of X at least. By Lemma 4 in [19], we obtain the first six population moments of X as follows. Furthermore, letting the population moments be equal to the sample moments, we obtain
E X = μ 0 = A 1 , E X 2 = μ 0 2 + ν 0 ν 0 2 σ 0 2 κ 0 + 1 κ 0 = A 2 , if ν 0 > 2 , E X 3 = μ 0 3 + 3 μ 0 ν 0 ν 0 2 σ 0 2 κ 0 + 1 κ 0 = A 3 , if ν 0 > 2 , E X 4 = μ 0 4 + 6 μ 0 2 ν 0 ν 0 2 σ 0 2 κ 0 + 1 κ 0 + 3 ν 0 2 ν 0 2 ν 0 4 σ 0 2 κ 0 + 1 κ 0 2 = A 4 , if ν 0 > 4 , E X 5 = μ 0 5 + 10 μ 0 3 ν 0 ν 0 2 σ 0 2 κ 0 + 1 κ 0 + 15 μ 0 ν 0 2 ν 0 2 ν 0 4 σ 0 2 κ 0 + 1 κ 0 2 = A 5 , if ν 0 > 4 , E X 6 = μ 0 6 + 15 μ 0 4 ν 0 ν 0 2 σ 0 2 κ 0 + 1 κ 0 + 45 μ 0 2 ν 0 2 ν 0 2 ν 0 4 σ 0 2 κ 0 + 1 κ 0 2 + 15 ν 0 3 ν 0 2 ν 0 4 ν 0 6 σ 0 2 κ 0 + 1 κ 0 3 = A 6 , if ν 0 > 6 .
From the first moment of X, we obtain the moment estimator of μ 0 as
μ 01 n = A 1 .
Let
w 1 = ν 0 ν 0 2 σ 0 2 κ 0 + 1 κ 0 , w 2 = ν 0 2 ν 0 2 ν 0 4 σ 0 2 κ 0 + 1 κ 0 2 , w 3 = ν 0 3 ν 0 2 ν 0 4 ν 0 6 σ 0 2 κ 0 + 1 κ 0 3 .
From the second moment of X, we obtain the moment estimator of w 1 as
w 1 n = A 2 A 1 2 .
From the third moment of X, we obtain the moment estimator of w 1 as
w 1 n = A 3 A 1 3 3 A 1 .
Obviously, the two moment estimators of w 1 are not equal. Therefore, we choose one of them as the moment estimator of w 1 . For simplicity, we use the moment estimator of w 1 calculated from E X 2 , and ignore the third equation involving E X 3 . Similarly, the equations involving E X 4 and E X 5 both have w 1 and w 2 . For simplicity, we use the equation involving E X 4 , and ignore the equation involving E X 5 . To have four equations, we will use the equation involving E X 6 . Therefore, the moment equations become
E X = μ 0 = A 1 , E X 2 = μ 0 2 + w 1 = A 2 , E X 4 = μ 0 4 + 6 μ 0 2 w 1 + 3 w 2 = A 4 , E X 6 = μ 0 6 + 15 μ 0 4 w 1 + 45 μ 0 2 w 2 + 15 w 3 = A 6 .
Solving the above moment equations, we obtain
μ 01 n = A 1 , w 1 n = ν 0 ν 0 2 σ 0 2 κ 0 + 1 κ 0 = A 2 A 1 2 B 1 , w 2 n = ν 0 2 ν 0 2 ν 0 4 σ 0 2 κ 0 + 1 κ 0 2 = 1 3 A 4 6 A 1 2 A 2 + 5 A 1 4 B 2 , w 3 n = ν 0 3 ν 0 2 ν 0 4 ν 0 6 σ 0 2 κ 0 + 1 κ 0 3 = 1 15 61 A 1 6 + 75 A 2 A 1 4 15 A 4 A 1 2 + A 6 B 3 .
Let
u 0 = σ 0 2 κ 0 + 1 κ 0 .
Since κ 0 and σ 0 appear together in u 0 , we can not directly obtain the estimators of κ 0 and σ 0 by the moment method. But we can obtain the estimator of u 0 by the moment method. In the following, we are interested in obtaining the moment estimators of < μ 0 < , ν 0 > 0 , and u 0 > 0 . The moment equations involving w 1 n , w 2 n , and w 3 n become
ν 0 ν 0 2 u 0 = B 1 , ν 0 2 ν 0 2 ν 0 4 u 0 2 = B 2 , ν 0 3 ν 0 2 ν 0 4 ν 0 6 u 0 3 = B 3 .
Since there are three equations and only two parameters ν 0 and u 0 , for simplicity, we will only use the first two equations, and ignore the third equation. Solving the above first two equations for ν 0 and u 0 , we obtain the moment estimators of ν 0 and u 0 as
ν 01 n = 2 B 1 2 4 B 2 B 1 2 B 2 , u 01 n = B 1 B 2 B 1 2 2 B 2 .
Substituting the expressions of B 1 and B 2 , and after some algebra, we obtain the expressions of ν 01 n and u 01 n in terms of A i i = 1 , 2 , 4 as
ν 01 n = 14 3 A 1 4 + 4 A 1 2 A 2 + 2 A 2 2 4 3 A 4 2 3 A 1 4 + A 2 2 1 3 A 4 ,
u 01 n = 1 3 A 2 A 1 2 5 A 1 4 6 A 1 2 A 2 + A 4 7 3 A 1 4 + 2 A 1 2 A 2 + A 2 2 2 3 A 4 .
Therefore, the moment estimators of μ 0 , ν 0 , and u 0 are given by (A6), (A7), and (A8).
Now let us show that the moment estimators are consistent estimators of the hyperparameters. It is easy to show that
A k = 1 n i = 1 n X i k p E X k
for k = 1 , 2 , 4 , where p means convergence in probability. Hence,
A 1 p E X = μ 0 , A 2 p E X 2 = μ 0 2 + ν 0 ν 0 2 u 0 , A 4 p E X 4 = μ 0 4 + 6 μ 0 2 ν 0 ν 0 2 u 0 + 3 ν 0 2 ν 0 2 ν 0 4 u 0 2 .
Therefore,
μ 01 n = A 1 f A 1 p f E X = μ 0 ,
ν 01 n = 14 3 A 1 4 + 4 A 1 2 A 2 + 2 A 2 2 4 3 A 4 2 3 A 1 4 + A 2 2 1 3 A 4 g A 1 , A 2 , A 4 p g E X , E X 2 , E X 4 = 14 3 μ 0 4 + 4 μ 0 2 μ 0 2 + ν 0 ν 0 2 u 0 + 2 μ 0 2 + ν 0 ν 0 2 u 0 2 4 3 μ 0 4 + 6 μ 0 2 ν 0 ν 0 2 u 0 + 3 ν 0 2 ν 0 2 ν 0 4 u 0 2 2 3 μ 0 4 + μ 0 2 + ν 0 ν 0 2 u 0 2 1 3 μ 0 4 + 6 μ 0 2 ν 0 ν 0 2 u 0 + 3 ν 0 2 ν 0 2 ν 0 4 u 0 2 = ν 0 ,
and
u 01 n = 1 3 A 2 A 1 2 5 A 1 4 6 A 1 2 A 2 + A 4 7 3 A 1 4 + 2 A 1 2 A 2 + A 2 2 2 3 A 4 h A 1 , A 2 , A 4 p h E X , E X 2 , E X 4 = 1 3 μ 0 2 + ν 0 ν 0 2 u 0 μ 0 2 5 μ 0 4 6 μ 0 2 μ 0 2 + ν 0 ν 0 2 u 0 + μ 0 4 + 6 μ 0 2 ν 0 ν 0 2 u 0 + 3 ν 0 2 ν 0 2 ν 0 4 u 0 2 7 3 μ 0 4 + 2 μ 0 2 μ 0 2 + ν 0 ν 0 2 u 0 + μ 0 2 + ν 0 ν 0 2 u 0 2 2 3 μ 0 4 + 6 μ 0 2 ν 0 ν 0 2 u 0 + 3 ν 0 2 ν 0 2 ν 0 4 u 0 2 = u 0 .
The proof of the theorem is complete. □

Appendix A.3. The Proof of Theorem 3

Now we derive the Maximum Likelihood Estimators (MLEs) of μ 0 , ν 0 , and u 0 . By Theorem 1, we know that the marginal distribution of X of the model (2) is
m x | μ 0 , ν 0 , u 0 = Γ ν 0 + 1 2 Γ ν 0 2 π ν 0 u 0 1 + 1 ν 0 x μ 0 2 u 0 ν 0 + 1 2 ,
for < x < , < μ 0 < , ν 0 > 0 , and u 0 > 0 . Then the likelihood function of μ 0 , ν 0 , and u 0 is
L μ 0 , ν 0 , u 0 | x = m x | μ 0 , ν 0 , u 0 = i = 1 n m x i | μ 0 , ν 0 , u 0 = i = 1 n Γ ν 0 + 1 2 Γ ν 0 2 π ν 0 u 0 1 + 1 ν 0 x i μ 0 2 u 0 ν 0 + 1 2 = Γ n ν 0 + 1 2 Γ n ν 0 2 π ν 0 u 0 n 2 i = 1 n 1 + x i μ 0 2 ν 0 u 0 ν 0 + 1 2 .
Consequently, the log-likelihood function of μ 0 , ν 0 , and u 0 is
log L μ 0 , ν 0 , u 0 | x = n log Γ ν 0 + 1 2 n log Γ ν 0 2 n 2 log π ν 0 u 0 + i = 1 n ν 0 + 1 2 log 1 + x i μ 0 2 ν 0 u 0 = n log Γ ν 0 + 1 2 n log Γ ν 0 2 n 2 log π + log ν 0 + log u 0 i = 1 n ν 0 + 1 2 log 1 + x i μ 0 2 ν 0 u 0 .
Taking partial derivatives with respect to μ 0 , ν 0 , and u 0 and setting them to zeros, we obtain
μ 0 log L = i = 1 n ν 0 + 1 2 1 1 + x i μ 0 2 ν 0 u 0 1 ν 0 u 0 2 x i μ 0 1 = i = 1 n ν 0 + 1 x i μ 0 ν 0 u 0 + x i μ 0 2 = 0 ,
ν 0 log L = n ψ ν 0 + 1 2 1 2 n ψ ν 0 2 1 2 n 2 1 ν 0 1 2 i = 1 n 1 · log 1 + x i μ 0 2 ν 0 u 0 + ν 0 + 1 1 1 + x i μ 0 2 ν 0 u 0 x i μ 0 2 u 0 1 ν 0 2 = n 2 ψ ν 0 + 1 2 n 2 ψ ν 0 2 n 2 1 ν 0 1 2 i = 1 n log 1 + x i μ 0 2 ν 0 u 0 ν 0 + 1 x i μ 0 2 ν 0 u 0 + x i μ 0 2 ν 0 = 0 ,
and
u 0 log L = n 2 1 u 0 i = 1 n ν 0 + 1 2 1 1 + x i μ 0 2 ν 0 u 0 x i μ 0 2 ν 0 1 u 0 2 = n 2 1 u 0 + 1 2 i = 1 n ν 0 + 1 x i μ 0 2 ν 0 u 0 + x i μ 0 2 u 0 = 0 ,
where
ψ x = Γ x Γ x = d d x log Γ x = digamma x
which can be directly calculated in R software by digamma(x) ([46]). Let
g i = ν 0 u 0 + x i μ 0 2 .
Thus,
x i μ 0 2 = g i ν 0 u 0 .
After some algebra, the above equations reduce to
f 1 μ 0 , ν 0 , u 0 = 1 n i = 1 n ν 0 + 1 x i μ 0 ν 0 u 0 + x i μ 0 2 = 1 n i = 1 n ν 0 + 1 x i μ 0 g i = 0 ,
f 2 μ 0 , ν 0 , u 0 = 1 2 ψ ν 0 + 1 2 1 2 ψ ν 0 2 1 2 ν 0 1 2 1 n i = 1 n log 1 + x i μ 0 2 ν 0 u 0 ν 0 + 1 x i μ 0 2 ν 0 u 0 + x i μ 0 2 ν 0 = 1 2 ψ ν 0 + 1 2 1 2 ψ ν 0 2 1 2 ν 0 1 2 1 n i = 1 n log g i log ν 0 u 0 ν 0 + 1 g i ν 0 u 0 g i ν 0 = 0 ,
f 3 μ 0 , ν 0 , u 0 = 1 2 u 0 + 1 2 1 n i = 1 n ν 0 + 1 x i μ 0 2 ν 0 u 0 + x i μ 0 2 u 0 = 1 2 u 0 + 1 2 1 n i = 1 n ν 0 + 1 g i ν 0 u 0 g i u 0 = 0 .
In (A9), (A10), and (A11), the expressions involving g i are used for simplifying the R coding.
The MLEs of the hyperparameters μ 0 , ν 0 , and u 0 are the solutions to the equations (A9), (A10), and (A11). The analytical calculations of the MLEs of μ 0 , ν 0 , and u 0 by solving the above equations are impossible, and thus we have to resort to numerical solutions. We can exploit Newton’s method to solve the above equations and to obtain the MLEs of μ 0 , ν 0 , and u 0 . The iterative scheme of Newton’s method is
p k + 1 = p k J p k 1 f p k , k = 0 , 1 , ,
where J p is the Jacobian matrix of f p = f 1 p , f 2 p , f 3 p and p = μ 0 , ν 0 , u 0 . Note that the MLEs of μ 0 , ν 0 , and u 0 are very sensitive to the initial estimators, and the moment estimators are usually proved to be good initial estimators. The Jacobian matrix of μ 0 , ν 0 , and u 0 is given by
J = J 11 J 12 J 13 J 21 J 22 J 23 J 31 J 32 J 33 = f 1 μ 0 f 1 ν 0 f 1 u 0 f 2 μ 0 f 2 ν 0 f 2 u 0 f 3 μ 0 f 3 ν 0 f 3 u 0 ,
where
J 11 = f 1 μ 0 = 1 n i = 1 n ν 0 + 1 1 ν 0 u 0 + x i μ 0 2 x i μ 0 2 x i μ 0 1 ν 0 u 0 + x i μ 0 2 2 = 1 n i = 1 n ν 0 + 1 2 x i μ 0 2 ν 0 u 0 x i μ 0 2 ν 0 u 0 + x i μ 0 2 2 = 1 n i = 1 n ν 0 + 1 x i μ 0 2 ν 0 u 0 ν 0 u 0 + x i μ 0 2 2 = 1 n i = 1 n ν 0 + 1 g i 2 ν 0 u 0 g i 2 ,
J 12 = f 1 ν 0 = 1 n i = 1 n x i μ 0 1 · ν 0 u 0 + x i μ 0 2 ν 0 + 1 u 0 ν 0 u 0 + x i μ 0 2 2 = 1 n i = 1 n x i μ 0 x i μ 0 2 u 0 ν 0 u 0 + x i μ 0 2 2 = 1 n i = 1 n x i μ 0 g i ν 0 u 0 u 0 g i 2 ,
J 13 = f 1 u 0 = 1 n i = 1 n ν 0 + 1 x i μ 0 1 1 ν 0 u 0 + x i μ 0 2 2 ν 0 = 1 n i = 1 n ν 0 ν 0 + 1 x i μ 0 ν 0 u 0 + x i μ 0 2 2 = 1 n i = 1 n ν 0 ν 0 + 1 x i μ 0 g i 2 ,
J 21 = f 2 μ 0 = 1 2 1 n i = 1 n 1 1 + x i μ 0 2 ν 0 u 0 1 ν 0 u 0 2 x i μ 0 1 ν 0 + 1 ν 0 2 x i μ 0 1 g i x i μ 0 2 2 x i μ 0 1 g i 2 = 1 2 1 n i = 1 n 2 x i μ 0 g i + ν 0 + 1 ν 0 2 x i μ 0 g i 2 x i μ 0 3 g i 2 = 1 2 1 n i = 1 n 2 x i μ 0 g i + ν 0 + 1 ν 0 2 x i μ 0 ν 0 u 0 g i 2
= 1 2 1 n i = 1 n 2 x i μ 0 g i 2 ν 0 + 1 u 0 x i μ 0 g i 2 = 1 2 1 n i = 1 n 2 x i μ 0 g i 2 x i μ 0 ν 0 u 0 + u 0 g i 2 = 1 2 1 n i = 1 n 2 x i μ 0 g i ν 0 u 0 u 0 g i 2 = 1 n i = 1 n x i μ 0 g i ν 0 u 0 u 0 g i 2 = J 12 ,
J 22 = f 2 ν 0 = 1 2 ψ ν 0 + 1 2 1 2 1 2 ψ ν 0 2 1 2 1 2 1 ν 0 2 1 2 1 n i = 1 n 1 1 + x i μ 0 2 ν 0 u 0 x i μ 0 2 u 0 1 ν 0 2 x i μ 0 2 1 ν 0 2 g i 1 + 1 ν 0 u 0 g i 2 = 1 4 ψ ν 0 + 1 2 1 4 ψ ν 0 2 + 1 2 ν 0 2 + 1 2 1 n i = 1 n x i μ 0 2 g i ν 0 x i μ 0 2 1 ν 0 2 g i + 1 + 1 ν 0 u 0 g i 2 = 1 4 ψ ν 0 + 1 2 1 4 ψ ν 0 2 + 1 2 ν 0 2 + 1 2 1 n i = 1 n g i ν 0 u 0 g i ν 0 1 u 0 ν 0 ν 0 + 1 g i 2 ν 0 2 ,
where
x i μ 0 2 g i ν 0 x i μ 0 2 1 ν 0 2 g i + 1 + 1 ν 0 u 0 g i 2 = g i ν 0 u 0 g i ν 0 g i ν 0 u 0 1 ν 0 2 g i + 1 + 1 ν 0 u 0 g i 2 = g i ν 0 u 0 g i ν 0 g i ν 0 u 0 g i + u 0 ν 0 ν 0 + 1 g i 2 ν 0 2 = g i ν 0 u 0 g i ν 0 g i ν 0 u 0 g i + u 0 ν 0 ν 0 + 1 g i 2 ν 0 2 = g i ν 0 u 0 g i ν 0 g i u 0 ν 0 ν 0 + 1 g i 2 ν 0 2 = g i ν 0 u 0 g i ν 0 1 u 0 ν 0 ν 0 + 1 g i 2 ν 0 2 ,
J 23 = f 2 u 0 = 1 2 1 n i = 1 n 1 1 + x i μ 0 2 ν 0 u 0 x i μ 0 2 ν 0 1 u 0 2 ν 0 + 1 x i μ 0 2 ν 0 1 g i 2 ν 0 = 1 2 1 n i = 1 n x i μ 0 2 g i u 0 ν 0 + 1 x i μ 0 2 g i 2 = 1 2 1 n i = 1 n x i μ 0 2 ν 0 u 0 + x i μ 0 2 u 0 ν 0 + 1 x i μ 0 2 g i 2 u 0
= 1 2 1 n i = 1 n ν 0 u 0 x i μ 0 2 + x i μ 0 4 ν 0 u 0 x i μ 0 2 u 0 x i μ 0 2 g i 2 u 0 = 1 2 1 n i = 1 n x i μ 0 4 u 0 x i μ 0 2 g i 2 u 0 = 1 2 1 n i = 1 n x i μ 0 2 x i μ 0 2 u 0 g i 2 u 0 = 1 2 1 n i = 1 n g i ν 0 u 0 g i ν 0 u 0 u 0 g i 2 u 0 ,
J 31 = f 3 μ 0 = 1 2 1 n i = 1 n ν 0 + 1 u 0 2 x i μ 0 1 g i x i μ 0 2 2 x i μ 0 1 g i 2 = 1 2 1 n i = 1 n ν 0 + 1 u 0 2 x i μ 0 x i μ 0 2 g i g i 2 = 1 2 1 n i = 1 n ν 0 + 1 u 0 2 x i μ 0 ν 0 u 0 g i 2 = 1 n i = 1 n ν 0 ν 0 + 1 x i μ 0 g i 2 = J 13 ,
J 32 = f 3 ν 0 = 1 2 1 n i = 1 n x i μ 0 2 u 0 1 · g i ν 0 + 1 u 0 g i 2 = 1 2 1 n i = 1 n g i ν 0 u 0 g i ν 0 u 0 u 0 g i 2 u 0 = J 23 ,
J 33 = f 3 u 0 = 1 2 1 u 0 2 + 1 2 1 n i = 1 n ν 0 + 1 x i μ 0 2 1 ν 0 2 u 0 + x i μ 0 2 g i 2 u 0 2 = 1 2 u 0 2 1 2 1 n i = 1 n ν 0 + 1 x i μ 0 2 ν 0 u 0 + g i g i 2 u 0 2 = 1 2 u 0 2 1 2 1 n i = 1 n ν 0 + 1 g i ν 0 u 0 g i + ν 0 u 0 g i 2 u 0 2 = 1 2 u 0 2 1 2 1 n i = 1 n ν 0 + 1 g i 2 ν 0 2 u 0 2 g i 2 u 0 2 .
Note that
ψ x = d 2 d x 2 log Γ x = trigamma x
which can be directly calculated in R software by trigamma(x) ([46]). In J i j i = 1 , 2 , 3 , j = 1 , 2 , 3 , the expressions involving g i are used for simplifying the R coding.
Now let us show that the MLEs are consistent estimators of the hyperparameters. From Theorem 10.1.6 in [47], we know that the MLEs are consistent estimators of the hyperparameters under some regularity conditions in Miscellanea 10.6.2 in [47]. The regularity conditions are listed below:
(C1). 
We observe X 1 , X 2 , , X n , where X i m x | μ 0 , ν 0 , u 0 are iid.
(C2). 
The parameter is identifiable; that is, if μ 0 , ν 0 , u 0 μ 0 , ν 0 , u 0 , then m x | μ 0 , ν 0 , u 0 m x | μ 0 , ν 0 , u 0 .
(C3). 
The densities m x | μ 0 , ν 0 , u 0 have common support, and m x | μ 0 , ν 0 , u 0 is differentiable in μ 0 , ν 0 , and u 0 .
(C4). 
The parameter space Ω contains an open set ω of which the true parameter value μ 00 , ν 00 , u 00 is an interior point.
It remains to show that the marginal distribution m x | μ 0 , ν 0 , u 0 satisfies all the regularity conditions. Let X = , be the support set of m x | μ 0 , ν 0 , u 0 .
First, (C1) is satisfied, as X 1 , X 2 , , X n is a random sample from m x | μ 0 , ν 0 , u 0 .
Second, let us show that (C2) is satisfied. The parameter is identifiable
μ 0 , ν 0 , u 0 μ 0 , ν 0 , u 0 m x | μ 0 , ν 0 , u 0 m x | μ 0 , ν 0 , u 0 for some x X
m x | μ 0 , ν 0 , u 0 = m x | μ 0 , ν 0 , u 0 for all x X μ 0 , ν 0 , u 0 = μ 0 , ν 0 , u 0 .
We have
m x | μ 0 , ν 0 , u 0 = m x | μ 0 , ν 0 , u 0 for all x X
m x | μ 0 , ν 0 , u 0 m x | μ 0 , ν 0 , u 0 = 1 for all x X .
Note that
m x | μ 0 , ν 0 , u 0 = Γ ν 0 + 1 2 Γ ν 0 2 π ν 0 u 0 1 + 1 ν 0 x μ 0 2 u 0 ν 0 + 1 2 ,
and thus
m x | μ 0 , ν 0 , u 0 = Γ ν 0 + 1 2 Γ ν 0 2 π ν 0 u 0 1 + 1 ν 0 x μ 0 2 u 0 ν 0 + 1 2 .
Therefore, (A13) is equivalent to
Γ ν 0 + 1 2 Γ ν 0 2 π ν 0 u 0 Γ ν 0 2 π ν 0 u 0 Γ ν 0 + 1 2 1 + 1 ν 0 x μ 0 2 u 0 ν 0 + 1 2 1 + 1 ν 0 x μ 0 2 u 0 ν 0 + 1 2 = 1 for all x X
Γ ν 0 + 1 2 Γ ν 0 2 π ν 0 u 0 Γ ν 0 + 1 2 Γ ν 0 2 π ν 0 u 0 1 + 1 ν 0 x μ 0 2 u 0 ν 0 + 1 2 1 + 1 ν 0 x μ 0 2 u 0 ν 0 + 1 2 = 1 for all x X ,
which implies
Γ ν 0 + 1 2 Γ ν 0 2 π ν 0 u 0 Γ ν 0 + 1 2 Γ ν 0 2 π ν 0 u 0 = 1 c 1 , 1 + 1 ν 0 x μ 0 2 u 0 ν 0 + 1 2 1 + 1 ν 0 x μ 0 2 u 0 ν 0 + 1 2 = c 1 for all x X ,
where c 1 is a constant which does not depend on x but may depend on μ 0 , ν 0 , u 0 , μ 0 , ν 0 , and u 0 . From (A14), we obtain
μ 0 = μ 0
and
ν 0 u 0 = ν 0 u 0 .
Substituting (A15) and (A16) into (A14), we obtain
1 + x μ 0 2 ν 0 u 0 ν 0 ν 0 2 = c 1 for all x X ,
which implies
ν 0 = ν 0 .
Substituting (A17) into (A16), we obtain
u 0 = u 0 .
Therefore,
μ 0 , ν 0 , u 0 = μ 0 , ν 0 , u 0 .
Consequently, (A12) is correct, and (C2) is satisfied.
Third, (C3) is satisfied, as the densities m x | μ 0 , ν 0 , u 0 have common support X = , , and m x | μ 0 , ν 0 , u 0 is differentiable in μ 0 , ν 0 , and u 0 .
Finally, (C4) is satisfied, as the true parameter value
μ 00 , ν 00 , u 00 ω = Ω = , × 0 , × 0 , ,
which is an open set.
Therefore, the marginal densities m x | μ 0 , ν 0 , u 0 satisfy all the regularity conditions, and the MLEs are consistent estimators of the hyperparameters.
The proof of the theorem is complete. □

References

  1. Berger, JO. Statistical Decision Theory and Bayesian Analysis, 2nd edition; Springer: New York, 1985. [Google Scholar]
  2. Maritz, JS; Lwin, T. Empirical Bayes Methods, 2nd edition; Chapman & Hall: London, 1989. [Google Scholar]
  3. Carlin, BP; Louis, A. Bayes and Empirical Bayes Methods for Data Analysis, 2nd edition; Chapman & Hall: London, 2000. [Google Scholar]
  4. Robbins, H. An empirical bayes approach to statistics. Proceedings of Third Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, University of California Press, 1955.
  5. Robbins, H. The empirical bayes approach to statistical decision problems. Annals of Mathematical Statistics 1964, 35, 1–20. [Google Scholar] [CrossRef]
  6. Robbins, H. Some thoughts on empirical bayes estimation. Annals of Statistics 1983, 1, 713–723. [Google Scholar] [CrossRef]
  7. Deely, JJ; Lindley, DV. Bayes empirical bayes. Journal of the American Statistical Association 1981, 76, 833–841. [Google Scholar] [CrossRef]
  8. Morris, C. Parametric empirical bayes inference: Theory and applications. Journal of the American Statistical Association 1983, 78, 47–65. [Google Scholar] [CrossRef]
  9. Maritz, JS; Lwin, T. Assessing the performance of empirical bayes estimators. Annals of the Institute of Statistical Mathematics 1992, 44, 641–657. [Google Scholar] [CrossRef]
  10. Carlin, BP; Louis, A. Empirical bayes: Past, present and future. Journal of the American Statistical Association 2000, 95, 1286–1290. [Google Scholar] [CrossRef]
  11. Pensky, M. Locally adaptive wavelet empirical bayes estimation of a location parameter. Annals of the Institute of Statistical Mathematics 2002, 54, 83–99. [Google Scholar] [CrossRef]
  12. Coram, M; Tang, H. Improving population-specific allele frequency estimates by adapting supplemental data: An empirical bayes approach. The Annals of Applied Statistics 2007, 1, 459–479. [Google Scholar] [CrossRef]
  13. Efron, B. Tweedie’s formula and selection bias. Journal of the American Statistical Association 2011, 106, 1602–1614. [Google Scholar] [CrossRef] [PubMed]
  14. van Houwelingen, HC. The role of empirical bayes methodology as a leading principle in modern medical statistics. Biometrical Journal 2014, 56, 919–932. [Google Scholar] [CrossRef]
  15. Ghosh, M; Kubokawa, T; Kawakubo, Y. Benchmarked empirical bayes methods in multiplicative area-level models with risk evaluation. Biometrika 2015, 102, 647–659. [Google Scholar] [CrossRef]
  16. Satagopan, JM; Sen, A; Zhou, Q; Lan, Q; Rothman, N; Langseth, H; Engel, LS. Bayes and empirical bayes methods for reduced rank regression models in matched case-control studies. Biometrics 2016, 72, 584–595. [Google Scholar] [CrossRef]
  17. Martin, R; Mess, R; Walker, SG. Empirical bayes posterior concentration in sparse high-dimensional linear models. Bernoulli 2017, 23, 1822–1847. [Google Scholar] [CrossRef]
  18. Mikulich-Gilbertson, SK; Wagner, BD; Grunwald, GK; Riggs, PD; Zerbe, GO. Using empirical bayes predictors from generalized linear mixed models to test and visualize associations among longitudinal outcomes. Statistical Methods in Medical Research 2018. [Google Scholar] [CrossRef]
  19. Zhang, YY; Rong, TZ; Li, MM. The empirical bayes estimators of the mean and variance parameters of the normal distribution with a conjugate normal-inverse-gamma prior by the moment method and the mle method. Communications in Statistics-Theory and Methods 2019, 48, 2286–2304. [Google Scholar] [CrossRef]
  20. Zhang, YY; Wang, ZY; Duan, ZM; Mi, W. The empirical bayes estimators of the parameter of the poisson distribution with a conjugate gamma prior under stein’s loss function. Journal of Statistical Computation and Simulation 2019, 89, 3061–3074. [Google Scholar] [CrossRef]
  21. Sun, J; Zhang, YY; Sun, Y. The empirical bayes estimators of the rate parameter of the inverse gamma distribution with a conjugate inverse gamma prior under stein’s loss function. Journal of Statistical Computation and Simulation 2021, 91, 1504–1523. [Google Scholar] [CrossRef]
  22. Zhou, MQ; Zhang, YY; Sun, Y; Sun, J; Rong, TZ; Li, MM. The empirical bayes estimators of the probability parameter of the beta-negative binomial model under zhang’s loss function. Chinese Journal of Applied Probability and Statistics 2021, 37, 478–494. [Google Scholar]
  23. Sun, Y; Zhang, YY; Sun, J. The empirical bayes estimators of the parameter of the uniform distribution with an inverse gamma prior under stein’s loss function. Communications in Statistics-Simulation and Computation 2024. [Google Scholar] [CrossRef]
  24. Zhang, YY; Zhang, YY; Wang, ZY; Sun, Y; Sun, J. The empirical bayes estimators of the variance parameter of the normal distribution with a conjugate inverse gamma prior under stein’s loss function. Communications in Statistics-Theory and Methods 2024, 53, 170–200. [Google Scholar] [CrossRef]
  25. Mao, SS; Tang, YC. Bayesian Statistics, 2nd edition; China Statistics Press: Beijing, 2012. [Google Scholar]
  26. Chen, MH. Bayesian Statistics Lecture. Statistics Graduate Summer School, School of Mathematics and Statistics, Northeast Normal University, Changchun, China July 2014.
  27. James, W; Stein, C. Estimation with quadratic loss. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability 1961, 1, 361–380. [Google Scholar]
  28. Zhang, YY. The bayes rule of the variance parameter of the hierarchical normal and inverse gamma model under stein’s loss. Communications in Statistics-Theory and Methods 2017, 46, 7125–7133. [Google Scholar] [CrossRef]
  29. Xie, YH; Song, WH; Zhou, MQ; Zhang, YY. The bayes posterior estimator of the variance parameter of the normal distribution with a normal-inverse-gamma prior under stein’s loss. Chinese Journal of Applied Probability and Statistics 2018, 34, 551–564. [Google Scholar]
  30. Zhang, YY; Xie, YH; Song, WH; Zhou, MQ. Three strings of inequalities among six bayes estimators. Communications in Statistics-Theory and Methods 2018, 47, 1953–1961. [Google Scholar] [CrossRef]
  31. Good, IJ. Turing’s anticipation of empirical bayes in connection with the cryptanalysis of the naval enigma. Journal of Statistical Computation and Simulation 2000, 66(2), 101–111. [Google Scholar] [CrossRef]
  32. Noma, H; Matsui, S. Empirical bayes ranking and selection methods via semiparametric hierarchical mixture models in microarray studies. Statistics in Medicine 2013, 32(11), 1904–1916. [Google Scholar] [CrossRef] [PubMed]
  33. Pan, W; Jeong, KS; Xie, Y; Khodursky, A. A nonparametric empirical bayes approach to joint modeling of multiple sources of genomic data. Statistica Sinica 2008, 18(2), 709–729. [Google Scholar]
  34. Zhang, Q; Xu, Z; Lai, Y. An empirical bayes approach for the identification of long-range chromosomal interaction from hi-c data. Statistical Applications in Genetics and Molecular Biology 2021, 20(1), 1–15. [Google Scholar] [CrossRef]
  35. Soloff, JA; Guntuboyina, A; Sen, B. Multivariate, heteroscedastic empirical bayes via nonparametric maximum likelihood. Journal of the Royal Statistical Society Series B-Statistical Methodology 2024, 87(1), 1–32. [Google Scholar] [CrossRef]
  36. Adler D, Murdoch D, others. rgl: 3D Visualization Using OpenGL 2017. URL https://CRAN.R-project.org/package=rgl, r package version 0.98.1.
  37. Zhang, YY; Zhou, MQ; Xie, YH; Song, WH. The bayes rule of the parameter in (0,1) under the power-log loss function with an application to the beta-binomial model. Journal of Statistical Computation and Simulation 2017, 87, 2724–2737. [Google Scholar] [CrossRef]
  38. Conover, WJ. Pages 295–301 (one-sample kolmogorov test), 309–314 (two-sample smirnov test). In Practical Nonparametric Statistics; John Wiley & Sons: New York, 1971. [Google Scholar]
  39. Durbin, J. Distribution Theory for Tests Based on the Sample Distribution Function; SIAM: Philadelphia, 1973. [Google Scholar]
  40. Marsaglia, G; Tsang, WW; Wang, JB. Evaluating kolmogorov’s distribution. Journal of Statistical Software 2003, 8(18). [Google Scholar] [CrossRef]
  41. Dimitrova, DS; Kaishev, VK; Tan, S. Computing the kolmogorov-smirnov distribution when the underlying cdf is purely discrete, mixed or continuous 2017; City, University of London Institutional Repository.
  42. Aldirawi, H; Yang, J; Metwally, AA. Identifying appropriate probabilistic models for sparse discrete omics data. In IEEE EMBS International Conference on Biomedical and Health Informatics (BHI); IEEE, 2019. [Google Scholar]
  43. Santitissadeekorn, N; Lloyd, DJB; Short, MB; Delahaies, S. Approximate filtering of conditional intensity process for poisson count data: Application to urban crime. Computational Statistics & Data Analysis 2020, 144, 1–14, Document Number 106850. [Google Scholar]
  44. Berger, JO. The case for objective bayesian analysis. Bayesian Analysis 2006, 1, 385–402. [Google Scholar] [CrossRef]
  45. Berger, JO; Bernardo, JM; Sun, DC. Overall objective priors. Bayesian Analysis 2015, 10, 189–221. [Google Scholar] [CrossRef]
  46. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2024; Available online: http://www.R-project.org/.
  47. Casella, G; Berger, RL. Statistical Inference, 2nd edition; Pacific Grove: Duxbury, 2002. [Google Scholar]
Figure 1. The histograms of the samples and their density curves. (a) x 1 n generated from (1) with μ 0 , κ 0 , ν 0 , σ 0 2 = 3 , 1 , 6 , 1 and u 0 = 2 . (b) x 2 n generated from (2) with μ 0 , κ 0 , ν 0 , σ 0 2 = 3 , 1 , 6 , 1 and u 0 = 2 . (c) x 3 n generated from (2) with μ 0 , κ 0 , ν 0 , σ 0 2 = 3 , 2 , 6 , 4 / 3 and u 0 = 2 .
Figure 1. The histograms of the samples and their density curves. (a) x 1 n generated from (1) with μ 0 , κ 0 , ν 0 , σ 0 2 = 3 , 1 , 6 , 1 and u 0 = 2 . (b) x 2 n generated from (2) with μ 0 , κ 0 , ν 0 , σ 0 2 = 3 , 1 , 6 , 1 and u 0 = 2 . (c) x 3 n generated from (2) with μ 0 , κ 0 , ν 0 , σ 0 2 = 3 , 2 , 6 , 4 / 3 and u 0 = 2 .
Preprints 189298 g001
Figure 2. Quantities of interest as functions of x n + 1 for κ 0 = 2 , 1 , 0.5 in the simulations. u 0 = σ 0 2 κ 0 + 1 / κ 0 = 2 is fixed in all the plots. Left column: β * , δ s π , θ n + 1 x n + 1 , δ 2 π , θ n + 1 x n + 1 , and δ 2 π , μ n + 1 x n + 1 are affected by κ 0 . Right column: α * , P E S L s π , θ n + 1 x n + 1 , and P E S L 2 π , θ n + 1 x n + 1 are not affected by κ 0 .
Figure 2. Quantities of interest as functions of x n + 1 for κ 0 = 2 , 1 , 0.5 in the simulations. u 0 = σ 0 2 κ 0 + 1 / κ 0 = 2 is fixed in all the plots. Left column: β * , δ s π , θ n + 1 x n + 1 , δ 2 π , θ n + 1 x n + 1 , and δ 2 π , μ n + 1 x n + 1 are affected by κ 0 . Right column: α * , P E S L s π , θ n + 1 x n + 1 , and P E S L 2 π , θ n + 1 x n + 1 are not affected by κ 0 .
Preprints 189298 g002
Figure 3. The histogram of θ n + 1 | x n + 1 and the density estimation curve of π ( θ n + 1 | x n + 1 ) .
Figure 3. The histogram of θ n + 1 | x n + 1 and the density estimation curve of π ( θ n + 1 | x n + 1 ) .
Preprints 189298 g003
Figure 4. The Bayes estimators and the PESLs as functions of ν 0 . (a) Bayes estimators vs ν 0 . (b) PESLs vs ν 0 .
Figure 4. The Bayes estimators and the PESLs as functions of ν 0 . (a) Bayes estimators vs ν 0 . (b) PESLs vs ν 0 .
Preprints 189298 g004
Figure 5. The Bayes estimators and the PESLs as functions of μ 0 , κ 0 , σ 0 2 , and x n + 1 . (a), (c), (e), (g) Bayes estimators vs μ 0 , κ 0 , σ 0 2 , and x n + 1 . (b), (d), (f), (h) PESLs vs μ 0 , κ 0 , σ 0 2 , and x n + 1 .
Figure 5. The Bayes estimators and the PESLs as functions of μ 0 , κ 0 , σ 0 2 , and x n + 1 . (a), (c), (e), (g) Bayes estimators vs μ 0 , κ 0 , σ 0 2 , and x n + 1 . (b), (d), (f), (h) PESLs vs μ 0 , κ 0 , σ 0 2 , and x n + 1 .
Preprints 189298 g005
Figure 6. The domain for α * , β * is D = ( 1 , 10 ] × ( 0 , 10 ] for all the plots. a is for α * and b is for β * in the axes of all the plots. The red surface is for δ 2 π , θ n + 1 x n + 1 and the blue surface is for δ s π , θ n + 1 x n + 1 in the upper two plots. (a) The estimators as functions of α * and β * . δ 2 π , θ n + 1 x n + 1 > δ s π , θ n + 1 x n + 1 for all α * , β * on D. (b) The PESLs as functions of α * and β * . P E S L 2 π , θ n + 1 x n + 1 > P E S L s π , θ n + 1 x n + 1 for all α * , β * on D. (c) The surface of δ 2 π , θ n + 1 x n + 1 δ s π , θ n + 1 x n + 1 which is positive for all α * , β * on D. (d) The surface of P E S L 2 π , θ n + 1 x n + 1 P E S L s π , θ n + 1 x n + 1 which is also positive for all α * , β * on D.
Figure 6. The domain for α * , β * is D = ( 1 , 10 ] × ( 0 , 10 ] for all the plots. a is for α * and b is for β * in the axes of all the plots. The red surface is for δ 2 π , θ n + 1 x n + 1 and the blue surface is for δ s π , θ n + 1 x n + 1 in the upper two plots. (a) The estimators as functions of α * and β * . δ 2 π , θ n + 1 x n + 1 > δ s π , θ n + 1 x n + 1 for all α * , β * on D. (b) The PESLs as functions of α * and β * . P E S L 2 π , θ n + 1 x n + 1 > P E S L s π , θ n + 1 x n + 1 for all α * , β * on D. (c) The surface of δ 2 π , θ n + 1 x n + 1 δ s π , θ n + 1 x n + 1 which is positive for all α * , β * on D. (d) The surface of P E S L 2 π , θ n + 1 x n + 1 P E S L s π , θ n + 1 x n + 1 which is also positive for all α * , β * on D.
Preprints 189298 g006
Figure 7. The boxplots of the D values and the p-values for the three methods. (a) D values. (b) p-values.
Figure 7. The boxplots of the D values and the p-values for the three methods. (a) D values. (b) p-values.
Preprints 189298 g007
Figure 8. The marginal densities for varied μ 0 = 3 , 0 , 3 , 6 , 9 , holding κ 0 = 1 , ν 0 = 6 , and σ 0 2 = 1 fixed.
Figure 8. The marginal densities for varied μ 0 = 3 , 0 , 3 , 6 , 9 , holding κ 0 = 1 , ν 0 = 6 , and σ 0 2 = 1 fixed.
Preprints 189298 g008
Figure 9. The marginal densities for varied κ 0 = 0.25 , 0.5 , 1 , 2 , 4 , holding μ 0 = 3 , ν 0 = 6 , and σ 0 2 = 1 fixed.
Figure 9. The marginal densities for varied κ 0 = 0.25 , 0.5 , 1 , 2 , 4 , holding μ 0 = 3 , ν 0 = 6 , and σ 0 2 = 1 fixed.
Preprints 189298 g009
Figure 10. The marginal densities for varied ν 0 = 2 , 4 , 6 , 12 , 24 , holding μ 0 = 3 , κ 0 = 1 , and σ 0 2 = 1 fixed.
Figure 10. The marginal densities for varied ν 0 = 2 , 4 , 6 , 12 , 24 , holding μ 0 = 3 , κ 0 = 1 , and σ 0 2 = 1 fixed.
Preprints 189298 g010
Figure 11. The marginal densities for varied σ 0 2 = 0.25 , 0.5 , 1 , 2 , 4 , holding μ 0 = 3 , κ 0 = 1 , and ν 0 = 6 fixed.
Figure 11. The marginal densities for varied σ 0 2 = 0.25 , 0.5 , 1 , 2 , 4 , holding μ 0 = 3 , κ 0 = 1 , and ν 0 = 6 fixed.
Preprints 189298 g011
Figure 12. The histogram of the sample x n + 1 and its density estimation curve.
Figure 12. The histogram of the sample x n + 1 and its density estimation curve.
Preprints 189298 g012
Table 1. The moment estimators and the MLEs of the hyperparameters μ 0 , ν 0 , u 0 = 3 , 6 , 2 for the samples x 1 n , x 2 n , and x 3 n .
Table 1. The moment estimators and the MLEs of the hyperparameters μ 0 , ν 0 , u 0 = 3 , 6 , 2 for the samples x 1 n , x 2 n , and x 3 n .
Moment estimators MLEs
x 1 n 4.826 , 159.725 , 1.872 4.826 , 159.725 , 1.872
x 2 n 3.025 , 7.306 , 2.127 3.020 , 6.011 , 1.967
x 3 n 3.021 , 7.367 , 2.138 3.018 , 6.179 , 1.990
Table 2. The frequencies of the moment estimators and the MLEs of the hyperparameters as n varies for ε = 1 , 0.5 , and 0.1 .
Table 2. The frequencies of the moment estimators and the MLEs of the hyperparameters as n varies for ε = 1 , 0.5 , and 0.1 .
Moment estimators MLEs
n F μ 01 ε , M n F ν 01 ε , M n F u 01 ε , M n F μ 02 ε , M n F ν 02 ε , M n F u 02 ε , M n
ε = 1 1e4 0.00 0.24 0.00 0.00 0.03 0.00
2e4 0.00 0.23 0.00 0.00 0.00 0.00
4e4 0.00 0.06 0.00 0.00 0.00 0.00
8e4 0.00 0.01 0.00 0.00 0.00 0.00
16e4 0.00 0.00 0.00 0.00 0.00 0.00
ε = 0.5 1e4 0.00 0.48 0.00 0.00 0.16 0.00
2e4 0.00 0.40 0.00 0.00 0.04 0.00
4e4 0.00 0.33 0.00 0.00 0.00 0.00
8e4 0.00 0.19 0.00 0.00 0.00 0.00
16e4 0.00 0.10 0.00 0.00 0.00 0.00
ε = 0.1 1e4 0.00 0.81 0.36 0.00 0.84 0.03
2e4 0.00 0.87 0.34 0.00 0.62 0.00
4e4 0.00 0.82 0.21 0.00 0.52 0.00
8e4 0.00 0.81 0.10 0.00 0.43 0.00
16e4 0.00 0.70 0.04 0.00 0.27 0.00
Table 3. The numerical values of the Bayes estimators and the PESLs in Figure 4: ν 0 changes.
Table 3. The numerical values of the Bayes estimators and the PESLs in Figure 4: ν 0 changes.
ν 0 2 3 4 5 6 7 8 9 10 11
δ s π , θ n + 1 x n + 1 0.8333 0.8750 0.9000 0.9167 0.9286 0.9375 0.9444 0.9500 0.9545 0.9583
δ 2 π , θ n + 1 x n + 1 2.5000 1.7500 1.5000 1.3750 1.3000 1.2500 1.2143 1.1875 1.1667 1.1500
P E S L s π , θ n + 1 x n + 1 0.3690 0.2704 0.2131 0.1758 0.1496 0.1302 0.1152 0.1033 0.0937 0.0856
P E S L 2 π , θ n + 1 x n + 1 1.2704 0.5772 0.3690 0.2704 0.2131 0.1758 0.1496 0.1302 0.1152 0.1033
Table 4. The numerical values of the Bayes estimators and the PESLs in Figure 5: μ 0 changes.
Table 4. The numerical values of the Bayes estimators and the PESLs in Figure 5: μ 0 changes.
μ 0 5 4 3 2 1 0 1 2 3 4 5
δ s π , θ n + 1 x n + 1 6.6429 5.4286 4.3571 3.4286 2.6429 2.0000 1.5000 1.1429 0.9286 0.8571 0.9286
δ 2 π , θ n + 1 x n + 1 9.3000 7.6000 6.1000 4.8000 3.7000 2.8000 2.1000 1.6000 1.3000 1.2000 1.3000
P E S L s π , θ n + 1 x n + 1 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496
P E S L 2 π , θ n + 1 x n + 1 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131
Table 5. The numerical values of the Bayes estimators and the PESLs in Figure 5: κ 0 changes.
Table 5. The numerical values of the Bayes estimators and the PESLs in Figure 5: κ 0 changes.
κ 0 1 2 3 4 5 6 7 8 9 10
δ s π , θ n + 1 x n + 1 0.9286 0.9524 0.9643 0.9714 0.9762 0.9796 0.9821 0.9841 0.9857 0.9870
δ 2 π , θ n + 1 x n + 1 1.3000 1.3333 1.3500 1.3600 1.3667 1.3714 1.3750 1.3778 1.3800 1.3818
P E S L s π , θ n + 1 x n + 1 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496
P E S L 2 π , θ n + 1 x n + 1 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131
Table 6. The numerical values of the Bayes estimators and the PESLs in Figure 5: σ 0 2 changes.
Table 6. The numerical values of the Bayes estimators and the PESLs in Figure 5: σ 0 2 changes.
σ 0 2 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
δ s π , θ n + 1 x n + 1 0.5000 0.9286 1.3571 1.7857 2.2143 2.6429 3.0714 3.5000 3.9286 4.3571
δ 2 π , θ n + 1 x n + 1 0.7000 1.3000 1.9000 2.5000 3.1000 3.7000 4.3000 4.9000 5.5000 6.1000
P E S L s π , θ n + 1 x n + 1 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496
P E S L 2 π , θ n + 1 x n + 1 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131
Table 7. The numerical values of the Bayes estimators and the PESLs in Figure 5: x n + 1 changes.
Table 7. The numerical values of the Bayes estimators and the PESLs in Figure 5: x n + 1 changes.
x n + 1 5 4 3 2 1 0 1 2 3 4 5
δ s π , θ n + 1 x n + 1 5.4286 4.3571 3.4286 2.6429 2.0000 1.5000 1.1429 0.9286 0.8571 0.9286 1.1429
δ 2 π , θ n + 1 x n + 1 7.6000 6.1000 4.8000 3.7000 2.8000 2.1000 1.6000 1.3000 1.2000 1.3000 1.6000
P E S L s π , θ n + 1 x n + 1 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496 0.1496
P E S L 2 π , θ n + 1 x n + 1 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131 0.2131
Table 8. The results of the KS test goodness-of-fit of the model (2) to the simulated data.
Table 8. The results of the KS test goodness-of-fit of the model (2) to the simulated data.
oracle method moment method MLE method
D ¯ 0.0268 0.0239 0.0204
p value ¯ 0.5230 0.6245 0.7674
min D % 0.20 0.16 0.64
max p value % 0.20 0.16 0.64
Accept H 0 % 0.98 0.99 0.99
Table 9. The estimators of the hyperparameters, the goodness-of-fit of the model, the empirical Bayes estimators of the variance parameter of the normal distribution and the PESLs, and the mean and variance of the real data by the moment method and the MLE method.
Table 9. The estimators of the hyperparameters, the goodness-of-fit of the model, the empirical Bayes estimators of the variance parameter of the normal distribution and the PESLs, and the mean and variance of the real data by the moment method and the MLE method.
Moment method MLE method
Estimators of μ 0 μ 01 n = 0.1669620 μ 02 n = 0.1619865
the hyperparameters ν 0 ν 01 n = 4.933 ν 02 n = 12.287
u 0 u 01 n = 0.005 u 02 n = 0.007
Goodness-of-fit D 0.0909 0.0708
of the model p-value 0.5313 0.8233
Empirical Bayes estimators δ s π , θ n + 1 x n + 1 0.0023464 0.0032576
and PESLs δ 2 π , θ n + 1 x n + 1 0.0035396 0.0038348
P E S L s π , θ n + 1 x n + 1 0.1779193 0.0771456
P E S L 2 π , θ n + 1 x n + 1 0.2753140 0.0912061
Mean and variance of E X 0.1669620 0.1619865
the poverty level data Var X 0.0080094 0.0080536
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated