A Comparison of Estimation Methods for the Rasch Model

The Rasch model is one of the most prominent item response models. In this article, different item parameter estimation methods for the Rasch model are compared through a simulation study. The type of ability distribution, the number of items, and sample sizes were varied. It is shown that variants of joint maximum likelihood estimation and conditional likelihood estimation are competitive to marginal maximum likelihood estimation. However, efﬁciency losses of limited-information estimation methods are only modest. It can be concluded that in empirical stud-ies using the Rasch model, the impact of the choice of an estimation method with respect to item parameters is almost negligible for most estimation methods. Inter-estingly, this sheds a somewhat more positive light on old-fashioned joint maximum likelihood and limited information estimation methods.


Rasch Model
The Rasch model [9,18] is likely the most important item response model. It is of interest to select appropriate estimation methods in a diverse applications. A variety of estimation methods has been proposed. In this article, a comprehensive comparison of different estimation methods for the Rasch model is conducted. We manipulate the factors test length (i.e., number of items), sample size, and type of ability of distribution.
For a number of items X i (i = 1, . . . , I) and a random variable θ (ability), the item response function for the Rasch model is given as where Ψ is the logistic link function, b i is the item difficulty, and F is some distribution for ability θ . In addition, items X i are assumed to be locally independent, that is P(X 1 , . . . , X I |θ ) = ∏ I i=1 P(X i |θ ). Importantly, the sum score S = ∑ I i=1 X i is a sufficient statistic for θ if maximum likelihood (ML) estimation is employed. Hence, all items are equally weighted in θ , which eases the interpretation of Rasch model parameters. Moreover, because in the Rasch model, only a single parameter is estimated per item, low sample sizes are required for reliable estimation.

Estimation Methods for the Rasch Model
A variety of estimation methods has been proposed for the Rasch model [16]. In the Rasch model, item parameters b = (b 1 , . . . , b I ) and distribution parameters of F are estimated. Assume that item responses x pi are available for persons p = 1, . . . , P and items i = 1, . . . , I. Denote by x p the vector of item responses and by s p the sum score of person p.
In marginal maximum likelihood estimation (MML; [4]), latent variables θ are integrated out by posing some distributional assumption G γ γ γ for θ , where distribution parameters γ γ γ are simultaneously estimated with b. The log-likelihood function l(b, γ γ γ) is maximized. The likelihood contribution for person p is given by . If G γ γ γ differs from the data-generating distribution F, biased item parameters can occur. Frequently, a normal distribution for θ is posed (MML-N), and a standard deviation σ is estimated. The integral in the likelihood function is evaluated by numerical integration. Alternatively, a multinomial distribution for θ can be estimated. This approach starts with a fixed grid of θ points θ 1 , . . . , θ C and estimates probabilities γ c = P(θ = θ c ). A log-linear smoothing of these probabilities has been proposed in the so-called general diagnostic model (MML-LM; [19,22]). Typically, smoothing is performed for up to three or four moments. In a located latent class model with C classes, the values of the grid points θ c are estimated in addition to probabilities γ c (MML-LC; [7,10]). It has been shown that in the Rasch model with I items, at most C = I/2 latent classes can be identified. The MML-LC approach imposes the weakest assumptions about F.
In conditional maximum likelihood estimation (CML; [1]), a conditioning step on the sum score S is performed that eliminates θ from estimation equations. In more detail, l p (b) = log P(X = x p |S = s p ) is evaluated that is independent of θ . In joint maximum likelihood estimation (JML; see [16] for an overview), persons are regarded as fixed effects, and person parameters γ γ γ = (γ 1 , . . . , γ P ) are simultaneously estimated with item parameters b. In practice, the estimation JML algorithm alternates between θ θ θ and b parameter estimation in one iteration. Because the number of estimated parameters grows with sample size, a bias correction for item parameters is required [14,21]. With obtained item parametersb i , the bias-corrected item parameter is computed as (I − 1)/I ·b i . In order to include all persons in the es-timation (because an ML estimate for θ is not defined for persons with extreme scores s p = 0 or s p = I), weighted likelihood estimation (WLE; [20]) as person parameter estimates. As an alternative to WLE, the ε-algorithm of Bertoli-Bersotti (JMLε; [3]) employs a modified likelihood by replacing the sufficient statistic s p with ε + (s p − 2ε)/I using an appropriate ε > 0. In penalized JML (PJML; [5]), a ridge penalty term is added to the log-likelihood function. This approach corresponds to assuming a normal prior distribution θ ∼ N(0, σ 2 prior ) with an appropriate choice of the regularization parameter σ prior > 0. This approach also circumvents the exclusion of persons with extreme scores from CML. It has been demonstrated that JML and CML can be considered particular variants of MML estmation [13].
Several simpler estimation alternatives (so-called limited in-formation methods) do not rely on the full item response pattern x p . In pairwise MML (PMML; [15]) person contributions P(X i = x pi , X j = x p j ) are considered by integrating out the latent variable θ as in MML. Typically, a normal distribution is employed. In pairwise CML (PCML; [23]), the conditioning P( is used for optimization that also removes θ from estimation equations as in CML. The row averaging approach (RA; [6]), the eigenvector method (EVM; [12]; see also [2]) as well as the MINCHI method [8] only rely on the evaluation of bivariate frequencies P(X i = x, X j = y) (x, y = 0, 1) and do not require assumptions about the distribution F of θ .

Method
In the simulation study, item response data has been generated for the Rasch model. We varied the number of items (I = 10, and 30) and sample sizes (N = 100, 250, 500, and 1,000). We chose I equidistant item parameters in the interval −1.5 and 1.5. Three types of ability distributions were simulated. First, we assumed a normal distribution N(0, 1) (Normal) for θ . Second, we simulated a standardized chi-square (Chi 2 ) distribution with one degree of freedom. Third, we simulated a located latent class Rasch model with three classes (LC3) and θ points −0.790, 1.033, 2.248 with corresponding probabilities .60, .35, and .05.
As analysis models, we implemented the estimation methods described in Section 2. For MML-LM estimation, we used a log-linear smoothing up to three and four moments. We specified MML-LC with 3, 4, and 5 located latent classes. For JMLε estimation, we tried values ε = 0.1, 0.3, and 0.5. In PJML estimation, we chose normal priors N(0, σ 2 prior ) with σ prior = 1, 1.5, and 2. The whole simulation was carried out in R [17] utilizing the R packages immer, pairwise and sirt. To enable comparisons of estimated item parameters across estimation methods, the set of item parameters were centered after estimation (i.e., they have a mean of 0). In total, 5,000 replications were conducted in each cell of the simulation design. Bias, standard deviation (SD), and root mean square error (RMSE) were estimated for all item parameters. We consider two summary measures of item parameter recovery. First, the average absolute bias AAB(b) = I −1 ∑ I i=1 |Bias(b i )| quantifies the average bias of item parameters. Second, bias and variability is summarized in the average relative RMSE (RRMSE) that is defined as where SD MML−N is the SD of item parameters using MML-N estimation. Hence, MML estimation using the normal distribution serves as the reference method.

Results
We only report JMLε with ε = .3 and PJML with σ prior = 1.5 that performed best on average across conditions for lack of space. We also only state results for MML-LM with smoothing 4 moments (MML-LM4) which was superior to only using three moments). MML-LC is reported for 3 located latent classes (MML-LC3), but the there were only low efficiency losses when using 4 or 5 classes.
The bias (i.e., the MAB) of item parameters was highest for JML with using WLE (JMLW) for short test length (I = 10) but vanished in a long test (I = 30). However, MML using an incorrect normal distribution (MML-N) produced slightly biased item parameters in the case of non-normal distributions (Chi 2 and LC3). Surprisingly, the normal distributional misspecification in pairwise MML (PMML) had even worse consequences than in MML-N. Bias and RRMSE values were averaged across conditions for each methods and ranked. These ranks are shown in Table 1. Overall, CML, the limited information methods EVM, RA, and CCML as well as MML-LC3 and MML-LM4 performed best in terms of bias. It may also be surprising that MML with located latent classes (MML-LC3) also performs well for continuous ability distributions.
In Table 1, the ranks of estimation methods across all conditions and results for 10 items are shown for the RRMSE. The findings for 30 items were similar but less pronounced. Overall, JML estimation methods performed well, in particular the ε-algorithm JMLε. Notably, MML with more flexible distributions and CML produced low RRMSE values. Interestingly, misspecified MML using a normal distribution (MML-N) outperformed limited information estimators with respect to variability (PMML, CMML, EVM, RA, MINCHI). Hence, the potential bias introduced by MML-N compared to the latter estimation methods can be compensated by smaller variability. It is likely that these findings also transfer to test designs with missing data.

Discussion
In this article, we compared several estimation methods for the Rasch model. It has been shown that the choice of the ability distribution impacts estimated item parameters. However, differences between estimation methods are only modest, in particular for longer test lengths. Interestingly, joint maximum likelihood estimation methods outperformed conditional and marginal maximum likelihood as well as limited information estimation methods. Prior distributions for item parameters can further improve estimation in small samples [11].