Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

On the Treatment of Missing Item Responses in Educational Large-scale Assessment Data: The Case of PISA 2018 Mathematics

Version 1 : Received: 6 October 2021 / Approved: 6 October 2021 / Online: 6 October 2021 (12:52:48 CEST)

How to cite: Robitzsch, A. On the Treatment of Missing Item Responses in Educational Large-scale Assessment Data: The Case of PISA 2018 Mathematics. Preprints 2021, 2021100107 (doi: 10.20944/preprints202110.0107.v1). Robitzsch, A. On the Treatment of Missing Item Responses in Educational Large-scale Assessment Data: The Case of PISA 2018 Mathematics. Preprints 2021, 2021100107 (doi: 10.20944/preprints202110.0107.v1).

Abstract

Missing item responses are prevalent in educational large-scale assessment studies like the programme for international student assessment (PISA). The current operational practice scores missing item responses as wrong, but several psychometricians advocated a model-based treatment based on latent ignorability assumption. In this approach, item responses and response indicators are jointly modeled conditional on a latent ability and a latent response propensity variable. Alternatively, imputation-based approaches can be used. The latent ignorability assumption is weakened in the Mislevy-Wu model that characterizes a nonignorable missingness mechanism and allows the missingness of an item to depend on the item itself. The scoring of missing item responses as wrong and the latent ignorable model are submodels of the Mislevy-Wu model. This article uses the PISA 2018 mathematics dataset to investigate the consequences of different missing data treatments on country means. Obtained country means can substantially differ for the different scaling models. In contrast to previous statements in the literature, the scoring of missing item responses as incorrect provided a better model fit than a latent ignorable model for most countries. Furthermore, the dependence of the missingness of an item from the item itself after conditioning on the latent response propensity was much more pronounced for constructed-response items than for multiple-choice items. As a consequence, scaling models that presuppose latent ignorability should be refused from two perspectives. First, the Mislevy-Wu model is preferred over the latent ignorable model for reasons of model fit. Second, we argue that model fit should only play a minor role in choosing psychometric models in large-scale assessment studies because validity aspects are most relevant. Missing data treatments that countries can simply manipulate (and, hence, their students) result in unfair country comparisons.

Keywords

missing item responses; multiple imputation; item response model; PISA; country comparisons; Mislevy-Wu model; latent ignorability; nonignorable item responses

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.