Preprint
Article

This version is not peer-reviewed.

Imputation Bias in ARIMA Air Quality Models

Submitted:

16 March 2026

Posted:

17 March 2026

You are already at the latest version

Abstract
Missing data remains a pervasive challenge in air quality data analysis, where inappropriate imputation techniques can introduce hidden biases and compromise the reliability of time-series models such as AutoRegressive Integrated Moving Average (ARIMA). This paper examines the impact of linear interpolation and mean/median imputation on the performance of the ARIMA model and biases in the prediction of particulate matter 2.5 (PM2.5) concentration, together with a detailed analysis of ARIMA generated error metrics and their implications for the accuracy and reliability of the prediction. The findings reveal that package-default imputation significantly influences ARIMA forecasts, while mean/median imputation consistently delivers superior predictive performance, highlighting its robustness for handling missing environmental data. Moreover, imputation during the data transformation stage exerts a greater influence on model outcomes than methods applied at later analysis stages.
Keywords: 
;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated