Imputation Bias in ARIMA Air Quality Models

Ejaz Hussain; Yang Li; Atiqur Rahman Ahad

doi:10.20944/preprints202603.1325.v1

Submitted:

16 March 2026

Posted:

17 March 2026

You are already at the latest version

Abstract

Missing data remains a pervasive challenge in air quality data analysis, where inappropriate imputation techniques can introduce hidden biases and compromise the reliability of time-series models such as AutoRegressive Integrated Moving Average (ARIMA). This paper examines the impact of linear interpolation and mean/median imputation on the performance of the ARIMA model and biases in the prediction of particulate matter 2.5 (PM2.5) concentration, together with a detailed analysis of ARIMA generated error metrics and their implications for the accuracy and reliability of the prediction. The findings reveal that package-default imputation significantly influences ARIMA forecasts, while mean/median imputation consistently delivers superior predictive performance, highlighting its robustness for handling missing environmental data. Moreover, imputation during the data transformation stage exerts a greater influence on model outcomes than methods applied at later analysis stages.

Keywords:

bias

;

air quality

;

ARIMA

;

forecasting

;

imputation

;

data analysis

;

predictive analysis

;

bias mitigation

Subject:

Computer Science and Mathematics - Analysis

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Imputation Bias in ARIMA Air Quality Models

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe