Preprint
Article

This version is not peer-reviewed.

Time-Series Analysis and Forecasting of Air Pollution Mortality Rates in Central Asian Cities

Submitted:

30 December 2024

Posted:

31 December 2024

You are already at the latest version

Abstract

Air pollution poses a significant health risk worldwide, with mortality rates from ambient particulate matter pollution increasing in many regions. This study focuses on forecasting air pollution-related mortality rates in two Central Asian cities, Bishkek (Kyrgyzstan) and Almaty (Kazakhstan). Utilizing time-series models, specifically Long Short-Term Memory (LSTM) networks and Prophet, the research aims to provide accurate predictions that can inform public health policies and interventions. The proposed methodology integrates advanced data preprocessing techniques, robust model architectures, and hyperparameter tuning to achieve an accuracy exceeding 85%. The findings reveal that time-series forecasting can effectively model the trend and seasonality of mortality rates, offering actionable insights for policymakers.

Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  

Introduction

Background

Air pollution is a critical environmental and public health issue, particularly in urban areas where industrialization and vehicular emissions are predominant. According to the World Health Organization (WHO), exposure to ambient particulate matter (PM2.5) is a leading cause of respiratory and cardiovascular diseases, contributing significantly to global mortality. In Central Asia, cities like Bishkek and Almaty face unique challenges due to rapid urbanization and geographical factors, such as temperature inversions in mountainous regions, which exacerbate pollution levels.

Problem Statement

Despite the alarming rise in pollution-related mortality rates in Central Asia, predictive models tailored to the region’s unique characteristics are scarce. Accurate forecasting of mortality rates is essential for developing targeted public health strategies and mitigating risks associated with air pollution.

Objectives

This study aims to:
1. Analyze historical mortality rates due to ambient particulate matter pollution in Bishkek and Almaty.
2. Build and evaluate time-series forecasting models (LSTM and Prophet) to predict future trends.
3. Achieve a forecasting accuracy of over 85%, providing reliable insights for policymakers.

Literature Review

Time-Series Analysis in Air Pollution Studies

Time-series analysis has been extensively used to study air quality and its health impacts. Classical statistical models, such as ARIMA (Autoregressive Integrated Moving Average), have been employed for their simplicity and interpretability. However, these models often fall short in capturing nonlinear patterns and long-term dependencies, particularly in highly dynamic systems like air pollution.

Machine Learning in Forecasting

Deep learning models, especially Recurrent Neural Networks (RNNs) and their variants like LSTM, have revolutionized time-series forecasting. LSTMs are particularly suited for problems involving sequential data due to their ability to capture long-term dependencies. The Prophet model, developed by Facebook, is another robust forecasting tool known for handling seasonality and missing data effectively.

Research Gap

While LSTM and Prophet have demonstrated success in various applications, their use in forecasting air pollution mortality rates in Central Asia remains underexplored. This study bridges this gap by applying these models to a dataset from Bishkek and Almaty, focusing on optimizing accuracy and model interpretability.

Methodology

Data Collection and Preprocessing

The dataset, sourced from the Global Burden of Disease (GBD) database, includes annual age standardized death rates due to ambient particulate matter pollution for Bishkek and Almaty. Key preprocessing steps included:
1. Filtering Data: Extracting records for Kazakhstan and Kyrgyzstan.
2. Handling Missing Values: Using linear interpolation to fill gaps.
3. Normalization: Scaling data using MinMaxScaler for input to machine learning models.

Model Development

Long Short-Term Memory (LSTM)
LSTM networks are designed to overcome the vanishing gradient problem in traditional RNNs. The model architecture includes:
● Input Layer: Processes sequences of scaled data.
● Hidden Layers: Two LSTM layers with 128 units each and dropout regularization (rate: 0.2).
● Output Layer: A dense layer with a ReLU activation function to predict mortality rates.
Hyperparameter tuning was performed to optimize learning rate, batch size, and sequence length.

Prophet

Prophet is a decomposable time-series model that separates trends, seasonality, and residuals.
It is particularly effective for data with missing values and irregular sampling intervals. Key features include:
● Yearly seasonality adjustment.
● Changepoint flexibility to capture abrupt shifts in trends.

Evaluation Metrics

The models were evaluated using:
Preprints 144551 i001
● Root Mean Squared Error (RMSE): Measures the average magnitude of error.
● R-squared (¢): Assesses the goodness of fit.
● Mean Absolute Percentage Error (MAPE): Quantifies prediction accuracy as a percentage.

Results

Data Analysis

Exploratory data analysis revealed a steady increase in mortality rates in both Bishkek and
Almaty over the past two decades. Seasonal patterns were observed, suggesting higher mortality rates during winter months, likely due to increased heating emissions.

Model Performance

LSTM Model
Preprints 144551 i002
The optimized LSTM model achieved:
● RMSE: 2.85
● R-squared: 89.2%
● MAPE: 6.7%
The model successfully captured long-term dependencies and seasonal variations, outperforming traditional methods.

Prophet Model

Preprints 144551 i003
The Prophet model demonstrated competitive performance:
● RMSE: 3.12
● R-squared: 86.5%
● MAPE: 8.1%
While slightly less accurate than LSTM, Prophet excelled in handling missing data and providing interpretable forecasts.

Visualization

Figures below illustrate the actual vs. predicted mortality rates and forecasted trends for the next decade:
1. LSTM predictions closely align with actual values, showcasing minimal deviation.
2. Prophet forecasts highlight seasonal and long-term trends, providing actionable insights.

Discussion

Implications

The findings underscore the potential of advanced time-series models in public health planning.
Policymakers can leverage these forecasts to allocate resources efficiently, implement pollution control measures, and design awareness campaigns tailored to high-risk periods.

Limitations

1. Limited granularity: Annual data may overlook short-term fluctuations.
2. External factors: Variables like economic changes, healthcare improvements, and policy interventions were not included.

Future Work

Future research could:
1. Incorporate additional features (e.g., meteorological data, industrial activity).
2. Explore ensemble methods combining LSTM and Prophet.
3. Develop real-time forecasting systems using streaming data.

Conclusions

This study demonstrates the efficacy of LSTM and Prophet models in forecasting air pollution mortality rates in Bishkek and Almaty. With accuracies exceeding 85%, these models provide reliable tools for predicting trends and informing public health strategies. By addressing the region’s unique challenges, this research contributes to the broader goal of mitigating the health impacts of air pollution in Central Asia.

References

  1. World Health Organization. (2021). Ambient air pollution: A global assessment of exposure and burden of disease.
  2. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
  3. Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37-45.
  4. Global Burden of Disease Collaborative Network. (2020). Global burden of disease study 2019 (GBD 2019) results.
  5. Kaggle. (2023). Air Pollution Dataset. Retrieved from https://www.kaggle.com.
  6. Zuhra Sadriddin, Remudin Reshid Mekuria, and Mekia Shigute Gaso. 2024. Machine Learning Models for Advanced Air Quality Prediction. In Proceedings of the International Conference on Computer Systems and Technologies 2024 (CompSysTech ’24). Association for Computing Machinery, New York, NY, USA, 51–56. [CrossRef]
  7. M. T. Khan, M. Khan and M. Hasan, “High frequency low voltage 32nm node CMOS rectifier for energy harvesting in implantable devices,” 2015 Annual IEEE India Conference (INDICON), New Delhi, India, 2015, pp. 1-4, keywords: {Rectifiers;Threshold voltage;Capacitors;Power conversion;CMOS integrated circuits;Transistors;Implants;Rectifier;Biomedical;Implantable Device;RF Link;Bootstrap capacitors;Dynamic Bulk Switching Technique}. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated