Preprint
Article

This version is not peer-reviewed.

A Multi-Source Sensor Dataset for Spain: Integrating Air Quality, Meteorological, Mobility and Calendar Records

Submitted:

27 May 2026

Posted:

28 May 2026

You are already at the latest version

Abstract
Air quality forecasting and environmental health research at urban and regional scales depend on the combination of measurements from heterogeneous sensor networks, yet the construction of integrated multi-source datasets is rarely described or released as a self-contained deliverable. This paper presents an open dataset that combines four sensor-derived sources covering the whole of Spain over the period 2022 to 2024: hourly air quality observations from the 588 stations of the national network operated by the Ministerio para la Transición Ecológica y el Reto Demográfico (MITECO), daily meteorological records from the Agencia Estatal de Meteorología (AEMET), daily mobility indicators derived from anonymised mobile telephony events published by the Ministerio de Transportes y Movilidad Sostenible (MITMA) at the municipality level, and a calendar of national and Autonomous-Community public holidays. The processing pipeline harmonises sources that differ in temporal resolution, spatial codification and quality regime into a tidy hourly table indexed by station and timestamp, with a fixed feature schema of 56 variables per record. Air quality stations are paired with their nearest AEMET station through a three-tier distance rule, and the daily exogenous features are aligned to the air quality time axis through a two-variant temporal-alignment scheme (lag-and-expand to the hourly grid for the hourly release, same-calendar-day join for the daily release). A complementary daily-resolution variant of the dataset is also released, with 72 columns and the same feature schema except for the air quality block, which is aggregated to daily mean, minimum and maximum. The integrated dataset contains approximately 14 million hourly records across the 588 stations and is released on Zenodo (DOI 10.5281/zenodo.20196221) under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It is intended as a substrate for research on air quality forecasting, environmental epidemiology and multi-source data fusion at nationwide scale.
Keywords: 
;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated