Submitted:
09 April 2025
Posted:
09 April 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
3. Materials and Methods
3.1. Data Collection
3.2. INMET and the A711 São Carlos Station
- Monitored Parameters: The A711 Station automatically measures air temperature, relative humidity, precipitation, wind speed and direction, atmospheric pressure, and solar radiation, providing a comprehensive view of local weather conditions.
-
Equipment Used: The station is equipped with high-precision sensors, including:
- -
- Digital Thermometer: Measures air temperature with high precision, essential for analyzing thermal variations that may influence precipitation formation.
- -
- Hygrometer: Captures relative humidity, a critical variable for precipitation forecasting and storm cloud formation.
- -
- Rain Gauge: Collects and quantifies precipitation, a key data point for analyzing rainfall trends.
- -
- Anemometer: Measures wind speed, aiding in storm prediction and identifying high-pressure areas.
- -
- Barometer: Measures atmospheric pressure, a crucial parameter for detecting changes in atmospheric conditions.
- -
- Pyranometer: Measures solar radiation, relevant for climate studies and evaporation analysis.
- Transmission Technology: The A711 Station transmits data automatically to INMET using radio and internet communication, enabling frequent and precise updates, essential for real-time analysis.
3.3. Libelium and the BME280 Module
-
Measurements: The BME280 sensor measures temperature, relative humidity, and atmospheric pressure, with advanced specifications:
- -
- Temperature: Measures temperatures in the range of -40°C to 85°C, with an accuracy of ±1°C.
- -
- Humidity: Captures relative humidity between 0% and 100%, with an accuracy of ±3%, essential for analyzing weather conditions and identifying precipitation patterns.
- -
- Pressure: Measures atmospheric pressure from 300 hPa to 1100 hPa, with an accuracy of ±1 hPa, important for predicting climate variations and atmospheric patterns.
- Communication Interface: The sensor uses I2C and SPI interfaces, facilitating integration with microcontrollers and other IoT systems.
- Energy Consumption: The BME280 is highly efficient, consuming only 3.6 µA in continuous measurement mode and 0.1 µA in sleep mode. This efficiency makes it ideal for devices operating in remote environments with long-term monitoring needs.
- Dimensions: The module is extremely compact, measuring 2.5 x 2.5 x 0.93 mm, making it easy to incorporate into portable devices and low-power systems.
3.4. Storage Platform: MongoDB
3.5. Data Fusion Methods
3.5.1. Temporal Fusion
3.5.2. Exponential Smoothing Functionality
3.5.3. Algorithms Used in Temporal Fusion
ARIMA with Exponential Smoothing (Sktime)
- Modeling with ARIMA: The model identifies AR (autoregressive), I (integration), and MA (moving average) components in the time series.
- Application of Smoothing: Exponential smoothing is applied to adjust the weight of more recent observations.
- Forecasting: The final model, combining ARIMA and smoothing, is used to predict future values, balancing long-term trends with recent fluctuations.
SARIMA with Exponential Smoothing (Sktime)
- Identification of Seasonal Components: The model includes seasonal autoregressive, differential, and moving average terms to capture repetitive patterns.
- Smoothing of Recent Data: Exponential smoothing is applied to adjust the weight of more recent seasonal data.
- Forecasting: The combination of SARIMA and smoothing generates forecasts that capture seasonal trends.
Prophet
- Modeling of Trends and Seasonality: Terms for long-term trends and seasonal patterns are adjusted.
- Incorporation of Specific Events: External events are included, increasing accuracy in cases with holiday effects or specific dates.
- Forecasting: Generates projections adjusted for seasonality and events.
Time Series K-Means (Tslearn)
- Feature Extraction: The algorithm converts time series data into feature representations using distance metrics such as Dynamic Time Warping (DTW) or Euclidean distance.
- Clustering Process: The data points are iteratively assigned to clusters based on similarity measures, minimizing intra-cluster variance.
- Pattern Identification: The final clusters reveal distinct precipitation trends, which can be further analyzed for forecasting insights.
3.6. Data Analysis
3.7. Prediction Method: Random Forest
3.8. Performance Metrics
- Mean Absolute Error (MAE): Measures the average absolute difference between predicted and actual values. Lower MAE values indicate better model performance.
- Mean Squared Error (MSE): Measures the average squared difference between predicted and actual values. MSE penalizes larger errors more heavily, making it sensitive to outliers.
- Coefficient of Determination (R2): Indicates the proportion of variance in the dependent variable that is predictable from the independent variables. An R2 value close to 1 indicates a model that explains a large portion of the variance.
- Cross-Validation MAE (CV_MAE): Measures MAE across multiple validation folds, providing a more reliable estimate of the model’s generalization capability and robustness against overfitting.
4. Results
4.1. Temporal Fusion
4.2. Best Results for Temporal Fusion
4.3. Model Forecast Comparisons
4.4. Summary of Results
- Superior Performance of ARIMA with Exponential Smoothing: The ARIMA with exponential smoothing algorithm achieved the best results, with the lowest MAE and MSE values and the highest R² score, indicating high predictive accuracy.
- Importance of Temporal Patterns: Temporal fusion effectively captured seasonal trends and recurring patterns, essential for reliable precipitation forecasting [7].
- Algorithm Selection Matters: While Prophet achieved competitive MAE and MSE values, its lower R² score highlighted limitations in explaining variance in certain scenarios. This underscores the importance of selecting appropriate models for specific forecasting tasks.
5. Discussion
6. Conclusions
- MAE: Measures the average absolute difference between predicted and actual values. Lower values indicate better accuracy.
- MSE: Penalizes larger errors more heavily, making it sensitive to outliers. Lower MSE values suggest more consistent predictions.
- R2: Indicates the proportion of variance in the dependent variable explained by the model. Higher values signify better explanatory power.
- CV_MAE: Evaluates the model’s generalization capability by assessing its performance across multiple validation folds. Lower CV_MAE values indicate more reliable and robust predictions.
References
- Akanbi, A.; Masinde, M. A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring. Sensors 2020, 20, 3166. [Google Scholar] [CrossRef] [PubMed]
- Roh, Y.; Heo, G.; Whang, S.E. A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective. IEEE Transactions on Knowledge and Data Engineering 2021, 33, 1328–1347. [Google Scholar] [CrossRef]
- Fawzy, D.; Moussa, S.; Badr, N. The Spatiotemporal Data Fusion (STDF) Approach: IoT-Based Data Fusion Using Big Data Analytics. Sensors 2021, 21. [Google Scholar] [CrossRef] [PubMed]
- Shi, P.; Li, G.; Yuan, Y.; Kuang, L. Data Fusion Using Improved Support Degree Function in Aquaculture Wireless Sensor Networks. Sensors 2018, 18. [Google Scholar] [CrossRef] [PubMed]
- Valente, F.J.; Morijo, J.P.; Vivaldini, K.C.T.; Trevelin, L.C. Fog-based data fusion for heterogeneous iot sensor networks: a real implementation. In Proceedings of the 2019 15th International Conference on Network and Service Management (CNSM). IEEE; 2019; pp. 1–5. [Google Scholar]
- Elbanoby, Y.; Aborizka, M.; Maghraby, F. Real-Time Data Management for IoT in Cloud Environment. In Proceedings of the 2019 IEEE Global Conference on Internet of Things (GCIoT); 2019; pp. 1–7. [Google Scholar] [CrossRef]
- Kenda, K.; Kažič, B.; Novak, E.; Mladenić, D. Streaming Data Fusion for the Internet of Things. Sensors 2019, 19. [Google Scholar] [CrossRef] [PubMed]









Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).