1. Introduction
Extreme weather events have become more severe in cities due to climate change. As temperatures rise due to global warming, the world is experiencing a shortage of natural resources, such as raw materials, energy sources, and water, especially in populated areas. Climate change is also increasing the natural evaporation of water supplies, leading to dry conditions. This serious problem should be alleviated in part by smart city solutions such as circular economy approaches, involving R9 strategies, which can help with scarce resource conditions [
1].
To mitigate the negative effects of such weather conditions and decreased water consumption, it is crucial to monitor and predict climate conditions closely and implement effective water management strategies. Mountains provide freshwater globally and may influence water resource management, climate change adaptation, and water policy. Viviroli examined 11 case study regions to provide a global perspective, highlight research deficiencies, and propose recommendations for research, management, and policy [
2].
Forecasting urban drought is a crucial strategy for municipalities and water management organizations. Urban extreme weather can be predicted using statistical forecasting approaches like time series analysis. Analyzing historical meteorological data can reveal city climate trends and patterns. These findings can help predict urban dryness and improve how we respond to extreme climatic events.
Time series models, such as ARIMA and SARIMA, have been widely used in drought prediction by analyzing historical precipitation and temperature data. For instance, Durdu (2010) developed ARIMA and SARIMA models to forecast drought in the Büyük Menderes River basin using the Standardized Precipitation Index (SPI), where the ARIMA model demonstrated superior performance. ARIMA models presume data stationarity and are linear, which can limit them when dealing with complex nonlinear interactions or drought dynamics [
3].
In addition to classical statistical methods, ML models are also used for weather analysis. Water resources engineering, planning, and irrigation scheduling need precipitation forecasting. For daily precipitation forecasting, this study compares hybrid wavelet-genetic programming (WGEP) and wavelet-neuro-fuzzy (WNF) models. The first phase employs single genetic programming (GEP) and neuro-fuzzy (NF) models to predict daily precipitation based on past values, although the results are limited. In the next step, hybrid WGEP and WNF models using wavelet coefficients as GEP and NF inputs provide disappointing results despite improved accuracy. In the third step, the best single and hybrid model inputs build new WGEP and WNF models. Kisi, O., & Shiri demonstrate that the novel hybrid WGEP models forecast daily precipitation better than WNF models, which fail to learn the non-linear precipitation process [
4].
To investigate drought variability in Denizli, a semi-arid region of Turkey, self-calibrated Palmer Drought Severity Index (sc-PDSI) values were utilized, and projections were conducted for various time horizons: short-term (1 month), mid-term (3 months and 6 months), and long-term (12 months) [
5].
Dependable harm Decision-makers need to forecast droughts, largely caused by spatiotemporal precipitation imbalances, to design adaptive measures. Decision-makers need these drought forecasting requirements: The prediction must identify impacted locations and severity, after that severity should be quantified rather than numerically, and at the end it should be completed quickly with minimum information. Ock-Jae Jang developed a drought forecasting method that merges the water balance model with a deep neural network to address these objectives [
6].
Some authors have compared classical time series analysis models and ML models for weather forecasting. Drought forecast accuracy is critical for informed land and water resource management as climate change and land use changes increase droughts. This research anticipated drought using the Informer model and compared it to ARIMA, LSTM, and CNN models [
7].
The Internet of Things (IoT) is the underlying concept that enables smart cities to forecast weather conditions. IoT devices, such as weather stations, building sensors, and smartphone apps, collect vast amounts of data on atmospheric conditions, temperature, humidity, and wind patterns. Data are then transferred to a central hub for fast processing and analysis. Machine learning algorithms turn collected data into accurate weather predictions, helping individuals, organisations, and governments make sensible decisions. Banara [
8] examined smart city IoT weather monitoring systems literature on sensors, microcontrollers, and communication media. The project sought to improve weather monitoring systems by integrating the Internet of Things.
Sreenivasulu [
9] demonstrated an efficient IoT-based weather forecasting method for smart cities utilising sensor data. The suggested method exceeded established methods in accuracy and speed. Atta-ur Rahman presented his framework for predicting precipitation in smart cities using fuzzy logic to fuse machine learning methodologies’ predictive accuracy [
10]. The effects of soil moisture on seasonal temperature and precipitation prediction scores in Europe have been shown by Van Den Hurk [
11].
The analysis of meteorological characteristics was utilized in numerous applications to evaluate gradual trends and forecast weather parameters [
12,
13,
14,
15].
Advances in time series forecasting, such as combining statistical methods with machine learning and signal processing, can improve urban drought predictions. These advanced models help managers to understand complex urban climate patterns and trends. Smart cities leverage technology and data analytics to boost productivity, sustainability, and quality of life, revolutionizing our work, lifestyle, and environment. Smart cities need accurate, real-time weather forecasting. Advanced sensors, satellites, and machine learning algorithms can help smart cities predict weather. This enables citizens and local authorities to strategize and prepare for weather-related incidents effectively. The primary advantage of precise weather prediction in smart cities is the ability to anticipate and respond to severe weather occurrences. The occurrence of heatwaves, storms, and floods has increased due to climate change. Precise weather predictions enable local government agencies to strategize, provide emergency aid effectively and, if necessary, evacuate areas with a higher danger risk.
Given that each place on Earth has distinct attributes and traits, the optimal approach to identifying the best model is to start with classical statistical time series models, which will be discussed in this article. This paper will apply, analyses, and compare traditional statistical time series models. In the future, our objective is to use machine learning models to determine which one provides superior predictions based on the available data for specific regions.
Exponential smoothing outperformed complicated statistical methods like autoregressive integrated moving averages (ARIMA) in past time series forecasting contests. There are three types of exponential smoothing algorithms: simple, single, double, and triple. Many approaches use a stochastic process to create observable data, which is influenced by unseen but predictable components, such as local level, trend, and seasonality. The components must be modified when the time series structure changes [
16].
Brown and Holt, independently, developed basic exponential smoothing. Winters expanded Holt’s exponential techniques to predict complex time series with seasonality. Their work is recognized by the term Holt-Winters exponential smoothing algorithms. The development of exponential smoothing algorithms relied on heuristics. Thus, model selection and error distribution modelling were unstatistical. Enterprises quickly adopted them for inventory modelling due to their ease of use [
17,
18,
19].
Tihi proposed employing (S)ARIMA time series models to collect, preprocess, and forecast meteorological parameters. The forecasting models were built on data sets of average the air temperature, precipitation, and soil moisture values from January 2014 to December 2018. The paperwork approach was used to manually choose the best (S)ARIMA models. This technique yielded 12 the air temperature, 6 precipitation, and 12 soil moisture time series models with distinct components. The best models for all three time series were chosen using AIC, AICc, BIC, and Ljung-box test p-values [
20].
Todorov presented an evaluation-based forecasting model selection process [
21]. The strategy was applied to an the air temperature dataset using a SARIMA model. Seasonal model parameters have been automatically chosen. Statistical metrics assessed model performance, including MAE, RMSE, and MAPE.
The study aims to collect and preprocess meteorological data to compare and analyses the efficacy of statistical time series methods and select the optimal model for forecasting drought-related variables. This study employs univariate time series data related to drought, such as the air temperature, precipitation, wind speed, and soil moisture, to build traditional forecasting models, compare them, and visualize their results. Upon comparison, the best predictive model is chosen based on the employed metrics and thereafter utilized for computing drought indexes to categorize the severity of drought and assist decision-makers in formulating drought-related initiatives.
2. Methods
Forecasting Methods
Time series forecasting involves predicting future values based on historical data, and it is a crucial subject in several fields, such as finance, economics, and meteorology. A time series refers to a set of observations or data points, that pertain to a particular variable and are recorded over some time. Time series forecasting uses a range of methods, where the most popular ones are ARIMA and Holt-Winters. These methods can be used in various applications, from forecasting GDP per capita to forecasting primary energy consumption. Also, the efficacy of these data-driven methods in meteorological forecasting applications is very high [
22,
23,
24,
25,
26].
When making predictions for time series data, it is crucial to assess the characteristics of the data, including trend, seasonality, correlations and any other factors that might impact the accuracy of the model. Model performance should be assessed by using statistical measures, such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), or Mean Absolute Percentage Error (MAPE).
(S)ARIMA Methods
Another popular method used in time series forecasting is ARIMA (AutoRegressive Integrated Moving Average) and its seasonal version called SARIMA (Seasonal ARIMA). ARIMA is a statistical model that considers autocorrelation and moving average components of time series data. By analyzing past data and trends, ARIMA can forecast future values with a high level of accuracy [
27]. On the other hand, SARIMA is an extension of ARIMA that includes the seasonal component in the model. SARIMA allows for more accurate predictions of seasonal trends and patterns in the data. It can capture the dynamics of a linear system very well [
28].
ARIMA forecasting is a commonly used technique for predicting future values of a time series by analyzing its past values. This statistical methodology utilizes sophisticated methods, such as autoregression (AR), moving average (MA) models, and differencing (I), to guarantee the stability of the series. The model integrates autoregressive, moving average, and integrated components to represent the interdependencies and patterns of the series precisely.
Achieving proficiency in ARIMA forecasting requires a thorough comprehension of time series theory and statistical concepts. The Autoregressive component of the model refers to the dependence of current values on prior ones, where the autoregression order (p) determines the number of past values considered. The Moving average component takes into account the impact of prior mistakes on the current result. Moving average order (q) determines the number of previous errors that are considered. Differencing (d) is used to achieve stationarity in a series. This is accomplished by subtracting the prior observation from the current one and eliminating any trend or seasonal influences.
The conventional representation for the non-seasonal ARIMA model is ARIMA (p,d,q), where each parameter is determined by analyzing the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the time series.
The Moving average component indicates the relationship between the current observation and the residual error resulting from applying a moving average model to previous data.
The SARIMA model is used to predict seasonal data, and it incorporates three seasonal components, namely, P, D, and Q, in addition to the three previously described non-seasonal components.
Holt-Winters Exponential Smoothing Method
Brown and Holt [
17,
18,
29] came up with the idea for the basic exponential smoothing approach on their own. In the next step, Winters modified the Holt exponential algorithms, such that they could be used to predict complicated time series that included seasonality [
19]. For this reason, the Holt-Winters exponential smoothing techniques were given their name, to recognize their contributions.
Heuristics served as the foundation for the conceptualization of exponential smoothing algorithms in their first form. Considering this, they did not possess a statistical foundation for the selection of models, nor did they have a technique for modeling the distribution of mistakes.
This method is particularly useful for data that exhibit trend and seasonality. Holt-Winters Exponential Smoothing uses weighted averages of past data points to make forecasts for future values. By adjusting the weights for trend and seasonality, the method can provide reliable forecasts, even in the presence of noisy or irregular data. The Holt-Winters method is often preferred for short to medium-term forecasting, due to its simplicity and flexibility in capturing complex patterns in the data.
The Holt-Winters Exponential Smoothing method is a widely used methodology for predicting time series data that exhibit both trends and seasonality. This approach enhances the basic exponential smoothing technique by including three elements: level (α), trend (β), and seasonality (γ). Each component has its parameter, that is used to regulate the impact of the component. These parameters are often established using optimization approaches. The level variable denotes the smoothed value of the time series at the present moment. The trend component of a time series refers to the overall direction and speed of change, whereas seasonality refers to the recurring patterns in the data, such as daily, weekly, or monthly cycles.
The equations for Holt-Winters technique with trend and additive seasonality may be stated as follows:
where:
α (alpha)-level smoothing parameter,
,
.
Holt-Winters, commonly known as triple exponential smoothing may be represented mathematically as:
where:
is the forecast for the m period,
s is the length of seasonality,
is the level of the series,
is the trend component,
is the seasonal component.
Forecasting Metrics
Forecasting metrics are used to evaluate the accuracy and effectiveness of predictive models, particularly those used for time series forecasting. Some of the most used measures are:
The Mean Absolute Error (MAE) is the average of the absolute deviations between the projected and actual values.
Root Mean Squared Error (RMSE) is the square root of the MSE. It uses the same unit as the projected and actual numbers, making it easier to read.
Mean Absolute Percentage Error (MAPE) is the average percentage difference between the anticipated and actual values calculated as the average of the absolute percentage errors.
3. Research
This research is focused on the development of a comprehensive approach for processing and analyzing raw data, with a particular focus on its application to meteorological datasets. The study highlights the integration of statistical methods to improve data quality, enhance predictive accuracy, and extract meaningful insights from complex time series patterns by comparing two traditional methods, such as (S)ARIMA and Holt-Winters. This research aims to serve as a preliminary investigation into the optimal prediction model for parameters that will subsequently be utilised in computing water balance, evapotranspiration, and drought indexes indicative of urban drought severity. The methodology for developing the optimal predictive model starts with data preparation and analysis, followed by the construction of mathematical models, which allows for the systematic examination of time-dependent meteorological factors. The primary objective of selecting the best predictive model is to provide decision-makers with accurate visualized forecasts that may be utilized for strategy development. Principal Component Analysis (PCA) was employed to identify significant relationships among all variables and therefore reduce the dimensionality of the initial dataset. The correlation matrix indicated that the relationships between variables were either moderately negative or weakly positive. Given the absence of significant positive or negative correlations, the decision to utilise the univariate time series data for constructing the prediction models was made. Also, the data were analysed with the Mann-Kendal test to investigate the trend in all four time series.
3.1. Selection of Software
The research results were processed by using the R programming language which is a very famous statistical programming language and environment, used mainly for statistical calculations, data analysis, and visualizations and RStudio (version 2021.09.0 Build 351) which is an Integrated Development Environment (IDE) for R.
3.2. Data Acquisition
The data were obtained from a weather station „WH3” located on a higher river terrace, often called loess or diluvial terrace, near Novi Sad, Serbia from January 2014 until December 2020. The higher river terrace is minimally dissected, and its surfaces are suitable for the unhindered expansion of the settlement. Water supply is easy, most of the areas are safe from floods, and the conditions for agriculture production and traffic flow are optimal. It covers about 38% of the area of Vojvodina.
This weather station, shown in the
Figure 1, consists of various sensors, which can be categorized into seven groups. The first group represents six sensors that measure only soil moisture (SM1, SM2, SM3, SM4, SM5, SM6). These sensors are divided into three subgroups depending on the soil depth at which they are located. In the first subgroup, closer to the surface, there are sensors SM1 and SM2, one from the left and one from the right side. At the next depth there are sensors SM3 and SM4 and at the greatest depth there are sensors SM5 and SM6. The other sensor group categories include a humidity sensor (AH1), wind speed (WS1), wind direction (WD1), the air temperature (AT1), precipitation (PP1), and battery power state (BT1). Since there are 12 sensor groups that record measurements hourly, there are 288 records in total per 24 hours across all sensor groups. For this research, only the measurements from the the air temperature, precipitation, soil moisture and wind speed sensors were taken. The air temperature was measured in degrees Celsius, precipitation in millimeters, wind speed in m/s and soil moisture in percentage of volume. The data set consists of four attributes: time, device, ID value, and value. Descriptions of all four attributes are given in
Table 1 [
30].
3.3. Data Preparation
After the hourly measurements have been retrieved from the online portal in Excel format, they are converted and then stored in a data frame. A data frame is a basic data structure used to store and organize tabular data. The first step is to prepare the data. The preparation process includes filtering, aggregating, sorting, and, finally, saving the data as a time series. The the air temperature, wind speed, precipitation and soil moisture time series are aggregated to obtain the average daily values. After being aggregated, the data are arranged in ascending order according to the observed date and then converted into time series entities. The dataset of 2553 observations is ultimately partitioned based on time into two subsets by using a straightforward and widely used time series splitting method called Holdout or Fixed split. 80% of the original dataset was used for training set and the rest was used for validation set for model verification.
3.4. Data Modelling
After the data were prepared, partitioned and analyzed for dimensionality reduction and trend, the next step was the modeling phase. Four predictive models of (S)ARIMA and Holt-Winters were built, each for every time series using functions from
forecast package version 8.24.0. Optimal (S)ARIMA parameters can be very challenging to select so the selection process of the best appropriate values for seasonal ARIMA parameters was performed automatically [
31].
5. Discussion
After the process of building the models is done, the models’ produced results must be compared. The comparison was conducted using three accuracy measures: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE). The metrics results of both models for all three of the time series are shown in
Table 4,
Table 5 and
Table 6, respectively.
Table 4 demonstrates that the ARIMA technique consistently outperforms the Holt-Winters model across all three error metrics. Although both performance indicators, RMSE and MAPE, describe the dispersion of the observations around the mean, they are not on the same scale, therefore equivalent results cannot be predicted. The root mean squared error (RMSE) often takes precedence over other statistical measures and serves as a reliable indication of model quality. However, when comparing multiple models, it is also important to include not only the RMSE values but also the values of some other performance indicators, like mean absolute error (MAE) or mean absolute percentage error (MAPE). The Holt-Winters method may not be adequately capturing the seasonality compared to ARIMA model. According to the lower values for all three error metrics, it can be concluded that the ARIMA model is the most suitable model for fitting the the air temperature data.
Table 5 shows the accuracy performance indicators of both models for the precipitation time series. The RMSE and MAE values of the HW model are higher compared to the ARIMA model, which indicates that the ARIMA model performs better when data are log-transformed. The Mean absolute percentage error (MAPE) is calculated as the average of the (actual-predicted)/abs(actual), and the lower the value is, the better the model is. One of its limitations is that this evaluation metric is inefficient when the data contain very extreme values. Since some of the actual average values in the time series are zero, or close to zero due to no or little precipitation that particular month, the function for calculating this measure returned an infinity value. To overcome this issue, MAPE should be replaced with an alternative regression metric like symmetric mean absolute percentage error (SMAPE) or a weighted MAPE [
32,
33]. For this time series, ARIMA clearly outperfoms Holt-Winters based on MAE and RMSE.
The accuracy results of both models for the soil moisture time series are shown in
Table 6. All three accuracy performance indicators are lower for the Holt-Winters model compared to the ARIMA model. This suggests that Holt-Winters performs significantly better, making more accurate predictions compared to ARIMA model. RMSE is particularly higher than Holt-Winters, indicating that ARIMA may be overfitting/underfitting or it may have a higher variance. Therefore, the conclusion is that the Holt-Winters is the superior model in the comparison to ARIMA and it may be capturing the underlying trend/seasonality iif the data more effectively than ARIMA.
The results of both models’ measures for the wind speed time series are shown in
Table 7. MAE value is lower for ARIMA which suggests that the absolute difference between the predicted and actual values is smaller, meaning that this model provides more consistent forecasts. As for RMSE, it is also lower for ARIMA model, indicating that this model is less prone to significant deviations or extreme forecast mistakes. RMSE imposes a greater penalty on higher mistakes compared to MAE, hence reinforcing the robustness of ARIMA. MAPE vlaues for both models are infinite due to possible zero or near-zero actual values. ARIMA’s ability to capture temporal dynamics and autoregressive behaviour outperforms the Holt-Winters model for the wind speed time series.
6. Conclusion
Time series forecasting uses statistical approaches to predict future values of a changing dataset based on previous observations. Economics, finance, and meteorology use it to make informed decisions. ARIMA and its seasonal version called SARIMA are popular time series analysis methods. ARIMA uses autocorrelation and moving average components, while SARIMA uses seasonal components for better predictions. The Holt-Winters Exponential Smoothing approach is frequently used for short to medium-term forecasting due to its simplicity and versatility in capturing complicated data patterns.
This study demonstrated the methodology for selecting the optimal forecasting model using certain evaluation performance indicators, such as RMSE, MAPE, and MAE. Four methods were applied to the temperature, precipitation, soil moisture and wind peed daily datasets. Three (S)ARIMA methods and one exponential smoothing method were used. Based on accuracy measures for each dataset, the Holt-Winters model with additive seasonality forecasted soil moisture daily average time series best, while ARIMA was best for precipitation and wind speed time series. SARIMA was ideal for forecasting temperature time series.
To enhance the accuracy of forecasting models, future research should explore the application of hybrid approaches, such as combining ARIMA with Artificial Neural Networks (ANN) [
34], which should perform better than just using one model, or enhancing the precision of the existing prediction models by the acquisition of more data and fine-tuning the models’ parameters. This integration leverages the strengths of both models, potentially outperforming the use of a single model alone. For instance, a study developed a hybrid forecasting framework that integrates two different models to improve prediction capability [
35]. Additionally, refining existing prediction models through the acquisition of more comprehensive data and fine-tuning model parameters is crucial. A systematic review of hybrid forecasting methods emphasises the importance of parameter optimisation for enhancing model performance [
36]. Furthermore, studies have shown that developing hybrid models that combine different forecasting techniques improves accuracy. For instance, a study suggested a mixed model using Empirical Mode Decomposition (EMD) and ARIMA to predict long-term streamflow, which worked better than either model alone [
37]. In summary, adopting hybrid modeling approaches and focusing on parameter optimization is a promising strategy for improving the precision of precipitation forecasting models.
The selection and development of algorithms play a fundamental role in all the research involving the application of mathematical models to energy systems and scientific investigations. Effective algorithms enable accurate simulations, optimize resource management, and enhance predictive capabilities; ultimately, they improve efficiency and decision-making in complex energy networks. By refining computation methods, researchers can better analyze system dynamics, forecast demand, and develop sustainable solutions tailored to real-world challenges [
38,
39,
40,
41,
42].