Preprint
Article

This version is not peer-reviewed.

Artificial Intelligence (AI) Techniques Application for Modelling Expected Trends in Pan Evaporation in Slovak River Basins

A peer-reviewed article of this preprint also exists.

Submitted:

09 December 2024

Posted:

10 December 2024

You are already at the latest version

Abstract

The modelling of pan evaporation (Ep) trends in Slovak river basins was performed by utilizing artificial intelligence (AI) techniques algorithms to accurately forecast evaporation rates based on daily climate data spanning from 2010 to 2023 across eight sub-basins within the Slovak Republic. The findings derived from the AI modelling indicate that the river basins of Bodrog, Hornád, Ipeľ, Morava, Slaná, and Váh are experiencing increases in evaporation measurements, whereas the Dunaj and Hron rivers demonstrate declining trends. This divergence may suggest the presence of differing ecological factors that affect the evaporation dynamics associated with each river. In this study, a comprehensive set of 28 machine learning and deep learning models was employed, including: Machine Learning (ML): Linear Regression, Tree-Based, Support Vector Machines (both with and without Kernels), Ensemble, and Gaussian Process methods; Deep Learning (DL): Neural Networks (Narrow, Medium, Wide, Bilayered, and Trilayered). The Stepwise Linear Regression yielded the most optimal fit. The Minimum Redundancy Maximum Relevance (mRMR) method was utilized to assess the efficacy of feature selection by concentrating on both relevance and redundancy. The results suggest that placing greater emphasis on relative humidity (RH) and minimum temperature (tmin) may significantly enhance the predictive accuracy of the model.

Keywords: 
;  ;  ;  ;  ;  

1. Introduction

Evaporation, which can also refer to evapotranspiration in the presence of plant cover, is a fundamental component of water circulation within landscapes. Evaporation from water surfaces plays a crucial role in the natural hydrological balance of a basin [1]. As the primary means of water loss in the hydrological cycle, it is vital to measure these losses accurately. The process of evaporation is complex and non-linear, driven by various climatic factors [2]. Numerous studies have noted a downward trend in evaporation rates [3,4,5].
The studies in question address "the pan evaporation paradox," a phenomenon characterized by the counterintuitive observation that, despite the increase in global temperatures, pan evaporation rates have been reported to decline in numerous regions. Conversely, other research indicates a trend of increasing pan evaporation rates [6}. According to [7], annual pan evaporation has exhibited a consistent decline since 1960 in various regions worldwide, including Russia, Siberia [8,9], England, and Scotland [10], Italy [11], and arid regions of China as well as eastern China [12,13,14,15]. Decreases in annual evapotranspiration potential (ETp) have been observed in northwest and southeast China [16,17], across all of China [18,19,20], and in India [21], while increases have been noted in Australia and New Zealand [22], south Florida [23], North America [24], as well as in northeast and southwest China [16,17] and Romania [20].
In the Slovak Republic, an analysis conducted by Damborská and Lapin (2023) [25] examined the changes and variability in evapotranspiration sums from 1951 to 2021. The findings highlight significant transformations in air temperature and precipitation during this timeframe. These climatic variations have directly impacted both potential evapotranspiration (PET) and actual evapotranspiration (AET), as well as soil moisture levels and runoff patterns across the country. It is important to note that actual evapotranspiration (AET) is significantly influenced by the availability of soil moisture, which has been declining over the years. This discrepancy between PET and AET poses challenges for effective water resource management. The conclusions drawn from this research build on previously published studies by Hrvoľ et al. (2009) [26] and Novák et al. (2018) [27]. However, these earlier analyses do not consider the varying trends in pan evaporation at the river basin scale, a factor that is crucial for the development of comprehensive water management strategies.
A promising method for enhancing the precision and effectiveness of water resource trends and predictions involves the modelling of pan evaporation through machine learning and deep learning techniques. The application of machine learning has shown significant potential in modelling climate change and forecasting extreme weather events, which has important implications for climate change research and policy development [28]. This study builds upon previously published research [29]. It is essential to understand pan evaporation trends at the river basin scale for various reasons, as this perspective can provide more relevant insights than examining broader geographic regions [30]. Identifying these trends within river basins is critical for effective water resource management, offering localized insights that inform irrigation planning strategies. This understanding can mitigate the adverse effects of climate change by evaluating the vulnerability of specific ecosystems and agricultural practices to climate variability. Pan evaporation (Ep) plays a vital role in the hydrological cycle, significantly influencing precipitation patterns and groundwater recharge. Thus, analysing trends within river basins is instrumental in understanding how local climatic conditions impact the overall water cycle.
This paper presents an assessment of pan-evaporation trends across the ten primary river basins of the Slovak Republic. The findings from this analysis are essential for a comprehensive understanding of the factors contributing to variations in Ep.
Machine learning (ML) applications for modelling Ep present several advantages compared to traditional empirical methods. As highlighted by Abed et al. (2022) [31], ML techniques demonstrate significant improvements in accuracy, adaptability to localized conditions, and the capacity to manage complex non-linear relationships among various variables. Given the ongoing impact of climate change on hydrological processes, the adoption of these advanced methodologies is essential for effective water resource management and the promotion of sustainable agricultural practices. Recent studies further underscore the efficacy of diverse artificial intelligence (AI) models in predicting pan evaporation across river basins of varying sizes [32,33]. These models employ meteorological data and sophisticated ML techniques to enhance the accuracy of evaporation forecasts, which are critical for effective water resources management [34]. The work entitled "A Comprehensive Survey of Machine Learning Methodologies with Emphasis on Water Resources Management" [35] provides a summary of the application of machine learning (ML) techniques in managing water resources, highlighting recent advancements and their implications for efficiency and sustainability. Recent studies reveal the effectiveness of various machine learning (ML) methodologies across multiple domains of water management. This summary provides an overview of the application of ML techniques in the realm of water resource management, highlighting recent advancements and their implications for sustainability and operational efficiency. The findings underscore the successful deployment of several ML approaches within different water management sectors. Abed et al. (2022) [31] present a distinctive approach that integrates Random Forest and deep learning algorithms for the modelling of monthly pan evaporation. This methodology constitutes a robust tool for the precise prediction of evaporation rates by effectively utilizing advanced machine learning techniques to capture intricate non-linear relationships within climatic data. The proposed strategy not only enhances prediction accuracy relative to conventional methods but also provides a versatile framework capable of adapting to a range of climatic conditions. Such adaptability renders it highly pertinent for the formulation of water resource management strategies. The findings from the aforementioned studies were meticulously evaluated, forming the basis for the assumptions of this research. A range of input attributes, also known as predictor variables, were analysed to determine the most effective variables for the machine learning models. Each approach underwent testing with multiple models, each characterized by distinct parameters and combinations of input variables [31].
This study is significant as it comprehensively considers all available pan evaporation measurements across the entirety of the Slovak Republic from 2010 to 2023. The collected data are integrated with artificial intelligence approaches at the river basin scale. The study aims to provide insights into i) which river basin is the most and least affected by changes in Ep within the Slovak Republic during the specified period; ii) which AI method, encompassing 28 models of ML and deep learning (DL) is the most appropriate for Ep modelling under Slovak conditions; and iii) the variables that exert the greatest influence on Ep.
The findings of this study underscore the potential of artificial intelligence techniques to enhance the accuracy of evaporation forecasts in Slovak river basins. By utilizing advanced machine learning methodologies in conjunction with comprehensive meteorological data, the study delivers valuable insights that can support sustainable water management practices. The superior performance of AI models relative to traditional forecasting methods highlights their critical role in addressing contemporary challenges associated with water scarcity and climate variability.

2. Materials and Methods

The territory of Slovakia is situated within a temperate climate zone, characterized by distinct seasonal variations, a hallmark of middle latitudes. A comprehensive climatic map of Czechoslovakia was created by integrating several parameters, including temperature criteria, total precipitation, irrigation index, and phenological indicators. This map categorizes the region into three distinct climatic areas [36], as depicted in Figure 1.
A - Warm Area: This region is characterized by an annual average of over 50 summer days and initiates the harvest of winter rye prior to July 15. It is subdivided into six sub-regions based on the irrigation index and the average temperature for the month of January.
B - Moderately Warm Area: In this region, the number of summer days does not exceed 50, and the winter rye harvest commences after July 15. The defining temperature threshold is established by the July isotherm of 16 °C. This area is further divided into ten sub-regions, classified according to the irrigation index, altitude, January temperature, and geomorphological features.
C - Cold Area: This region exhibits an average temperature of less than 16 °C in July. It comprises three distinct subareas: C1 - Slightly Cold: characterized by July temperatures ranging from 12 to 16 °C; C2 - Cool: with July temperatures ranging from 10 to 12 °C; C3 - Cold Mountain: where July temperatures fall below 10 °C.
The geographical positioning of Slovakia contributes to a transitional climate that integrates both maritime and continental elements, although altitude significantly influences the climate profile [37]. Among the primary climatic factors, air temperature, alongside atmospheric precipitation, plays a crucial role in determining the climatic conditions of a particular area. The Danubian Lowland emerges as the warmest region, characterized by an average annual air temperature of nearly 10 °C, as indicated by extensive long-term temperature records. Conversely, the average air temperature in the East Slovak Plain region is slightly lower in comparison. The average annual air temperature in river basins and valleys that are interconnected with the lowlands, such as Považie, Ponitrie, and Pohronie, typically falls within the range of 7 to 9 °C. In contrast, the highest basins, including Popradská and Oravská Kotlina, as well as northern Spiš, record average annual temperatures of less than 6 °C. It is observed that as altitude increases, there is a corresponding decrease in the average annual air temperature. This trend becomes particularly evident in locations situated at approximately 2,000 m a. s. l. [37]. The geographic position, altitude, wind direction, and leeward aspect of mountains significantly influence atmospheric precipitation in the region. In Slovakia, the average annual precipitation exhibits considerable variation, ranging from approximately 2,000 mm in the High Tatras to less than 500 mm in areas such as Galanta, Senec, and the eastern portion of Žitný Island. The phenomenon known as the precipitation shadow created by the mountains results in relatively low rainfall totals in these regions. As a result, the Spiš basins experience a notable decrease in moisture, being shielded from moist air masses originating from the south by the Slovak Ore Mountains and from the southwest to the northwest by the High and Low Tatras. On average, the region receives less than 600 mm of precipitation annually. In Slovakia, there is a noted trend of increasing precipitation with elevation. The mountains situated in the northwestern and northern parts of Slovakia typically experience higher levels of atmospheric precipitation compared to those found in the central, southern, and eastern regions. This phenomenon can be attributed to their increased exposure to prevailing north-westerly wind patterns. Furthermore, elevated atmospheric precipitation levels may also occur in the windward areas of mountains located further south during southern cyclonic conditions, a situation that is particularly prevalent in the eastern Slovak region of Vihorlat. The climate characteristics referred to in this context are based on data from the period 1961 to 2010, as detailed in the Climatic Conditions of Slovak Republic (2022) [38]. The criteria for the zoning of the Slovak Republic are elaborated upon in the study by Novotná et al. (2022) [29].

2.1. An Analytical Review of Climate Data

The criteria for the zoning of the Slovak Republic are elaborated upon in the study by Novotná et al. (2022) [29]. All available climate stations in the period 2010-2023 were classified according to which basin in Slovakia they belong to (Figure 2). The Water Plan of Slovakia divides individual river basins according to size and importance, considering various factors such as hydrological characteristics and ecological requirements [35]. There are 2 main river basins in the Slovak Republic: the Danube and Vistula. Climatic stations that measured PE in the monitored period in Slovakia were included in the respective basins in which they were located. There are 10 sub-basins in the Slovak Republic, but PE climate stations were in only 8 sub-basins in the monitored period. The Danube basin includes the basins 1.) Morava, 2.) Danube, 3.) Váh, 4.) Hron, 5.) Ipeľ, 6.) Slaná, 7.) Hornád, 8.) Bodrog, 9.) Bodva. The Vistula basin comprises the Dunajec and Poprad basins. Due to the absence of climate stations with evaporation (Ep) measurements in the Bodva, Dunajec, and Poprad sub-basins, these regions have been excluded from the current study. The locations of individual stations by watershed are depicted in Figure 2 and detailed in Table 1. Table 2 presents a comprehensive list of all climate stations utilized in this study, along with the specific periods of measurement incorporated into the ML models.
The climate characteristics analyzed on a daily basis are as follows: (1) daily pan evaporation in millimetres (Ep); (2) minimum temperature in degrees Celsius (Tmin); (3) maximum temperature in degrees Celsius (Tmax); (4) average temperature in degrees Celsius (Taver); (5) relative humidity expressed as a percentage (RH); (6) average wind speed measured in meters per second (Sw); (7) total precipitation in millimetres (P); and (8) vapor pressure in hectopascals (E). The distribution of individual climate stations across the respective basins is provided in Table 2, with further insights illustrated in Figure 3. This information reviews all climate stations from which pan evaporation Ep data were analysed within the entire territory of the Slovak Republic.

2.2. Data Description

The following eight meteorological indicators were employed in the development of the proposed predictive models: Ep, tmin, tmax, taver, RH, Sw, P, and E. The data collection process encompassed daily reports from the years 2010 to 2023, resulting in a comprehensive dataset covering a total of 14 years. Table 3 provides the monthly statistical metrics about the quantified meteorological data for the designated sub-basins. Furthermore, Figure 4 illustrates the monthly variations in each weather parameter throughout the period from 2010 to 2023.

2.3. Statistics and Machine Learning Toolbox Application

The data analysis conducted in this study utilized MATLAB, version 2024a, Statistics and Machine Learning Toolbox [40]. MATLAB is recognized as a high-performance programming language and interactive environment that is widely employed for numerical calculations, data analysis, algorithm development, and data visualization. The MATLAB R2024a version offers users an extensive range of machine learning models that are applicable for data analysis and predictive modelling. A total of 28 ML models were implemented to fulfil the objectives of this study. These models include:

Linear Regression (LR) Methods

1. (LR): LR is a fundamental statistical technique employed to model the relationship between a dependent variable and one or more independent variables using a linear equation. This model forecasts outcomes based on a linear fit.
2. Interaction LR: This variant of linear regression incorporates interaction terms among variables. It evaluates not only the direct effects of predictor variables but also examines how the influence of one predictor variable varies with the level of another predictor.
3. Robust LR: This approach modifies traditional linear regression to minimize the influence of outliers within the dataset. Techniques such as Huber loss can be applied to enhance the model's resistance to extreme values.
4. Stepwise LR: Stepwise linear regression represents an automated technique for selecting a subset of predictors by methodically adding or removing variables based on established criteria, such as the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC). The term "stepwise" denotes the iterative process of refining predictors to determine the most suitable model complexity.

Tree-Based Methods

5. Fine Tree: A fine tree is a type of decision tree characterized by its deep branching structure, which may lead to overfitting. While it excels at capturing intricate details within the dataset, its ability to generalize new data is often limited.
6. Medium Tree: The medium tree represents a decision tree that effectively balances depth and breadth. It demonstrates improved resistance to overfitting in comparison to a fine tree while still being capable of identifying significant patterns within the data.
7. Coarse Tree: A coarse tree is a simplified decision tree model featuring fewer splits and branches. This design enhances its generalizability and reduces the likelihood of overfitting when compared to both fine and medium trees.

Support Vector Machines (SVM)

8. Linear SVM: The linear support vector machine is designed to identify a linear hyperplane that effectively separates distinct classes within the feature space. This approach is particularly suitable for datasets that exhibit linear separability.
9. Quadratic SVM: This variant of support vector machine employs a quadratic kernel function to establish a decision boundary capable of addressing situations where the classes are not linearly separable in the original feature space.
10. Cubic SVM: The cubic support vector machine, akin to the quadratic variant, utilizes a cubic kernel function, thereby facilitating the formation of more intricate decision boundaries.
11. Fine Gaussian SVM: This model leverages a Gaussian Radial Basis Function (RBF) kernel to manage high-dimensional data. The term "fine" denotes that the model has been meticulously calibrated to capture complex interrelationships within the data.
12. Medium Gaussian SVM: The medium Gaussian support vector machine achieves a balance between detail and generalization, effectively mitigating the risks of overfitting and underfitting.
13. Coarse Gaussian SVM: This model features a Gaussian support vector machine with a broader decision boundary, which reduces the likelihood of overfitting; however, it may also fail to detect subtle patterns present within the data.

Efficient Linear (EL) Methods

14. EL Least Squares: This methodology represents an advanced optimization of linear regression, specifically designed to manage larger datasets with greater effectiveness by employing sophisticated numerical techniques for least squares fitting.
15. EL SVM: This term denotes an enhanced implementation of Support Vector Machines (SVM) that emphasizes rapid convergence and computational efficiency, thereby rendering it well-suited for the analysis of large datasets.

Ensemble Methods (EM)

16. Ensemble: Boosted Trees: An ensemble method that combines weak learners (typically shallow trees) in a sequential manner, where each tree corrects the errors of the previous ones. This often leads to a powerful predictive model.
17. Ensemble: Bagged Trees: Utilizes bootstrap aggregating (bagging) to build multiple decision trees from random samples of the data. The final prediction is usually made by averaging or majority voting from all trees, enhancing robustness and reducing overfitting.

Gaussian Process Regression

18. Squared Exponential Gaussian Process Regression: This approach to Gaussian process regression utilizes a squared exponential kernel to effectively model smooth functions. It is particularly adept at generating continuous and smooth predictions.
19. Matern 5/2 Gaussian Process Regression: This variant of Gaussian process regression employs the Matern kernel with a smoothness parameter of 5/2, providing enhanced flexibility in modelling functions that display varying levels of smoothness.
20. Exponential Gaussian Process Regression: This methodology incorporates an exponential kernel within Gaussian process regression, rendering it suitable for modelling functions that may exhibit limited smoothness but demonstrate exponential decay behaviour.
21. Rational Quadratic Gaussian Process Regression: This Gaussian process utilizes the rational quadratic kernel, which integrates features from both squared exponential and linear kernels. This combination allows for a high degree of flexibility in capturing patterns with diverse smoothness characteristics.

Neural Networks (NN)

22. Narrow NN: This neural network configuration comprises fewer neurons within each layer. Although this structure may limit its ability to capture intricate relationships, it offers advantages such as expedited training processes and a reduced likelihood of overfitting.
23. Medium NN: This architecture incorporates a moderate number of neurons, aiming to establish a balance between model complexity and the dynamics of training.
24. Wide NN: A wide neural network features a larger number of neurons in one or more layers, enabling it to identify complex patterns. However, this complexity may lead to challenges, particularly concerning overfitting.
25. Bilayered NN: This architecture consists of two primary layers, generally encompassing one hidden layer succeeded by an output layer. Its simplicity often facilitates easier interpretation.
26. Trilayered NN: This more elaborate structure includes three distinct layers: input, hidden, and output. The inclusion of an additional layer enhances the network’s capacity to learn more comprehensive representations of the input data.

Kernels

27. SVM Kernel: The SVM Kernel refers to the kernel function employed in support vector machines, which facilitates the transformation of input data into a higher-dimensional space. This transformation enhances the separation of classes, thereby improving classification performance. Common types of kernels utilized in this context include linear, polynomial, and radial basis function (RBF) kernels.
28. Least Squares Regression Kernel: The least squares regression kernel is utilized in scenarios involving least squares fitting within kernelized feature spaces. This approach allows for the efficient management of regression tasks in high-dimensional environments, significantly enhancing computational efficacy and accuracy.
In the MATLAB R2024a environment, a diverse array of models is available, allowing users to select optimal methodologies tailored to their specific data characteristics and analytical objectives. Each model type presents distinct advantages and applies to various predictive tasks across different domains. To fulfil the objectives of this study, all the previously mentioned machine learning models have been employed. The evaluation of individual artificial intelligence methods will be conducted using the Minimum Redundancy Maximum Relevance (mRMR) method. This prominent feature selection technique in machine learning is particularly effective for enhancing model performance by identifying a subset of features that are relevant to the target variable while exhibiting minimal redundancy among themselves. This approach proves valuable in the context of high-dimensional datasets, where irrelevant or redundant features can diminish model accuracy and elevate computational complexity [41].

3. Results

3.1. Changes in Ep Across River Basins

Figure 5 presents the fundamental characteristics of Ep for various river basins. The Dunaj river basin exhibits the lowest average Ep values, in contrast to the Slaná river basin and Hron river basins, which demonstrate the highest values. Table 3 offers a comprehensive interpretation of each river basin. The "observations number" denotes the total count of observations or data points collected for each river basin. The "mean" indicates the average value of these observations, while the "median" represents the central value when the observations are arranged in order. The median is particularly valuable for assessing central tendency, as it remains less influenced by outliers than the mean. Furthermore, the "standard deviation (SD)" serves as a measure of variability or dispersion within the observations. The standard deviation (SD) serves as a measure of the extent of variation or dispersion within a set of observations. A higher value of SD indicates a greater spread of data points in relation to the mean. The coefficient of variation (CV) is defined as the ratio of the standard deviation to the mean, expressed as a percentage. This metric is instrumental in comparing the degree of variation across different datasets, even when the means of these datasets differ significantly. The range of a dataset is determined by the difference between its maximum and minimum values. Quartiles are statistical measures that partition a dataset into four equal parts, providing valuable insights into its distribution. The lower quartile (1st quartile) and upper quartile (3rd quartile) are typically indicated by brackets. In the context of box plots, the "whiskers" represent the range of data points that lie within a specified distance from the quartiles, thereby reflecting the minimum and maximum values while excluding outliers. According to Table 3 and Figure 5, the Bodrog, Hornád, and Hron river basins demonstrate significant variability, as evidenced by their higher standard deviations and ranges. In contrast, the Ipeľ and Morava basins exhibit lower levels of variability. This discrepancy can be attributed to the distribution and number of climate stations within each river basin. Furthermore, the Slaná, Hron, and Bodrog river basins display potential outliers that may influence the interpretation of the data. The analysis of symmetry indicates that the means and medians of most basins are closely aligned, suggesting that the data distributions are not significantly skewed.
The findings presented in Table 3 demonstrate that the Bodrog basin consistently displays higher mean values in comparison to several other basins, particularly Morava, Slaná, and Vah, with all differences being statistically significant. Furthermore, analyses comparing the Dunaj and Hornád river basins also reveal several significant differences, thereby emphasizing the variability in mean scores across these regions. Additionally, Hron river basin exhibits significant negative differences when compared to Ipeľ river basin, indicating that Ipeľ may have a lower mean value than Hron river basin.

3.1.1. Ep Trend Analysis

The values ​​represent the annual expression of the change in Ep. They point to the trend for 13 annual time series, related to individual river basins. The analysis of evaporation trends across various river basins reveals distinct patterns:
  • Bodrog river basin demonstrates a steady increase in measurements, with an annual change of +0,01057 mm, indicating a consistent rise in evaporation rates over time.
  • Conversely, Dunaj river basin exhibits a slight decline, with an annual change of -0,006843 mm, suggesting a downward trend in evaporation.
  • Hornád river basin shows a significant increase of +0,01889 mm per year.
  • Similarly, Hron river basin reflects a decrease of -0,009121 mm annually.
  • Ipeľ river basin records a modest increase of +0,01027 mm, indicating stable growth in its measurements, akin to that observed in Bodrog river basin.
  • Morava river basin is notable for exhibiting the largest increase at +0,05674 mm per year.
  • Slaná river basin also demonstrates a robust increase of +0,01649 mm.
  • Finally, Váh river basin displays a consistent annual increase of +0,01217 mm, suggesting steady growth; however, this increase is less pronounced than those observed in other river basins.
This analysis underscores the varied trends in evaporation observed within these river systems. The results indicate that while certain river basins exhibit a positive trend in Ep, others reveal a negative trend. However, overall, these trends are not statistically significant. Similar conclusions can be inferred from the analysis presented in Figure 6.

3.2. AI Models’ Accuracy Evaluation

Authors should discuss the results and how they can be interpreted from the perspective of previous studies and of the working hypotheses. The findings and their implications should be discussed in the broadest context possible. Future research directions may also be highlighted.
In the study, the validation scheme employed was 5-fold cross-validation, utilizing 72,900 samples for training, which allowed for a robust evaluation of model performance by ensuring that each instance in the dataset was used for both training and validation across different folds, thereby enhancing the model's generalizability and reducing the risk of overfitting. Table 5 shows results according to RMSE (Root Mean Square Error), which measures the average error between predicted and actual values. Lower values indicate better model performance. MSE (Mean Squared Error) measure of average squared error. R-squared represents the proportion of variance in the dependent variable that can be predicted from the independent variables. Ranges from 0 to 1, with higher values indicating better fit. MAE (Mean Absolute Error): Measures the average absolute errors. Lower values are better, see Table 5.
The best model fit was evaluated for the LR models (1st and 2nd models) perform very well with RMSEs around 0.805 to 0.821 and R² values ​​above 0.60, indicating that they can model dependence well.
NN (Narrow, Medium, Wide, and Bilayered) have similar RMSE and MAE, while the results are also good, but not superior to traditional regression methods.
SVM Models show relatively stable performances with RMSE between 0.882 - 0.900 and R² between 0.538 - 0.556, which are average results.
Tree-based Models (Fine, Medium, Coarse) show worse performances, especially in predictions, with the highest RMSE value of 1.035 for Fine Trees.
Gaussian Process Models get solid RMSE and MAE, with the best results around 0.877 RMSE. These models have the advantage of flexibility.

3.3. Climate Variables and Their Relation to the Ep

Figure 7 shows the Minimum Redundancy Maximum Relevance (mRMR) score as a feature selection metric used in machine learning and data analysis to identify the most relevant features for a given prediction task while minimizing redundancy among them. Table 6 shows the mRMR score that was used for feature selection, which helps improve machine learning models by finding a balance between relevance and redundancy.
RH (Relative Humidity) score: 0.3269 is the highest mRMR score, indicating that relative humidity is the most informative feature of the model. It strongly correlates with the target variable while maintaining low redundancy with other features.
tmin (Minimum Temperature) score: 0.2219 is also significant, with a strong relevance. While it has a lower score than RH, it is still a crucial feature, second in importance. This suggests that minimum temperature has a meaningful impact on what you're trying to model.
Sw (Wind Speed) score: 0.0757 is a more modest score. Although it is informative, its contribution is less than RH and tmin, indicating that its relationship with the target variable may not be as strong.
tmax (Maximum Temperature) score: 0.0568 shows lower relevance. It may still provide useful information but is less significant compared to the previous features.
taver (Average Temperature) score: 0.0368 is the lowest score among the listed features, suggesting that average temperature may be the least informative or unique feature in the context of the model. It likely contains redundancy or overlaps significantly with the other temperature-related features.
  • Top Features: Relative humidity (RH) and minimum temperature (tmin) are the standout features, with RH being particularly critical to your analysis.
  • Moderate Importance: Velocity (Sw) and maximum temperature (tmax) have moderate relevance but may not be as pivotal in influencing outcomes.
  • Lowest Importance: Average temperature (taver) could be candidate for elimination or further consideration depending on the modelling approach, as it presents less relevance.
The results of the relationship of the individual climate variables and their relation to the Ep conclude that focusing on RH and tmin could provide the best predictive power for the model, while carefully evaluating the necessity of the other features based on their contribution.

4. Discussion

The degree of drought and the intensity of regional evaporation are both significantly influenced by Ep. The "pan evaporation paradox" refers to the fact that, even though increased evaporation is predicted, the opposite tendency has been seen in many regions of the world [42,43,44]. The examination of pan evaporation trends in the Bodrog, Hornád, Ipeľ, Morava, Slaná, and Váh river basins reveals increases in measurement values. In contrast, the Dunaj and Hron river basins exhibit diminishing trends. This discrepancy may stem from various biological factors influencing evaporation processes differently across these regions. The findings align with the research conducted by Abed et al. (2022) [31], which evaluated model outcomes using standard statistical measures to assess their effectiveness in predicting evaporation at four meteorological stations in Bayan Lepas, Malaysia. Their study emphasized the importance of accurately estimating evaporation for effective water resource management and agricultural planning. Similarly, Elbeltagi et al. (2023) [45] focused on selecting optimal models for validating monthly evaporation predictions at Basrah station in Iraq. Their work involved advanced ML techniques to enhance the accuracy of evaporation estimates across three distinct areas: Baghdad, Basrah, and Mosul. The study demonstrated that hybrid models could effectively address the complexities of hydrological relationships and improve prediction accuracy. These studies underscore the significance of employing robust modelling techniques to understand and predict evaporation dynamics influenced by ecological factors in different geographical contexts. In the study of Massari et al. (2022) [46] was tested the hypothesis that runoff deficit exacerbation during droughts is a common feature across climates, driven by evaporation enhancement. The reasons for this worsening of the runoff deficit during dry periods remain largely unknown, and this calls into question the predictability of when this worsening will occur in the future and how intense it will be. Runoff deficit exacerbation refers to the phenomenon where the reduction in runoff during drought periods is disproportionately greater than the decline in precipitation. The condition known as "runoff deficit exacerbation" occurs when the decrease in runoff during drought periods is disproportionately larger than the decrease in precipitation. Water supplies may be severely impacted by this aggravation, especially in areas that have protracted dry spells.
The definite environmental, meteorological, and data characteristics of the landscape have to be taken into account to identify the best AI techniques for Ep modelling in the specific landscape. The suggested machine learning (ML) and deep learning (DL) models could successfully address Ep modelling in Slovakia, based on the search results and pertinent information. Helm et al. (2020) [47] also applied ML and AI to deify applications and future AI directions. DL has been successfully applied to many application domains, yet its advantages have been slow to emerge for time series forecasting [48]. Solís et al. (2024) [49] also outline the situations in which time series forecasting Is better off using DL models. However, the results of the study using ML and DL models prove that the ML models better perform prognosing pan evaporation in river basins of different sizes. These findings are from the other relevant studies [31,44,50].
The relationship between the various climate variables and how they relate to the Ep unequivocally shows that focusing on Tmin and RH can greatly improve the predictive ability of models predicting Ep. The findings can create more accurate and efficient models that are suited to certain hydrological contexts by carefully assessing other climate variables according to their contributions. This method not only enhances forecasting skills but also facilitates better environmental planning and water resource management decision-making. These findings are consistent with Fu et al. (2009) [3], Kisi et al. (2022) [51], Jin et al. (2024) [4]. However, due to the large dispersion of climatic, geographical, and local conditions, it is necessary to test individual ML models independently for each site as it was concluded in our previous study [29], but also [52,53].

5. Conclusions

In order to accurately predict evaporation rates based on climate daily data from 2010 to 2023 from the eight sub-basins of the Slovak Republic, advanced machine learning (ML) and deep learning algorithms are used in the modelling of Ep anticipated trends using artificial intelligence (AI) techniques in Slovak river basins. From this study, we conclude the following:
i.) The analysis of evaporation trends across various river basins highlights a diverse range of patterns that reflect both increases and decreases in evaporation rates. The river basins such as Bodrog, Ipeľ, Morava, Slaná, and Váh exhibit consistent increases in Ep, with Morava showing the most pronounced rise at +0.05674 mm per year. These upward trends might indicate broader environmental changes impacting water loss in these areas. Conversely, Dunaj and Hron are experiencing declines in evaporation. Dunaj's annual decrease of −0.006843 mm and Hron's −0.009121 mm suggest that environmental factors may be contributing to reduced evaporation rates, which could have implications for water management and ecosystem health in these regions. Overall, the findings underscore the importance of monitoring evaporation trends as they can signify broader environmental changes, water resource management needs, and the health of river ecosystems.
ii.) LR Models are the most reliable method based on this analysis, while other models, particularly tree-based ones, may require further refinement to improve their predictive accuracy. The accuracy of the Linear Regression Models (best performance) was followed by the Gaussian Process Models, NN, SVM Models, and Tree-based Models (Worst Performance). This ranking reflects the effectiveness of each model type in capturing the relationships necessary for accurate evapotranspiration modelling in the given context, with linear regression standing out as the most reliable approach based on the provided metrics.
iii.) The findings of the association between the various climate factors and the Ep indicate that, while carefully weighing the importance of each feature according to its contribution, concentrating on RH and tmin may offer the model the most predictive power.
In summary, continued long-term monitoring of Ep, particularly at both regional and river basin levels, is essential for generating relevant insights that can inform water resource management and environmental policy. The study's conclusions provide insightful information for managing water resources, facilitating the construction of effective water delivery networks, sustainable irrigation plans, and reservoir management. Decisions that improve water conservation, maximize resource allocation, and support ecological balance can be made by stakeholders with knowledge of evaporation trends and model performances. By putting these discoveries into practice, water management techniques can become more robust, meeting agricultural demands while preserving water supplies for the future time periods.

Author Contributions

Conceptualization, B.N. and V.C.; methodology, B.N. and V.C..; software, V.C.; validation, V.C. and B.N.; formal analysis, B.N.; investigation, B.N.; resources, B.C.; data curation, B.C.; writing—original draft preparation, B.N.; writing—review and editing, B.N. and B.C.; visualization, V.C. and M.M.; supervision, B.N.; project administration, B.N.; funding acquisition, B.N. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Science Grant Agency under the contract No. VEGA 1/0559/23: „Assessment of the Production and Regulatory Function of Agricultural Ecosystems Affected by Climate Change.”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

Data will be made available on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Šuhájková, P., Kožín, R., Beran, A., Melišová, E., Vizina, A., & Hanel, M. Update of empirical relationships for calculation of free water surface evaporation based on observation at Hlasivo station. Vodohospod. Tech.-Ekon. Inf. 2019, 61, 4-11. https://www.vtei.cz/en/2019/08/update-of-empirical-relationships-for-calculation-of-free-water-surface-evaporation-based-on-observation-at-hlasivo-station/ [In Czech].
  2. Kohut, M., Rožnovský, J., & Knozová, G. Měření výparu z vodní hladiny výparoměrem GGI-3000 v České republice Práce a studie 35. Praha: Český hydrometeorologický ústav, 2013, 95 s. ISBN 978-80-87577-16-5, ISSN 1210-7557. [In Czech].
  3. Fu, G., Charles, S.P. & Yu, J. A critical overview of pan evaporation trends over the last 50 years. Climatic Change 97, 193–214 (2009). [CrossRef]
  4. Jin, Y., Zhang, Y., Yang, X., Zhang, M., Guo, X.-B., Deng, Y., Hu, Y.-H., Lu, H.-Z., Tan, Z. H. Observed decreasing trend in pan evaporation in a tropical rainforest region during 1959–2021, Journal of Plant Ecology, Volume 17, Issue 1, February 2024, rtad033. [CrossRef]
  5. Abtew, W. Jayantha, O., & Nenad, I. Pan evaporation and potential evapotranspiration trends in South Florida . Hydrological Processes 2011, 25(6), 958-969. [CrossRef]
  6. Zhang, Q., Wang, W., Wang, S., Zhang, L. (2008). Increasing Trend of Pan Evaporation over the Semiarid Loess Plateau under a Warming Climate. J. Appl. Meteorol. Climatol. 2008, 55, 2007–2020. [CrossRef]
  7. Bai, H., Lu, X., Yang, X., Huang, J., Mu, X., Zhao, G., Gui, F., & Yue, C. Assessing impacts of climate change and human activities on the abnormal correlation between actual evaporation and atmospheric evaporation demands in southeastern China. Sustainable Cities and Society 2020, 56, 102075. [CrossRef]
  8. Golubev, V. S., Lawrimore, J. H., Groisman, Y. P., Speranskaya, N. A., Zhuravin, S. A., Menne, M. J., Peterson, T. C., & Malone, R. W. (2001). Evaporation changes over the contiguous United States and the former USSR: A reassessment, Geophysical Research Letters, No. 28, с. 2665. [CrossRef]
  9. Peterson, T., Golubev, V., & Groisman, P. Evaporation losing its strength. Nature 377. 1995., 687–688. [CrossRef]
  10. Stanhill, G., & Möller, M. Evaporative climate change in the British Isles, International Journal of Climatology 2007, No. 28, с. 1127. [CrossRef]
  11. Moonen, A. C., Ercoli, L., Mariotti , M., & Masoni, B. Climate change in Italy indicated by agrometeorological indices over 122 years, Agricultural and Forest Meteorology 2002, No. 111, с. 13. [CrossRef]
  12. Liu, B., Xu, M., Henderson, M., & Gong, W. A spatial analysis of pan evaporation trend in China, 1955-2000, Journal of Geophysical Research Atmospheres 2004, No. 109. [CrossRef]
  13. Shen, Y., Liu, C., Liu, M., Zeng, Y., & Tian, C. Change in pan evaporation over the past 50 years in the arid region of China, Hydrological Processes 2009, No. 24, с. 225. [CrossRef]
  14. Xu, Y.-P., Pan, S., Gao, C., Fu, G., & Chiang, Y.-M. Historical pan evaporation changes in the Qiantang River Basin, East China, International Journal of Climatology 2015, No. 36, с. 1928. [CrossRef]
  15. Yang, H., & Yang, D. Climatic factors influencing changing pan evaporation across China from 1961 to 2001, Journal of Hydrology 2012, No. 414-415, с. 184. [CrossRef]
  16. Chen, D., Gao, G., Xu, C.-Y., Guo, J., & Ren, G. Comparison of the Thornthwaite method and pan data with the standard Penman-Monteith estimates of reference evapotranspiration in China, Climate Research 2005, No. 28, с. 123. [CrossRef]
  17. Thomas, A. Spatial and temporal characteristics of potential evapotranspiration trends over China, International Journal of Climatology 2000, No. 20, с. 381. [CrossRef]
  18. Wang, K., & Dickinson, R. E. A review of global terrestrial evapotranspiration: Observation, modeling, climatology, and climatic variability, Reviews of Geophysics 2012, No. 50. [CrossRef]
  19. Zhang, L., Seydou, T., Yuanla, C., Yufeng, L., Ge, Z., Bo, L., Guy, F., Karthikeyan, R., & Vijay, S. Assessment of spatiotemporal variability of reference evapotranspiration and controlling climate factors over decades in China using geospatial techniques, Agricultural Water Management 2019, No. 213, с. 499 Management 2019, No. 213, с. 499. [CrossRef]
  20. Croitoru, A.-E., Piticar, A., Dragotă, C. S., & Burada, D. C. Recent changes in reference evapotranspiration in Romania, Global and Planetary Change 2013, No. 111, с. 127. [CrossRef]
  21. Chattopadhyay, N., & Hulme, M. Evaporation and potential evapotranspiration in India under conditions of recent and future climate change, Agricultural and Forest Meteorology 1997, No. 87, с. 55. [CrossRef]
  22. Hobbins, M. T., Dai, A., Roderick, M. L., & Farquhar, G. D. Revisiting the parameterization of potential evaporation as a driver of long-term water balance trends, Geophysical Research Letters 2008, No. 35. [CrossRef]
  23. Abtew, W., Obeysekera, J., & Iricanin, N. Pan evaporation and potential evapotranspiration trends in South Florida, Hydrological Processes 2010, No. 25, с. 958. [CrossRef]
  24. Hember, R. A., Coops, N. C., & Spittlehouse, D. L. Spatial and Temporal Variability of Potential Evaporation across North American Forests. Hydrology 2017, 4(1), 5. [CrossRef]
  25. Damborská, I., & Lapin, M. Changes and variability of evapotranspiration sums in Slovakia in 1951–2021. Contributions to Geophysics and Geodesy 2023, 53(3), 241-270. [CrossRef]
  26. Hrvoľ, J., Horecká, V., Škvarenina, J., Střelcová, K., & Škvareninová, J. Long-term results of evaporation rate in xerothermic Oak altitudinal vegetation stage in Southern Slovakia. Biologia 64. 2009, 605–609. [CrossRef]
  27. Novák, V., Danko, M., Holko., L. (2018). Hydrological research in the conditions of ongoing climate change. Bratislava: Veda, 2018, s. 51-93. ISBN 978-80-224-1691-7 978-80-224-1691-7.
  28. Eyring, V., Collins, W. D., Gentine, P., et al. Pushing the frontiers in climate modelling and analysis with machine learning. Nat. Clim. Chang. 14. 2024, 916–928. [CrossRef]
  29. Novotná, B., Jurík, Ľ., Čimo, J., Palkovič, J., Chvíla, B., Kišš, V. Machine Learning for Pan Evaporation Modeling in Different Agroclimatic Zones of the Slovak Republic (Macro-Regions). Sustainability 2022, 14, 3475. [CrossRef]
  30. Ren., C., Ren, G., Zhang, P., Kealdrup, S., & Qin, T. Y. Urbanization Significantly Affects Pan-Evaporation Trends in Large River Basins of China Mainland. Land 2021, 10(4):407.
  31. Abed, M., Imteaz, M. A., Ahmed, A. N. et al. Modelling monthly pan evaporation utilising Random Forest and deep learning algorithms. Sci Rep 12. 2022, 13132. [CrossRef]
  32. Amoo, T. O. Integrated hydrological modelling for sustainable water allocation planning: Mkomazi Basin, South Africa case study. 2018. [CrossRef]
  33. Benjamin, S. G., Brown, J. M., Brunet, G., Lynch, P., Saito, K., & Schlatter, T. W. 100 Years of Progress in Forecasting and NWP Applications. Meteorological Monographs, 59. 2019, 13.1-13.67. [CrossRef]
  34. Mao, Y., Li, Y., Teng, F., Sabonchi, A. K. S., Azarafza, M., & Zhang, M. Utilizing Hybrid Machine Learning and Soft Computing Techniques for Landslide Susceptibility Mapping in a Drainage Basin. Water 2024, 16(3), 380. [CrossRef]
  35. Drogkoula, M., Kokkinos, K., & Samaras, N. A. Comprehensive Survey of Machine Learning Methodologies with Emphasis in Water Resources Management. Appl. Sci. 2023, 13, 12147. [CrossRef]
  36. Ministry of Environment of the Slovak Republic Water Plan of the Slovak Republic – Update 2021. Summary Information. Porter, s.r.o. 2023. ISBN 978-80-8213-122-5. https://download.sazp.sk/2023/WPS-2021-Summary-Information-web.pdf.
  37. SHMÚ. Climate atlas of Slovakia. Slovenský hydrometeorologický ústav. Bratislava. 2015.
  38. Climatic Conditions of the Slovak Republic (Klimatické Pomery Slovenskej Republiky) 2022. https://www.shmu.sk/sk/?page=1064 (In Slovak).
  39. Slovak Environmental Agency. Atlas of the Slovak Republic - Web map application, 2024. https://app.sazp.sk/atlassr/.
  40. MathWorks. Statistics and Machine Learning Toolbox Analyze and model data using statistics and machine learning, 2024. https://www.mathworks.com/products/statistics.html.
  41. Zhao, Z., Anand, R., & Wang, M. Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing Machine Learning Platform. 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 442-452.
  42. Alsumaiei, A. A. Utility of Artificial Neural Networks in Modeling Pan Evaporation in Hyper-Arid Climates. Water 2020, 12, 1508. [CrossRef]
  43. Brutsaert, W., & Parlange, M. Hydrologic Cycle Explains the Evaporation Paradox. Nature 396. 1998, 30. [CrossRef]
  44. Elbeltagi, A., Heddam, S., Katipoğlu, O., Alsumaiei, A., & Al-Mukhtar, M. Advanced long-term actual evapotranspiration estimation in humid climates for 1958–2021 based on machine learning models enhanced by the RReliefF algorithm. Journal of Hydrology Regional Studies 2024. [CrossRef]
  45. Elbeltagi, A., Al-Mukhtar, M., Kushwaha, N.L., Nadhir, A.-A., & Vishwakarmaet, D. K. Forecasting monthly pan evaporation using hybrid additive regression and data-driven models in a semi-arid environment. Appl Water Sci 13, 42. 2023. [CrossRef]
  46. Massari, C., Avanzi, F., Bruno, G., Gabellani, S., Penna, D., & Camici, S. Evaporation enhancement drives the European water-budget deficit during multi-year droughts, Hydrol. Earth Syst. Sci., 26. 2022, 1527–1543. [CrossRef]
  47. Helm, J. M., Swiergosz, A. M., Haeberle, H. S., Karnuta, J. M., Schaffer, J. L., Krebs, V. E., Spitzer, A. I., & Ramkumar, P. N. Machine Learning and Artificial Intelligence: Definitions, Applications, and Future Directions. Curr Rev Musculoskelet Med. 2020 Feb; 13(1):69-76. /: https. [CrossRef]
  48. Miller, J. A., Aldosari, M., Saeed, F., Barna, N. H., Rana, S., Arpinar, I. B., & Liu, N., A Survey of Deep Learning and Foundation Models for Time Series Forecasting. 1, 1. 2024, 35 pages. [CrossRef]
  49. Solís, M., & Calvo-Valverde, L.-A. Explaining When Deep Learning Models Are Better for Time Series Forecasting. Engineering Proceedings 2024, 68(1), 1. [CrossRef]
  50. Westergaard, G., Erden, U., Mateo, O. A., Lampo, S. M., Akinci, T. C., & Topsakal, O. Time Series Forecasting Utilizing Automated Machine Learning (AutoML): A Comparative Analysis Study on Diverse Datasets. Information 2024, 15(1), 39 Datasets. [CrossRef]
  51. Kisi, O., Mirboluki, A., Naganna, S. R., Malik, A., Kuriqi, A., & Mehraein, M. Comparative evaluation of deep learning and machine learning in modelling pan evaporation using limited inputs, Hydrological Sciences Journal 2022. [CrossRef]
  52. Roderick, M. L., & Farquhar, G. D. Changes in Australian pan evaporation from 1970 to 2002, International Journal of Climatology 2004, No. 24, с. 1077. [CrossRef]
  53. Roderick, M. L., & Farquhar, G. D. Changes in New Zealand pan evaporation since the 1970s, International Journal of Climatology 2005, No. 25, с. 2031. [CrossRef]
Figure 1. Climatic regions of the Slovak Republic, as delineated in the Atlas of Slovak Republic [39].
Figure 1. Climatic regions of the Slovak Republic, as delineated in the Atlas of Slovak Republic [39].
Preprints 142319 g001
Figure 2. Locations of individual climate stations within the designated river basins of the Slovak Republic.
Figure 2. Locations of individual climate stations within the designated river basins of the Slovak Republic.
Preprints 142319 g002
Figure 3. A summary of all climate stations utilized for the analysis throughout the entire territory of the Slovak Republic.
Figure 3. A summary of all climate stations utilized for the analysis throughout the entire territory of the Slovak Republic.
Preprints 142319 g003
Figure 4. The monthly variations of Ep and associated meteorological parameters examined in this study.
Figure 4. The monthly variations of Ep and associated meteorological parameters examined in this study.
Preprints 142319 g004
Figure 5. A comparative analysis of the individual river basins within the Slovak Republic, highlighting both the minimum and maximum values of Ep (mm). The box plot is delineated by the lower and upper quartiles, with the median represented by a line in the center. Additionally, the diamond symbol indicates the average value.
Figure 5. A comparative analysis of the individual river basins within the Slovak Republic, highlighting both the minimum and maximum values of Ep (mm). The box plot is delineated by the lower and upper quartiles, with the median represented by a line in the center. Additionally, the diamond symbol indicates the average value.
Preprints 142319 g005
Figure 6. A comparative analysis of annual Ep trends in selected river basins: a) Bodrog, b) Dunaj, c) Hornád, d) Hron, e) Ipeľ, f) Morava, g) Slaná and h) Váh.
Figure 6. A comparative analysis of annual Ep trends in selected river basins: a) Bodrog, b) Dunaj, c) Hornád, d) Hron, e) Ipeľ, f) Morava, g) Slaná and h) Váh.
Preprints 142319 g006aPreprints 142319 g006b
Figure 7. The Minimum Redundancy Maximum Relevance (mRMR) score as a feature selection metric used in machine learning and data analysis.
Figure 7. The Minimum Redundancy Maximum Relevance (mRMR) score as a feature selection metric used in machine learning and data analysis.
Preprints 142319 g007
Table 1. Distribution of individual climate stations across the respective basins.
Table 1. Distribution of individual climate stations across the respective basins.
No. Station Name River Basin District
1 Ďubákovo Slaná Banskobystrický
2 Holíč Morava Trnavský
3 Bratislava, Mlynská Dolina Dunaj Bratislavský
4 Bratislava-Koliba Dunaj Bratislavský
5 Jaslovské Bohunice Váh Trnavský
6 Žihárec Váh Nitriansky
7 Moravský Svätý Ján Morava Trnavský
8 Dolný Hričov Váh Žilinský
9 Topoľčany Váh Nitriansky
10 Mochovce Váh Nitriansky
11 Hurbanovo Dunaj Nitriansky
12 Beluša Váh Trenčiansky
13 Prievidza Váh Trenčiansky
14 Rabča Váh Žilinský
15 Liptovský Mikuláš, Ondrášová Váh Žilinský
16 Dudince Ipeľ Banskobystrický
17 Želiezovce Hron Nitriansky
18 Banská Bystrica, Zelená Hron Banskobystrický
19 Sliač Hron Banskobystrický
20 Lom nad Rimavicou Slaná Banskobystrický
21 Liesek Váh Žilinský
22 Boľkovce Ipeľ Banskobystrický
23 Telgárt Hron Banskobystrický
24 Rožňava Slaná Košický
25 Spišské Vlachy Hornád Košický
26 Gánovce Hornád Prešovský
27 Prešov – Army Hornád Prešovský
28 Košice, Airport Hornád Košický
29 Tisinec Bodrog Prešovský
30 Trebišov, Milhostov Bodrog Košický
31 Somotor Bodrog Košický
32 Michalovce Bodrog Košický
33 Orechová Bodrog Košický
34 Kamenica nad Cirochou Bodrog Prešovský
35 Vysoká Nad Uhom Bodrog Košický
Table 2. Comprehensive overview of all utilized climate stations and identification of the associated climate data periods.
Table 2. Comprehensive overview of all utilized climate stations and identification of the associated climate data periods.
No. Code Station Name Climate data
Starting date Ending date
1 665 Ďubákovo 1.1.2007 6/30/2016
2 11,8 Holíč 1.1.2007 6/30/2016
3 11,81 Bratislava – M. Dolina 39083 12/31/2023
4 11,813 Bratislava-Koliba 1.1.2007 12/31/2019
5 11,819 Jaslovské Bohunice 1.1.2007 12/31/2019
6 11,82 Žihárec 1.1.2007 12/31/2023
7 11,835 Moravský Svätý Ján 1.1.2007 12/31/2019
8 11,841 Dolný Hričov 1.1.2007 12/31/2023
9 11847 Topoľčany 1.1.2007 12/31/2023
10 11,856 Mochovce 1.1.2007 12/31/2019
11 11,858 Hurbanovo 1.1.2007 12/31/2023
12 11,862 Beluša 1.1.2007 12/31/2019
13 11,867 Prievidza 1.1.2007 12/31/2023
14 11,869 Rabča 1.1.2007 12/31/2019
15 11,878 Liptovský Mikuláš 1.1.2007 12/31/2023
16 11880 Dudince 1.1.2007 12/31/2023
17 11,881 Želiezovce 1.1.2007 12/31/2014
18 11,898 Banská Bystrica 1.1.2007 12/31/2019
19 11903 Sliač 1.1.2007 12/31/2023
20 11,91 Lom nad Rimavicou 1.1.2007 12/31/2019
21 11,918 Liesek 1.1.2007 12/31/2023
22 11927 Boľkovce 1.1.2007 12/31/2023
23 11,938 Telgárt 1.1.2007 12/31/2023
24 11,944 Rožňava 1.1.2007 10/30/2017
25 11,949 Spišské Vlachy 1.1.2007 12/31/2019
26 11,952 Gánovce 1.1.2007 12/31/2023
27 11955 Prešov-vojsko 1.1.2007 12/31/2023
28 11968 Košice, letisko 1.1.2007 12/31/2023
29 11976 Tisinec 1.1.2007 12/31/2023
30 11978 Trebišov, Milhostov 1.1.2007 12/31/2023
31 11,979 Somotor 1.1.2007 11/30/2015
32 11982 Michalovce 1.1.2007 12/31/2023
33 11984 Orechová 1.1.2007 12/31/2023
34 11,993 Kamenica N. Cirochou 1.1.2007 12/31/2023
35 11,995 Vysoká nad Uhom 1.1.2007 12/31/2019
36 11855 Nitra - Veľké Janíkovce 1.1.2020 12/31/2023
Legend
complete data
some records are missing
higher located stations - the measurement started later, e.g. in May
the station has stopped measuring or the data is incomplete
the station has been translated several times
measurements finished
Table 3. A comprehensive overview of fundamental descriptive statistics, which include the number of observations, mean, median, standard deviation, coefficient of variation, and range for the assessed river basins.
Table 3. A comprehensive overview of fundamental descriptive statistics, which include the number of observations, mean, median, standard deviation, coefficient of variation, and range for the assessed river basins.
River Basin Number
of
Observations
Mean Median Standard Deviation Coefficient of Variation Range Quartiles Whiskers
Bodrog 16 444 2.51 2.23 1.5 0.6 13.7 [1.4 3.4] [0 6.4]
Dunaj 5 880 2.42 2.3 1.36 0.56 8.4 [1.4 3.35] [0 6.2]
Hornád 11 047 2.46 2.4 1.41 0.57 12.6 [1.4 3.3] [0 6.1]
Hron 8 827 2.3 2.1 1.42 0.62 14 [1.2 3.2] [0 6.2]
Ipeľ 5 901 2.48 2.3 1.29 0.52 11.2 [1.5 3.3] [0 6]
Morava 3 148 2.21 2 1.37 0.62 11.1 [1.2 3] [0.1 5.7]
Slaná 3 252 2.21 2.2 1.16 0.52 18.5 [1.5 2.8] [0 4.7]
Váh 21 237 2.2 2 1.32 0.6 11.2 [1.2 3] [0 5.7]
Table 4. An overview of the statistically significant differences in Ep among eight defined sub-basins: Bodrog, Dunaj, Hornád, Hron, Ipeľ, Morava, Slaná, and Váh.
Table 4. An overview of the statistically significant differences in Ep among eight defined sub-basins: Bodrog, Dunaj, Hornád, Hron, Ipeľ, Morava, Slaná, and Váh.
River Basin Comparison Difference between Means Lower Limit Higher Limit
Bodrog vs Dunaj 0.0831 0.0166 0.1497
Bodrog vs Hornád 0.0481 -0.0076 0.1038
Bodrog vs Hron 0.2110 0.1511 0.2709
Bodrog vs Ipeľ 0.0301 -0.0339 0.0942
Bodrog vs Morava 0.2995 0.2148 0.3842
Bodrog vs Slaná 0.2966 0.2232 0.3699
Bodrog vs Váh 0.3041 0.2577 0.3504
Dunaj vs Hornád -0.0350 -0.1047 0.0346
Dunaj vs Hron 0.1279 0.0548 0.2009
Dunaj vs Ipeľ -0.0530 -0.1295 0.0235
Dunaj vs Morava 0.2163 0.1219 0.3108
Dunaj vs Slaná 0.2134 0.1290 0.2978
Dunaj vs Váh 0.2209 0.1585 0.2833
Hornád vs Hron 0.1629 0.0996 0.2262
Hornád vs Ipeľ -0.0179 -0.0852 0.0493
Hornád vs Morava 0.2514 0.1642 0.3386
Hornád vs Slaná 0.2485 0.1723 0.3246
Hornád vs Váh 0.2560 0.2053 0.3067
Hron vs Ipeľ -0.1809 -0.2516 -0.1101
Hron vs Morava 0.0885 -0.0014 0.1784
Hron vs Slaná 0.0855 0.0063 0.1648
Hron vs Váh 0.0931 0.0378 0.1483
Ipeľ vs Morava 0.2693 0.1766 0.3621
Ipeľ vs Slaná 0.2664 0.1840 0.3488
Ipeľ vs Váh 0.2739 0.2142 0.3337
Morava vs Slaná -0.0029 -0.1023 0.0964
Morava vs Váh 0.0046 -0.0769 0.0861
Slaná vs Váh 0.0075 -0.0621 0.0771
Table 5. A comparison and evaluation of the individual machine learning models ranking from best fit to least suitable. The average squared error (ASE) chooses the model with the smallest average squared error value. RMSE (Root Mean Square Error), which measures the average error between predicted and actual values. Lower values indicate better model performance. MSE (Mean Squared Error) measure of average squared error. R-squared represents the proportion of variance in the dependent variable that can be predicted from the independent variables. Ranges from 0 to 1, with higher values indicating better fit. MAE (Mean Absolute Error).
Table 5. A comparison and evaluation of the individual machine learning models ranking from best fit to least suitable. The average squared error (ASE) chooses the model with the smallest average squared error value. RMSE (Root Mean Square Error), which measures the average error between predicted and actual values. Lower values indicate better model performance. MSE (Mean Squared Error) measure of average squared error. R-squared represents the proportion of variance in the dependent variable that can be predicted from the independent variables. Ranges from 0 to 1, with higher values indicating better fit. MAE (Mean Absolute Error).
Model Number Model Type RMSE (Validation) MSE (Validation) RSquared (Validation) MAE (Validation) MAE (Test) MSE (Test) RMSE (Test) RSquared (Test)
1 Linear Regression: Linear 0.821 0.673 0.599 0.626 0.625 0.673 0.820 0.599
2 Linear Regression: Interactions Linear 0.805 0.649 0.613 0.611 0.611 0.647 0.805 0.614
3 Linear Regression: Robust Linear 0.821 0.675 0.598 0.624 0.624 0.674 0.821 0.598
4 Stepwise Linear Regression: Stepwise linear 0.805 0.649 0.613 0.611 0.611 0.647 0.805 0.614
5 Fine Tree 1.035 1.072 0.388 0.777 0.453 0.443 0.666 0.747
6 Medium Tree 0.952 0.906 0.484 0.706 0.565 0.618 0.786 0.647
7 Coarse Tree 0.911 0.830 0.527 0.671 0.618 0.715 0.846 0.592
8 Linear SVM 0.900 0.810 0.538 0.660 0.660 0.810 0.900 0.538
9 Quadratic SVM 0.886 0.785 0.552 0.645 0.643 0.783 0.885 0.554
10 Cubic SVM 0.882 0.778 0.556 0.639 0.636 0.773 0.879 0.559
11 Fine Gaussian SVM 0.927 0.860 0.509 0.676 0.527 0.624 0.790 0.644
12 Medium Gaussian SVM 0.879 0.773 0.559 0.636 0.626 0.758 0.871 0.568
13 Coarse Gaussian SVM 0.886 0.786 0.552 0.643 0.642 0.783 0.885 0.553
14 Efficient Linear: Efficient Linear Least Squares 0.919 0.844 0.518 0.684 0.684 0.844 0.919 0.519
15 Efficient Linear: Efficient Linear SVM 0.922 0.851 0.515 0.682 0.685 0.854 0.924 0.513
16 Ensemble: Boosted Trees 0.895 0.801 0.543 0.655 0.646 0.780 0.883 0.555
17 Ensemble: Bagged Trees 0.889 0.791 0.549 0.654 0.481 0.488 0.699 0.722
18 Squared Exponential Gaussian Process Regression 0.881 0.776 0.557 0.643 0.640 0.770 0.877 0.561
19 Matern 5/2 Gaussian Process Regression 0.880 0.774 0.558 0.641 0.638 0.766 0.875 0.563
20 Exponential Gaussian Process Regression 0.877 0.770 0.561 0.638 0.601 0.693 0.832 0.605
21 Rational Quadratic Gaussian Process Regression 0.881 0.776 0.558 0.642 0.639 0.767 0.876 0.562
22 Narrow Neural Network 0.879 0.773 0.559 0.641 0.637 0.766 0.875 0.563
23 Medium Neural Network 0.879 0.772 0.560 0.640 0.631 0.754 0.868 0.570
24 Wide Neural Network 0.888 0.788 0.551 0.648 0.622 0.732 0.855 0.583
25 Bilayered Neural Network 0.881 0.776 0.557 0.641 0.636 0.762 0.873 0.565
26 Trilayered Neural Network 0.881 0.777 0.557 0.641 0.633 0.756 0.870 0.569
27 SVM Kernel 0.881 0.777 0.557 0.638 0.632 0.767 0.876 0.563
28 Least Squares Regression Kernel 0.878 0.772 0.560 0.640 0.636 0.762 0.873 0.566
Table 6. The Minimum Redundancy Maximum Relevance (mRMR) score for the most relevant climate characteristics identification.
Table 6. The Minimum Redundancy Maximum Relevance (mRMR) score for the most relevant climate characteristics identification.
Ranking Feature mRMR score
1 RH 0.3269
2 tmin 0.2219
3 Sw 0.0757
4 tmax 0.0568
5 taver 0.0368
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated