Evaluation of Machine Learning Techniques for Daily Reference Evapotranspiration Estimation

The ASCE-EWRI reference evapotranspiration (ETo) equation is recommended as a standardized method for reference crop ETo estimation. However, various climate data as input variables to the standardized ETo method are considered limiting factors in most cases and restrict the ETo estimation. This paper assessed the potential of different machine learning (ML) models for ETo estimation using limited meteorological data. The ML models used to estimate daily ETo included Gene Expression Programming (GEP), Support Vector Machine (SVM), Multiple Linear Regression (LR), and Random Forest (RF). Three input combinations of daily maximum and minimum temperature (Tmax and Tmin), wind speed (W) with Tmax and Tmin, and solar radiation (Rs) with Tmax and Tmin were considered using meteorological data during 2003–2016 from six weather stations in the Red River Valley. To understand the performance of the applied models with the various combinations, station, and yearly based tests were assessed with local and spatial approaches. Considering the local and spatial approaches analysis, the LR and RF models illustrated the lowest rate of improvement compared to GEP and SVM. The spatial RF and SVM approaches showed the lowest and highest values of the scatter index as 0.333 and 0.457, respectively. As a result, the radiation-based combination and the RF model showed the best performance with higher accuracy for all stations either locally or spatially, and the spatial SVM and GEP illustrated the lowest performance among models and approaches.


Introduction
Evapotranspiration (ET) is a combination of two separate processes that transfers huge volumes of water and energy from the soil (evaporation) and vegetation (transpiration) to the atmosphere [1,2].Since ET plays a crucial rule in watershed and agricultural water management, accurate spatial and temporal estimation crop water requirements (ETa) at the scale of human influence is a critical need for a wide range of applications [3].Quantifying ETa from agricultural lands is vital to management of water resources.Measurement methods of ETa are available through water vapor transfer methods (e.g.Bowen ratio, eddy covariance) [4][5][6], water budget measurements (e.g.soil water balance, weighting lysimeters) [7][8][9], or remote sensing techniques [10][11][12].However, the application of field scale methods is limited due to the cost, complexity and maintenance.Estimating ETa using remote sensing models has developed in the recent years but cloud existence in many areas during the satellite passing dates limits the ability of imagery usefulness.Therefore, due to difficulties of ETa direct measurements, ETa can be estimated using by multiplying the calculated reference ET (ETo), using different calculation methods and meteorological data, with the crop coefficient (Kc).However, required meteorological data for ETo calculation methods are not readily available for many locations, and methods with fewer variables must be considered when basic meteorological data are available [13].However, the simplified and basic models are suited to estimate ETo on a weekly or monthly basis instead of daily basis [14].
The ETo calculation is a complex process due to the large number of associated meteorological variables, and it is hard to develop an accurate empirical model to overcome all complexity of the process.Over the last decades, machine learning (ML) techniques have attracted the interest of streams of researchers around the globe to overcome to the ETa estimation complexity.These methods are evolutionary computation techniques that can achieve the best relationship in a system with data driving tool.Due to the capability of the ML methods in tackling non-linear relationship between dependent and independent variables [15], numerous ML techniques have been applied and proposed to predict ETo for agricultural purposes including genetic programming (GP) [16,17], kernel based algorithms, e,g, support vector machine (SVM) [18,19], artificial neural network [20][21][22], wavelet neural network [13,23], random forest (RF) [24,25], and multiple linear regression (LR) [26,27].
Several authors have applied ML methods to detect ETo values with minimum variation from observed values [15,20,28].Among these models, the GEP and SVM illustrated better estimation than other models [19,29].The Gene Expression Programming (GEP) is comparable to GP and both involve different size and shapes of computer programs encoded in linear chromosomes of fixed length [16].The SVM method is a regression procedure that has been used successfully in hydrology context [30][31][32] and agro-hydrology for ETo modelling [18,33].GEP has several advantages compared to GP such as generating valid structure, multigenic nature, and ability of surpasses the old GP system.Shiri et al. [16] evaluated GEP to estimate daily evaporation through spatial and temporal data scanning.Shiri and Kişi [34] compared the GEP and Adaptive neuro-fuzzy inference system (ANFIS) techniques for predicting short and long-term river flows.Also, Shiri and Kişi [34] used a similar comparison to predict groundwater table fluctuations.[33] studied the potential of SVM model in ETo prediction and observed that the SVM model could be useful for ETo estimation and hydrological modeling studies.

Kişi and Çimen
The LR model is one of the commonly used ML methods in hydrology [27,35].It has been used to cover the study of the relationship between two or more hydrological variables and investigate the dependence and relationship between the successive values of hydrologic data.Some researchers have used the LR method to estimate the ETo rate for different regions due to the simplicity of the method compared to other numerical methods [36,37].Tabari et al. [26]compared the LR and SVM model's performance for semi-arid region and found that SVM model was superior to empirical and LR models.
Due to the high computational costs and complexity of the ML models, the tree-based ensemble models attract people by its simplicity and estimation power.The RF model as a ML tree-based model is able to produce a great result compared to the other ML models [24,38].This model known for its simplicity and the ability for both classification and regression tasks.The RF method is also widely used to predict the ET rate of different climate regions [39].Feng et al. [25] applied RF model for ETo estimation on a daily basis for southwest China and indicated that the RF model performed slightly better than the GRNN model.Shiri [40] evaluated the coupled RF with wavelet algorithm to estimate ETo for southern part of Iran and obtained that the coupled RF model showed great improvement compared to the conventional RF and empirical models.And to our knowledge, this model has not been applied in the Northern US for ETo studies.
According to the literatures, GEP and SVM models have been frequently applied among the world for various climate conditions for ETo estimation, while the LR and RF model's application were minimal.In addition, these models have not been compared with commonly used SVM and GEP models for sub-humid climate condition.Since the limited studies have been conducted to evaluate ML models for Northern part of the US (that experiences a high variability of weather conditions and a huge amount of agricultural production), the objectives of this study were to investigate the effect of different input combination of meteorological data on the accuracy of daily ETo estimation in sub-humid climate using GEP, SVM, LR, and RF methods, 2) compare the spatial and temporal prediction capabilities of the different ML models in ETo estimation, and 3) evaluate the performance of the models based on the various study years and meteorological station.

Study area climate and reference evapotranspiration (ETo)
The weather data for current study were obtained from the North Dakota Agricultural Weather Stations [41] located at the Prosper (ND), Galesburg (ND), Leonard (ND), Sabin (MN), Perley (MN), and Fargo (ND) for 17 years (January 2003-December 2016) (Table 1).To calculate the daily reference evapotranspiration (ETo) for each study stations, the ASCE standardized reference evapotranspiration equation (ASCE-EWRI) was used for alfalfa reference crop [42].This equation provides a standardized calculation of ETo demand that can be used in developing Kc and comparing with other methods.Equation 1 presents the form of the standardized ETo equation for all hourly and daily calculation time steps.
where, ETo is reference evapotranspiration rate (mm d -1 ), Rn is net solar radiation (MJ m -2 d - 1 ), G is soil heat flux (MJ m -2 d -1 ),  is sychrometric constant (KPa °C-1 ), T is the mean daily air temperature, U2 is average wind speed at 2 m height (m s -1 ), es is saturation vapor pressure (KPa), ea is actual vapor pressure (KPa), ∆ is the slope of the saturation vapor pressure temperature relationship (KPa °C-1 ), Cn and Cd are coefficients which are related to the crop and time step .The value for the constants Cn and Cd are 1600 and 0.38 for alfalfa reference crop.

Models structure and application
To process the GEP and SVM algorithm, GeneXpro program in Matlab, and to process the LR and RF models, scikit-learn module imbedded in Python 3.2 programming language were used.GEP is an extension of GP [43] developed by Ferreira [44] that create computer program to investigate a relationship between input and output variables.GEP are developed to find a better solution for a particular problem to solve the under-study phenomena [45].Application of GEP requires several steps.The fitness function must be determined in a first step with a random generation of chromosomes of a certain program (initial population) and evaluating against a set of fitness cases [46].Using weather station data as input variables (terminals) to model daily ETo involves the next general step.Selection of fitness functions (i.e.absolute error, relative error and correlation coefficient) depends on the experience and intuition of the user.The GEP model in current study was developed based on the recommended functions by Shiri et al. [16].In third step, the chromosomal architectural can be defined by having the weather variables as terminal and function set as chromosomes.The fourth step was to select linking function that relates genes to each other as addition linking the parse trees [16].Finally, genetic operators corresponding rates were chosen.Table 3 summarized the commonly used parameters for each run.The SVM is developed by Cortes and Vapnik [47] and known as classification and regression method [33] to solve problems with applying a flexible representation of the class boundaries and implementing an automatic complexity control to reduce over fitting.In SVM, the dependency of dependent variable to a set of independent variables is evaluated.In regression estimation with Support Vector Regression (SVR), which is used to define SVMs in the literature, a functional dependency () between a set of sampled input points  = { 1 ,  2 ,  3 , … ,   } (here, input sampled refer to meteorological variables) taken from   (input vector of n dimension) and target values  = { 1 ,  2 ,  3 , … ,   } (ETo as target values) with   ∈   .More detail on SVM can be found in Vapnik [48].
The LR is statistical method to describe quantitative relationship between a dependent variable and one or more independent variables [26,49].In LR, the function is a linear equation and expressed as: Where bo-bk are the fitting constants, yi is the dependent variable, and x1-xk are the independent variables for this system.
The RF method combines a group of decision trees for either classification or regression purposes.Although each decision tree may not capable of well learning, combination of decision trees results a strong learner.Each decision tree predicts the outcome individually, and RF votes among the outcomes for classification or averages the outcomes for regression.Each decision tree is trained on different subset of samples by a bagging extension of the RF model to reduce the risk of overfitting.Moreover, different subset of input variables can be used in each tree to make it more useful in prediction for datasets with higher dimensions [50].For this study, a small subset of data was used to find a good combination of parameters for the RF model.As a result, number of trees in the forest and minimum number of samples required for leaf node were 50 and 35, respectively.The mean square error criteria used as a procedure for estimation.
The calculated daily ETo for was used to feed the GEP, SVM, LR and RF models.
Three treatments including temperature, radiation, and mass transfer-based combinations were used as input to feed the models and each model of the combinations were assessed for spatial and temporal approaches.Different statistical analysis was performed to evaluate the accuracy and performance of the different combinations and approaches for each study stations.The combinations were as follow: (i) Tmin, Tmax: temperature based (GEP1, SVM1, LR1, RF1) (ii) Tmin, Tmax, Rs: radiation based (GEP2, SVM2, LR2, RF2) (iii) Tmin, Tmax, W: mass transfer based (GEP3, SVM3, LR3, RF3)

K-fold cross validation
Splitting the data to the sets of data for testing and training is a usual procedure for assessing the ML techniques.Using 10-30% of the complete data set as a single test set is a common way for GEP evaluation.Therefore, the K-fold cross validation technique was used to increase the evaluation performance and set of data for either training or testing purposes.Using K-fold cross validation, the data set was divided into K subsets, and training process was repeated K times leaving each time a distinct set of patterns for testing until a complete testing scan for the data set was fulfilled.Computation cost defines the minimum assemble size of the test set.Here, the minimum test size was fixed as one year for local modeling and one station for spatial modeling.
Consequently, at local scale, one year was held out each time for testing while the models were trained using the remaining 16 years; hence, a total of 612 models (17 years× 6 stations× 3 input combinations× 2 models) were established for the local k-fold testing.At the spatial scale, one station was considered as test block each time and the models were trained using the patterns from the remaining stations; hence, a total number of 36 models (6 stations× 3 input combinations× 2 models) were constructed.The temporal and spatial approaches were noted with T and S in the figures.

Evaluation criteria
To investigate the performance of models for each combinations and approaches, four statistical indicators were used, namely, the root mean square error (RMSE), the mean absolute error (MAE), correlation coefficient (r 2 ), and scatter index (SI), defined as follows: ) 2  (5) where

Results and Discussions
The local (or temporal) and spatial analysis of four models for six study stations are shown in table 3.According to the three combinations' performance, radiation-based method illustrated the highest accuracy for either local or spatial approaches compared to the other combinations.
The mass-transfer based combination was the next accurate combination.Results showed that the local trained models surpassed the spatially trained models because of using the same patterns for both training and testing models.However, the spatial models gave comparable results compared to the local model in some cases, especially for radiation-based combinations.
Differences of temperature among the stations was dramatically affected the performance of both temperature-based and mass transfer-based models.In all cases, minimum differences between the performance accuracy of the local and spatial models were belonged to the LR model.This can be inferred to the mathematical structure of this technique, where only linear relationships can be supposed between the input and target parameters with lower degree of flexibility compared to heuristic data driven models.The GEP and SVM models illustrated the great improvement rate for all three input combinations from spatial to local condition with the highest improvement of 21 percent for mass-transfer based combination.Specifically, the GEP model showed an improvement from spatial to temporal approach, however, the percentage of improvement was 2.3, 3.9, and 0.13 for radiation, temperature and mass-transfer based combinations, respectively.In term of obtained improvement for RMSE and MAE from spatial to local approaches, both of the GEP and SVM models gained the similar results.The correlation coefficient of the SVM model decreased from spatial to local approaches for about 6.3, 6.5, and 10.9 percent for radiation, temperature and mass-transfer based combinations, respectively.By using local radiation data for training the models, the SI indicator for the GEP and SVM models showed an improvement of 8.2 and 10 percent from spatial to local approach, respectively.This improvement was about 6.6 and 8.7 percent for mass-transfer based and 15 and 10.9 percent for temperature-based combinations, respectively.
Statistical analysis revealed the similar performance of the local GEP and SVM models.For RMSE and MAE statistical variables, GEP and SVM models showed the greater improvement in performance for mass-transfer and temperature-based combinations, respectively.By considering correlation coefficient values, it can be concluded that the improvement in accuracy of either GEP or SVM approaches were not significant and all illustrated the ability to estimate with an acceptable accuracy.Therefore, if temperature data were not available at the station, but they do for other stations, the GEP and SVM approaches can be useful to estimate ETo.However, due to the higher mapping ability of the GEP models, using either local or spatial GEP are preferable.
The models relying on the mass-transfer combination had slightly higher accuracy than the temperature-based approach, but lower accuracy compared to the radiation-based combination.
All of the local and spatial GEP and SVM methods illustrated lower improvement compared to than that for the temperature-based approach.This showed that wind speed can have a significant effect on accurate estimation of spatial and local ETo.Due to the flat topography of the study area and facing with lots of high-speed winds during the growing season and almost other seasons, including the wind as a parameter to build the model architecture and estimating the ETo can increase the accuracy of the approach.
Overall, the RF and the LR models illustrated the best performance among the four models, and comparing the GEP and SVM models, the GEP model showed better performance than SVM model for all three input combinations.
A breakdown of the models' performance accuracy at each station are shown in Figures 1-3 for all of the three input combinations, respectively.In case of the temperature-based combination (Figure 1), the local GEP and SVM models (shown as: TGEP and TSVM) gave more accurate results than the spatial (shown as: SGEP and SSVM) models.For the LR and RF models, the difference of accuracy between local (TLR and TRF) and spatial approaches (SLR and SRF) were not significant and both showed better performance than GEP and SVM models since they relied on the patterns of the same location used for training and testing the models.
According to table 2, station 6 (Fargo) had highest and station 2 (Galesburg) had lowest range of recorded temperature among the study stations.This range may be caused to have the lowest performance for station 2; however, it was difficult to evaluate the model performance in the climate context of each station due to the few numbers of study stations.The RF model showed the best performance with higher accuracy for all stations either temporally or spatially, and the spatial SVM and GEP illustrated the lowest performance among other models and approaches.The RF and LR methods showed the lowest range of SI compared to the spatial and local GEP and SVM methods.For temperature-based combination, the spatial and local LR approaches had minimum SI ranges of 0.018 and 0.020, respectively, and the spatial SVM and GEP methods illustrated the highest SI ranges of 0.113 and 0.119, respectively.The spatial RF approaches with an average of 0.333, and spatial SVM with an average of 0.457 showed the lowest and highest SI rate, respectively.Therefore, spatial RF approaches may be the practical way to estimate the missing meteorological data of the study stations.
Figure 2 shows evaluation result of the radiation-based combination for the four models with spatial and local approaches.The amount of received radiation for all study stations were similar.
According to the global performance of the defined models, radiation-based combination gained the best performance among the three input combinations.In addition, the radiation-based combination had the lowest rate of RMSE, MAE, and SI, and the highest rate of r 2 for each of the study stations.Among the spatial and local scenarios, local approach had the better performance than the spatial approach.For the radiation-based combination, the spatial RF and local RF models had an accurate estimation of ETo, respectively.For station 3 and 4 (Leonard and Sabin) either spatial or local approaches of GEP and SVM models gained lower performance than the other stations.This can be due to the slightly higher magnitude and variations of solar radiation (table 2) among the other stations during the study period.study stations showed that the spatial RF and temporal RF had the lowest and the spatial GEP and spatial SVM had the highest rate of SI indicators.Therefore, either spatial or temporal RF method could be useful to estimate the missing values for any of the stations.
Figure 3 shows the statistical indices of the mass-transfer based combination.Similar to the previous combinations, the spatial and local RF gave the accurate estimation than other methods.
On the other hand, the local SVM approach showed better estimation than spatial SVM and GEP methods for all stations except station 2, which had the lowest range of temperature variation.
The fluctuations of the indices among the stations were higher than radiation-based combination and lower than that for temperature-based combination, which showed the mediocre accuracy compared to the other combinations.showed that the temporal and local LR had the highest and spatial and local RF had the lowest SI values, respectively.Therefore, by having the lowest range of SI and lowest value of SI for spatial RF approach, it might be more practical to apply the spatial RF for other stations without training a specific model for each station.Accordingly, no local dataset could be needed to train the local models.This could be helpful to estimate the ETo for stations with partial or missing meteorological data.
To understand the performance of the applied models on yearly based at each of the study stations, the models were assessed per test years.decrease.Because of the lower input values, the variance rate would be low, and this type of input only would be valid for periods with very specific trends.This explanation may clarify the performance of spatial approaches, where relationship encountered might be site specific.
The comparison of the three input combinations showed that the performance of some approaches was similar for some years while the performance of methods for some years were far from each other's.For example, the SVM model showed the most improvement from temperature to mass-transfer based combination which became like the RF method.However, depending on the station and test year, this similarity become even closer.All the test years and stations showed an improvement from temperature to radiation or mass-transfer based combination except for the Prosper station.On the other hand, the Prosper station showed the best improvement for SVM model for radiation-based approaches.Considering that all of the input combinations rely on the temperature and another variable (solar radiation or wind speed), it might be thought that the performance differences could be due to the effect of the second variable in the estimation of the output.In addition, when the performance of the models is similar, the impact of secondary variable might be less than the primary variable (temperature).
However, when the gap between the performance or SI indicator increase, it proves the crucial impact of the secondary variable on the model performance for the specific test year or station.
Similar conclusion could be drawn for the comparison between the input combinations.If each of the combinations shows the better performance than the other combination method for specific year and station, the second variable effect should be important and might have significant influence in the explanation of the output for that test year.

Conclusions
The aim of this paper was to evaluate the different ML methods including GEP, SVM, LR and RF to estimate ETo in the Red River Valley.Global comparison of performance accuracy of the applied models revealed that the RF model was the best for all combinations among the four defined models.Furthermore, the RF model illustrated the best performance for spatial and local conditions for all input combinations.In general, the LR, GEP and SVM models were improved when a local approach was used, with an exception of the RF model, which was less accurate with a local approach.The radiation-based combination was the most accurate predictor among all models tested.As a result, this combination showed the lowest rate of improvement due to better performance at the first step.
The results showed that due to the flat topography of the study area with high wind speeds during the growing season, including the wind as a parameter to build the model architecture and estimate the ETo can increase the accuracy of the prediction.In addition, it might be more practical to apply the spatial RF model for stations with missing meteorological data without the e and ET o are simulated and calculated reference evapotranspiration at the i-th time step, respectively, n is number of time steps, ET e ̅̅̅̅̅ and ET o ̅̅̅̅̅ are mean values of simulated and calculated ETo, respectively.The RMSE describes the average magnitude of errors and can take on values from 0 to ∞ to indicate perfect and worst fit, respectively, and the MAE scores the error magnitudes without any specific weight to larger/smaller errors.Therefore, the lower value of the RMSE and MAE are desirable.The r 2 values around 1 indicates a perfect linear relationship between estimated and calculated values, which the closer value to zero demonstrate the poor relationship between simulation and calculation.Finally, SI is a dimensionless index of RMSE that gives a good insight to compare the performance of different models.

Figure 1 .
Figure 1.Statistical indices of the temperature-based combination for four applied models (GEP,

Figure 2 .
Figure 2. Statistical indices of radiation-based combination for four applied models (GEP,

Figure 3 .
Figure 3. Statistical indices of mass transfer-based combination for four applied models (GEP,

Figure 4
illustrates the SI values obtained from the three input combinations for each study years of the study stations.The SI values of the models fluctuated considerably for almost all stations during the test years.As shown in figure 4, the SI values fluctuated considerably within test years for all input combinations and approaches.Among the study stations, Prosper and Sabin stations showed the average maximum range for the SI values.The minimum average of the SI value 0.223 was observed for the RF radiation-based combination for the Fargo station, and the maximum average of the SI was obtained for the SVM temperature-based combination (SVM1) for the Galesburg station.The Galesburg station had the lowest temperature range among the study stations.According to the figure 4, the RF2 (radiation-based combination) model showed the lowest fluctuation compared to the other approaches with similar trend among the study stations.For all of the study stations, test years, and input combinations, the RF models gave the best performance with lowest SI values compared to the other models and combinations.The SVM and GEP models showed the worst SI averages for the temperature-based combinations.In this case, the order of the accuracy of the models was similar to that obtained from the station-based analysis.The radiation-based combination gave the most accurate results and estimation in comparison with the temperature or mass-transfer based combination models.

Figure 4 .
Figure 4. Scatter index (SI) values of GEP, SVM, LR, and RF models per test year and study

Table 1
Study weather data locations measured at the NDAWN automated stations

Table 3
Parameters used per run of GEP.

Table 3
Global average performance indicators of the gene expression programming (GEP), (table3).Comparing the GEP, SVM, LR and RF models, the RF model illustrated the lowest rate

preprints.org) | NOT PEER-REVIEWED | Posted: 7 August 2019 doi:10.20944/preprints201908.0097.v1 of
RMSE and MAE with the best performance for radiation-based approaches.However, the RF model improved 4.37, 5.76 and 1.49 percent from local to spatial approaches for temperature, radiation, and mass-transfer based combinations, respectively, which was in contrast with the improvement's direction of the other models.Considering the models on the basis of radiation combination, the spatial RF model exhibited the highest linear relationship (r 2 =0.927) between calculated and estimated ETo in comparison with the other models.The local RF method was the next accurate approach to estimate ETo based on radiation-based data.This observation illustrated the ability of the RF algorithms to estimate ETo using data from local stations for training.Furthermore, the LR model had significant improvement for RMSE and MAE from spatial to local approaches for all three types of input combinations.For LR model, the r 2 value was not changed for radiation-based and temperature-based combination from spatial to local approaches and the change was 0.13 percent for mass-transfer based model.Therefore, the LR model illustrated almost similar result for both spatial and local approaches among all models.