Comparison of the Automatically Calibrated Google 2 Evapotranspiration Application-EEFlux and the 3 Manually Calibrated METRIC Application 4 5

Reliable evapotranspiration (ET) estimation is a key factor for water resources 33 planning, attaining sustainable water resources use, irrigation water management, and water 34 regulation. During the past few decades, researchers have developed a variety of remote 35 sensing techniques to estimate ET. The Earth Engine Evapotranspiration Flux (EEFlux) 36 application uses Landsat imagery archives on the Google Earth Engine platform to calculate 37 the daily evapotranspiration at the local field scale (30 m). Automatically calibrated for each 38 Landsat image, the EEFlux application design is based on the widely vetted Mapping 39 Evapotranspiration at high Resolution with Internalized Calibration (METRIC) model and 40 produces ET estimation maps for any Landsat 5, 7 or 8 scene in a matter of seconds. In this 41 research we evaluate the consistency and accuracy of EEFlux products that are produced 42 when standard US and global assets are used. Processed METRIC products for 58 scenes 43 distributed around the western and central United States were used as the baseline for 44 comparison. The goal of this paper is to compare the results from EEFlux with the standard 45 METRIC applications to illustrate the utility of the EEFlux products as they currently stand. 46 Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 July 2018 doi:10.20944/preprints201807.0040.v1 © 2018 by the author(s). Distributed under a Creative Commons CC BY license. Given that EEFlux is derived from METRIC, differences are expected to occur due to 47 differing calibration methods (automatic versus manual) and differing input datasets. The 48 products compared include the fraction of reference ET (ETrF), actual ET (ETa), and 49 surface energy balance components net radiation (Rn), ground heat flux (G), and sensible 50 heat flux (H), as well as Ts, albedo and NDVI. The product comparisons show that the 51 intermediate products of Ts, Albedo, and NDVI, and also Rn have similar values and 52 behavior for both EEFlux and METRIC. Larger differences were found for H and G. Despite 53 the more significant differences in H and G, results show that EEFlux is able to calculate 54 ETrF and ETa values comparable to the values from trained expert METRIC users for 55 agricultural areas. For non-agricultural areas such as semi-arid rangeland and forests, the 56 automated EEFlux calibration algorithm needs to be improved in order to be able to 57 reproduce ETrF and ETa that is similar to the manually calibrated METRIC products. 58


Introduction
Reliable and accurate estimates of water consumption are essential for water rights management, water resources planning and water regulation, especially for agricultural fields that may have specifically attached water rights [1].Over the past few decades, a variety remote sensing techniques have been used to quantify evapotranspiration (ET) at the field and larger scales over large range of agricultural and nonagricultural land uses [1][2][3][4][5][6].Among the types of remote sensing of ET models, surface energy balance techniques are one of the more popular methods used.The Mapping Evapotranspiration at high Resolution with Internalized Calibration (METRIC) application [7,8] is one of the more widely used surface energy balance models in operational practice, and employs principles and techniques that originated with the Surface Energy Balance Algorithms for Land (SEBAL) [9].
The accuracy of METRIC ET has been evaluated using measured ET by Lysimeter, Bowen ratio and eddy covariance towers in a range of locations of the U.S. [10][11][12][13][14][15][16].Because results of comparisons between METRIC ET and measured ET have been promising, and due to the physically-based employment of surface energy balance algorithms, METRIC is considered to be a well-established model that has been routinely applied as part of the water resources management operations in a number of states and federal agencies [17].However, applying METRIC can often be time-consuming, since a well-trained expert is typically needed to calibrate and run the model.Calibration of METRIC is required for each Landsat scene and image date and entails the determination and assignment of extreme ranges in ET (high and low) to locations within an image.The step calibrates temperature-impacted components of the surface energy balance to reproduce the assigned ET range.Different users who might not be equally experienced can produce different results.To reduce the uncertainties associated with the calibration process, and to save time and money [15,18], designed automated calibration algorithms for the METRIC model to generate ET estimates comparable to ones manually produced from well-trained users.Comparison results have suggested that an automated calibration algorithm can estimate ET comparable to the ET estimated by trained users, and the variation within populations of ET produced with automated calibrations have mimicked the variation produced manually between different users [15].
Although the automated calibration of the METRIC application reduces some of the expertise requirements of ET production, users still have to accrue and assemble a variety of inputs including the satellite image, land cover map, digital elevation map, local weather data, and soils map, from a variety of sources and platforms.There can be a significant amount of pre-processing required for the different inputs before applying the algorithms.
The input and data handling can be one of the most time consuming parts of the overall process.As a means to automate data assembling and handling and to speed the ET computation process, the Earth Engine Evapotranspiration Flux (EEFlux) application was designed and developed on the Google Earth Engine (GEE) platform based on the METRIC model [7].EEFlux utilizes Landsat imagery archives stored on GEE, a cloud-based platform (see Allen et al., [10]).A web-based interface provides users with the ability to request ET estimation maps for any Landsat 5, 7 or 8 scene in a matter of seconds.EEFlux also provides rapid generation of intermediate product maps, such as surface temperature (Ts), normalized difference vegetation index (NDVI) and albedo maps for given Landsat scene that may be useful for other applications besides ET.
The goal of this paper is to compare the results from EEFlux with standard manually calibrated METRIC products to assess the utility and accuracy of EEFlux products as they currently stand.Though METRIC does not represent ground-truth, its standing in the scientific community is established, making it a reasonable benchmark for comparison.
Further, given that EEFlux is derived from METRIC, it is useful to examine the differences between their products.Differences are expected due to the differing energy balance calibrations (automatic versus manual), versions of METRIC, geographic location and differing input datasets.Because of the continuing evolution of both METRIC and EEFlux, there are algorithmic differences beyond the energy balance calibrations, but these generally tend to have more minor impacts on the final ET products relative to calibration and input differences.Therefore, this paper does not seek to trace each algorithmic difference but touches on some of the significant known differences.The products compared include the fraction of reference ET (ETrF), actual ET (ETa), net radiation (Rn), ground heat flux (G), sensible heat flux (H), Ts, albedo and NDVI.Those products were gathered from 58 METRIC scenes in the western and central United States that were produced by trained individuals.

Study Area
A suite of images from different parts of the western and central U.S. were chosen to compare the performance of automatically calibrated EEFlux to manually calibrated METRIC, and locations within agricultural fields and non-agricultural land areas were examined.These areas were selected due to the importance of water in the areas and the significant impacts of water on the study areas' economies.In this comparison analysis, we used existing processed METRIC images that had been developed to identify or address particular water resources issues in key areas.Analyzing different regions of the U.S. provided a basis for examining regional differences in comparison statistics.
In total 58 Landsat image dates were evaluated in this study.Figure 1

Methods
Because the objective of this study was the comparison between the automatically calibrated EEFlux products to manually produced METRIC products, we discuss the primary differences between the two applications and refer the readers to primary documents that explain the details of the METRIC model (e.g., [1,[7][8][9]17]).We note that the GEE-based EEFlux application is still being actively developed by the University of Nebraska-Lincoln (UNL), University of Idaho (UI) and Desert Research Institute (DRI).EEFlux production data from version 0.9.4 was used in this study.
In this section, we briefly explain the sampling methods we used and introduce the criteria used to compare EEFlux and METRIC products.We note that METRIC algorithms have been improved upon and evolved over time, with applications of METRIC in the study areas occurring over a number of different years (2002-2016), and using different versions of METRIC algorithms.The different versions of METRIC include differences in produced energy balance components that are generally minor, for example, in the calculation of ground heat flux and aerodynamic roughness.

Similarities and Differences between EEFlux and METRIC
EEFlux employs primary METRIC algorithms that conduct a full energy balance at the land surface and calculate latent heat energy (LE, W/m 2 ) on a pixel by pixel basis as a residual of the surface energy balance equation: where LE is heat energy used by water in its phase change from liquid to gas during the ETa process, Rn is net radiation flux density (W/m 2 ); G is the ground heat flux density (W/m 2 ) representing sensible heat conducted into the ground; and H is the sensible heat flux density (W/m 2 ) convected into the air.LE is estimated at the exact time of the satellite overpass for each pixel.ETa is then calculated by dividing LE by the latent heat of vaporization: where ETinst is the instantaneous ET flux (mm h -1 ); 3600 converts seconds to hours; ρw is the density of water (~1000 kg m -3 ); and λ is the latent heat of vaporization (J kg -1 ) that can be computed using Ts, which is the surface temperature (K): λ = [2.501-0.00236(Ts -273.15)]× 10 6 (3) The ETrF is calculated for each pixel as the ratio of the computed ETinst from each pixel to the instantaneous tall crop reference evapotranspiration (ETr): ETrF is used as a vehicle for extrapolating ET from the instant of the overpass to the  [23,24] and the Climate Forecast System Reanalysis (CFSR) (http://cfs.ncep.noaa.gov/cfsr/)[25] gridded weather data for all calculations.
The use of gridded weather data in EEFlux can explain, to some extent, differences between METRIC and EEFlux final products, including estimates for daily ETa.This is discussed in more detail in the following sections.More detailed information on METRIC and EEFlux ETr calculations is found elsewhere [10,26,27].
During calibration, METRIC and EEFlux solve the energy balance equation by applying an estimate for ETa at low ET and high ET conditions and solving for H = Rn -G -LE.The low and high ET calibration end-points are referred to as hot and cold pixels.In METRIC, these end-points are searched for automatically or manually, and EEFlux, they are determined automatically.LE is computed by multiplying ETr by the assumed fraction of ETr at the calibration points (typically between 0 and 0.1 for the hot pixel and between 1 and 1.05 for the cold pixel).The estimate for instantaneous ETr does not have a large effect on the ETrF or ETa values, since ETrF is assigned to the end-point conditions.However, it does have an impact on the internally computed H, which is used to absorb and later correct for systematic biases in the other parameters, including Rn, G, albedo, aerodynamic roughness and ETr [7].
A significant internal difference between EEFlux and METRIC is in the way they calculate G.Some versions of METRIC evaluated calculated G by the following equations depending on the pixel leaf area index (LAI) value: LAI is estimated from surface-corrected NDVI.Due to the differences in calculation of G, the G products often do not match well between METRIC and EEFlux.These differences are carried into the calibration of H, as previously described, but are generally factored back out during calculation of ETa due to the internal bias correction of METRIC and EEFlux.This is shown later in the results.
METRIC and EEFlux use similar methods for estimating aerodynamic roughness length for momentum transfer, zom, used in calculating aerodynamic resistance in the calculation of H, sensible heat flow from the surface to the air.zom is estimated as a function of estimated LAI for agricultural land classes and as fixed values for nonagricultural classes.METRIC and EEFlux apply a Perrier roughness function [28] for trees, where roughness is a convex function of amount of ground cover.Some versions of METRIC provide for local modification of land cover maps to specify orchard, vineyard and tall (corn) crops so that special estimation can be made for zom as well as albedo and surface temperature to account for shadowing in deep canopies.

Sampling method and comparison criteria
For the comparisons, the highest percentage cloud-free images were selected for the five locations and, for the few images having minor cloud cover, a cloud mask was applied to avoid sampling from clouded areas.A minimum thermal threshold of 270 (K) was used to further screen sampling pixels to avoid thermal pixels lying near the edges of cloud masks or at the edge of gaps in Landsat 7 images caused by the Scan Line Corrector failure.
Occasionally, thermal pixels in Landsat 7 images are contaminated by cubic convolutionaveraged non-data values stemming from the original native thermal resolution of 60 m.
For the comparison, we randomly chose 1000 pixels from specified areas of interest in the Landsat scenes.These areas targeted primary agricultural areas and adjacent non-

Results
Five locations in the United States comprised of nine Landsat image scenes were used to compare the automatically calibrated EEFlux products to the manually calibrated METRIC products.Although the final and primary products of the applications are ETrF and ETa, we also compared intermediate products from the models including Ts, albedo, and NDVI, and the primary components of the energy balance: Rn, G, and H. EEFlux is a user-friendly webbased platform that enables users to download the intermediate products of Ts, albedo, and NDVI in addition to ETrF and ETa.Therefore, it is useful to confirm similarity with METRIC for those additional products.
We compared the intermediate and final products for each location and calculated R 2 , RMSE, and slopes relative to the METRIC products.environments.The general aridity of synoptic weather data, with generally lower humidity content and higher air temperature than experienced under irrigated conditions, especially in semiarid and arid climates [32,33], causes overstatement of ETr by the Penman-Monteith combination reference equation that presumes a well-watered surface and associated air temperature and humidity parameters [20].This is discussed more in a later section.

Overall Summary of EEFlux vs METRIC comparisons
A summary of comparisons over all 58 images and five locations was compiled by combining all sampled data and calculating overall R 2 , RMSE, and slope values.For individual image and location comparisons, the reader is referred to Supplemental Tables 1-6 that provide statistics for both agricultural and non-agricultural areas for each image date.[27,34].This bias is the basis for ongoing studies and development of methods to identify and condition gridded data sets to remove aridity bias prior to calculation of reference ET, which represents near maximum ET in well-watered environments [32].We further explored the ETr biases for each individual date and location as described later in the discussion section.

ET r F and ET a examples
For most applications, the primary products of EEFlux and METRIC that are of most interest are ETrF and ETa.Therefore, this results section focuses on those two products.In the following section, we explore the differences between EEFlux and METRIC by discussing average statistics determined for ETrF and ETa for each of five locations.

EEFlux ET r F vs METRIC ET r F for Individual Locations
Table 2 provides a statistical summary for ETrF comparisons for each of the nine Landsat path and row locations evaluated that were located in five general USA locations.Statistics are provided for agricultural and non-agricultural land uses.As shown in Table 2 and Figures 4 and 5, there was minor underestimation of ETrF values by EEFlux, relative to METRIC, within agricultural land uses for some locations.However, the results were generally good, and EEFlux, on average, is judged to have produced reasonably accurate and useful ETrF imagery, particularly in southern California, southern Oregon, the Green River area of Wyoming, and in southern Idaho, with average R 2 values higher than 0.84 and average slope values larger than 0.93, and where, in some of the areas, slopes were nearly 1.00.Moreover, the RMSE values in these areas were almost all less than 10% of the average magnitudes of ETrF values (0-1.05).RMSE values of 10% are considered by Allen et al., [29] and Jensen and Allen [32] to be common to ET estimation and ET measurement.Within the agricultural fields in Nebraska, EEFlux performance was not as good or consistent as for the other locations.However, RMSE and R 2 values are still within our acceptable range, except for one scene area which had an ETrF RMSE value of 0.28 and R 2 value of 0.69.This was previously illustrated in Figure 3 and is explained by the impact of recent rains, where EEFlux underestimated ETrF for agricultural areas for several dates in central Nebraska.ETrF equals instantaneous ETrF as is done for agricultural land uses [7].The typically 450 stronger ETr from gridded weather data impacts this transformation.Causes of these 451 differences, with location, continue to be investigated.452

EEFlux ET a vs METRIC ET a for Individual Locations
Table 3 provides a statistical summary for ETa comparisons for the nine Landsat path and row locations evaluated, for both agricultural and non-agricultural land uses.Figures 6    and 7 show average slopes and RMSE values for ETa.Supplemental Figure 10 provides similar plots for average R 2 values for ETa.As shown in Table 3 and Figures 6 and 7, slope values increased over those for ETrF for both agricultural and non-agricultural areas for most of the locations investigated.As discussed previously, that is largely a consequence of ETr overestimation by use of the gridded weather data set [27,34]

Time dependency of EEFlux performance
Because the study area in southern California had the broadest time series of processed images, we chose this location to explore the time dependency of EEFlux performance and to assess the impact of time of year on performances of the two processing systems.As described earlier we evaluated 13 processed Landsat 8 images for the southern California location.The first and last images evaluated were the 26 th of January 2014 and the 10 th of November 2014, respectively.Figure 8 shows R 2 , slope, and RMSE values for ETrF and ETa for agricultural and non-agricultural land uses for different comparison dates.Generally, there was not any statistical correlation between the performance of EEFlux as compared to that of METRIC with time of year.While R 2 values for both ETrF and ETa were always higher for agricultural land uses as opposed to non-agricultural land uses, no trends through time were detected.The slope values were similar over time for both agricultural and nonagricultural land uses.However, slopes for non-agricultural ETrF and ETa do show a slight trend, decreasing from March through November.RMSE values for ETrF, like R 2 and slope values did not follow any visible trend during 2014 in the agricultural land uses in southern California.However, as observed in the bottom plot of Figure 8, RMSE values for ETa increased for both land covers during summer time, indicating larger differences between EEFlux ETa values and METRIC values during the primary growing season when ETa was higher.

Discussion
Based on the comparison results, we conclude that the implementation of EEFlux on In non-agricultural land uses, EEFlux did not match with METRIC as well as it did for agricultural land uses.This may be partially due to differences among G and H products and DEM sources used.As noted earlier, we evaluated EEFlux version 0.9.4 and, as EEFlux is still in progress, the automated calibration algorithms are expected to be improved in the future, which should result in even more accurate ETrF and ETa estimates.

Source of Reference ET Estimation
Besides using ETr for internal energy balance calibration and computation, EEFlux uses gridded weather data to extrapolate instantaneous daily ETrF values to the 24-hour period, which is then multiplied by 24-hour ETr to calculate daily ETa values.Figure 9 shows ratios of gridded ETr values versus the single ETr values generally used in METRIC computations for each image date and location.As shown in Figure 9, for most dates and locations, the average gridded ETr values used in EEFlux were higher than the associated single average gridded ETr values used by METRIC, with variation within each location from about 0.9 to 1.3.As we discussed earlier, the average EEFlux-gridded ETr was larger than the METRIC calculated, ground-based ETr values by an average ratio of 1.10 and 1.09 for agricultural and non-agricultural land uses, respectively.The higher 24-hour ETr estimation in EEFlux due to the gridded weather data source, leads to some degree of daily ETa overestimation.While R 2 of ETrF and ETa values are higher than 0.89 for both days, the RMSE and slope values are considered to be acceptable for only July 18 th , and is not in the acceptable range for September 4 th .The average R 2 of ETrF and ETa values for combination of all the data were 0.78 and 0.73, respectively.The combined slope values were 0.9 for ETrF and 1.07 for ETa values, which do fall within the acceptable ranges.Scatter in the comparisons is due to small differences in the METRIC version used or in internal parameter settings in METRIC such as corrections for low albedo in crops such as corn that have deep canopies [7].
Combined RMSE values were 0.

Conclusions
The consistency and accuracy of ET products from the automatically calibrated GEE shows the Landsat scene locations and study areas of the research.In central Nebraska, areas along the Platte River were the focus of study, where 15 Landsat images (Paths 29-30 and Rows 31-32), during summer 2002, were utilized.In western Wyoming, agricultural areas along the Green River were evaluated.That area falls into 2 Landsat rows on a single path (Path 37 and Rows 30-31).We utilized 9 Landsat images during summer 2011 for the comparison.Southern California was the third study area (Path 39 and Row 37).Due to its very dry climate, the California location had the highest frequency of cloudless images, so that we were able to evaluate 13 Landsat images from late January 2014 to early November 2014.A large irrigated area in southern Idaho comprised a fourth area containing 15 Landsat image dates from year 2016 (Path 40 and Row 30).That location represents a large irrigated region receiving irrigation water from the Snake River and from the Snake Plain Aquifer.The fifth location was comprised of agricultural areas in the Klamath basin of southern Oregon and northern California where we evaluated 6 Landsat images (Path 45 and Row 31), during the growing season of year 2004.

Figure 1 .
Figure 1.Locations of Landsat Scenes evaluated in this study.
of METRIC calculated G as a function of sensible heat flux for LAI > 0.5 and equation 6b otherwise.Very recent versions of METRIC calculate G as a function of LAI only.The version of EEFlux evaluated calculated G as:

Figure 2
shows an example comparison for each product sampled from within agricultural fields in Path 29 Row 32 in central Nebraska for a Landsat 5 (2002/06/28) image.Additional graphs of the same format as Figure 2 are included for each location studied in the Supplemental Figures 1-8.

Figure 3 illustrates
Figure 3 illustrates ETrF and ETa correlations and behavior between EEFlux and METRIC over individual sample points for two locations (central Nebraska and southcentral Idaho) and two Landsat systems for agricultural areas.The top two rows of graphs show good EEFlux calibration and estimation relative to the METRIC calibration and estimation, producing relatively good R 2 , RMSE, and slope values.The lower row of graphs illustrates a poorer calibration where EEFlux substantially underestimated ETrF and ETa especially in the lower end of the ET spectrum, as reflected in poor R 2 , RMSE, and slope values.The poor agreement for the particular location and date indicate that the EEFlux automated calibration algorithms can fail under some conditions.As previously noted, those algorithms are under continued improvement by the UNL and UI developers.While the automated calibration of EEFlux is prone to producing poor calibrations under some circumstances, it should be noted

Figure 4
illustrates average slope values for ETrF for the different locations and Figure 5 presents average RMSE values for ETrF.The supplemental Figure 9 provides similar plots showing average R 2 values for ETrF.

R 2 ,
slope and RMSE values in Table 2 and Figures 4 and 5 indicate that EEFlux ETrF values did not match METRIC ETrF values as strongly for non-agricultural land uses as they did for agricultural land uses.EEFlux tended to underestimate ETrF for all non-agricultural land covers sampled and produced RMSE values that were higher than those for agricultural land uses within the same Landsat scene.Some of the differences are due to different means for estimating soil heat flux, for aerodynamic roughness of natural vegetation systems, and potentially due to impacts of the digital elevation model (DEM) used to estimate solar radiation and aerodynamic behavior in complex terrain that is characteristic of natural systems.Differences are also attributed to the weather data sources used in the application of the evaporative fraction (EF) function to nonagricultural land uses, where a ratio of ETa to Rn -G is used to transform ETrF to 24-hour ETrF values, rather than assuming that 24-hour Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 3 July 2018 doi:10.20944/preprints201807.0040.v1

Figure 4 .
Figure 4. Average slope values for ETrF for EEFlux vs. METRIC for different locations and scenes for agricultural and nonagricultural land uses.

Figure 7 .
Figure 7. Average RMSE values (mm/d) for ETa for EEFlux vs. METRIC for different locations and scenes for agricultural and nonagricultural land uses.

Figure 8 .
Figure 8. a) R 2 , b) slope and c) RMSE values for ETrF and ETa products for EEFlux vs. METRIC for a series of comparison dates (Path 39 Row 37).

Figure 9 . 5 . 2
Figure 9. Ratios of calculated 24-hour ETr used in EEFlux (based on gridded weather data) to that used in the METRIC model (calculated from ground-based weather station data) for five different Landsat scene locations and comparison days.

Figure 10 .
Figure 10.Comparison between METRIC products (ETrF and ETa) that were manually calibrated and produced by 2 different METRIC users.The top two comparisons are for 18 th of July and the bottom two are for 4 th of September.
areas comprised of rangeland or forests.National Land Cover Database (NLCD) (https://www.mrlc.gov/)raster data were used to distinguish between agricultural and non- [1,7,[29][30][31]Error (RMSE) and Coefficient of Determination (R 2 ) were calculated for each set of data to compare EEFlux products with the same products from METRIC.In addition, slopes of EEFlux products vs. METRIC products with zero intercept were calculated to indicate when EEFlux underestimated or overestimated the products, on average, compared to METRIC.In this study, R 2 values higher than 0.8, RMSE values less than 15% of the average magnitude of each product, and slope values between 0.9 to 1.1 were conidered acceptable, in terms of expected error common to operationally produced spatial ET products[1,7,[29][30][31].

Table 1
presents the overall R 2 , RMSE, and slope values for all products for agricultural and non-agricultural areas.Intermediate products of Ts, Albedo, and NDVI were relatively similar between agricultural and non-agricultural classes, with R 2 and slope values close to 1 and with relatively small RMSE values.Rn estimates by EEFlux correlated well with those by METRIC, with an average R 2 value of 0.93 and slope of 1.02 for agricultural areas and average R 2 of 0.87 and slope of 1.02 for non-agricultural areas.Relative RMSE for Rn was less than 5%, on average, for Rn for both land covers.The other two energy balance components sampled (G and H) did not match as well between EEFlux and METRIC.The poor agreement for G is attributed to the previously noted differences between METRIC and EEFlux equations for G.Although the equations for G differed between EEFlux and the various METRIC versions, the average RMSE and slope indicate that EEFlux still calculated Because METRIC typically uses ground-based weather data for hourly and daily ETr calculation, and EEFlux uses gridded weather data sets to derive ETr, the calculated ETr values used in computations can be different due to differences in origin of weather data and aridity biases common to the gridded weather data sets.While several of the METRIC applications applied only a single ETr value for an entire Landsat image for both energy balance calibration and for interpolation to 24-hour periods, ETr values used in EEFlux can vary across the image through the gridded weather data that has an approximately 12 km grid spacing for NLDAS-2 hourly data, for CONUS, and 4 km grid spacing forhour data.In order to explore differences among ETr values used in METRIC and EEFlux, we calculated averages of gridded ETr values for each image date and associated ratios of those average values to the typically single scene-wide METRIC ETr values.Table1summarizes average slopes of 24-hour EEFlux ETr values to METRIC ETr values.On removed from the ET estimates during the ET production steps, due to the internal, systematic bias correction of METRIC and EEFlux.Differences in H are also traceable to the sources used to compute instantaneous ETr as noted previously, where generally higher estimates in ETr in EEFlux produce lower values for H during the surface energy balance calibration.average,over all five locations and the dates evaluated, the grid-based ETr ran higher than ground-based calculated ETr by ratios of 1.10 and 1.09 for agricultural and non-agricultural land uses, respectively.The approximately 10% higher ETr estimation by the gridded data suggests that general ET applications with EEFlux can be biased 10% high solely due to the aridity bias of the gridded data sets

Table 1 .
Average values for R 2 , RMSE, and slope for EEFlux vs. METRIC, based on a comparison over all data (Ag sample size = 47838, Non-Ag sample size = 35110)

Table 2 .
Average values for R 2 , slope and RMSE for ETrF for each Landsat scene location 453

Table 3 .
. R 2 and slope values were generally within the acceptable accuracy range for agricultural areas.R 2 values were mostly larger than 0.8 and RMSE values were generally in the range of 0.9 to 1.1 mm/d, except one location where it was 0.69 mm/d.Most R 2 values were less than 0.8 for non-agricultural land uses and RMSE values in all locations, except for southern California and southern Idaho, were larger for non-agricultural land uses as compared to agricultural lands.Slope values show that EEFlux tended to underestimate ETa for non-agricultural land uses everywhere except for southern Idaho.In general, ETa was substantially lower in non-agricultural land uses than in agricultural areas due to limits on ET imposed by precipitation amount.The agricultural areas sampled were generally all irrigated.Average values for R 2 , slope and RMSE for 24-hour ETa for each Landsat scene location evaluated.RMSE values have units of mm/d.Average slope values for ETa for EEFlux vs. METRIC for different locations and scenes for agricultural and nonagricultural land uses.
GEE, including the automated internal calibration, has been relatively successful.EEFlux ETrF and ETa results matched those from manually applied METRIC applications for most of the agricultural areas evaluated.For some dates within central Nebraska, EEFlux performance was poorer than for the other locations for agricultural land uses.Some of the increased error is due to fewer Landsat images processed for that region due to extensive cloud clover.In one location we were able to evaluate only 3 Landsat image dates (Path 29 Row 31) and for the other three Worldwide Reference System (WRS) scene areas we evaluated 4 image dates; whereas we evaluated 13 Landsat Image dates in California and 15 image dates in Idaho.Having fewer image dates can result in more extreme means due to greater impacts of outliers and/or a smaller sample size.Other impacts, as noted, for central Nebraska is the tendency for more frequent and substantial rainfall during the growing season that increases the impact of background evaporation.This complicates the image calibration.
[14]r ETrF and 0.98 mm/d for ETa values.A comparison of these average R 2 , slope and RMSE values with average values for EEFlux vs. METRIC summarized in Table.1, suggests that, for the locations evaluated, that the EEFlux automated calibration algorithm is generally able to estimate ETrF and ETa values for agricultural land uses that are comparable in accuracy and reproducibility to differences noted from METRIC when applied by different trained users.This finding is consistent with that of Medellín-Azuara et al.,[14].
EEFlux application were evaluated by comparing EEFlux products to those from manually calibrated METRIC images for 58 Landsat images.Sets of Landsat images from five study locations distributed across central and western USA included both agricultural and nonagricultural land uses.The agricultural areas sampled were typically irrigated.The comparison results show that EEFlux is able to calculate ETrF and ETa values in agricultural areas that are comparable to those produced by trained METRIC users and that are generally within accepted accuracy ranges.Differences between EEFlux and METRIC were larger for non-agricultural land uses showing room for improvement to the EEFlux algorithms.Differences noted could, in part, be the result of EEFlux struggling to account for background evaporation at the hot pixel calibration end point.Hot pixel bias in the hot pixel assigned ETrF tends to affect the non-agricultural pixels more than agricultural pixels because the nonagricultural pixels tend to have lower ET and are therefore more impacted by error or bias in the overall surface energy balance.Another likely reason for the poorer performance for nonagricultural land uses is a bias introduced during the application of EF to extrapolate instantaneous ETrF to daily ETrF, as discussed earlier.The EF relies on the instantaneous and 24-hour ETr, Rn and G being accurate.We have established that both ETr and G estimates deviate between METRIC and EEFlux, so we would expect to have different results in the non-agricultural areas.In fact, we should expect larger differences between METRIC and EEFlux in non-agricultural areas than in agricultural areas given that the instantaneous ETrF used in the agricultural areas is robust in the face of biased G and instantaneous ETr.While EEFlux is still a work in progress, it can be used to rapidly estimate ETa for areas of interest.However, it is important to be aware of biases in 24-hour ETa estimates due to aridity biases in the gridded weather data used by EEFlux.Results presented in this paper should provide a good overview of the general variability and error to be expected for ETrF and ETa estimates from EEFlux.