The China Meteorological Assimilation Driving Datasets for the SWAT Model ( CMADS ) and it ' s Application in the Heihe River Basin in China

Large-scale hydrological modeling in China is challenging given the sparse meteorological stations and large uncertainties associated with atmospheric forcing data. Here we introduce the development and use of the China Meteorological Assimilation Driving Datasets for the SWAT model (CMADS) in the Heihe River Basin(HRB) for improving hydrologic modeling, by leveraging the datasets from the China Meteorological Administration Land Data Assimilation System (CLDAS)(including climate data from nearly 40000 area encryption stations, 2700 national automatic weather stations, FengYun (FY) 2 satellite and radar stations). CMADS uses the Space Time Multiscale Analysis System (STMAS) to fuse data based on ECWMF ambient field and ensure data accuracy. In addition, compared with CLDAS, CMADS includes relative humidity and climate data of varied resolutions to drive hydrological models such as the Soil and Water Assessment Tool (SWAT) model. Here, we compared climate data from CMADS, Climate Forecast System Reanalysis (CFSR) and traditional weather station (TWS) climate forcing data and evaluated their applicability for driving large scale hydrologic modeling with SWAT. In general, CMADS has higher accuracy than CFRS when evaluated against observations at TWS; CMADS also provides spatially continuous climate field to drive distributed hydrologic models, which is an important advantage over TWS climate data, particular in regions with sparse weather stations. Therefore, SWAT model simulations driven with CMADS and TWS achieved similar performances in terms of monthly and daily stream flow simulations, and both of them outperformed CFRS. For example, for the three hydrological stations (Ying Luoxia, Qilian Mountain, and ZhaMasheke) in the HRB at the monthly and daily Nash-Sutcliffe efficiency ranges of 0.75-0.95 and 0.58-0.78, respectively, which are much higher than corresponding efficiency statistics achieved with CFSR (monthly: 0.32-0.49 and daily: 0.26 – 0.45). The CMADS dataset is available free of charge and is expected to a valuable addition to the existing climate reanalysis datasets for deriving distributed hydrologic modeling in China and other countries in East Asia.

Although the weather patterns and assimilation systems used by these reanalysis datasets are different from each other, the common point is that the patterns and assimilation systems are business mature numerical forecasting models.For example, NCEP-NCAR (NCEP1) uses business numerical model GSM (T62) and SSI assimilation system on January, 1995; NCEP-DOE (NECP2) uses basically the same model and assimilation system with NCEP1, but does some improvements;ERA-40 uses integration forecasting system IFS (T159) by ECMWF and adopts improved 3DVar technology to assimilate data (ERA-40 did not refresh any more after August 2002); ERA-Interim uses ECMWF integration forecasting system IFS (T255) and 4DVAR assimilation system, which is a continuous product of ERA-40.Compared with ERA-40, ERA-Interim not only improves on horizontal resolution (T159->T255), but also adopts more advanced 4DVar technology.JRA-25 uses T106 global spectrum mode (JMA 2002), whose assimilation system is developed based on 3DVar technology.In addition, NCEP buildsreal-time updated CFSR global reanalysis dataset.Different from the past, CFSR adopts global high resolution atmosphere-ocean-land-ice coupled system (whose atmosphere mode is GFS, ocean mode is MOM4, land mode is Noah).Meanwhile the assimilation system of CFSR is also discrete (whose atmosphere mode is GIS-3DVAR, ocean-ice mode is GODAS, and land mode is GLDAS).NASA also builds MERRA global reanalysis data, which uses GEOS-5 ADAS assimilation system based on 3DVar GIS technology (whose horizontal resolution is 38km (T382)) and improves a lot in the field of water cycle simulation.Furthermore, JMA-55 reanalysis dataset adopts TL319L60 (about 60km) on December, 2009, 4DVar assimilation system (T106 inner model) and offline SiB land model (using 3-hour atmospheric forcing data).
Many scientists carried out reliability analysis on the above reanalysis data and obtained a lot of useful conclusions.However, different reanalysis data has its own advantages and disadvantages, and any kind of data does not have the same performance in different areas and time periods.For example, Zhao Tianbao et al(Zhao et al.,2006)compared and analyzed ERA-40 and NCEP-2 and found that the confidence level of ERA-40 was higher than NCEP-2.While Huang Gang(Huang et al.,2006)studiedChina sounding data and pointed out that before 1970s when researching East-Asia climate interdecadal variation, ERA-40 was better; after 1970s, on the description of troposphere geopotential height and temperature in Inner Mongolia and North China, NCEP/MCAR was better than ERA-40.Compared with the two reanalysis plans, the 6-hour global precipitation distribution and quantity produced by JRA-25 and JCDAS are the best in time and space.But due to the low resolution of JRA-25, it is not suitable for mesoscale analysis (Simmons et al., 2000).Najafi et al (Najafi et al.,2012) used CSFR dataset to drive the soil moisture model (SAC-SMA) and analyzed runoff in Donghe River basin with water supply from snow fall and melting.Fuka et al (Fuka et al., 2014) used precipitation and temperature date from the CFSR dataset (http://cfs.ncep.noaa.gov/cfsr/) to drive SWAT model and found that the SWAT simulations driven by CSFR were better than that with TWS.Smith et al (Smith et al., 2013) compared the water balance relations of ERA-Interim、CSFR、MERRA between land surface and atmosphere and concluded that the above datasets all could reflect seasonal changes of water balance well.Lavers et al (Lavers et al.,2012) used ERA-Interim, CFSR, NCEP-NCAR and MERRA to study the relation between winter flood and large-scale climate, demonstrating that all these data could reflect a consistent relationship between the two.Quadro et al (Quadro et al., 2013) found that CSFR performed better in simulating South America water balance compared with NCEP Reanalysis II (NCEP-2) and MERRA.Besides, Wei et al (Wei et al., 2013) simulated three cyclones going through Taiwan Strait by using CFSR and TRMM.Although the CFSR dataset is widely used, we found that this dataset has large uncertainties in precipitation frequency and intensity although large-scale precipitation climatology is captured well (Higgins et al., 2010, Silva et al., 2011).Precipitation is one of the most important factors in the processes of generating runoff.However, due to the lack of reliable observations, the usability and accuracy of CFSR dataset in China are not satisfactory.
Because of the coarse resolution, global climate modes (GCMs) are unable to be completely applied in regional climate pattern, which is also stated in the IPCC fourth report (Gerald et al., 2007).Studies show that GCMs cannot be directly applied in the assessment of regional-scale future hydrological changes (Wood et al., 2004).Given the regional climate models have higher spatial-temporal resolutions than GCMs, Lu et al used precipitation output of mesoscale climate modes (MC2) to drive Xin'anjiang model, which increased the forecasting precision and lead time of land-atmosphere coupling model (Lu et al.,2007).However, Chao et al (Wang et al., 2010) obtained worse results by using the regional climate model (RegCM3) to drive semi-distributed hydrological model (SWAT).For the verification of RegCM3 in China, Jeremy et al (Jeremy et al.,2003) used RegCM3 to simulate and analyze monthly changing rules of precipitation and seasonal precipitation in winter and summer in the monsoon area of East Asian, finding that RegCM3 produced larger errors for precipitation simulation especially in winter.
Compared to Southeastern China, the western area in China has sparse meteorological stations.As a result, limited meteorological stations in this area constrained large-scale modeling studies.Given the poor performance of regional climate models and reanalysis datasets in China, it is necessary to develop a high resolution dataset covering the whole country and evaluate its performance in large-scale hydrological modeling.This study presents the newly developed CMADS dataset which can be used for large-scale hydrological modeling with SWAT model.This new dataset is based on China's land assimilation system forcing field.This study compared the modeling performance with inputs from CMADS, CFSR and TWS, and then evaluated the added value of the newly developed CMADS dataset in large-scale modeling studies with Heihe River Basin as the study region.Heihe River Basin (E98°34′-101°09′N37°43′-39°06′) is the second largest inland river of China, originating from Qilian Mountain in the South and flowing out of the mountain at Ying Luoxia hydrological station.The Heihe River Basin has a higher latitude in the south than in the north, higher in the west than in the east.This basin is characterized by scare precipitation, adequate sunshine and large diurnal temperature range.The total catchment area is 9973km 2 with an average elevation between 1980.629m and 4029.827m(Figure 1).Heihe River Basin has an average annual precipitation of 300mm-700mm and an average annual temperature between -3℃ and 7℃.The mountain area, whose altitude is above 4500m, is covered with ice and snow and the altitude of snow line increases from east to west.Due to the largeamount of precipitation and glacier in Qilian Mountain as well as its mountainous underlying surface and good vegetation distribution, Qilian Mountain area is the upstream area of the whole Heihe River Basin.The multi annual average runoff at Ying Luoxia station is 1.58 billion m 3 .However, the annual runoff changes less in Heihe River Basin, whose maximum and minimum annual runoff are usually smaller than 3.There is large intra-seasonal variability with May and June accounting for 12%-25% of the annual runoff and July and September for 50%-55% of the annual runoff.The financial revenue mainly depends on animal husbandry.This area has abundant water resources and developed irrigation facilities.

Material and methodology
This study used the SWAT model to illustrate the added value of the CMADS data with Heihe River Basin as the study region.The observed stream flows from three hydrological stations in this area were obtained from Heihe River Basin Authority.Then three simulations were conducted with SWAT model driven by CMADS, CFSR and TWS, respectively.Finally, the simulated results were compared with the observations.

Digital elevation model (DEM)
The spatial input data of SWAT model includes DEM data, river network data and land use data.DEM data used in this study is the SRTM -(90m) DEM, which is archived from CGIAR-CSI SRTM 90 database(http://srtm.csi.cgiar.org/SELECTION/inputCoord.asp)(Jarvis et al.,2008).
To ensure the consistency of model, this study set spatial resolution of DEM, soil and land use as 1km, and set projection coordinates as Beijing_1954_GK_Zone_17N.

Hydrological verification data
This study used daily streamflow observations of ZhaMashenke, Qilian Mountain and Ying Luoxia hydrological stations.The details of each station are shown in Table 1.

2.3Atmospheric forcing input data
The study chooses three kinds of datasets as the atmospheric forcing data of SWAT model (shown in Table 2) and Heihe River Basin has four national basic meteorological observation stations (i.e.Tuo Le (T1), Ye Niugou (T3), Qilian (T4) and Zhang Ye (T2)).The observation stations can be regarded as the most authoritative results in space.In order to assess the accuracy of CFSR and CMADS in the basin, we will analyze the interpolation results of CFSR and CMADS in location T1-T4 of TWS in the following pages.For TWS, The emphasis of this research is to obtain daily average air pressure, average wind speed, average temperature, average relative humidity, daily maximum/minimum temperature, daily precipitation and sunshine duration.The missing values of the observations are filled by the SWAT model embedded weather generator.Specifically, SWAT model uses these observed data to calculate multi annual climate conditions (Neitsch et al., 2009) and then adopts the centroid method to interpolate station elements (Andersson et al.,2012).

Introduction of CFSR dataset
The CFSR dataset is provided by American National Environmental Forecasting Center (Saha et al.,2010), which is an global reanalysis dataset with high resolution covering between 98°34′-101°09′E and 37°43′-39°06′N (atmospheric horizontal resolution is T382, about 38km, 64 floors in vertical).There are 15 interpolating points (CF1 -CF15) in the study region (the distribution of interpolating points is shown in Figure 4).The space resolution is 0.313°*0.313°and the temporal resolution is daily time step from 1 st Jan, 2008 to 31th Dec, 2013, including precipitation, maximum/minimum temperature, wind speed, relative humidity and solar radiation.The official website of SWAT model also recommends using CFSR dataset to drive and build model globally.However, the effectiveness of driving SWAT model by CFSR dataset has not been verified systematically in China.

Introduction of CMADS dataset
CMADS is a new dataset developed by this study, which is based on CLDAS data assimilation technology.CLDAS assimilation system fuses multi-source data such as satellite observation, land surface observation and numerical products (Meng et al., 2017, Shi et al., 2008, Shi et al., 2011, Zhang et al., 2013).This study built CMADS dataset (  In order to verify the applicability of CMADS in China, we used bilinear interpolation method to compare CMADS dataset with elements from national automatic stations of China Meteorological Administration (2421 stations in total).During the validation process, we carried out quality controls (including region threshold, climate threshold value and time-space consistency check) to ensure resolution ratio to be 98.9%.

CMADS evaluation in China
Figure 2(a-f) shows the spatial distribution of bias and root-mean-square error of climatic variables between CMADS dataset and national automatic stations.It is found that CMADS can reflect spatial distribution of land surface elements in China well.Firstly, the temperature bias ranges from-0.5K to 0.5K in China and in extremely specific stations occurs large errors such as -4K error in Qinghai-Tibet Plateau.The total root-mean-square error of CMADS land surface temperature in China ranges from 0.5K to2.0K in western China (especially in Xinjiang and Tibet), while in South and Southeast China the error is majorly smaller than 0.5K.Secondly, from figure 2c we can see that the bias of air pressure is smaller in east than in west, ranging from 0Hpa to 5Hpa in eastern China, Yangtze River and Huai River area; while the bias is between 0Hpa and 17Hpa in Southwest and Northwest China.In most areas of China the root-mean-square error of air pressure is under 11Hpa (under 3Hpa in northwestern region, under 5Hpa in northeastern, northern and eastern China).Thirdly, the bias distribution of CMADS relative humidity in China (Figure 2e) is between -2% and -6%, while in northwestern, southwestern and northern China, the bias is mainly negative (between 1% and -4%).The distribution of root-mean-square error of relative humidity in China is displayed in Figure 3f, showing that the error is around 3%~9% in most areas of China, while some stations in Xinjiang area, the middle area of northwestern region, central China and the middle area of southwestern region has a 7%-9% root-mean-square error.Fourthly, the bias and root-mean-square error of CMADS surface wind speed are displayed in Figure 3g-h.It is found that bias is between -1m/s and 0.75m/s.In Jianghuai region, the middle area of southwestern China and the southern area of North China, the bias is positive (0m/s-0.75m/s).The root-mean-square error of CMADS surface wind speed performs well in most areas, which is between 0.5m/s and 1.0m/s.In some stations of southwestern, southeastern and northern China, the error reaches up to 1.0m/s-1.5m/s.In conclusion, after evaluating the accuracy of CMADS data in China against national automatic stations, we can see that CMADS dataset match the observations well.
In this study, SWAT model requires 11 interpolated stations (CM1-CM11) of CMADS V1.0 (resolution ration: 0.333°).The distribution of multi-annual total precipitation and maximum/minimum temperature of CMADS in Ying Luoxia River Basin is shown in Figure 4.The study emphasizes on verifying the utility of CMADS dataset for driving hydrological model in China.Information of the three different meteorological forcing datasets is shown in Table 2. Through the above analysis we find that there are few meteorological stations in western China, which can not satisfy large-scale hydrological simulation.Compared to TWS, CMADS and CFSR exhibit obvious advantages.This study used 11 and 15 meteorological stations from CMADS and CFSR dataset respectively, while only 4 TWS (T1-T4) were available in this basin.Notably, we found that there were missing values in each station, with the missing ratio up to 3.395%, 8.762%, 4.654% and 7.448% in TuoLe(T1), Zhang Ye(T2), Ye Niugou(T3), Qilian(T4), respectively.However, there is no missing value of CMADS and CFSR dataset driven by SWAT model, which is the advantage of assimilation dataset compared to TWS.In order to quantitatively analyze the differences of two kinds of interpolated datasets (CFSR and CMADS) in Heihe River Basin, we extracted the spatial coordinates of four traditional weather stations in the study area (Figure 4), evaluated and verified the utility of CMADS and CSFR dataset compared to TWS observed data.The four interpolating points are national meteorological stations (T1-space coordinates: 38.82,98.42； T2-space coordinates：39.09,100.29；T3-spacecoordinates：38.42,99.59,T4-spacecoordinates： 38.18,100.25).After analyzing data extracted from CFSR and CMADS compared with TWS data, we found that (Figure 5, 6) the fitting goodness between CMADS and TWS was better than that between CFSR and TWS and there were underestimated precipitation of CMADS at four stations from May to September between 2009 and 2011.The maximum error of precipitation was 0.28mm while correlation coefficient was higher than 0.992, which indicated that the fitting goodness between CMADS and TWS dataset is high.Compared with CMADS, the performance of CFSR were not good.Precipitation at four interpolating points during the five years (2009-2013) was all over-estimated, with the largest error up to1.15mm/month.In addition, maximum temperatures at four stations were all underestimated, with the largest error of -9.41℃/month (Figure 5 T4) and smallest error of -5.93℃/month.The evaluation results are shown in Table 3.The SWAT model was driven with the above datasets (TWS, CMADS and CFSR) to further investigate the hydrological performances of CMADS.

Introduction of SWAT model
The SWAT model is a semi-distributed model, which can simulate basin-scale hydrology, sediment and non-point source pollution (Neitsch et al., 2009).Different from other hydrological models, SWAT model separates one basin into several HRUs and set areas with the same land use, soil category and gradient as one independent HRU.SWAT model has been widely used throughout the world since publication (Zhang et al., 2013).

3SWAT model settings
The study area is divided into 24 sub-basins based on DEM.Then SWAT model divides each sub-basin into several HRUs.In SWAT model, water balance of each HRU is calculated based on surface runoff, interflow, base flow, infiltration, river transfer loss and evapotranspiration.The combinations of three different forcing data (CMADS, CFSR and TWS) forcing SWAT model are referred to as CMADS+SWAT mode, CFSR+SWAT mode and TWS+SWAT mode hereafter.
All three modes use Penman-Monteith method to calculate potential evapotranspiration, which requires solar radiation, temperature, relative humidity and wind speed.Since there is no solar radiation data in TWS dataset, the solar radiation of TWS+SWAT mode is generated by the weather generator (predicted by Markov) embedded in SWAT itself.Given daily input data, all three modes adopt Soil Conservation Service's curve (SCS) to calculate surface runoff and SCS curve, which is a non-linear function between precipitation and initial loss.Surface runoff is calculated in each HRU and finally routes into the main channel.Finally, we choose river storage method based on continuity equation to calculate main channel water.
Based on Centriod interpolation principle, SWAT model can interpolate spatial discrete meteorological data at single point into the whole basin (Wood et al.,2004).To reduce errors caused by spatial dispersion and interpolation (especially in mountain area) and increase precipitation accuracy of HRU and natural sub-basin, this study combines information extracted from Heihe River basin evaluation and marks off several evaluation areas.Precipitation gradient can simulate precipitation distribution in different evaluation areas well, as after evaluation adjustment each basin's precipitation will be generated through model output.
This study chooses the simulation period as 2008-2013, with year 2008 for model spin-up.Here, the calibration period is from 2009 to 2010, while the verification period is from 2011 to 2013.

Sensitivity analysis
The study used SWAT-CUP software developed by EWAGE (Abbaspour et al., 2007b) to analyze and calibrate parameters of three modes.SUFI-2 algorithm (Abbaspour et al., 2004, Abbaspour et al., 2007a) was chosen to run SWAT-CUP Software (Abbaspour, 2011), including model calibration, validation, sensitivity analysis and uncertainty analysis.The algorithm involves all kinds of uncertainties, such as parameters, conceptual models, input and so on, in order to reach a 95% prediction uncertainty (95PPU) for the majority of measured data.The 95PPU is calculated at the 2.5% and 97.5% levels of the cumulative distribution of an output variable obtained through Latin hypercube sampling.Sensitivity analysis was used to analyze which parameter or which kind of parameters was most sensitive.In this study, we analyzed parameters related to runoff (26 parameters in total).After that we obtained the rank of sensitive parameters driven by three kinds of meteorological data as shown in Table 4.

Model calibration
The study chose the first 14 sensitive parameters for calibration according to significant parameters and simulated conditions (Abbaspour, et al., 2015) between 2009 and 2010 and verified the model performance from 2011 to 2013 driven by d ifferent datasets.After being calibrated at the monthly scale, we carried out parameter calibration with daily data and verified daily runoff.During this process, we firstly considered the ratio between annual evaporation and runoff, and then ensured a reasonable level of simulated total evaporation, precipitation and runoff.Besides, when calibrating three hydrological stations (Ying Luoxia, ZhaMashenke and Qilian Mountain), we calibrated Qilian Mountain station at first, then ZhaMashenke station and finally Ying Luoxia station.This is because compared with the other two stations, Ying Luoxia station locates in the downstream, and then the accurate calibration of upstream parameters can be a good foundation for downstream calibration.
It is found that there is difference between the best parameters of three modes.

3.3Model assessment
The study uses two evaluation index: Nash-Sutcliffe Efficiency (NSE) and determination efficiency (R 2 ) (Nash et al., 1970), which are both widely used to assess model performance.
Nash-Sutcliffe Efficiency is a normal statistic equation, which reflects fitting degree between observed data and simulated results (Schaefli et al., 2007).NSE can be calculated with equation ( 1): Where Q is runoff variable, m Q and s Q represent runoff observed value and simulated value respectively and m Q is runoff average observed value.NSE ranges from -∞ to 1.When NSE equals 1, it denotes that observed data fits well with simulated data.When NSE is between 0.1 and 1, indicating simulation results can be accepted.When NSE is smaller than 0, we deem that simulation result is bad.Determination efficiency: it reflects the correlation degree between measured variables.R 2 can be calculated by equation ( 2 Where m Q and s Q represent runoff observed value and simulated value respectively, i is the i th simulated or observed value.Some studies choose R2＞0.5andNSE＞0.5 as the satisfactory criterion of SWAT model (Santhi et al.,2001), while others think that NSE＞0.4 can also be the satisfactory criterion (Ahmad et al.,2011).This study adoptes evaluation criterion by Moriasi et al (Moriasi et al., 2007).Namely, during model calibration period, if monthly-scale simulation result NSE≥ 0.65 or daily-scale result NSE≥ 0. 5, then the results can be acceptable (Santhi et al., 2001).(Moriasi et al.,2007) and Santhi (Santhi et al.,2001), it is found that at the monthly-scale, CMADS+SWAT mode and TWS+SWAT mode both achieved satisfactory performance at three stations (shown in Table 5).At the monthly-scale (Figure 7, Figure 8 and Figure 9), the simulation results of CMADS+SWAT mode (Figure 8A) were better than the results of TWS+SWAT mode at ZhaMashenke station (Figure 8B).Due to no meteorological stations at ZhaMashenke, CMADS dataset had greater advantages than TWS dataset.However, the monthly simulation results of CMADS+SWAT mode were slightly over-estimated compared with TWS+SWAT mode at No.2 sub-basin (Ying Luoxia), which might be caused by more precipitation (Figure 16C) of CMADS+SWAT mode (May-Oct each year) than TWS+SWAT mode.The over-estimation came from centroid interpolation method and elevation with secondary adjustment of SWAT model itself and meteorological data.Nevertheless, the slightly over-estimated precipitation of CMADS at No.2 sub-basin (Ying Luoxia) did not cause larger errors in model simulations (Table 5).In addition, we found that the simulation effects of CFSR+SWAT model at three stations were unsatisfactory.Runoff was overestimated compared with observed data with the largest NSE efficiency coefficient being 0.49 (Figure 7C, 8C, 9C).Furthermore, runoff overestimation existed during the increasing runoff period from October to August next year at all three sub-basins.In September each year, simulation results of CFSR+SWAT mode were underestimated.Because the distribution of precipitation within the year was overestimated, the base flow was also overestimated each year (Figure 7C, 8C, 9C).This was because CFSR data was not corrected against observed meteorological stations in China, then precipitation was overestimated.Although runoff was simulated well after model parameter calibration, SFSR+SWAT mode tended to have overestimated evaporation (Figure 5), which might also be related to the underestimation of maximum temperature (Figure 6).Due to the over estimation of CFSR precipitation, evaporation exceeded local annual evaporation greatly when calibrating CFSR+SWAT mode (Figure 14).After monthly-scale calibration at three sub-basins (Figure 7 to Figure 9), we introduced the optimum parameters into SWAT model to continue calibrating and adjusting three modes at daily scale.Results indicated that similar to monthly simulation, both CMADS+SWAT mode and TWS+SWAT mode performed well at daily scale (Table 5, Figure 10, Figure 11 and Figure 12).Runoff simulation results of the above two modes exhibited a good consistency in the daily hydrological maps of three stations.However, the simulated peak value of TWS+SWAT mode were underestimated both at Qilian Mountain station (Figure 10B) and ZhaMashenke station (Figure 11B), while the peak was slightly overestimated at Ying Luoxia station.The daily simulated results of CMADS+SWAT mode at Qilian Mountain (NS= 0.58, R 2 = 0.66) could be accepted.Meanwhile, models showed satisfactory performance at Ying Luoxia (NS= 0.77, R 2 = 0.80) and ZhaMashenke (NS=0.75,R 2 =0.78).At ZhaMashenke (from March to April), the daily simulated results of CMADS+SWAT mode were higher than observed results and had larger amplitude.However, the simulation results were better than TWS+SWAT mode at other periods.Furthermore, we also found that in terms of peak simulation, the accuracy of CMADS+SWAT mode at Qilian Mountain and ZhaMashenke was higher than that of TWS+SWAT mode and CFSR+SWAT mode.Besides, the simulation of CMADS+SWAT mode agreed better with observed data than the other two modes, especially at Qilian Mountain and ZhaMashenke control stations.All of these indicated that compared with CMADS data, traditional meteorological stations could not capture spatial heterogeneity based on limited stations, which limited its application in simulating basin water balance.

A)CMADS+SWAT mode B)TWS+SWAT mode C)CFSR+SWAT mode
By comparing monthly-scale simulation results with daily-scale simulation results of SWAT model driven by three kinds of datasets (TWS, CSFR and CMADS),we found that CMADS+SWAT mode could simulate historical process of Heihe River Basin runoff well, while CFSR that has been used widely around the world performed bad. Figure 5, Figure 6 and Table 3 gave some verification.

Monthly-scale runoff simulation results of three kinds of modes at three sub-basins in 5 years
After parameter calibration, the water yield (WYLD) of CFSR+SWAT mode reached a similar level with other modes (Figure 14).However, Figure 16 showed that CFSR precipitation element can only be reflected in few large-scale precipitation modes.Similar to Figure 13A, although runoff result of CFSR+SWAT mode showed peak value in July, it did not display a good consistency with observation in other periods.In figure 13, CFSR+SWAT mode has overestimations during runoff rising period (Jan-Jun) and runoff declining period (Oct-Dec), which also occurred between July and September each year.For TWS and CMADS, from Figure 13A, C, D we can see that both TWS+SWAT mode and CMADS+SWAT mode were slightly underestimated between March and May (runoff rising period).Compared with CMADS+SWAT mode, TWS+SWAT mode has slight underestimation in November (runoff decline period).In general, both TWS+SWAT mode and CMADS+SWAT mode reproduced the monthly average peak value of runoff observation well.TWS+SWAT mode showed over-estimation in January, April-May and October-December, but occured large underestimation from mid-May to September.

Differences caused by water balance
Water balance analysis is an important tool for evaluating water resources in the world.It helps us to understand quality differences of different forcing data (Zhang et al., 2012, Silva et al., 2011).After analyzing water balance components in three sub-basins of Heihe River Basin by using three modes we found that the overestimated CFSR precipitation as inputs of SWAT model leaded to larger evaporation and higher estimation of water balance than the other two kinds of datasets (Figure 14). Figure 14 indicated that precipitation distribution of CFSR in three sub-basins was much higher than other two datasets (CMADS and TWS).Annual average precipitation of CFSR was 864.35mm, while precipitation of CMADS and TWS were 442.45mm and 458.48mm respectively.Evidences showed that annual precipitation in the main stream area of Heihe River was 459.7mm (Yin et al., 2013), which was consistent with the overestimated precipitation of CFSR.TWS+SWAT mode and CMADS+SWAT mode partitioned 42.6% and 43.3% of precipitation into runoff respectively, while only 25.5% of precipitation of CFSR+SWAT mode was partitioned to runoff.After comparison we found that the proportion of side flow, subsurface flow and lateral seepage flow in the runoff generation period were higher for CFSR+SWAT mode than for CMADS+SWAT mode and TWS+SWAT mode.The proportion of side flow, subsurface flow and lateral seepage flow in total runoff generation for CFSR+SWAT mode was 44.2%, 39.9% and 44.17%, respectively.
Results also indicated that CFSR+SWAT mode with overestimated precipitation produced smaller soil moisture compared with TWS+SWAT and CMADS+SWAT mode.This might be related to large evaporation of CFSR+SWAT mode.On the contrary, the actual evapotranspiration of CFSR+SWAT mode was much larger than the other two modes (annual average evapotranspiration of CFSR+SWAT mode was 498.27mm, while for CMADS+SWAT and TWS+SWAT mode, the annual average evapotranspiration were 245.18mm and 253.09mm respectively).However, statistics showed that annual average evapotranspiration in Heihe River mountain area and main stream area is around 279.3~294.1mm(Yin et al., 2013).In order to fit water balance of CFSR+SWAT mode with observed runoff, it caused overestimated precipitation of CFSR+SWAT mode resulting in increasing evaporation, and then caused soil moisture to be lower.In conclusion, although water balance of CFSR+SWAT mode is similar to the other two modes, poor performance of evaporation and precipitation decrease the quality of CFSR in Heihe River basin greatly.

Fig. 15. Bias distribution of annual average precipitation of CMADS, CFSR and TWS dataset in different sub-basins
Precipitation is an important factor controlling watershed runoff process.In order to study whether CMADS dataset can reflect the real situation of Heihe River Basin after driving SWAT model, this study conducted bias calculation of precipitation distribution in three sub-basins generated from SWAT model (Figure 15).Results showed that annual average precipitation produced by CMADS+SWAT mode was bigger than TWS+SWAT mode only in Ying Luoxia basin, while in other sub-basins it was smaller than TWS+SWAT mode and CFSR+SWAT mode.Precipitation of the above three kinds of datasets was obtained by evaluation correction and barycenter interpolation of SWAT model.Due to a lack of observed data, it is difficult to judge which kind of data's precipitation is more reliable.So it can only be estimated by using other methods.
To quantitatively investigate how SWAT model built-in evaluation module influences precipitation distribution, we analyzed precipitation of three sub-basins with or without evaluation module (Figure 16).Where "-E" represents precipitation after evaluation adjustment of SWAT model and "-NE" represents precipitation without evaluation adjustment.We found that there existed some consistent relations between precipitation distribution (Figure 16) and previous water balance (Figure 14).Precipitation of CFSR dataset at three natural sub-basins exceeded TWS dataset and CMADS dataset.Precipitation of CFSR dataset at three sub-basins were 526.42mm、1012.982mm and 1053.66mmrespectively, which were much larger than local multi-annual average precipitation (459.7mm)(Yin et al.,2013).From Figure 12 we found that compared with TWS, precipitation peak value of CFSR and CMADS was more concentrated, especially in Qilian Mountain basin (Figure 16a).After evaluation module was applied in SWAT model, there was a certain increase of precipitation, which gradually increased with close to July.Besides, precipitation of CMADS+SWAT mode in Ying Luoxia between May and September was about 39.7% higher than that of TWS+SWAT mode (Figure 16C).It caused bigger overestimated monthly runoff of CMADS+SWAT mode at Ying Luoxia sub-basin than TWS+SWAT mode.However, R 2 reached 0.8 in daily runoff simulation of CMADS+SWAT mode, which exceeded that of TWS+SWAT mode (Table 5).It is also found that if weather stations are far away from hydrological stations or the area lacks of weather stations, CMADS+SWAT mode would achieve better results.Furthermore, Figure 15B showed that precipitation of CMADS+SWAT mode was smaller than TWS+SWAT mode between April and June, August and October; while fitting results of simulated peak value and base flow of CMADS+SWAT mode in ZhaMashenke sub-basin (Figure 8a and Figure 11a) were better than TWS+SWAT mode (Figure 8b, Figure 11b, Table 5 and Figure 13b).Simulation results of CMADS+SWAT mode and TWS+SWAT mode were both satisfactory in Qilian Mountain sub-basin.

Relative elements analysis of CMADS driving SWAT model in Heihe River Basin
The spring flood of Heihe River Basin is from March to April, and the summer flood is  ZhaMashenke and Ying Luoxia).It is found that the increase of soil moisture in April had important relations with snowmelt, which was consistent with the spring flood of Heihe River Basin.Besides, we also found that the upstream snowmelt greatly changed soil moisture in the whole basin.Soil moisture increase was more obvious in areas with large snowmelt; soil in the downstream stations (such as Ying Luoxia) would become wetter during snowmelt period.Figure 17f also indicated that on 1 st July, 2013, Qilian Mountain Basin experienced higher amount of snowmelt, although this period was not the snowmelt concentration period.Therefore, we carried out correlation analysis of snowmelt and runoff generation between July and August in recent five years (2009)(2010)(2011)(2012)(2013) simulated by CMADS+SWAT mode in Heihe River Basin (Figure 18).CMADS+SWAT mode Figure 18 (a-f) showed the spatial distribution of WYLD at the end of July or the beginning of August each year in Heihe River Basin.The two-dimension broken line chart in the right hand showed the changing relations between snowmelt and WYLD in three basins (Qilian Mountain, ZhaMashenke and Ying Luoxia).Analysis indicated that WYLD of Heihe River Basin would reach the peak value between July and September.As shown in Figure 18a-f, snowmelt contributed little to WYLD between July and September.Comparing Figure 6 and Figure 16 we found that precipitation reached maximum between July and September in Heihe River Basin. Figure 17(a-f) also indicated that larger WYLD occurred more often in the middle and high altitude.In addition, WYLD bias was large in different sub-basins, indicating that there were more WYLD in the high altitude than in the low/middle altitude (Figure 18).This might be caused by distribution of precipitation in the mountains as well as snowmelt in cold highland area.

Discussion and Conclusions
The study used CMADS, TWS and CFSR datasets to force the SWAT model and evaluated their performance for stream flow simulation in the Heihe River basin.It is found that CFSR overestimates precipitation, especially in summer, but underestimates mean annual precipitation.In addition, the CMADS data performes better than CFSR regarding both accuracy and spatial resolution, as CMADS introduces advanced assimilation technology and is bias corrected through China's national automatic observation stations.For TWS, it does not perform well in China especially in Western China where climate stations are sparse.
For a large river basin, quantitative analysis of water balance components is essential for supporting ecological and hydrological managements.TWS data often cannot satisfy current large-scale hydrological modeling needs in regions with sparse observations.Therefore, when there are scarce or even no weather stations in the basin, CMADS will be a valuable source to provide atmospheric forcing data for hydrological modeling exercises.Another advantage of CMADS compared with TWS is that it contains complete climate forcing data over a specific time period without missing values, which helps to save much time spent on data quality assurance.Although we only demonstrate the value of CMADS for improving SWAT model, it can also be easily reformatted for other hydrological models.
(Z141100006014049), and State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin (2016CG05).
Note: YLX, QLM and ZMSK are correlation coefficients and bias of Ying Luoxia, Qilian Mountain and ZhaMashenke stations respectively.

Fig. 1 .
Fig. 1.Distribution of meteorological stations and hydrological stations in the study area.
temporal resolution: day by day; spatial resolution: 1/3°; time scale: 2008-2013) by using data loop nesting, resampling and bilinear interpolation methods.The dataset was formatted to be consistent with SWAT model requirement.Nevertheless, the CMADS dataset consists of two formats (i.e..dbfand .txt),which can be easily converted for use in other hydrological models.The first version of CMADS dataset covers the whole East Asian (0N~65N, 60E~160E).The spatial resolution of CMADS V1.0, CMADS V1.1, CMADS V1.2 and CMADS V1.3 are 0.333, 0.25, 0.125 and 0.0625 degree, respectively at daily time step from 2008 to 2014.Due to the restrictions of SWAT model itself (the number of meteorological stations should not exceed 500), the study chose CMADS V1.0 (which has a low resolution ratio) as one of the forces of SWAT model.The spatial range of CMADS lies between 0N and 65N, 60E and 160E, consisting of 300*195 grid points.Totally, 58500 stations are used for analysis in East Asia area and each station includes daily average temperature, daily maximum/minimum temperature, daily accumulative precipitation, daily average solar radiation, daily average air pressure and daily average wind speed.

Fig. 2 .
Fig. 2. Evaluation of CMADS dataset in China.A) surface temperature bias evaluation in 2012 B) root-mean-square error evaluation of land surface temperature in 2012 C) bias evaluation of air pressure in 2012 D) root-mean-square error evaluation of air pressure in 2012 E) bias evaluation of relative humidity in 2012 F) root-mean-square error evaluation of relative humidity in 2012 G) bias evaluation of surface wind speed in 2012 H) root-mean-square error evaluation of surface wind speed in 2012.

Fig. 3 .
Fig. 3.The range of CMADS V1.0 dataset and the space position in this study

Fig. 5 .Fig
Fig. 5.The cumulative average monthly (from year 2009 to 2013) rainfall of TWS, CMADS and CFSR at four sites (T1-T4) -scale and monthly-scale runoff simulation results of three kinds of modes at three sub-basins This study used three different modes (CMADS+SWAT mode, CFSR+SWAT mode and TWS+SWAT mode) to obtain monthly and daily runoff at three stations (Qilian Mountains, ZhaMashenke and Ying Luoxia).Based on the model evaluation index by Moriasi

Fig. 7 .
Simulation results of monthly average runoff of three different modes at Qilian Mountain control station (2009-2013) A)CMADS+SWAT mode B)TWS+SWAT mode C)CFSR+SWAT mode Fig. 8.Simulation results of monthly average runoff of three different modes at ZhaMashenke control station (2009-2013) A)CMADS+SWAT mode B)TWS+SWAT mode C)CFSR+SWAT mode Fig. 9.Simulation results of monthly average runoff of three different modes at Ying Luoxia control station (2009-2013)

Fig. 10 .
Daily runoff simulation results of three different modes at Qilian Mountain control station (2009-2013) A)CMADS+SWAT mode B)TWS+SWAT mode C)CFSR+SWAT mode Fig. 11.Daily runoff simulation results of three different modes at ZhaMashenke control station (2009-2013) Fig. 12.Simulation results of monthly average runoff of three different modes at Ying Luoxia control station (2009-2013)

Fig. 16 .
Fig. 16.Precipitation distribution of CMADS, CFSR and TWS dataset with or without evaluation module (A.Qilian Mountain, B. ZhaMashenke, C. Ying Luoxia) Figure17 (a-e) showed the spatial distribution of snowmelt in Heihe River Basin on 2 nd April each year.The two-dimension broken line chart in the right hand showed the changing relations between snowmelt and soil moisture in three basins (Qilian Mountain, The upper three curves in the right broken line chart are runoff generation lines and the lower three curves are snowmelt processing lines.Green: No.2 sub-basin (Ying Luoxia), Blue: No. 13 sub-basin (ZhaMashenke), Red:No.20 sub-basin (Qilian Mountain) Fig. 18.Analysis graph of relationships between snowmelt and soil humidity of

Table 2 Information of three kinds of atmospheric forcing data
Table 4 is the final value of model parameters.