Hydrological Modeling in Tropical Regions via TopModel . Study case : Central Sector of the Middle Magdalena Valley-Colombia 5

Hydrological modeling allows us to make a comprehensive assessment of the interaction between dynamics of the hydrological cycle, climate conditions, and land use. These modeling results are relevant in water resources management field. We use TopModel (TOPography based hydrological MODEL for the hydrological 15 modeling of an area of 17 000 km2 in the Middle Magdalena Valley (MMV), a tropical basin located in Colombia. This study is located in the intertropical convergence zone (ITCZ) which is characterized by special meteorological conditions and fast water fluxes over the year. This area has been subjected to significant land use changes, as a result of intense economic activities, e.g., agriculture, hydropower energy and oil & gas production (Avellaneda, 2003). The proposed model is based on a record of 12 years of: i.) daily precipitation data from observed gauges, ii.) 20 daily evapotranspiration data from temperature data and iii.) daily streamflow data as observed data. A calibration process was performed using data from 2000 to 2008, and a validation was performed with data from 2009 to 2012. The Nash-Sutcliffe coefficient was used as an objective function to assess the quality of these processes (values of this metric are between 0.74 and 0.73 respectively, for model calibration and validation). The results show us an adequate performance of the model in areas of the tropical region and allow us to analyze the relationship between 25 water storage capacity in the soils of the area with subsurface runoff. This conclusion is consistent with the characteristics of the region. The calibrated model provides an idea about the hydrological functioning of the basin and estimates an approximation of the groundwater recharge in the region. The estimation of the recharge is important to quantify the interaction of surface water and groundwater, especially during the dry season, due to its importance in the analysis of scenarios with climate variability. 30 Keywords—Hydrologic modeling, TopModel, Middle Magdalena Valley, Nash Sutcliffe efficiency, Tropical regions. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 July 2018 doi:10.20944/preprints201807.0210.v1 © 2018 by the author(s). Distributed under a Creative Commons CC BY license.


INTRODUCTION
In the last decade, mathematical models are getting an important role for solving problems in water resources management (Dunne, 1983;Reynaud and Leenhardt, 2008;Bossa et al., 2012), giving rise to discussions about their use and application to evaluate and analyze complex hydrological systems (Bellin et al., 2016).
Hydrological models have been developed to understand different processes.This processes must be evaluated for different environmental conditions (Jenson, 1991;Loosvelt et al., 2014;Hollanda et al., 2015).However, they are generally focused on fulfilling two main objectives: i) to improve the understanding of the hydrological phenomena in the basins and how the changes generated in them affect the hydrological phenomena and, ii) the generation of synthetic sequences of hydrological data for the design of infrastructure or for its use in forecasting (Refsgaard, 1997;Kauffeldt et al., 2016;Ibarra-Zavaleta et al., 2017).
These models are classified into three types (Sieber et al., 2005;Hughes, 2016): i) Empirical: where the solution is based on empirical parameters, calculated by identifying statistically significant relationships between certain variables; ii) Theoretical: that are described by differential equations and follow the laws of physical and chemical processes; and iii) Conceptual: which are simplified representations of physical processes, in mathematical terms to simulate complex processes based on key parameters that describe the important functions of the system of interest.
In this research we implemented a rainfall-runoff model as TopModel (Lamb, Beven and Myrabø, 1998;Beven and Freer, 2001b)).TopModel us usually applied to assess the management of water resources at the regional scale, using conceptual models for detailed assessment of surface flow (Beven and Freer, 2001a;Mockler, O'Loughlin and Bruen, 2016;Teng et al., 2017).Likewise, distributed and semi-distributed models (which do not simulate the basin as a group, but as a set of divided parts) are necessary for the simulation of spatial patterns of hydrological response within a basin (Mazzoleni et al., 2015;Ibarra-Zavaleta et al., 2017) .Moreover, hydrological models also provide valuable information to study changes in land use or climate (Karlsson et al., 2016).Thus, changes in land use are directly related to water supply (and therefore to hydrological modeling), mainly related to human consumption, food production, and power generation.These activities have become a global priority in the economic and social sphere due to the growth of the population and the need to establish economic activities for the communities' empowerment (Harou et al., 2009;Buytaert, 2011;Crespo et al., 2012).Through models, it has been possible to represent dominant hydrological processes in the hydrological cycle of a particular ecosystem, mainly by calculating the water balances which allow exploring the validity of the representation, interactions and various levels of model behavior (Buytaert and Beven, 2011).
The more the basic hydrological information (precipitation, temperature, evaporation, evapotranspiration and flow), the more the objectives of the hydrological models achieved.However, the uncertainty due to scarcity of these data is quite common in several areas of Colombia, due to the complexity of the topography (Meerveld and Weiler, 2008).In addition, it is necessary to generate tools for calibration and validation of synthetic data to the hydrological modeling process (Blanco-Gutiérrez, Varela-Ortega and Purkey, 2013;da Silva et al., 2015).The calibration and validation of data through modeling, has enabled the verification of the assumptions that underlie the hydrological model used, which has contributed significantly to the generation of new knowledge (Ibarra-Zavaleta et al., 2017).
In consequence, basin-scale hydrological modeling lead us to an analysis of the spatial variability in a simplified way of the hydrological response (Beven and Freer, 2001a).Models based on topography in a region such as the MMV allows us to conclude the information of the topographic index (used to quantify topographic control on hydrological processes) is integrated into the general structure of the model, throughout the spatial distribution of moisture content and water table depth (Mukherjee et al., 2013).This document presents a simplified description of the TopModel applied in the Southern part of the MMV area and a general description of the methods used for its evaluation.This paper is organized as follows: Section 2 presents the theoretical framework.Section 3 shows the area of study.Section 4 presents the data sources.Section 5 illustrates the calibration and validation results and conclusions are indicated in Section 6.

Hydrologic Model: TopModel
TopModel is a semi-distributed model based on similarities of the topography which are expressed through a Topographic Index (TWI).It can be considered as a rainfall-runoff conceptual model based on the landscape characteristics which contains three dynamic storage regions: i) root zone, ii) gravity zone and iii) saturated zone (Hollanda et al., 2015;Metcalfe, Beven and Freer, 2015) (figure 1).The main assumptions of the model are: i.The dynamics of the saturated zone can be approached through a steady state simulation, ii.All the model parameters are spatially homogeneous, iii.The hydraulic gradient can be approximated via the surface terrain gradient, iv.The hydraulic conductivity decreases exponentially with depth.
The model theory assumes that the local hydraulic gradient is equal to the local surface slope and implies that all points with the same value of the TWI have the same hydraulic properties (Andersen, Refsgaard and Jensen, 2001;Mukherjee et al., 2013;Yi, Zhang and Yan, 2017;Jeziorska and Niedzielski, 2018).Its value is computed from the basin topography using [1].
where a i is the upslope contributing area of i-th basin and tan b i is the slope of the ground surface of the basin.
Upslope contributing area represents the area that can potentially produce runoff to the point of interest (Suliman et al., 2016;Jeziorska and Niedzielski, 2018).
The generation of runoff in TopModel is given by a relation between the topography and the transmissivity of the basin (Dewandel et al., 2017;Jeziorska and Niedzielski, 2018;Xue et al., 2018).Rain infiltrates the root zone (figure 1) until its maximum storage capacity is reached, and then it can be reduced to a linear velocity by the actual evapotranspiration of the surface, described by [2].
where E a is the real evapotranspiration, E p is the potential evapotranspiration, S r max is the maximum root zone deficit and S r z is the root zone deficit.
After the infiltration process in the root zone ends, the excess water fills the unsaturated zone and recharges the saturated zone generating a decrease in the water table depth.According to the first assumption of TopModel, the depth of the local water table is represented by the local storage deficit (D), which can be calculated for each TWI where D l is the catchment average water table depth, a is the average TWI and m is scaling parameter.
The water table is equal to the ground level when reaches zero.Therefore, the average value of the TWI for which constitutes the threshold for the maximum storage capacity.Moreover, each point having higher value of TWI is considered to be in a saturation condition.Additional rain on saturated surfaces cannot infiltrate the soil, therefore, excess water is transferred directly to saturated surface runoff.
The water storage deficit is reduced by the recharge water flux from the unsaturated zone to the groundwater, and its rate can be calculated using [4]: where D l is the local storage deficit, S u z is the local water storage in the unsaturated zone and T d is the mean 115 residence time in the S u z .Therefore, the total recharge rate q v is expressed as the sum of all values of q v i multiplied by the upslope area A i representing a set of hydrologically homogenous points, associated with topographic index class of the i-th location (Metcalfe, Beven and Freer, 2015).
where Q v the total flux, q v i the flux of water entering the water table locally (per unit area).
The base flow Q b is represented as the subsurface saturated zone flux , and can be defined by [6]: 120 Where Q 0 is the hydrological flux for the entire catchment area when D = 0.Both above described components: surface flow s q and base flow Q b account for the total discharge.
Consequently, the total runoff at the outlet of the basin is expressed as the sum of the surface flow and the base flow (van der Heijden and Haberlandt, 2015).The transmissivity -T in the low zones [7], according to the fourth where T 0 the local saturated transmissivity.

Calibration and Validation
Calibration is the procedure adopted to obtain a set of optimal parameters and initial values that are adjusted to the particular characteristics of each basin (Dakhlaoui et al., 2017).The Nash Sutcliffe objective function -NSE [8] (Wu et al., 2017) is considered a test for the goodness of fit of the hydrological prediction model, got by comparing between observed and simulated discharges (Dakhlaoui et al., 2017).Measured discharge was used as indicative variable, to be compared with model predictions of discharge, through the studied period.
, [8] where Q i represents the observed flow, S i is the simulated flows and the average of observed flows.
In addition, the percentage bias (PBIAS) is used as a complement in the analysis of the performance of the model to measure the average tendency of the simulated values to be larger or smaller than those observed (Wiant and Harner, 1979).The optimal value of the PBIAS is 0, the positive values indicate an overestimation bias and the Validation allows to determine how accurate the model reproduces or imitates independent information using the values of the parameters found during calibration (Refsgaard, 1997;Devia, Ganasri and Dwarakish, 2015;Y. Zhang et al., 2016).

STUDY AREA
The study area (figure 2A) covers 17,000 km 2 of the central area of the MMV and is geomorphologically located along the central part of the valley, crossed by the Magdalena River, between the eastern and central mountain ranges of the Colombian Andes (Avellaneda, 2003).The Magdalena River has a length of approximately 650 km along the studied area.This region is rich in natural ecosystem resources, such as fresh water, flora and fauna and mineral resources such as gold, oil and gas (Ingrain, 2012).The study area is hydrologically formed by five basins associated with the water potential of the Magdalena River and its most important tributaries (Cimitarra, Sogamoso, Opon, Carare and Negro Rivers) (Ideam, 2014).In this area, rainfall has a bimodal pattern: a first season between March and June and a second season between October and December; the rest of the year is dry (Ideam, 2014).The average amount of rain exceeds 2,000 mm/year and the river has an average annual flow of 2,361 m 3 /s and high flows (Q5) of 4,298 m 3 /s and low (Q95) of 1,578 m 3 /s.The average annual temperature is higher than 24 °C throughout the territory and the elevation of the terrain varies between 16 and 3110 meters above sea level (Ideam,

2014).
The most important economic activity in the study area is the extraction, production and transportation of oil and gas (Jaime Herrera et al., 2013).In addition, the study area is an important crossroads between the Andes, the capital and the most populous city of Colombia, Bogotá and the Caribbean ports in the north of the country (Asociacion Colombiana del Petroleo and Asociacion Latinoamericana de la Industria Petrolera, 2008).

DATA SOURCES
A digital elevation model (DEM) (figure 2A) was derived from SRTM satellite data with a 30 meters resolution between the coordinates 6° 7' 46'' N and 74° 39' 47'' W and an average vertical altitude error of 6.2 m (90% confidence level) and a geolocation error of 9 meters for South America (Sharma, Tiwari and Bhadoria, 2011).
For the consolidation of the TWI, it was necessary to export the DEM in ASCII format to TopModel using the Sp, Raster and Topidx packages (Metcalfe, Beven and Freer, 2015).The cells with similar hydrological characteristics were generated from the grouping of pixels into different categories based on the function of the topographic index (Dai et al., 2017).The TWI for the study area is presented in figure 3 to the location where the drainage network is generated.According to this, the high values of the topographic index are related to areas topographically convergent or smooth slopes which generate discharge flows (Beven and Freer, 2001a).These areas are characterized by low transmissivities, thus the level of water table reaching the surface (Tian, Zheng and Zheng, 2016).
The information of daily rainfall (available at IDEAM) was verified through an analysis of missing data, considering only those stations with less than 10% of missing data.Then, a 10 km buffer of 10 km was set to the study area, from which 37 optimal stations were obtained.The missing data from the chosen stations was completed using the triangulation method with the three closest stations and considering a period between 2000 and 2012.The average precipitation in the basin was estimated by the Thiessen method, as described in Ruelland et al., 2008 andWagner et al., 2012 (figure 2B) [10].
The calculation of evapotranspiration (ETo) was performed using the Hargreaves method (Hargreaves and Samani, 1982;Hargreaves and Allen, 2003).This method allows to determine the daily air temperature range ( TR = T max -T min ) and the extraterrestrial radiation R a to estimate the solar radiation R s , as shown in [10].
where R s is the solar radiation (mm/d), K RS is an empirical coefficient fitted to R s / R a versus TR data with a value of 0.16, R a is extraterrestrial radiation (mm/d), TR is daily air temperature range in degrees Celsius.By means of the previous equation we obtain [11] for the calculation of ET 0 .
Where C , a and m are coefficients with values of 0.0023, 17.8 and 0.5, respectively (Díaz, Esteller and Lopez- Vera, 2005), T mean is mean air temperature (•C).
For this process, the temperature data was obtained from IDEAM and verified with the procedure described in the previous paragraph for the precipitation data.The observed average temperature of 16 optimal climatologic stations was used in the area of study; its distribution in the basin was estimated by the Thiessen method with correction based on elevation (figure 2C) (Y.Zhang et al., 2016;Ayantobo et al., 2017).Flow data was obtained from IDEA databases as well through the selection of three limnimetric stations.The selection of stations considered two at the inlet (Puerto Berrio and Sogamoso) and one at the outlet of the basin (San Pablo).These stations were selected

Parameter estimation
TopModel uses, in addition to the TWI, ten parameters that represent the characteristics of the basin through hydrology, soils, and location of the study area (Table 1).The values of each of these parameters use reliable information that is available for the study area.In this work, the values of the model parameters were initially established from the reported literature for MMV area by the hydrological and geological institutions of Colombia (National Hydrological, Meteorological and environmental Institute -Ideam, and Colombian geological service -SGC) and they are shown in Table 1.
The drawbacks generated by the model in the calibration process are associated with the uncertainty of the parameters because the different sets of possible parameter values can have similar performance values (Loosvelt et al., 2014;Mazzoleni et al., 2015;Ballinas-González, Alcocer-Yamanaka and Pedrozo-Acuña, 2016).For this reason, a Monte Carlo analysis (He et al., 2012;Simmons et al., 2017) has been used to estimate the best sets of parameters that generate a better performance.For this purpose, a data sampling with uniform distribution based on the interval reported in Table 1 was used.The number of simulations varied between 10000 and 35000, since make new iterations did not improve the model performance.
This analysis showed that the parameters m and Srmax are the most significant (figure 4), so the change in their values influences the performance of the model and the shape of the simulated hydrograph.The m parameter represents the change in the saturated hydraulic conductivity with respect to depth.Small values of m imply quick flow and insignificant subsurface runoff, while large values indicate that more rainfall can infiltrate the soil, thus less water reaches the outlet via surface route (Sigdel et al., 2011).This parameter is related to subsurface flow control and the deficit of local storage, which is important in the case of percolation and recharge of an aquifer (Buytaert and Beven, 2011).Therefore, this parameter has a significant effect on the calculation of the local storage deficit, contributing areas and the shape of the curve in the hydrograph recession.
The value of the Srmax indicates the influence of evapotranspiration on the hydrological behavior of the catchment.
Low Srmax value allows less water to be stored in the root zone and hence available for evapotranspiration what can lead to the increased runoff (Hollanda et al., 2015).

Model performance
The simulations with iterations between 10.000 and 35.000 for data were made, and non-significant differences in values of efficiency were detected when comparing best parameter sets performance.
The results from calibration (figure 5) generated a set of parameters with an efficiency of 0.74 for the objective function and 2.6 for PBIAS (table 2).The efficiencies were classified according to the methodology reported by Boskidis et al., 2012.The differences in the efficiencies of the parameter sets were very low, so the overall performance of all simulations can be considered very similar, despite small variations in the values of all parameters that generate these efficiencies.This is associated with the sensitivity of the parameters and their impact on the representation of hydrological processes in the studied basin.
For the process of the model validation, results generated from the calibrated parameters with higher NSE values gave efficiency values near to 0.73 and its simulated flows.The NSE values found are close to the values obtained for a proven models in Ecuador, Colombia, Poland, Nepal and China with efficiency values near to 0.7 (Sigdel et al., 2011;Padrón et al., 2015;Gil Morales and Tobón Marín, 2016;Jeziorska and Niedzielski, 2018;Xue et al., 2018).
However, it is possible that longer periods of time could help ensure better simulations and adjustments, because when more data is added to model, the variability increases, and so the parameter values are adjusted in concordance.Additionally, it is widely known that TopModel has problems to accurately represent low flows during droughts (Hollanda et al., 2015).For periods where precipitation exceeds evapotranspiration, the wide range of parameters provide acceptable simulations for basin discharge, although base flow is less accurately simulated, as it happen in other sites (Z.Zhang et al., 2016).
In addition, the performance of the model has been analyzed through the flow duration curve between the observed and simulated data (Badjana et al., 2017)(figure 6).The simulated curve shows an increase of 28 cubic meters per day between 10% and 90% of the exceedance time with respect to observed curve, which indicates that its variability is low and therefore it is related to underground storage processes that dominate the flow of the stream and maintain a flow more stable in time (Salazar, 2016).This corroborates the analysis carried out with respect to the variability of the parameters and allows us to identify a coherent process between the behavior of the basin studied and the performance of the model.In particular, it seems that the model fits well the high-flow observations, but presents a slightly higher variance for the mid-flows, as observed in a line with a higher slope; as for the low-flows, the observations is featuring a step at the end of the FDC, indicating that the basin is not able to maintain the same conditions for base flow as for other flow sources, maybe due to anthropic or natural interferences on the percolation fluxes, and simulated flows roughly follows this patterns, with a smoother curve, due to its impossibility to represent these unknown interferences.

CONCLUSION
TopModel was able to reproduce the main pattern of the hydrograph with acceptable accuracy for the case-study.
A low performance to simulate some patterns (base flow) can be attributed to errors in input data and parameter . This map shows a correspondence between the highest values of the index and the drainage network.Areas with TWI values between 15 and 22.5 correspond used as a reference in the annual report presented by the Ideam -ENA (Ideam, 2014).Data from 2000 to 2008 was used for the model calibration and data from 2009 to 2012 was used to validate the model.
model was implemented for 2000-2012 period, with a daily timescale.Data between 2000 and 2008 was used for the calibration of the model, by verifying site conditions adjusted to the structure and model assumptions and the establishment of a set of initial parameters which adequately describes the hydrological behavior of the basin study.During the process of calibration, the values of model parameters were selected which corresponding to the parameter sets with the highest values of efficiencies obtained (i.e.NSE values) in the performed simulations.Initial Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 12 July 2018 doi:10.20944/preprints201807.0210.v1 uncertainty.The most probable cause of those results is linked to the uncertainty of the data series analyzed.Low accuracy of the model can also be an effect of the model inability to represent distributed rainfall pattern.The application of the hydrological model developed in this research contributes to national efforts and the availability of results for the development of comparative studies in Middle Magdalena Valley at basin scale.This Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 12 July 2018 doi:10.20944/preprints201807.0210.v1

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 July 2018 doi:10.20944/preprints201807.0210.v1 negative
values indicate a bias of underestimation of the model.The equation used for its calculation is shown in [9].