Introduction
Nowadays, there is a growing concern in the scientific community about the climate change impact on water resources. The expected changes in precipitation and temperature would have an effect on the hydrological response of river basins around the world (Arnell and Gosling, 2013). In order to understand the hydrological processes involved in floods, droughts and extreme events, there is a need of good-quality meteorological data, particularly in regions with a scarce number of stations. An interesting option is the use of spatial interpolated meteorological data. Spatial interpolation is a procedure for estimating the value of a variable of interest at unsampled sites within an area covered by existing observations. The techniques for estimating data are essentially applications of statistics (WMO, 2011). The output of spatial interpolation is an integrated data set, in which meteorological data are arranged along an evenly spaced matrix (hereafter called processed gridded data set).
The Intergovernmental Panel on Climate Change (IPCC) argues that the historical record for many regions is poor, especially for those regions more vulnerable to climate change, so it recommends more research to integrate raw observations into processed gridded products (IPCC, 2014). For example, the meteorological station density in Mexico is higher in the central region and in the south than in the north of the country. Therefore, the use of gridded data is an alternative to observed data for hydrological risk assessment and the evaluation of climate change impact on water resources, in particular for those Mexican regions with low density of meteorological gauging stations.
Several studies compared gridded meteorological data sets and their results evidence a large uncertainty in the estimation of precipitation. For instance, Koutsouris et al., (2016) compared seven global precipitation data sets for a basin located in Tanzania. The precipitation data come from satellite, reanalysis and interpolation. Results show that the selected data sets present differences in long-term averages, mainly in the wet season. In particular, interpolated data overestimated precipitation during the dry period, and showed a difference in the peak-time in the wet season.
Other studies evaluated the uncertainty related to precipitation data in the estimation of discharge. For example, Fekete et al. (2004) compared six global monthly precipitation data sets to assess their spatial and temporal differences and the uncertainty that arises when they are used as input to a global water balance model. They found important numerical differences in precipitation between data sets, especially in wet tropics. Besides, results show that the uncertainty in precipitation translates much greater uncertainty in runoff in semi-dry regions where the rainfall-runoff process is highly nonlinear. In addition, the authors claim that the comparison of simulated runoff to observed river discharge has the potential to objectively evaluate the performance of a given data set.
Biemans et al. (2009) assessed the uncertainty in simulated discharge originated by precipitation data over several large river basins around the world. The precipitation data come from seven gridded data sets, and results show a large uncertainty in mean annual precipitation (around 30%). In a second step, the precipitation data were used to feed an uncalibrated global water balance model to evaluate the transfer of uncertainty to simulated discharge. Results show that uncertainty in discharge is about three times higher than the estimated uncertainty in precipitation.
Getirana et al. (2010) evaluated six gridded daily precipitations data sets for one South-American basin. The data sets included gauged, satellite and reanalysis data. The precipitation data were used to feed a hydrological model to simulate discharge. Results are mixed, so discharge simulated with gauged data agreed best with observed discharge, while other data sets underestimated precipitation fields and discharge.
The studies above show, by using water balance models, that the uncertainty from meteorological data is translated to discharge over large basins located around the world. This study has two main scopes: first, this work compares observed and gridded meteorological data for two small Mexican basins; in a second step, a rainfall-runoff model is fed with meteorological data from both data sets in order to evaluate the uncertainty that arises when gridded meteorological data are used to simulate discharge. The manuscript is organized as follows: firstly, the study basins, the meteorological data and the hydrological model are presented; secondly, the simulated discharges are compared; finally, concluding remarks close the manuscript.
Data and methods
Study basins and meteorological data sets
The Papagayo River has its farthest headwater in the Sierra Madre del Sur mountain chain and it discharges into the Pacific Ocean. The basin covers an area of 7067 km2, and it is located entirely in the State of Guerrero. The basin is situated over a Tropical Savannah (Aw) climate region (Peel et al., 2007).
The Valles River Basin is located mostly in the State of San Luis Potosí, covering an area of 3521 km2. The Valles River flows north to south through the Sierra Madre Oriental mountain chain, and it discharges into the major Pánuco River Basin. The Valles River Basin is situated over Tropical Savanna (Aw) and Tropical Monsoon (Am) climatic regions. Figure 1 shows the location of the study basins.
The discharge data come from the National Data of Surface Water (i.e. BANDAS; IMTA 2016). The data were obtained for the gauging stations La Parota and Santa Rosa for the Papagayo River Basin and the Valles River Basin respectively. Figure 2 and Table 1 shows the mean monthly discharge for the study basins. The Papagayo River Basin has a peak flow in September while the Valles River basin presents one peak flow in July and another peak flow in September. Low flow occurs from December to May for both basins.
Daily time series of precipitation and minimum and maximum temperature were obtained from two sources, called observed and processed gridded data sets. The observed data come from the CLICOM climatological database (CICESE, 2016), built by the Servicio Meteorológico Nacional (SMN). Observed data were obtained from five meteorological stations for the Papagayo River Basin (for the 1971-2000 period), and from three stations for the Valles River Basin (for the 1973-1996 period). The processed gridded data (resolution of 1/16°) were taken from the hydrometeorological data set for Mexico, the U.S., and Southern Canada presented by Livneh et al. (2015), which covers the 1950-2013 period. The SMN meteorological data were used to build this data set for Mexico. The processed gridded data set used in this study incorporates a topographic adjustment in order to take into account mountain precipitation. Indeed, interpolating between lower-elevation stations across a complex topography would systematically misrepresent the precipitation fields (Livneh et al. 2015). The adjustment procedure first computes the ratio between the mean monthly interpolated precipitation and the mean monthly climatological precipitation (obtained from the data set presented by Vose et al. (2014) which takes topographic effects into account). Then, daily interpolated precipitation was scaled for the entire record with the computed monthly ratios.
The hydrological model
The rainfall-runoff model used in this study is GR4J (Perrin, 2000) which is a conceptual lumped model that simulates streamflow at daily time step. GR4J uses daily potential evapotranspiration (PE) and daily precipitation (P) as input variables. For this study, the PE was computed using the formulation proposed by Oudin et al. (2005), which is based on mean temperature and incoming solar radiation. GR4J has been largely used to simulate discharge in basins located in France (Le Moine et al., 2007), Australia (Coron et al., 2012), and recently it was also used to assess climate change impacts on water resources in Mexico (Velázquez et al., 2015), China (Tian et al., 2013) and Canada (Seiller and Anctil, 2014), among other countries.
Figure 3 shows the GR4J diagram. The model simulates the hydrology as follows: if P ≥ PE, then net rainfall (Pn) is computed as the difference between P and PE, and net evapotranspiration capacity (En) is zero. Otherwise, the actual evapotranspiration (Es) is calculated. Then, a part of the net rainfall (Ps) fills the production store, which accounts for soil conditions. Another part of net rainfall joins percolation (Perc) from production store. The total quantity of water is divided in two flow components: 90% is routed to a unit hydrograph (UH1) and then to a non-linear routing store, while the rest is routed to another unit hydrograph (UH2). The total streamflow (Q) is computed with both flows. In addition, the model accounts for ground water exchange (F). Four parameters are optimized in GR4J: the maximum capacity of the production store (x1, in mm), the groundwater exchange coefficient (x2, in mm), the one day ahead maximum capacity of the routing store (x3, in mm) and the time base of unit hydrograph UH1 (x4, days). A detailed description of GR4J is given by Perrin et al. (2003).
The Nash-Sutcliffe (NS) coefficient (Nash and Sutcliffe, 1970) was used to evaluate the performance of the hydrological model
where Qobs,i and Qsim, i are the observed and simulated streamflows at time step i, and n is the total number of daily observations. A NS=1 corresponds to a perfect match between observed and simulated discharge.
Results and discussion
Figure 4 shows the meteorological stations and grid points used for the meteorological data comparison. The mean areal observed precipitation was computed following the Thiessen Polygon Method (OMM, 1994). For gridded data, the average precipitation was computed with the grid points inside the basins.
Figure 5 and Table 2 shows the mean monthly Precipitation (Precip.), maximum (Tmax) and minimum (Tmin) temperature computed with observations and gridded data. It can be seen that gridded data underestimate mean precipitation for both basins, especially in wet months (from June to October). Similarly, Tmax and Tmin are underestimated about 2°C; however, despite the bias, there are similarities in the annual cycle computed with both data sets, particularly for the Papagayo River Basin.
The former comparison was performed with observations from five and three meteorological stations for the Papagayo River Basin and the Valles River Basin respectively (Figure 4); on the other hand, gridded meteorological data were computed with spatially comprehensive observations (Livneh et al., 2015). In addition, both basins have a complex topography, ranging from 3317 m to sea level for the Papagayo River Basin, and from 1918 m to 69 m for the Valles River Basin (Figure 1). In order to assess the effect of the limited number of meteorological stations and the complex topography of the basins in the computation of the mean precipitation, the comparison between data sets was performed by considering only the nearest grid points to meteorological stations (so-called virtual stations). The mean precipitation was computed with the Thiessen Polygon Method for both meteorological and virtual stations. Results show that the mean annual precipitation computed for the Papagayo River Basin is 4.20, 2.57 and 2.68 mm day-1 for observed data, gridded data and virtual stations data respectively. Similarly, the mean annual precipitation computed for the Valles River Basin is 3.76, 2.26 and 2.52 mm day-1 for observed data, gridded data and virtual stations data respectively. Thus, there is a similar underestimation of precipitation when considering all grid points inside the basins or only a few virtual stations.
Figure 6 shows the empirical cumulative distribution functions for the meteorological variables obtained from observed and gridded data. As showed in Figure 5, it can be seen that gridded data generally underestimate precipitation and temperature. Table 3 shows selected percentiles computed for the observed and gridded data. The relative difference in precipitation is -31% (-43%) for percentile 75 (90) for the Papagayo River Basin. Similarly, the relative difference is +20 (-40%) for percentile 75 (90) for the Valles River Basin. Besides, gridded data estimate more days with precipitation: the difference between data sets in the number of days with precipitation ≥1mm is 100 and 323 days (over the analyzed period) for the Papagayo River Basin and the Valles River Basin respectively. Regarding Tmax and Tmin, it can be seen from Figure 6 that the difference between data sets is higher for high quantiles than for low quantiles. For instance, in the Papagayo River Basin (Figure 6b), the difference in Tmax is -2.6°C and -3.5°C for percentiles 10 and 90 respectively.
The results presented above indicate important differences between meteorological data sets. In order to evaluate the effect of such differences on simulated discharge, the rainfall-runoff model was fed with meteorological data from both data sets. GR4J was calibrated and validated over 9-yr periods, and Table 4 shows the obtained NS values. Results show that, in general, there is a good agreement between simulated and observed discharges.
A further insight on the influence of meteorological data on model performance is given by the scatterplot presented in Figure 7. From this figure, it can be seen a good agreement between observed and simulated daily discharges when observed meteorological is used (Figure 7a and c). For instance, low and high flows are well simulated for Papagayo River Basin, as the scatters tend to concentrate around the 1:1 line. For the Valles River Basin, high flows are well simulated, but there is a slight underestimation of low flows. In contrast, when gridded data is used to simulate daily discharge, low flows are not correctly simulated, and only medium and high flows agree to a certain extent (Figure 7b and d). Similar results were obtained for validation period (not shown). Table 4 shows that NS values are comparable for both data sets (NS is higher than 0.8 in the calibration period) which gives the idea that the model performance is good when gridded data are used to simulate discharge. The comparable values of NS could be explained because the efficiency coefficient is computed with differences between simulated and observed discharge as squared values (see Eq. 1), so NS overestimates large values while lower values are neglected (Legates and McCabe, 1999), leading to an overestimation of the model performance during high flows and an underestimation during low flow conditions (Krause et al., 2005).
The hydrological model shows a good performance in the simulation of medium and high flows despite the systematic bias in gridded data. This issue can be explained by comparing the hydrological model parameters, which take different values depending on the meteorological data used to feed GR4J. Although parameters in GR4J have no physical meaning, Pagano et al. (2010) assume that each parameter controls a process in the model. Thus, lower values of x 1 (the maximum capacity of the production store) decrease soil moisture in the basin. Then, for the Papagayo River Basin, parameter x 1 has a value of 1357 and 181 for observed and gridded data respectively. Similarly, for the Valles River Basin, x 1 takes a value of 3040 and 294 for observed and gridded data respectively. On the other hand, a positive value of x 2 (the groundwater exchange coefficient) indicates water export (Perrin et al., 2003), so higher values of this parameter increase streamflow (Pagano et al., 2010). The results show that, for the Papagayo River Basin, parameter x 2 takes a value of 1.44 and 5.04 for observed and gridded data respectively. For the Valles River Basin, x 2 has a value of 0.09 and 3.71 for observed and gridded data, respectively.
Gridded data clearly underestimate precipitation (Figure 5). Therefore, in order to simulate medium and high flows with gridded data, GR4J decreases soil moisture and increases the contribution of groundwater to streamflow, losing low flows representation. In other words, optimized GR4J parameters represent the hydrology of the study basins in a very different way when observed and gridded data are used to feed the model.
Conclusions
Hydrological modelling has different sources of uncertainty that compromises their use for water management. Previous studies showed that the uncertainty related to meteorological data is translated to simulated discharge, leading to errors on streamflow estimation. This study has two main scopes: the comparison between observed and processed gridded data sets and the evaluation of the uncertainty related to the latter in the simulated discharge for two Mexican basins. The use of gridded data could be an alternative to observed data in those Mexican regions with low density of gauging stations.
Our results show that the gridded data underestimate precipitation, minimum and maximum temperature in the study basins. For instance, the precipitation is underestimated by about 40% for high quantiles. Besides, gridded data estimate more days with precipitation. Regarding the temperature, the differences between observed a gridded data is about 2°C, despite the good agreement in the annual cycle. Moreover, the difference in maximum temperature is larger for high quantiles than for low quantiles.
In order to assess the error that the bias on gridded data causes to simulated streamflow, both data sets were used to feed the rainfall-runoff model GR4J. Results show that the model parameters can be optimized (by using the Nash-Sutcliffe coefficient) with both data sets, resulting in a good model performance for medium and high discharges. However, when gridded data are used, low flows are completely overestimated. The analysis of the optimized GR4J parameters shows that hydrological model has two representations of the basins response, so when gridded meteorological data are used to feed the model, GR4J decreases soil moisture and increases groundwater exchange, leading to a misrepresentation of the hydrological basins behavior.
The use of biased meteorological data results in discharge errors. Our results show that such errors could be erroneously perceived as small for the estimation of medium and high flows, which are generally the more evaluated in water management. However, errors in meteorological data lead to a lack of knowledge of the basin’s hydrological system. In that aspect, Beven (2016) suggests that we should take a much closer look at the data to be used in model calibration and evaluation before running a model.
This study uses one processed gridded data set. Future work should consider several data sets, computed with different interpolation methods in order to take into account the uncertainty that arises when precipitation fields are calculated in regions with complex topography. For instance, the work of Hofstra et al. (2008) compares six different interpolation methods to interpolate meteorological data over Europe. The results show that the difference in the skill between interpolation methods is small; however, the authors claim that the skill of interpolation methods is influenced by station density, and tends to be poor in areas with complex topography.