Environmental Modelling & Software: Yaoling Bai, Thorsten Wagener, Patrick Reed

Environmental Modelling & Software 24 (2009) 901–916
Contents lists available at ScienceDirect
Environmental Modelling & Software

journal homepage: www.elsevier.com/locate/envsoft
A top-down framework for watershed model evaluation and selection

under uncertainty
Yaoling Bai, Thorsten Wagener*, Patrick Reed
Department of Civil and Environmental Engineering, The Pennsylvania State University, Sackett Building, University Park, PA 16802, USA
a r t i c l e i n f o a b s t r a c t
Article history: This study introduces a top-down strategy for model evaluation and selection under uncertainty in
Received 12 August 2008 which watershed model structures with increasing complexity are applied to twelve watersheds across
Received in revised form a hydro-climatic gradient within the United States (US). The models’ complexities and their related
11 December 2008
assumptions provide an indication of the dominant controls on the watershed response at the inter-
Accepted 11 December 2008
annual, intra-annual, monthly, and daily time scales as captured in the water balance signatures (or
Available online 5 February 2009
metrics) used in this study. The ability of the models to capture the water balance signatures is evaluated
in an ensemble framework with respect to their reliability (Is the model ensemble capturing the
Keywords:
Model evaluation observed signature?) and with their shape (Is the model structure capable of representing an observed
Top-down approach signature’s variability?). Model selection is automated by combining the reliability and shape perfor-
Uncertainty mance measures in a fuzzy rule system. Our results suggest that the framework can be tuned to function
Fuzzy rules as a screening tool that formalizes our model selection process. This fuzzy model selection framework
Dominant controls enhances our ability to automatically select parsimonious model structures for large databases of
Time scales watersheds and therefore provides an important step towards understanding how controls on the
Signatures watershed response vary with landscape and climatic characteristics. This understanding further
advances our ability for model-based watershed classification.
Ó 2008 Elsevier Ltd. All rights reserved.
1. Introduction necessary, and to introduce additional state variables and parame-

ters as necessary when moving to finer scales. Alternatively, in
Hydrologic models are important tools for testing hypotheses a bottom-up approach, modelers formulate and combine the model
about watershed behavior and for predicting the response of components based on an a priori physically based formulation of
hydrologic systems. Choosing the appropriate model structure is continuum-scale processes. Bottom-up models generally require
a crucial step in hydrologic modeling in order to accurately predict detailed information of the physical characteristics of the watershed
streamflow or other variables, and to understand the dominant under study, and, due to the need for spatially detailed modeling,
physical controls on watersheds’ responses (Clark et al., 2008). In often yield the potential for over-parameterization. This problem is
the context of hypothesis testing, we often aim at selecting the particularly evident when the objective is to predict streamflow only
model structure with the minimum level of complexity that is (see discussion in Jakeman and Hornberger, 1993).
capable of reproducing the observed watershed response as rep- Recently, Sivapalan and colleagues have discussed and formal-
resented by historical streamflow records. ized the top-down strategy to develop a collection of lumped
This work focuses on seeking to identify the appropriate models with increasing complexity to simulate water balance
complexity of watershed models using top-down analysis. Klemes signatures across temporal scales in a series of studies (Jothi-
(1983) first suggested top-down or downward modeling as tyangkoon et al., 2001; Atkinson et al., 2002, 2003; Farmer et al.,
a systematic approach to identify the appropriate model structure 2003; Eder et al., 2003; Littlewood et al., 2003; Son and Sivapalan,
for a given case through a process of hypothesis testing. The basic 2007). The lumped model structures represent different hypoth-
idea is to capture the hydrologic watershed-scale response at a given eses about how watersheds control the streamflow response. The
temporal scale with the minimum level of model complexity work by Sivapalan and colleagues employs either visual examina-
tion or quantitative measures to evaluate the model performance,
* Corresponding author. Tel.: þ1 814 865 5673; fax: þ1 814 863 7304. using a priori parameter estimates based on physical watershed
E-mail address: [email protected] (T. Wagener). characteristics to choose the most parsimonious structure capable
1364-8152/$ – see front matter Ó 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.envsoft.2008.12.012
902 Y. Bai et al. / Environmental Modelling & Software 24 (2009) 901–916
of representing the watershed behavior at each time scale. These a priori parameter space by implementing uniform random sampling (URS) to
studies contribute a qualitative assessment of the relationships calculate ensemble simulations with respect to these two ranges. The feasible
ranges of parameters are estimated either by using values derived from relations of
between model complexity (and thus hypotheses of watershed watershed physical characteristics to model parameters, or by referring to
function), climate conditions (the watersheds analyzed were empirical techniques from previous studies (Atkinson et al., 2002; Dingman, 2002;
located in different climatic regimes) and prediction time scales Farmer et al., 2003; Moore, 2007; Van Werkhoven et al., 2008). The a priori
(annual, monthly and daily). Ultimately, their work has provided parameter space refers to reduced parameter ranges attained by using watershed
specific information. A priori parameter ranges constrain feasible parameter ranges
new insights into how climate, soil and landscape controls impact
using the a priori parameter estimations to give sampling space over which best
watershed response at different time scales. Alternatively, Young parameter sets can be chosen with higher probability. Using Monte Carlo simu-
(2003, 1998) has presented a data-based mechanistic modeling lation, we obtain ensemble predictions of watershed responses using both feasible
methodology as an example of how the top-down modeling parameter ranges and a priori parameter ranges. Appendix B provides details on
philosophy at the watershed scale can be implemented. The data- how the lower, q1, and upper limit, q2, of the feasible range of each parameter were
determined according to assumptions about the physical meaning of the
based mechanistic modeling strategy identifies the appropriate parameters.
routing component, for a chosen soil moisture accounting A priori parameters of this study are estimated mainly from available vegetation
component, using an instrumental variable strategy. While this and soils data using methods derived in previous studies (Linsley et al., 1958; Miller
limits the routing component to combinations of linear transfer and White, 1998; Wittenberg, 1999; Dingman, 2002; Koren et al., 2003; Anderson
et al., 2006). We use estimates of the a priori parameters qa and define ranges around
functions, it avoids over-parameterization by trading off model
them to account for uncertainty in these estimates. The ranges are estimated based
performance and parameter uncertainty using Young’s information on the difference between the upper, q2, and lower limit, q1, of the feasible range of
criterion (Young, 2003, 1998). As an additional constraint, a routing each parameter, Dqf. The maximum and minimum for each a priori parameter are
component is only acceptable if it is physically interpretable by the then calculated as qa Dqf * 0.15 and qa þ Dqf * 0.15 respectively. Some parameters
modeler. Sivapalan et al. (2003) provide a review of the top-down are difficult to even conceptually relate to physical watershed characteristics such as
the deep recharge coefficient from upper saturated zone to deeper store kd and the
or downward approach, including its connection to the data-based shape parameter for spatial soil water storage distribution in the multi-bucket
mechanistic modeling strategy by Young. model b. These parameters’ values were estimated using manual calibration to
Manual top-down approaches that require visual examination observed streamflow since no appropriate a priori estimation strategy is available.
of simulation performance to select model structures are limited, While this approach is consistent with previous studies (e.g. Farmer et al., 2003), we
are working on finding new a priori strategies for the remaining parameters for
by their lack of transferability and their high time demands if large
future studies.
numbers of watersheds must be analyzed. Some studies have relied
on statistical measures to compare simulated with observed 2.2. Modular modeling structure
response (Atkinson et al., 2002) or minimize the residuals between
simulated and observed response (Son and Sivapalan, 2007) to Previous studies (e.g. Atkinson et al., 2002; Farmer et al., 2003) using the classic
bucket model (Manabe, 1969) take advantage of simple conceptualizations of the
evaluate the model’s ability to reproduce observations. However, hydrologic system and require the specification of a minimal set of parameters
these approaches typically ignore parametric (and other) uncer- derived from landscape properties. Similar models have been adopted in this study
tainties and instead focus on deterministic simulations to develop with only minimal changes. In a manner similar to prior studies we start with our
measures of model performance. simplest model to capture the water balance at the annual time scale. Failure to
predict runoff for finer time scales (intra-annual, monthly, daily, etc.) using the
In this study we contribute a framework for top-down model
simplest model is assumed to reflect model structure deficiencies and justify the
evaluation and selection using a formalized quantitative classifica- introduction of additional parameters, state variables and processes. By progres-
tion of model performance that considers each model’s given sively increasing the model complexity, the predictions for finer scales are satisfied.
parametric uncertainty (while it could easily be extended to All of the model structures used in this study can be separated into three modules:
consider data uncertainty). We separately evaluate each model’s (i) soil moisture accounting (SMA), (ii) actual evapotranspiration (ET) and (iii)
routing (R). The resulting modules are shown schematically in Fig. 2 as a function of
ability to capture the range of observed streamflow and the
increasing complexity.
temporal characteristics of a watershed’s response. These inde- The SMA module estimates soil water storage in different soil layers based on
pendent performance measures are combined in a multi-objective water distribution driven by interception, evapotranspiration, infiltration, runoff
fuzzy classification of model performance to achieve consistent and percolation. We assume a saturation excess runoff mechanism for runoff
production for all modules. Starting with a single bucket as the representation of
decisions when selecting the appropriate model complexity for
a watershed, the simplest SMA module S1 only produces surface runoff via the
a given time scale. This study also contributes an analysis of the saturation excess mechanism. The total storage capacity controls the runoff
impacts of a priori data on the two measures and elucidates the generation. By increasing the complexity of the SMA module, we obtain 4 separate
limitations associated with a range of model structures. We apply
our proposed fuzzy top-down model evaluation and selection
method to twelve US watersheds that compose a strong hydro-
climatic gradient (Van Werkhoven et al., 2008). By sampling
watersheds across a range of climatic conditions our results
demonstrate the ability of fuzzy top-down modeling to clarify the
dominant controls on hydrologic behavior for a range of time scales.
This study is a first step towards a large scale effort to evaluate and
select suitable model structures simultaneously for a large numbers
of watersheds. Ultimately this approach will allow model selection
based on similarity in watershed physical and climatic conditions,
even before observations of the watershed response are available.
2. Methodology
2.1. Monte Carlo framework
We propose a fuzzy top-down model evaluation and selection framework that

explicitly accounts for parametric uncertainties via Monte Carlo ensemble simu-
lation (Fig. 1). We use uniform prior distributions independently defined for each
parameter. Samples are drawn over both the feasible parameter space and the Fig. 1. Flow chart of proposed model selection framework.
Y. Bai et al. / Environmental Modelling & Software 24 (2009) 901–916 903
EVAPOTRANSPIRATION
E = Ebs+Ev E = Eus+Esat= (Ebs+Ev)us+(Ebs+Ev)sat
ET1 ET2
SOIL MOISTURE ACCOUNTING

E P E P E P E P E P
Qse Qse Qse Qse
Susfc Susfc
Smax
S S
Sb
Sb
Sb
Sb
Qss Ssat Qse
Sfc Ssat Ssat
Qss Qss Qss
Sgw Qbf Sgw Qbf
SMA_S1 SMA_S2 SMA_S3 SMA_S4 SMA_M4
ROUTING
αss
Qss
Sfc Ssat αss Ssat αss
Qss Qss
Sgw αbf
Qbf
R1 R2 R3
Fig. 2. Modules with increasing complexity used in the model structures investigated.
SMA modules. Module S2 adds subsurface flow produced from the excess water behavior (Wagener et al., 2007). This study uses signatures similar to those used in
generated after field capacity has been satisfied. The field capacity threshold some of the aforementioned previous studies (Jothityangkoon et al., 2001; Atkinson
parameter fc and the total storage capacity control runoff generation. In module S3, et al., 2002; Farmer et al., 2003). Signatures are obtained by aggregating daily
the soil profile is divided into the upper unsaturated zone and the lower saturated streamflow simulations to the appropriate coarser time scale. The four signatures
zone and includes subsurface flow from the saturated zone as well as surface measure runoff variability with decreasing time scales of interest: inter-annual
runoff resulting from saturation excess. No runoff is generated from the unsatu- variability, intra-annual variability, flow duration curve as well as daily streamflow
rated zone in module S3. The storage condition of the unsaturated zone is deter- time-series. Inter-annual variability predicts variation of the annual runoff yield that
mined by water availability and its field capacity controls the recharge to the is indicated by long term climate variability. Intra-annual variability predicts sea-
saturated zone. Module S4 extends S3 by incorporating a deep storage recharged sonality of runoff. The flow duration curve represents the flow regime and its
by percolation from the saturated store. The percolation rate determines runoff steepness reflects the speed of watershed drainage. Daily streamflow captures the
generation from the deep zone. timing and magnitude of daily runoff.
Building on modules S1–S4, we have also formulated four multiple-bucket
models with the same modular components as the single-bucket SMA modules. 2.4. Measures of acceptability
Modules M1, M2, M3, and M4 are the multi-bucket representations of S1, S2, S3, and
S4, respectively. We use 10 buckets with different soil water storage capacity to To address the limitations of visually examining hydrograph fits for the large
consider the spatial variability of the watershed response as another control factor number of combinations of watersheds and model types considered in this study,
for runoff generation. The soil moisture distribution of the ten buckets fits the we measure model performance in an uncertainty framework. We analyze model
Xinanjiang distribution (Zhao et al., 1980; Son and Sivapalan, 2007). The formulation performance considering two measures jointly evaluated using fuzzy analysis to
of the multi-bucket models is provided in Appendix A. classify model acceptability.
The ET module estimates actual evapotranspiration by modeling bare soil We use a reliability measure (Yadav et al., 2007; Zhang et al., 2008) to test
evaporation and vegetation transpiration separately. ET is estimated from different a model’s capability to reproduce the observed magnitude of the four signatures.
soil zones as defined in the different SMA modules. Module ET1 describes evapo- Reliability is formulated as the ratio of the number of observations captured by the
transpiration from the moisture storage as one zone. Modules ET2 calculates the
evapotranspiration from the unsaturated zone and shallow saturated zone, while
assuming that there is no ET loss from deep storage. ET from the saturated zone is
controlled by energy supply through potential evapotranspiration (PE). Water
availability determines the ET flux from the unsaturated zone. If the soil water
storage satisfies the threshold water storage (field capacity), transpiration reaches
the potential rate. When soil water storage falls below field capacity, transpiration
becomes a linear function of soil water storage. Bare soil evaporation is also
a function of soil water storage content.
The routing module describes how the flow is released from different storages.
The subsurface flow is delayed through a linear storage–discharge relationship.
Quick subsurface flow is produced from the saturated zone and slow subsurface flow
is produced from deep storage. The runoff from the deep storage is delayed by deep
percolation as well as the deep soil layer that causes a slower recession than the
saturated zone. There is no runoff generated from the unsaturated zone until its
storage exceeds field capacity and no routing is considered for the saturation excess
runoff produced in this way.
We therefore consider a total of eight models with increasing complexity by
combining these 3 components (Fig. 3). For descriptions of all parameters see the list
of Notations and the model equations provided in Appendix A.
2.3. Signatures
A signature is defined here as an index or time-series of the response behavior of Fig. 3. Eight models were derived using different combinations of the available SMA,
a watershed at a given time-scale, which is reflective of the watershed functional ET and routing modules.
ensemble simulations of a single model structure to the total number of Reliability (R) in this study, the membership function is defined as M(R) (Fig. 4).
observations: When M(R) ¼ m1, the model is accepted with a membership degree of m1 with
respect to R. The membership function of the shape measure (S) is defined in
Numin a similar way.
Rel ¼ (1)
Num We use a fuzzy multi-objective function (FMOF) to obtain a consistent measure
where Rel is the reliability measure, Numin is the number of observations falling considering both reliability and shape. Previous studies (Yu and Yang, 2000; Cheng
inside the ensemble, and Num is the total number of observations. The reliability et al., 2002; Freer et al., 2004; Yang and Yu, 2006; Nasir and Huang, 2007; Shrestha
value ranges from 0 to 1. We determine a certain value of Rel as a threshold of model and Rode, 2008; etc.) used different formulations of FMOF. We simply define FMOF as
acceptability. The model is assumed incapable of capturing the magnitude range of
the signature when reliability is below this value. This measure does not assess FMOF ¼ minðm1 ; m2 Þ (3)
whether the model is capable of reproducing the observed system behavior with where m1 and m2 are the membership degrees of reliability and shape, respectively.
a single parameter set though, or whether parameters might have to vary in time The value of FMOF varies between 0 and 1 and quantifies the degree of acceptability
since the current measure evaluates each time step (or frequency) independently. of the model.
Additionally, we introduce a shape measure to evaluate a model’s capability to
reproduce the dynamics of a catchment. The shape measure is calculated as the
magnitude of the difference of slope between observations and simulations of 3. Watersheds and data
signatures. The slopes are calculated in-between two time-steps. Its value is
normalized by the variance of the observed slope as follows:
Twelve US watersheds were included in this study. Previous
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
uPT studies presented some of the physical properties and hydro-
u ðgoi gsi Þ2
Slopediff ¼ 1 t PiT¼ 1 2
(2) climatic information for the 12 watersheds (Duan et al., 2006; Gan
i ¼ 1 ðgoi goÞ
and Burges, 2006; Van Werkhoven et al., 2008). Daily data of
where Slopediff is the shape measure, go is the slope of observations, and gs the slope potential evapotranspiration (PE), precipitation and observed
of simulations, go is the mean slope of observations during the period T. The shape
streamflow for 15 years (1961–1975) are used for this study. The
value lies between negative infinity and 1. When simulations are perfectly parallel to
the observation, Slopediff equals 1. When simulations and observations are
year 1960 is used as a warm-up period.
orthogonal, the shape value is negative infinity. Using a threshold value of Slopediff, These watersheds, ranging from dry to wet, represent quite
which varies with time scale, we can assess whether the model is capable of different hydro-climatic conditions. The locations as well as rele-
capturing the signature dynamics by estimating the highest Slopediff value a model vant climatic and topographic information of the watersheds are
can produce.
presented in Table 1 and Fig. 5. We use the numbers in Table 1 as
The approach to define model acceptability proposed in this paper of course
bares some similarity with previously introduced approaches. Hornberger and Spear identifiers for the watersheds. As illustrated in Fig. 1 and Table 1, the
(1981) introduced the idea of behavioral/non-behavioral classification in which sizes of these basins range from about 1000 km2 to 4500 km2 and
models are grouped based on acceptable or unacceptable behavior. This is similar to the wetness index (P/PE) ranges from 0.50 to1.68. A summary of the
our idea or reliability, a definition that has some resemblance with the acceptable
watershed hydrologic characteristics is presented in Fig. 6.
‘corridor’ of behavior by Van Straten and Keesman (1991). The Generalized Likeli-
hood Uncertainty Estimation approach by Beven and Binley (1992) extends the
binary strategy by Spear and Hornberger to include any measure of performance and
4. Results
an associated subjective threshold for this classification. This approach is similar to
our use of the shape measure in the sense that we also define a threshold of
acceptability. A significant difference in our framework is that we propose it as 4.1. Model performance with respect to reliability
a screening tool in the context of hypothesis testing after which further analysis is
required, rather than as a tool to identify all acceptable models to make predictions. Model reliability is estimated by calculating how many of the
Norton (1996) discusses the idea of parameter bounding. This method provides
observations are contained by the model ensemble derived from
a strategy in the opposite direction from what we are proposing, i.e. he asks the
question of how wide do parameter ranges have to be for the model to capture the uniform random sampling of the parameter space using 10,000
observations. We on the other hand, ask, is a model capable of capturing certain samples. The analysis of reliability is supported by visualizing the
behavior given our assumptions about feasible or a priori parameter ranges. simulations of the four different signatures. The question is, how
does reliability vary from models S1 to M4, from annual to daily
2.5. Fuzzy rules time scales, and from dry watersheds to wet watersheds? Figs. 7–10
show the ensemble predictions of models S1–S4 (plots for models
Fuzzy classification enables the grouping of entities if the degree of membership
to a particular class is uncertain (Bardossy, 2005). In classical mathematics, a set is M1–M4 are omitted for brevity) for the four signatures over both
deterministic and its membership function is binary. For instance, the membership feasible ranges (left column) and a priori ranges (right column) for
of a variable only has two values, 0 and 1, a value of 0 indicates that the variable does the three test watersheds shown in the figures from top to bottom
not belong to the set and a value of 1 indicates that the variable belongs to the set. In [Guadalupe (dry), East Fork White (medium) and French Broad
contrast, a fuzzy set membership function describes the degree to which a certain
element is a member of a fuzzy set (e.g. the group of acceptable models). The range
(wet)]. Table 2 shows the corresponding reliability values.
of the membership function is [0, 1]. Fuzzy membership in this study is defined Fig. 7 visualizes ensemble predictions of models S1, S2, S3 and
based on the two measures just introduced. For example, for the measure of S4 for the inter-annual variability using feasible parameter ranges
Fig. 4. Membership functions (schematic) for reliability and shape measures.

Table 1
Characteristics of the twelve MOPEX watersheds used.
ID Basin name Area (km2) Average elevation (m) Mean annual P (mm) Mean annual Q (mm) Mean annual PE (mm)
1 Guadalupe River near Spring Branch, TX 3406 289 765 116 1528
2 San Marcos River near Luling, TX 2170 98 827 179 1449
3 English River near Kalona, IA 1484 193 893 270 994
4 Spring River near Waco, MO 3015 254 1076 299 1094
5 Rappahannock River near Fredericksburg, VA 4134 17 1030 378 920
6 Monocacy River near Frederick, MD 2116 71 1041 421 896
7 East Fork White River near Columbus, IN 4421 184 1015 378 855
8 South Branch Potomac River, Springville, WV 3810 171 1042 341 761
9 Bluestone River near Pipestem, WV 1021 465 1018 417 741
10 Amite River near Denham Springs, LA 3315 56 1564 610 1073
11 Tygart Valley River near Pipestem, WV 2372 390 1166 736 711
12 French Broad River near Asheville, NC 2448 594 1383 800 819
(left column) and a priori ranges (right column). For the feasible required the increased complexity of the S4 model to attain perfect
parameter range all of the observations fall inside the ensembles of reliability. Models S2 and S3 failed to capture the spring and
all models and reliability values of all models are therefore equal to summer observations yielding reduced reliability values of 58% and
1. The uncertainty ranges are quite wide though, in particular for 75%, respectively.
the dry watershed (Guadalupe). Plots of the ensembles of the inter- Fig. 9 shows the ensemble simulations of the 4 models for the
annual variability signature using a priori parameter ranges in the flow duration curve over both the feasible and a priori parameter
right column show that the use of site specific information ranges. Ensemble predictions using the feasible parameter ranges
significantly narrows the uncertainty bands relative to the results for all the 3 watersheds show that S1 can only capture the high
for the feasible ranges. The a priori ensembles of S1 do not capture flows and therefore suffers from low reliability values ranging
any observation for all 3 watersheds, since the runoff produced is between 11 and 20%. For the Guadalupe watershed, the increased
too low in all cases. Apart from S1, the other 3 models can capture complexity of models S3 and S4 is required to capture the full range
all the observations for the Guadalupe and East Fork White of the flow regime for the feasible range results in the left column of
watersheds. For the French Broad watershed, the increased Fig. 9. The wetter East Fork White and French Broad watersheds
complexity of models S3 and S4 is required to achieve 100% required less model complexity to capture the full range of the flow
reliability. regime, where model S2 was sufficient given the feasible parameter
The intra-annual variability results in Fig. 8 show perfect reli- range. The a priori parameter ranges significantly differentiate
ability for the feasible parameter ranges for the Guadalupe and East model performances for all 3 watersheds. For all these watersheds,
Fork White watersheds. For the French Broad watershed (the the models cover an increasingly wider range of the flow regime as
wettest of the three), a couple of observations fall just outside of the model complexity increases when moving from models S1 to S4.
ensemble of S1 (reliability of 83%), while the other 3 models With increasing model complexity, the models enhance their
capture all the observations. For the Guadalupe watershed, the ability to capture medium and low flow conditions by providing
ensemble of S1 for intra-annual variability for the a priori param- additional routing mechanisms.
eter ranges does not include any observation whereas the other 3 Fig. 10 visualizes the necessity of increasing model complexity
models show 100% reliability. When constrained to the a priori when simulating daily streamflow particularly well. Plots are
parameter ranges model S1 performs very poorly for both the East shown at the log-transformed scale to emphasize recession
Fork White and the French Broad watersheds, missing 75% of the periods. One year of daily streamflow (1961) is selected for visu-
observations for the intra-annual signature. For the East Fork White alization purposes, while all 10 years are used to calculate the
watershed, using a priori ranges, S3 and S4 can capture all obser- reliability values (Table 2). For the dry Guadalupe watershed, none
vations and only a couple of observations are located just outside of of the models can reproduce the continuous low flow recession
the ensemble of S2 (reliability of 83%). The French Broad watershed during long dry periods since the models drain too quickly. This is
Fig. 5. Locations of the 12 MOPEX watersheds in the US (a), and on the Budyko curve (b). AE is actual evapotranspiration, P is precipitation and PE is potential evapotranspiration.
Fig. 6. (a) and (b) Daily streamflow hydrographs of 12 watersheds in 1961. (c) Flow duration curves of 12 watersheds using daily streamflow data for the period 1960–1998.
different for the two wetter watersheds in which more frequent enables models S2–S4 to reproduce the low flows. Their reliability
rainfall events prevent the lower stores from drying out most of the values are therefore all lie close to 1 for all 3 watersheds. Plots in the
time. Model S1 cannot capture low flows because of a lack of right column show that the a priori parameter ranges make it even
subsurface flow components. The additional slow flow routing more difficult for the models to capture the daily streamflow
Fig. 7. Ensemble prediction ranges of inter-annual variability. Ensembles produced by parameter sets drawn from feasible and a priori ranges for the (a) and (b) Guadalupe
watershed, (c) and (d) East Fork White watershed, and (e) and (f) French Broad watershed respectively. The vertical axis represents annual runoff over average annual runoff.
Fig. 8. Ensemble prediction ranges of intra-annual variability of streamflow. Ensembles produced by parameter sets drawn from feasible and a priori ranges for the (a) and (b)
Guadalupe watershed, (c) and (d) East Fork White watershed, and (e) and (f) French Broad watershed respectively. The vertical axis represents average monthly runoff over average
annual runoff.
Fig. 9. Ensemble prediction ranges for the flow duration curve. Ensembles produced by parameter sets drawn from feasible and a priori ranges for the (a) and (b) Guadalupe
watershed, (c) and (d) East Fork White watershed, and (e) and (f) French Broad watershed respectively.
Fig. 10. Ensemble prediction ranges for daily streamflow in the year 1961 on logarithmic scale. Ensembles produced by parameter sets drawn from feasible and a priori ranges for
the (a) and (b) Guadalupe watershed, (c) and (d) East Fork White watershed, and (e) and (f) French Broad watershed respectively.
Table 2
Reliability of four signatures over both feasible ranges and a priori ranges for the Guadalupe, East Fork White and French Broad watersheds. All reliability values shown as
percent (%).
Model S1 M1 S2 M2 S3 M3 S4 M4
Feasible range
Guadalupe Inter-annual 100 100 100 100 100 100 100 100
Intra-annual 100 100 100 100 100 100 100 100
FDC 11 11 98 98 100 100 100 100
Daily 12 12 93 92 96 96 100 100
East Fork White Inter-annual 100 100 100 100 100 100 100 100
Intra-annual 100 100 100 100 100 100 100 100
FDC 20 20 100 100 100 100 100 100
Daily 22 22 98 98 99 99 99 99
French Broad Inter-annual 100 100 100 100 100 100 100 100
Intra-annual 83 83 100 100 100 100 100 100
FDC 20 20 100 100 100 100 100 100
Daily 21 21 97 97 98 97 100 100
A priori range
Guadalupe Inter-annual 0 0 100 100 100 100 100 100
Intra-annual 0 25 100 100 100 100 100 100
FDC 2 4 39 39 65 66 100 99
Daily 2 5 37 38 60 60 98 98
East Fork White Inter-annual 0 0 100 100 100 100 100 100
Intra-annual 17 25 83 75 92 83 100 100
FDC 4 4 79 85 97 98 99 99
Daily 5 5 70 67 75 72 86 82
French Broad Inter-annual 0 0 73 87 100 100 100 100

Intra-annual 25 25 58 67 75 75 100 100
FDC 4 4 48 46 55 52 98 98
Daily 4 4 38 39 44 44 85 85
recessions. Only S4 is capable of doing so even in the wet with parameter sets drawn from a priori ranges for the 3 water-
watersheds. sheds. With increasing model complexity, the models improve
Table 2 shows the values of reliability of all 8 models for the 3 their ability to reproduce the shape of the FDC. For all 3 watersheds,
watersheds (Guadalupe, East Fork White and French Broad). Reli- only model S4 is capable of reproducing the full flow regime.
ability values using feasible ranges are very close between models at Table 3 lists the shape values of all 8 models using both the
inter-annual and intra-annual scales for all 3 watersheds. For FDC feasible ranges and a priori parameter ranges for the 3 watersheds.
and daily streamflow reliability values of models S2–S4 are signifi- Shape values at inter-annual scale do not always increase as model
cantly higher compared to model S1. In general, reliability values complexity increases. In some cases, single-bucket models have
using a priori parameter ranges increase significantly with increasing higher shape values than the corresponding multi-bucket models
model complexity for the FDC and daily streamflow signatures. for inter-annual variability. For intra-annual variability, FDC and
daily streamflow shape values increase in general as the model
complexity increases.
4.2. Model performance with respect to shape
The shape measure is used to evaluate the models’ ability to 4.3. Fuzzy evaluation of model performance
capture the variability of the water balance signatures rather than
their magnitude as was the case with the reliability measure. We We use the fuzzy rules discussed in Section 2.5 to combine reli-
test the ability of models S1 to M4 for reproducing the observed ability and shape criteria with the aim to obtain a consistent measure
system dynamics by selecting the best parameter set from 10,000 to evaluate model suitability across the twelve watersheds analyzed
uniformly sampled draws from feasible and a priori parameter in this study. The results shown in Figs. 7–11 as well as in Tables 2 and
ranges respectively. Visually, only the results for the FDC signature 3 guided our selection of the membership function thresholds for
are shown, while all shape values are listed in Table 3. the reliability and shape measures. The chosen thresholds reflect our
Plots in the left column of Fig. 11 show the best simulations of subjective opinion on model performance based on the visual
models S1–S4 with parameter sets drawn from their feasible range examination of the model ensembles. The fuzzy membership func-
with respect to the shape measure. Watersheds are from top to tions and their thresholds model our subjective preference related to
bottom moving from driest to wettest (Guadalupe, East Fork White judging a model’s performance and therefore allow us to use this
and French Broad). For all 3 watersheds, model S1 cannot reproduce approach as a screening tool for large numbers of watersheds and
the observed shape for medium and low flow percentiles. For the model choices. In this way, infeasible models are eliminated early
Guadalupe watershed, only model S4 can reproduce the shape of and we can focus a more detailed analysis on the remaining more
the flow regime. For the East Fork White and French Broad water- promising system representations. After determining the thresholds
sheds all models except S1 can reproduce the shape of the FDC values, we can calculate the fuzzy values of model performance
quite well. All models except S4 have problems with the very low using the fuzzy measure we defined in equation (3). Table 4 provides
flows, though. Plots in the right column show the best simulations the threshold values of membership function for the shape measure
Table 3
The shape measures of four signatures over both feasible ranges and a priori ranges for Guadalupe, East Fork White and French Broad. The ideal shape value is 1.00 that
indicates simulations are perfectly parallel to observations. The shape value of the best simulations with respect to the shape measure is chosen to represent the model’s ability
to reproduce the dynamics of signatures.
Model S1 M1 S2 M2 S3 M3 S4 M4
Feasible range
Guadalupe Inter-annual 0.07 0.07 0.29 0.19 0.14 0.14 0.21 0.27
Intra-annual 0.43 0.45 0.55 0.56 0.59 0.61 0.73 0.70
FDC –a –a 0.22 0.26 0.33 0.44 0.50 0.50
Daily 0.00 0.04 0.07 0.12 0.07 0.13 0.12 0.13
East Fork White Inter-annual 0.21 0.15 0.12 0.14 0.35 0.38 0.27 0.30
Intra-annual 0.41 0.39 0.55 0.55 0.64 0.63 0.65 0.63
FDC 0.62 0.12 0.49 0.48 0.49 0.49 0.50 0.50
Daily 0.00 0.00 0.16 0.16 0.19 0.22 0.24 0.23
French Broad Inter-annual 0.09 0.06 0.01 0.02 0.36 0.23 0.26 0.27
Intra-annual 0.51 0.51 0.57 0.58 0.58 0.58 0.78 0.75
FDC 1.46 0.18 0.49 0.49 0.49 0.50 0.50 0.50
Daily 0.00 0.00 0.34 0.33 0.33 0.33 0.41 0.41
A priori range
Guadalupe Inter-annual 0.16 0.08 0.38 0.35 0.14 0.11 0.29 0.26
Intra-annual 0.22 0.36 0.44 0.43 0.25 0.38 0.70 0.73
FDC –a –a 0.22 0.28 0.28 0.41 0.49 0.50
Daily 0.00 0.02 0.02 0.06 0.01 0.03 0.12 0.12
East Fork White Inter-annual 0.20 0.13 0.27 0.24 0.04 0.13 0.07 0.13
Intra-annual 0.33 0.35 0.55 0.55 0.61 0.60 0.68 0.66
a a
FDC – – 0.45 0.46 0.47 0.47 0.50 0.49
Daily 2.21 4.40 1.52 2.57 0.99 1.54 0.52 0.55
French Broad Inter-annual 0.09 0.05 0.40 0.39 0.32 0.18 0.12 0.10
Intra-annual 0.12 0.12 0.36 0.36 0.37 0.38 0.67 0.69
FDC –a –a 0.39 0.39 0.45 0.46 0.50 0.50
Daily 4.35 4.39 0.13 0.10 0.05 0.01 0.41 0.41
a
The full range of simulated flow duration curve does not exceed 25% time.
Fig. 11. Simulations of the best shape reproduction by models S1–S4 with respect to the observed flow duration curve. Ensembles produced by parameter sets drawn from feasible
and a priori ranges for the (a) and (b) Guadalupe watershed, (c) and (d) East Fork White watershed, and (e) and (f) French Broad watershed respectively.
at each time scale. Our visual analysis did make clear that different M2 can already satisfy the simulation for most of the water-
thresholds are required depending on the observed variability of the sheds. For the other watersheds, models S3 and M3 can capture
signature. The thresholds values of the membership function for the the inter-annual variability for the Tygart (11) and French Broad
reliability measure are 95% (model whose reliability is not less than (12) watersheds. Models S4 and M4 can capture this signature
95% will be 100% accepted) and 75% (model whose reliability is no for the Amite (10) watershed. For the East Fork White (7)
greater than 75% will be rejected) at all time scales. watershed, none of models but S3 can capture the inter-annual
Fig. 12 shows model acceptance values based on the fuzzy variability. At the intra-annual scale, models S4 and M4 can
evaluation according to FMOF derived from an assessment of the capture intra-annual signature variability in the Guadalupe (1),
model ensembles of all signatures using the feasible parameter San Marcos (2), Amite (10) and French Broad (12) watersheds.
ranges. Watersheds are sorted from the driest (Guadalupe, water- Model S2 also captures the intra-annual variability for the Spring
shed 1) to the wettest (French Broad, watershed 12). At the inter- (4) and East Fork White (7) watersheds. None of the models can
annual time scale model S1 already satisfies our thresholds for the reproduce the intra-annual variability for the remaining water-
simulation of inter-annual variability in the medium and wet sheds. For the FDC signature, models S4 and M4 are acceptable
watersheds, while model S2 satisfies the thresholds in the dry representations for all watersheds. When simulating daily
watersheds. For the intra-annual variability, model S2 and the more streamflow, all of the models are basically unacceptable except
complex models are acceptable for the Guadalupe (1), San Marcos that models S4 and M4 work in the French Broad (12)
(2), Spring (4), East Fork White (7), Tygart (11) and French Broad watershed.
(12) watersheds. Model S1 has a higher acceptability than S2 for the
Monocacy (6) watershed. For the Rappahannock (5), Monocacy (6) 5. Discussion
and Bluestone (9) watersheds, we can again see that the single-
bucket models have a slightly higher acceptability than the corre- The results derived through the fuzzy evaluation of model
sponding multi-bucket models. When simulating the FDC, only performance combine the reliability and shape measures to quan-
model S4 is acceptable in reproducing the observed curve with tify the degree of acceptability of each model for each watershed
respect to both reliability and shape for the dry watersheds (1, 2, 3 and provide a screening tool for the necessary model complexity at
and 4). For the other watersheds, model S2 already satisfies the a given time scale.
simulation criteria for the FDC. At the daily time scale, none of the At the inter-annual time scale, the simplest model, S1, can
models provide acceptable simulations of the observed streamflow capture the inter-annual variability for medium and wet
except models S4 and M4, which can be accepted for the French
Broad (12), Rappahannock (5) and Monocacy (6) watersheds. For
the FDC and daily streamflow signatures, single-bucket models Table 4
Membership function thresholds for the shape measure at each time scale.
have slightly lower acceptability than the corresponding multi-
bucket models. Membership function thresholds Unacceptable 100% acceptable
Fig. 13 visualizes the model acceptance based on the fuzzy Inter-annual 0.2 0.2
evaluation with respect to reliability and shape derived from an Intra-annual 0.35 0.55
assessment of ensemble signature simulations using a priori FDC 0.45 0.5
Daily 0.2 0.4
parameter ranges. At the inter-annual time scale, models S2 and
watersheds. For dry watersheds, model S2 is needed to capture the which degrades the intra-annual shape metric’s results for water-
inter-annual variability, which suggests that the addition of shed 3. Those watersheds (1, 2, 4, 10 and 12), which are located
subsurface flow and the delay of runoff are required even at the further south, are less likely to be impacted by snow. Model S1 can
annual scale. S1 only produces saturation excess flow and lacks any reproduce intra-annual runoff variability for watersheds 4 and 10.
routing. When using S1 to model dry watersheds, since it lacks The fact that a simple-bucket model is sufficient suggests that for
subsurface drainage, all the stored water evapotranspires between these two watersheds, climate is the dominant control for the
rainfall events due to the water limited conditions in the dry seasonal runoff dynamics. Model S2 can satisfy the simulation of
watershed, which leads to an underestimation of runoff. Intro- inter-annual variability for watersheds 1, 2 and 12. For the dry
duction of subsurface drainage in model S2 reduces this problem watersheds (1 and 2), explicit simulation of the recessions is
and yields acceptable simulations of the inter-annual variability for needed for sustaining runoff during long dry periods. For the
the drier watersheds. wettest watershed (12), the carryover effect between sequential
At the intra-annual time scale, the required model complexity months is significant because of frequent precipitation events and
varies with watershed. Basically, model S4, the most complex one, thus both antecedent soil moisture and subsurface contributions to
is always acceptable. But for the English watershed (3), which is runoff are important for the simulation of monthly flows. For the
likely the most impacted by snow since it is located furthest north, other watersheds, model structural limitations complicate the
none of the models seem to be capable of reproducing an observed classification of model performance because the influence of
double-peak in March and May due to lack of snow-accumulation- snowmelt on runoff production (particularly since the peak of
melt component. The models can only simulate the peak in May, intra-annual variability in March cannot be simulated well) makes
Inter−annual Variability Intra−annual Variability

1
12 12
11 11
0.8
10 10
9 9
8 8 0.6
Watershed
7 7
6 6
5 5 0.4
4 4
3 3
0.2
2 2
1 1
0
S1 M1 S2 M2 S3 M3 S4 M4 S1 M1 S2 M2 S3 M3 S4 M4
Model Model
Flow Duration Curve Daily Streamflow
12 12
11 11
10 10
9 9
8 8
Watershed
7 7
6 6
5 5
4 4
3 3
2 2
1 1
Model Model
Fig. 12. Fuzzy evaluation of model performance for the 12 watersheds based on model ensembles of signatures derived from their feasible parameter ranges. The model is 100%
accepted when the value of FMOF equals 1. The model is rejected when the value of FMOF equals 0.
it difficult to distinguish models S2, S3 or S4 at the intra-annual structural deficiency was purposefully used in the design of this
time scale. study to ensure that known model limitations could be detected
Regarding the flow duration curve (FDC), model S2 is acceptable using the fuzzy model performance criteria.
for medium and wet watersheds. Because the FDC reflects the Overall, the inter-annual variability and FDC signatures are more
watershed regime and flow magnitudes without consideration of easily reproduced since they do not require the model to simulate
timing, this signature can be captured with a simple model for timing correctly. While for intra-annual variability and daily
wetter systems. For the dry watersheds, model S4 is needed to streamflow simulations, both correct timing and magnitude are
simulate the full FDC regime since slow subsurface flow in the deep needed. The lack of adequate routing was correctly identified in the
storage is important for low flow generation. Low flows are rejection of the full suite of model structures.
necessary to sustain the streamflow in the dry season when The shape measure controls model selection when model
evapotranspiration is high and rainfall events are infrequent. performance is evaluated based on sampling from the feasible
The recession part of the daily streamflow simulation is difficult parameter range, i.e. not constraining parameters using local
to reproduce for the drier watersheds for which consequently none physical characteristics. The reliability values of most models
of the models is acceptable. Models S4 and M4 are acceptable for across all signatures are generally greater than 0.9 and the
a few medium and wet watersheds. The fuzzy-performance clas- membership with respect to reliability is subsequently close to 1.
sification at the daily time scale did successfully detect a known Therefore, according to the FMOF we define, the shape measure
structural limitation in all of the models. The study watersheds are always determines the degree of acceptability of a model struc-
relatively large (all larger than 1000 km2) which requires more ture when using feasible parameter ranges. When using a priori
extensive routing of the quick flow at the daily prediction scale. This parameter ranges, however, both reliability and shape measures
Inter−annual Variability Intra−annual Variability

1
12 12
11 11
0.8
10 10
9 9
8 8 0.6
Watershed
7 7
6 6
5 5 0.4
4 4
3 3
0.2
2 2
1 1
0
Model Model
Flow Duration Curve Daily Streamflow
12 12
11 11
10 10
9 9
8 8
Watershed
7 7
6 6
5 5
4 4
3 3
2 2
1 1
Model Model
Fig. 13. Fuzzy evaluation of model performance for the 12 watersheds based on model ensembles of signatures derived from their a priori parameter ranges. The model is 100%
accepted when the value of FMOF equals 1. The model is rejected when the value of FMOF equals 0.
impact the classifications of model acceptability during the fuzzy context of our study watersheds – that a subsurface flow routing
evaluation. component, controlled by the field capacity, is required for the dry
The analysis using feasible parameter ranges predominantly watersheds and that climate is the dominant control of the water
focuses on evaluating the model structures, without strong balance at the inter-annual time scale. At the daily time scale, the
constraints on the actual parameter ranges suitable to reflect each models presented in our study are correctly rejected by our multi-
individual watershed’s characteristics. To also test our assumptions objective fuzzy evaluation framework due to a lack of appropriate
about the physical interpretation of the model parameters with routing and therefore insufficient delay in streamflow transfer to
respect to physical watershed characteristics, we further con- the watershed outlet.
strained them for each watershed to a priori ranges using available In the future, other measures and signatures need to be tested
soils and vegetation data. If the model is still acceptable with these that make it increasingly difficult for a model to pass the accept-
narrower parameter ranges, then this would allow some interpre- ability test. The model structures currently included in this study
tation of the dominant watershed characteristics and their control should be enhanced by including a snow-accumulation-melt
on certain response signatures. routine and more extensive routing. In general, the number of
For the FDC and daily streamflow signatures, the multi-bucket model structures considered needs to be expanded to obtain
models have a slightly higher degree of acceptability than the a better picture of required model complexity across an even more
corresponding single-bucket models. The single-bucket models diverse set of watersheds, including semi-arid systems. The impacts
often simulate a peak that is too low and recessions that fall too of uncertainty in the input and output data need to be investigated.
quickly. The spatial variability included in the multi-bucket models Applying this methodology to a large number of watersheds can
provides a more realistic fit of the variability of the watershed also contribute to the search for watershed classification strategies
response observed for daily dynamics and results in a better fit of by grouping watersheds with similar hydrologic behavior as sug-
peaks and recessions. However, for watersheds 6 and 8, the multi- gested by their similarity in required model structures.
bucket model M1 shows a slightly lower degree of acceptability
than the corresponding simple-bucket model S1 at the intra-annual Acknowledgement
time scale. Based on an examination of the simulations using multi-
bucket models, we find that model M1 produces much higher The first author was supported as part of a USDA National Needs
runoff in winter than the observed intra-annual yield because the Graduate Fellowship in Integrated Soil and Water Sciences with
snowfall produces runoff directly due to the lack of a snow-accu- additional support by the College of Agriculture Science of the
mulation-melt component. Pennsylvania State University. Partial support for this project was
Thresholds for the shape measure membership function at the provided by the National Science Foundation (EAR-0635998). We
inter-annual scale are relatively small compared to the other thank Murugesu Sivapalan and three anonymous reviewers for
signatures. These thresholds are 0.2 for model rejection and 0.2 their constructive comments that helped to improve the paper.
for full membership of a model. The reason for these low values is
that the shape measure is normalized by the variance of the
observations. The variance of inter-annual variability is relatively Appendix A. Model structures and model equations
small compared with that of the other signatures. The difference
between the slope of the simulations and the slope of the obser- The model structures of the single-bucket models are taken from
vations is considerable compared with the variance of the obser- Atkinson et al. (2002) and Farmer et al. (2003). The multiple-bucket
vation, which results in smaller shape values. Therefore, the range formulation is taken from Son and Sivapalan (2007). In the sections
of the shape values is wider. When the shape values between below we provide the equations associated with the five models.
models differ by only a small amount, the best simulations with
respect to the shape measure between models will appear nearly A1. Model S1
equally acceptable.
Model S1 is a single-bucket model with the single store. It only
6. Conclusions includes runoff generation by saturation excess controlled by the
maximum soil water storage capacity.
This study introduces a novel top-down methodology for (1) Threshold storage parameter and threshold storage
screening candidate model structures for streamflow simulation
across time scales and across a (potentially) large number of
watersheds. The model structures that are acceptable after the qfc qwlt
screening stage can subsequently be analyzed in greater detail fc ¼
ðf qwlt Þ
regarding their suitability to represent the hydrology at a specific
place. Implementing the methodology in 12 US watersheds with Sfc ¼ Sb fc
very different physical and climatic characteristics, we identify the
necessary model complexity across four different watershed (2) Interception and evapotranspiration, here St1 is the soil
response signatures. This study therefore provides a tool in which water storage of the last time step.
model selection can be formalized and made consistent through
formalizing the selection process regarding simulated shape and Ei ¼ aei P
magnitude in a fuzzy rule system that provides a screening tool for
assessing large numbers of models across a wide range of water-
St ¼ minðSb ; St1 þ P Ei Þ
sheds. The system can be adapted by other modelers through
different definitions of selection measures and selection thresholds (
with respect to the fuzzy rules used. MEp ; St Sfc
Ev ¼
The resulting acceptable models provide insight into what M SSfct Ep ; St < Sfc
processes control the watershed responses at different time scales
under different climate conditions. We have shown – within the
(
St
Ebs ¼ ð1 MÞ Ep S0us Susfc ; S0us Susfc
Sb rp ¼
0; S0us < Susfc
E ¼ Ei þ Ev þ Ebs
where rp, recharge to the saturated zone, occurs when the field
(3) Saturated excess runoff and soil water storage at the current capacity in the unsaturated zone is satisfied.
step

St ¼ min Sb ; S0us þ Ssat t1
St ¼ St1 þ P E

St Sb ; St Sb Ssat ¼ min Sb ; Ssat t1 þ rp
Qse ¼
0; St < Sb
Sus ¼ St Qsat
St ¼ St Qse
Q ¼ Qse Susfc ¼ fc ðSb Ssat Þ
A2. Model S2 Ssat

Ev sat ¼ MEp
St
Model S2 is a single-bucket model with only a single store. It
contains saturation excess runoff and subsurface flow that is Ssat
controlled by the threshold storage Sfc. Ebs sat ¼ ð1 MÞEp
St
Steps (1) and (2) are the same as Model S1.
(3) Saturated excess runoff, subsurface runoff and soil water
Sus Sus
storage at the current time step Ebs us ¼ ð1 MÞ Ep
St Sb Ssat
St ¼ St1 þ P E
8
> Sus
>
> MEp ; Sus > Susfc
St Sb ; St Sb < St
Qse ¼ Ev us ¼ 0; Sus ¼ 0
0; St < Sb > Sus Sus
>
: S M S Ep ;
> Sus < Susfc
t usfc
St ¼ St Qse
Ebs ¼ Ebs us þ Ebs sat
ass ðSt Sb Þ; St Sfc
Qss ¼
0; St < Sfc Ev ¼ Ev us þ Ev sat
St ¼ St Qss
E ¼ Ei þ Ev þ Ebs
Q ¼ Qse þ Qss
(3) Saturated excess runoff, subsurface runoff and soil water
storage at the current time step
A3. Model S3
St ¼ St1 þ P E
Model S3 is a single-bucket model with two stores, i.e. unsatu-
rated zone and saturated zone. Evaporation and transpiration occur
S t Sb ; St Sb
from both the unsaturated and saturated zones. Flow generation is Qse ¼
0; St < Sb
by saturation excess runoff and subsurface flow from the saturated
zone.
(1) Threshold storage parameter and threshold storage in the St ¼ St Qse
unsaturated zone
Qss ¼ ass Ssat
qfc qwlt
fc ¼
ðf qwlt Þ Q ¼ Qse þ Qss
Susfc ¼ ðSb Ssat t1 Þfc St ¼ St Q

(2) Interception and evapotransipiration. Depletion from the
unsaturated and saturated zones is allocated proportionally Ssat ¼ Ssat Qss
according to water storages contents in these zones.
A4. Model S4
Ei ¼ aei P
Model S4 is a single-bucket model with three stores. The model
Sus ¼ St1 Ssat t1 structure of S4 is basically the same as that of S3 except that it has an
additional deep store that is recharged by the deep percolation from
the unsaturated and saturated zones. The deep store only loses water
S0us ¼ Sus þ P Ei through base flow, no evapotranspiration losses occur in this store.
Steps (1)and (2) are the same as Model S3. Notation

(3) Base flow from the deep store and deep storage. qfc field capacity (dimensionless)
Qbf ¼ abf Sdeep qwlt permanent wilting point (dimensionless)
f porosity (dimensionless)
fc threshold storage parameter (dimensionless) (0 < fc < 1)
rg ¼ kd Ssat
Sb maximum storage of the bucket model (mm)
Sfc threshold storage (mm)
Sdeep ¼ Sdeep Qbf þ rg P precipitation (mm d1)
(4) Saturated excess runoff, subsurface runoff and soil water Ep potential evapotranspiration (mm d1)
storage at the current time step. Here, St does not contain soil water E actual evapotranspiration (mm d1)
storage of the deep store. Ei interception (mm d1)
Ev vegetation transpiration (mm d1)
St ¼ St1 þ P E Ebs bare soil evaporation (mm d1)
Ev us transpiration from unsaturated zone (mm d1)
Ev sat transpiration from saturated zone (mm d1)
St Sb ; St Sb
Qse ¼ Ebs us evaporation from unsaturated zone (mm d1)
0; St < Sb
Ebs sat evaporation from saturated zone (mm d1)
Ssat soil water storage in saturated zone (mm)
Ssat ¼ Ssat rg
Sus soil water storage in unsaturated zone (mm)
Susfc field capacity of current unsaturated zone (mm)
Qss ¼ ass Ssat Sdeep soil water storage in deep store (mm)
St total soil water storage at current time t (mm)
Ssat ¼ Ssat Qss St1 total soil water storage at last time step t 1 (mm)
Ssatt1 soil water storage of saturated zone at last time step t 1
Q ¼ Qse þ Qss þ Qbf (mm)
rp daily recharge to saturated zone from unsaturated zone in
which water storage exceeds field capacity (mm)
St ¼ St Qse Qss rg
rg daily recharge from upper saturated zone to deeper store
(mm)
Q total runoff (mm d1)
A5. Multiple-bucket Qse surface runoff generated by saturation excess (mm d1)
Qss subsurface flow originating from saturated zone
The multiple-bucket model uses 10 buckets to represent a vari- (mm d1)
able soil moisture distribution that fits the Xinanjiang model Qbf base flow originating from deep store (mm d1)
distribution. The 10 buckets are combined in parallel. The multiple- M fraction of catchment area covered by deep rooted
bucket models M1–M4 follow the same mechanisms of hydrologic vegetation (dimensionless)
processes as models S1–S4, respectively. Kv vegetation transpiration efficiency (dimensionless)
ass recession coefficient for subsurface flow from saturated
Smax ¼ ð1 þ bÞSb zone store in the linear storage-outflow model (d1)
abf recession coefficient for subsurface flow from deep store
F ¼ ½0:05 0:15 0:25 0:35 0:45 0:55 0:65 0:75 0:85 0:95 in the linear storage-outflow model (d1)
h 1
i kd deep recharge coefficient from the upper saturated zone
Sb f ¼ Smax 1 ð1 FÞb to the deep store (d1)
b shape parameter for spatial soil water storage distribution
in the multi-bucket model (dimensionless)
Appendix B. Feasible parameter ranges
F cumulative probabilities at which soil water storages of
10 buckets fit spatial soil water storage distribution
(dimensionless)
Parameters Unit Feasible range Description
Smax maximum soil water storage in the watershed (mm)
M – 0–1 Fraction of catchment area covered by deep
Sb_f soil water storage capacities in the 10 buckets (mm)
rooted vegetation
aei – 0–0.49a Interception coefficient
Sb mm 0–1200 Maximum soil water storage
References
fc – 0–1 Field capacity
ass d1 0.05–0.5b Recession coefficients for subsurface flow
Anderson, R., Koren, V., Reed, C., 2006. Using SSURGO data to improve Sacramento
from saturated zone
Model a priori parameter estimates. Journal of Hydrology 320 (1–2), 103–116.
abf d1 0.001–0.05b Recession coefficients for subsurface flow
Atkinson, S., Woods, R.A., Sivapalan, M., 2002. Climate and landscape controls on
(base flow) from deep store water balance model complexity over changing timescales. Water Resources
Kd d1 0–0.5c Deep recharge coefficient, regulates Research 38 (12), 1314, doi:10.1029/2002WR001487.
recharge of deeper store from upper Atkinson, S.E., Sivapalan, M., Woods, R.A., Viney, N.R., 2003. Dominant physical
perched zone controls on hourly flow predictions and the role of spatial variability: Mahur-
b – 0.1–2.5d,e Shape parameter for spatial soil water angi catchment, New Zealand. Advances in Water Resources 26, 219–235.
storage distribution Bardossy, A., 2005. Fuzzy sets in rainfall/runoff modeling. In: Anderson, M.G.,
a
McDonnell, J.J. (Eds.), Encyclopedia of Hydrologic Sciences. John Wiley & Sons, Ltd..
Dingman (2002, p. 307). Beven, K.J., Binley, A.M., 1992. The future of distributed models: model calibration
b
Van Werkhoven et al. (2008). and uncertainty prediction. Hydrological Processes 6, 279–298.
c
Farmer et al. (2003). Cheng, C.T., Ou, C.P., Chau, K.W., 2002. Combining a fuzzy optimal model with
d
Yadav et al. (2007). a genetic algorithm to solve multi-objective rainfall–runoff model calibration.
e
Moore (2007). Journal of Hydrology 268 (1–4), 72–86.
Clark, M.P., Slater, A.G., Rupp, D.E., Woods, R.A., Vrugt, J.A., Gupta, H.V., Wagener, T., Miller, D.A., White, R.A., 1998. A Conterminous United States multi-layer soil
Hay, L.E., 2008. Fuse: a modular framework to diagnose differences between characteristics data set for regional climate and hydrology modeling. Earth
hydrological models. Water Resources Research 44, W00B02, doi:10.1029/ Interactions 2, 1–26.
2007WR006735. Moore, R.J., 2007. The PDM rainfall–runoff Model. Hydrology and Earth System
Dingman, S.L., 2002. Physical Hydrology. Prentice-Hall, Inc., p 307. Sciences 11 (1), 483–499.
Duan, Q., Schaake, J., Andreassian, V., Franks, S., Gupta, H.V., Gusev, Y.M., Nasir, F., Huang, G., 2007. A fuzzy decision aid model for environmental perfor-
Habets, F., Hall, A., Hay, L., Hogue, T.S., Huang, M., Leavesley, G., Liang, X., mance assessment in waste recycling. Environmental Modelling and Software
Nasonova, O.N., Noilhan, J., Oudin, L., Sorooshian, S., Wagener, T., Wood, E.F., 23 (6), 677–689.
2006. Model Parameter Estimation Experiment (MOPEX): overview and Norton, J.P., 1996. Roles for deterministic bounding in environmental modelling.
summary of the second and third workshop results. Journal of Hydrology 320 Ecological Modelling 86, 157–161.
(1–2), 3–17. Shrestha, R.R., Rode, M., 2008. Multi-objective calibration and fuzzy preference
Eder, G., Sivapalan, M., Nachtnebel, H.P., 2003. Modelling of water balances in an selection of a distributed hydrological model. Environmental Modeling and
Alpine catchment through exploitation of emergent properties over changing Software 23 (12), 1384–1395.
time scales. Hydrological Processes 17, 2125–2149. Sivapalan, M., Bloeschl, G., Zhang, L., Vertessy, R., 2003. Downward approach to
Farmer, D., Sivapalan, M., Farmer, D., 2003. Climate, soil, and vegetation controls hydrological prediction. Hydrological Processes 17, 2101–2111.
upon the variability of water balance in temperate and semiarid landscapes: Son, K., Sivapalan, M., 2007. Improving model structure and reducing parameter
downward approach to water balance analysis. Water Resources Research 39 uncertainty in conceptual water balance models through the use of auxiliary
(2), 1035, doi:10.1029/2001WR000328. data. Water Resources Research 43, W01415, doi:10.1029/2006WR005032.
Freer, J.E., McMillan, H., McDonnell, J.J., Beven, K.J., 2004. Constraining Van Straten, G., Keesman, K.J., 1991. Uncertainty propagation and speculation in
dynamic TOPMODEL responses for imprecise water table information projective forecasts of environmental change: a lake-eutrophication example.
using fuzzy rule based performance measures. Journal of Hydrology 291 Journal of Forecasting 10, 163–190.
(3–4), 254–277. Van Werkhoven, K., Wagener, T., Reed, P., Tang, Y., 2008. Characterization of
Gan, T.Y., Burges, S.J., 2006. Assessment of soil-based and calibrated parameters of watershed model behavior across a hydroclimatic gradient. Water Resources
the Sacramento model and parameter transferability. Journal of Hydrology 320 Research 44, W01429, doi:10.1029/2007WR006271.
(1–2), 117–131. Wagener, T., Sivapalan, M., Troch, P., Woods, R., 2007. Catchment classification and
Hornberger, G.M., Spear, R.C., 1981. An approach to the preliminary analysis of hydrologic similarity. Geography Compass 1, doi:10.1111/j.1749-8198.2007.00039.x.
environmental systems. Journal of Environmental Management 12, 7–18. Wittenberg, H., 1999. Baseflow recession and recharge as nonlinear storage
Jakeman, A., Hornberger, G.M., 1993. How much complexity is warranted in processes. Hydrological Processes 13 (5), 715–726.
a rainfall–runoff model? Water Resources Research 29 (8), 2637–2649. Yadav, M., Wagener, T., Gupta, H.V., 2007. Regionalization of constraints on expected
Jothityangkoon, C., Sivapalan, M., Farmer, D., 2001. Process controls of water balance watershed response behavior. Advances in Water Resources 30, 1756–1774.
variability in a large semi-arid catchment: downward approach to hydrological Yang, T., Yu, P., 2006. Application of fuzzy multi-objective function on reducing
modeling. Journal of Hydrology 254, 174–198. groundwater demand for aquaculture in land-subsidence areas. Water
Klemes, V., 1983. Conceptualization and scale in hydrology. Journal of Hydrology 65, Resources Management 20, 377–390.
1–23. Young, P., 2003. Top-down and data-based mechanistic modelling of rainfall-flow
Koren, V., Smith, M., Duan, Q., 2003. Use of a priori parameters in the derivation of dynamics at the catchment scale. Hydrological Processes 17, 2195–2217.
spatially consistent parameter sets of rainfall–runoff models. In: Calibration of Young, P., 1998. Data-based mechanistic modelling of environmental, ecological,
Watershed Models, Water Science and Application, vol. 6. The American economic and engineering systems. Environmental Modelling and Software 13,
Geophysical Union. 105–122.
Linsley, R.K., Kohler, M.A., Paulhus, J.L.H., Wallace, J.S., 1958. Hydrology for Engi- Yu, P.S., Yang, T.C., 2000. Fuzzy multi-objective function for rainfall–runoff model
neering. McGraw Hill, New York. calibration. Journal of Hydrology 238, 1–14.
Littlewood, I.G., Croke, B.F.W., Jakeman, A.J., Sivapalan, M., 2003. The role of ‘top- Zhang, Z., Wagener, T., Reed, P., Bushan, R., 2008. Ensemble streamflow predictions
down’ modelling for Prediction in Ungauged Basins (PUB). Hydrological in ungauged basins combining hydrologic indices regionalization and multi-
Processes 17 (8), 1673–1679. objective optimization. Water Resources Research 44, W00B04, doi:10.1029/
Manabe, S., 1969. Climate and the ocean circulation, 1. The atmospheric circulation 2008WR006833.
and the hydrology of the Earth’s surface. Monthly Weather Review 97 (11), Zhao, R.J., Zhang, Y.L., Fang, L.R., Liu, X.R., Zhang, Q.S., 1980. The Xinanjiang model.
739–774. In: Hydrological Forecasting, IASH Publ., 129. pp. 351–356.

Environmental Modelling & Software: Yaoling Bai, Thorsten Wagener, Patrick Reed

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Environmental Modelling & Software: Yaoling Bai, Thorsten Wagener, Patrick Reed

Uploaded by

Copyright:

Available Formats

Environmental Modelling & Software 24 (2009) 901–916

Contents lists available at ScienceDirect

Environmental Modelling & Software

A top-down framework for watershed model evaluation and selection

1. Introduction necessary, and to introduce additional state variables and parame-

2.1. Monte Carlo framework

We propose a fuzzy top-down model evaluation and selection framework that

SOIL MOISTURE ACCOUNTING

Sgw Qbf Sgw Qbf

SMA_S1 SMA_S2 SMA_S3 SMA_S4 SMA_M4

Fig. 4. Membership functions (schematic) for reliability and shape measures.

French Broad Inter-annual 0 0 73 87 100 100 100 100

Inter−annual Variability Intra−annual Variability

Flow Duration Curve Daily Streamflow

Inter−annual Variability Intra−annual Variability

Flow Duration Curve Daily Streamflow

Q ¼ Qse Susfc ¼ fc ðSb Ssat Þ

A2. Model S2 Ssat

Susfc ¼ ðSb Ssat t1 Þfc St ¼ St Q

Steps (1)and (2) are the same as Model S3. Notation

You might also like