Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Physics and Chemistry of the Earth 42–44 (2012) 70–76

Contents lists available at SciVerse ScienceDirect

Physics and Chemistry of the Earth


journal homepage: www.elsevier.com/locate/pce

Validation of hydrological models: Conceptual basis, methodological


approaches and a proposal for a code of practice
Daniela Biondi a,⇑, Gabriele Freni b, Vito Iacobellis c, Giuseppe Mascaro d, Alberto Montanari e
a
Dipartimento Difesa del Suolo ‘‘V. Marone’’, Università della Calabria, Italy
b
Facoltá di Ingegneria ed Architettura, Università di Enna ‘‘Kore’’, Italy
c
Dipartimento di Ingegneria delle Acque e di Chimica, Politecnico di Bari, Italy
d
Dipartimento di Ingegneria del Territorio, Università di Cagliari, Italy
e
Dipartimento DICAM, Università di Bologna, Italy

a r t i c l e i n f o a b s t r a c t

Article history: In this paper, we discuss validation of hydrological models, namely the process of evaluating performance
Available online 5 August 2011 of a simulation and/or prediction model. We briefly review the validation procedures that are frequently
used in hydrology making a distinction between scientific validation and performance validation. Finally,
Keywords: we propose guidelines for carrying out model validation with the aim of providing agreed methodologies
Hydrological model to efficiently assess model peculiarities and limitations, and to quantify simulation performance.
Validation Ó 2011 Elsevier Ltd. All rights reserved.
Performance indexes
Model diagnostic
Calibration

1. Introduction resources evaluation, flood protection and design of civil infrastruc-


tures. These numerical models adopt approaches and computational
The term validation is well known in hydrology and environ- schemes that may be widely different. For this reason, validation
mental modelling and is commonly used to indicate a procedure protocols are required to (i) facilitate model inter-comparison, (ii)
aimed at analysing performance of simulation and/or forecasting improve development of superior models, as well as their coupling
models. In the scientific context, the term validation has a broader and integration with data assimilations schemes, and (iii) help fore-
meaning including any process that has the goal of verifying the cast users optimise their decision making.
ability of a procedure to accomplish a given scope. As an example, Another important reason for developing standard validation
it can indicate the verification of a preliminary hypothesis or the criteria is the progressive mismatch between the complexity of
security assessment of a computer network. modelling tools and the capacity of modellers and practitioners
The need for agreed and standardised validation protocols in to rigorously assess the reliability of modelling application (Hug
hydrological modelling has become progressively more urgent. In et al., 2009). This difficulty is exacerbated by the lack of sufficiently
fact, in the last 30 years, hydrologic modelling has been greatly informative data. As an example, measured variables are often
improved thanks to the increasing availability of computational point values while simulated variables are frequently averaged in
resources, the advancement in the process understanding as well time and/or in space. Moreover, measured variables are affected
as the availability of spatially distributed data, mainly provided by by uncertainty due to the monitoring technology (e.g., Di Bald-
remote sensors (Smith et al., 2004). The scientific literature contin- assarre and Montanari, 2009). The problem with data availability
uously proposes new, sophisticated modelling solutions aimed at and uncertainty has been highlighted since the first approaches
reproducing the hydrological cycle at multiple scales (e.g., field, to environmental modelling validation and has often limited the
watershed and even global scale) and for several goals, including possibility to adopt techniques successfully used in other scientific
research-oriented objectives, such as advancing the knowledge of disciplines (Santhi et al., 2001).
physics of water movement, and more practical scopes, like water A number of notable efforts have been recently devoted towards
the development of shared modelling methodologies and verifica-
tion standards in hydrology and close disciplines. The US National
⇑ Corresponding author.
Weather Service (NWS) created a team of researchers, named
E-mail addresses: [email protected] (D. Biondi), [email protected]
(G. Freni), [email protected] (V. Iacobellis), [email protected] (G. Mascaro), Hydrologic Verification System Requirements Team, which had,
[email protected] (A. Montanari). among his tasks, the goal of establishing requirements for a

1474-7065/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.pce.2011.07.037
D. Biondi et al. / Physics and Chemistry of the Earth 42–44 (2012) 70–76 71

comprehensive national system to verify hydrologic forecast (see protocol with relative guidelines are presented in Section 4, while
the website http://www.weather.gov/oh/rfcdev/projects/hvsrt_ conclusions are drawn in Section 5.
charter_05.htm). Moreover, Theme 3 ‘‘Advance the learning from
the application of existing models, towards uncertainty analyses
2. Definitions and principles of evaluation theory
and model diagnostics’’ of the Predictions in Ungaged Basins
(PUB) initiative also promotes harmonisation of model evaluation
Prior to defining some basic concepts that have been used
techniques. Similar standardisation processes have been started
throughout this paper, we underline that high uncertainty exists
in other research fields in which mathematical modelling has be-
in the terminology adopted in the present literature focused on
come a common practice such as environmental quality assessment
the general process of evaluating the usefulness of a model for a gi-
and water resources management (Belia et al., 2009; Muschalla
ven purpose. This implies that, in diverse contexts (e.g., environ-
et al., 2009). Particularly, Belia et al. (2009) proposed a road map
mental sciences, economics, meteorology, and medicine), the
for the definition of standard modelling approaches and model
same word or expression is referred to indicate different activities.
evaluation protocols starting from the creation of a common
For example, the word verification is currently utilised in atmo-
knowledge base to which water quality modellers can refer for their
spheric science in the expression forecast verification to indicate
applications. Muschalla et al. (2009) defined an open protocol to ap-
the procedures aimed at measuring the ability of a meteorological
ply to wastewater process modelling, highlighting the importance
model to predict the future weather (e.g., Jolliffe and Stephenson,
of model validation and uncertainty analysis especially in those
2003). Alternative expressions in this field are forecast evaluation,
cases where model complexity is not supported by sufficient data
validation or accuracy. In the broader field of environmental mod-
availability. In addition, a relevant effort was provided by the EU
elling, some authors used the expression model verification to de-
research project HarmoniQuA (Harmonizing Quality Assurance in
fine a procedure for establishing that the model code correctly
model based catchments and river basin management) aimed to
solves the set of mathematical equations adopted to simulate the
the development of modelling support tools that investigate the
real world (Matott et al., 2009). Since a discussion on the uncer-
reliability of modelling responses at catchment scale (Refsgaard
tainty in terminology and taxonomy would be too long and out
et al., 2005; Scholten et al., 2007).
of the scope of this paper, here we underline the existence of this
Even if the need for common validation criteria is widely
problem and refer the reader to, e.g., Anderson and Bates (2001)
accepted in the hydrology and in the environmental science context,
and Matott et al. (2009) for more details.
piecemeal contributions to this aim were presented in the specia-
The definitions adopted in this work are based on the consider-
lised literature (Klemeš, 1986; Andréassian et al., 2009; Krause
ation that the most frequent validation procedures, used in hydro-
et al., 2005; Schaefli and Gupta, 2007; Gupta et al., 2009). In addition,
logic and environmental modelling in general, proposed to split
validation protocols have been so far rarely applied in practical
model evaluation in three complementary phases (Gupta et al.,
cases. For example, the NWS recently conducted two experiments,
2008): (a) quantitative evaluation of model performance; (b) qual-
named DMIP (Distributed Model Intercomparison Project) and
itative evaluation of model performance; (c) qualitative evaluation
DMIP2, where several distributed hydrological models have been
of model structure and scientific basis. In the following, we alterna-
applied to common benchmark cases, consisting of well-
tively use the expressions model validation or performance valida-
instrumented basins located in diverse regions of US with contrast-
tion to indicate the concepts (a and b). This would be equivalent
ing climatic and landscape conditions (Smith et al., 2004). These ini-
to the definition of model validation proposed by Matott et al.
tiatives were very successful, with the contribution of a notable
(2009). We instead adopt the expression scientific validation to re-
number of different models. However, no standardised validation
fer to the activities described in point (c).
protocols have been utilised to analyse results. Verification tools
In what follows, the term model will be used to indicate a
would have been instead extremely important to intercompare the
numerical tool for simulating input, state and output variables of
models, by quantifying their capability to reproduce specific hydro-
a specific process. The user confidence into model results is strictly
logical processes, or to assess their robustness, i.e., whether if they
connected with the simulation reliability, which is assessed by
may be applied in a broad range of conditions and climates, or only
comparing modelling output with data observed in the real world.
in specific regimes.
The comparison is in turn made by means of user-defined criteria
The considerations outlined above indicate that, despite recent
depending on the aim of the specific application. In the subsequent
efforts, an agreed and rigorous validation approach has not yet
sections, the main performance and scientific validation proce-
been proposed, due to the strong difficulty in identifying a unique
dures presented in literature are briefly reviewed.
and general protocol applicable to the large number of existing
models and kinds of applications proposed in hydrology. As a
matter of fact, in the majority of cases, validation is limited to 3. Overview of techniques and methodological practice
analyse one or two events (e.g., intense floods), by simply com-
paring times series of simulated versus observed variables, and 3.1. Performance validation: graphical techniques and performance
computing few lumped metrics that are able to capture only metrics
some attributes characterising model performance. The present
paper aims at overcoming these limitations, by delineating a first The typical approach adopted to evaluate model performance
proposal for a validation protocol in hydrology that explicitly dis- requires the comparison between simulated outputs on a set of
tinguishes two phases: (i) the quantitative evaluation of model observations that were not used for model calibration. This proce-
performances, and (ii) the qualitative evaluation of model struc- dure coincides with the so-called split sample test in the classic
ture and science foundation. The guidelines here proposed are in- hierarchical validation scheme proposed by Klemeš (1986), as well
tended to aid the work of researchers and practitioners/engineers as with the first level of the theoretical scheme of Gupta et al.
while developing and applying numerical modelling tools in (2008).
hydrology. After clarifying some definitions that are used in the Model performance can be addressed by means of qualitative
paper (Section 2), we provide a summary of the state of the art and quantitative criteria. The former essentially rely on the graph-
of validation techniques for surface hydrological modelling ical comparison between observed and simulated data, whereas
including metrics and graphical tools, whose combined use is the latter are based on numerical performance metrics. Both ap-
suggested in the proposed protocol (Section 3). The validation proaches are fundamental tools to be used in complementary fash-
72 D. Biondi et al. / Physics and Chemistry of the Earth 42–44 (2012) 70–76

ion, since they are able to capture distinct aspects of model 2001; Romanowicz and Beven, 2006), the AIC (Akaike Information
performance. Criterion), the BIC (Bayesian Information Criterion) and the KIC
The choice of the validation criteria is guided by several factors. (Kashyap Information Criterion). The last three statistic criteria ac-
It depends on the nature of the simulated variables and the main count for the mathematical complexity of the model by including
model purpose. It is also affected by the fact that model simula- the number of model parameters in the metric computation.
tions can be either deterministic or probabilistic. In the traditional A wide number of metrics is derived from the general expres-
deterministic approach, a unique best output is produced. More re- sion (Van der Molen and Pintér, 1993):
cently, a number of techniques, such as ensemble forecasting " #1=b
(Schaake et al., 2007), have been proposed to account for the differ- 1X N
F¼ jy  yo;t js ; s P 1; b P 1 ð1Þ
ent sources of uncertainty associated with input data, model struc- N t¼1 s;t
ture and parameterization. Through these approaches,
probabilistic hydrological forecasts are produced that attempt to or from the analogous relation based on the relative deviations,
explicitly quantify uncertainty. "
N  s #1=b
Many criticisms have been addressed to traditional lumped 1X 
ys;t  yo;t 
F¼ ; yo;t – 0; s P 1; b P 1 ð2Þ
metrics for their lack of diagnostic power or inability to capture N t¼1  yo;t 
differences between different model or parameter sets leading to
ambiguous situations characterised by equifinality. As a result, where F is the performance metric, N is the number of observations,
more powerful evaluation tool like multi-objective methods that while ys,t and yo,t are the simulated and observed values at time t,
combines different (weighted) performance metrics into one over- respectively.
all objective function (e.g., Gupta et al., 1998) have been proposed. In particular, the metrics related to (2) are dimensionless and,
Another notable issue is that metric interpretation in not always thus, provide a more balanced evaluation of model performance
straightforward. A common approach to address this problem con- over the entire study period. Metrics derived from both expres-
sists of evaluating the metrics computed from model outputs with sions (1) and (2) do not have an upper boundary while a null value
a benchmark value or a reference forecast that is generally an un- indicates a perfect fit.
skilled forecast (such as random chance and persistence). According to the values assumed by s and b parameters, the two
In the review presented in this section, we mainly consider a expressions provide different metrics, some of which are listed in
hydrological model simulating the streamflow in a river basin, thus Table 1. For higher s, the metric is more sensitive to large differ-
focusing on metrics and graphical techniques primarily applied to ences between simulated and observed values. Several perfor-
time series. In particular, we present a review of robust perfor- mance metrics adopt s = 2 and are therefore based on squared
mance validation methods that can be utilised for both long-term deviations.
multi-seasonal simulations and periods characterised by specific To assess the quality of the model fit, other indexes, for example
dominant processes (e.g., extremes or snow melting periods). In the Janus coefficient (Power, 1993), compare the model errors in
each of the following subsection, we first introduce the techniques the validation and the calibration period. Other goodness-of-fit
useful for validating performance of deterministic simulations and, metrics are based on regression operations between simulated
then, we discuss the methods utilised for assessing accuracy of and observed data (Legates and McCabe, 1999). In this category,
probabilistic hydrological predictions. We highlight that, in this we include (Table 1): the coefficient of determination R2, the index
last case, most of the verification techniques are based on tools of agreement D (Wilmott et al., 1985), and the coefficient of effi-
developed in applied meteorology, a discipline that has historically ciency NSE introduced by Nash and Sutcliffe (1970), which is by
devoted consistent efforts on model validation and forecast far the most utilised index in hydrological applications. Differently
verification. from the metrics previously described, the perfect agreement is
achieved when R2, D and NSE are equal to unity.
3.1.1. Graphical techniques Due to its large popularity, it is worthy to focus on Nash–Sutc-
Graphical techniques allow a subjective and qualitative valida- liffe coefficient, whose main characteristics are as follows: (i) it
tion. Despite the pletora of exiting goodness-of-fit metrics, visual measures the departure from unity of the ratio between the mean
inspection still represents a fundamental step in model validation squared error and the variance of the observations; (ii) it varies be-
as it allows the study of temporal dynamics of model performance tween 1 and 1; (iii) a null value is obtained when the simulation
and facilitate the identification of patterns in error occurrence. In is identically equal to the mean value of the observed series.
most cases, they are based on a graphic comparison of simulated The diagnostic properties of the Nash–Sutcliffe efficiency have
and measured time series (Fig. 1a). This kind of plots can be diffi- been recently investigated in detail by Gupta et al. (2009) through
cult to read, especially when the observation period is long. Scat- the decomposition into more meaningful components. These
terplots of simulated versus observed discharge are more easily authors show that using NSE is equivalent to check model capabil-
interpretable and provide an objective reference given by the 1:1 ity to reproduce the following statistics: (i) mean value and (ii) var-
line of perfect fit (Fig. 1b). Other common graphical representa- iance of the discharge time series, and (iii) coefficient of correlation
tions are residual plots (Fig. 1c) and the comparison of streamflow between simulated and observed time series. The weight attrib-
duration curves as well as flood frequency distributions. Recently, uted to each of the above components depends on the magnitude
the use of ensemble forecast techniques in hydrological models has of the observed data, but is mainly concentrated on correlation.
led to the adoption of graphical methods developed and typically Basing on this evidence, Gupta et al. (2009) proposed an innovative
used in applied meteorology to evaluate probabilistic forecasts, index, called KGE (Kling-Gupta Efficiency), expressed as an explicit
like the reliability diagram (Fig. 1d) and the verification rank histo- function of the three statistics mentioned above.
gram (Fig. 1e) (Wilks, 2006; Mascaro et al., 2010). Additional modifications of the NSE have been proposed in the
literature, including those based on transformed variables, others
3.1.2. Performance metrics using relative instead of absolute errors, and those adopting refer-
The performance metrics (or indexes) provide a quantitative ence value different from the mean (Krause et al., 2005; Chiew and
and aggregate estimate of model reliability and are generally ex- McMahon, 1994; Romanowicz et al., 1994; Freer et al., 1996).
pressed as a function of the simulation errors. Some metrics have Other frequently used indexes include those based on the rank cor-
a statistical foundation, as the likelihood functions (Beven et al., relation criteria, such as the Spearman and Kendall coefficients. Re-
D. Biondi et al. / Physics and Chemistry of the Earth 42–44 (2012) 70–76 73

(a) (b) (c)

(d) (e)

Fig. 1. Graphical methods used to evaluate model performance. Deterministic forecast: (a) observed and simulated time series; (b) scatter plot; (c) residual plot. Probabilistic
forecast: (d) reliability diagram; (e) verification rank histogram.

fundamental information about the forecast performance and, only


Table 1 recently, some techniques have been specifically designed to quan-
Numerical metrics used to evaluate model performance. tify their weight in the process of verifying hydrologic probabilistic
Performance metric Expression
forecasts. Contributions in this field include the work of Welles
PN (2005), Welles et al. (2007), Laio and Tamea (2007), Engeland et al.
Mean Absolute Error (MAE) F 1 ¼ N1 t¼1 jys;t  yo;t j (2010), Mascaro et al. (2010) and the technical report released by
Mean Square Error (MSE) P
F 2 ¼ N1 N
t¼1 jys;t  yo;t j
2
NWS downloadable at http://www.nws.noaa.gov/oh/rfcdev/docs/
Root Mean Square Error (RMSE) h P i1=2
F 3 ¼ N1 t¼1 jys;t  yo;t j2
N
Final_Verification_Report.pdf.
Minimax objective function F 4 ¼ N1 max jys;t  yo;t j Moreover, taking again inspiration to applied meteorology, both
Average Absolute Percentage Error P ys;t yo;t  deterministic and probabilistic hydrological forecasts can be trans-
F 5 ¼ 100 N1 N t¼1  y 
(AAPE) o;t
formed into a categorical yes/no forecasts according to some criti-
Mean Square Relative Error (MSRE) PN ys;t yo;t 2
F6 ¼ 100 N1 t¼1  yo;t 
cal value or probability threshold (e.g., probability that the
Coefficient of determination (R2) ( PN )2 streamflow accumulated in a given duration will exceed a certain
ðyo;t yo Þðys;t ys Þ
F7 ¼ P N t¼1
 
2 0:5 PN

2 0:5 threshold). The contingency table, which shows the frequency of
ðyo;t yo Þ ðys;t ys Þ
Index of agreement (D)
t¼1
PN 2
t¼1
yes and no forecasts and occurrences and the frequency of their
ð y y Þ
F 8 ¼ 1  PN t¼1 s;t o;t 2 combinations (hit, miss, false alarm, correct negative), is a common
t¼1
ðjys;t yo jþjyo;t yo jÞ
PN way to analyse what types of errors are being made. A large variety
Nash–Sutcliffe Efficiency ð ys;t y o;t Þ
2

F 9 ¼ 1  Pt¼1
coefficient (NSE) N
t¼1
ðyo;t yo Þ
2
of categorical statistics can be computed from the elements of the
contingency table to describe particular aspects of forecasts perfor-
mance, including probability of detection, false alarm rate, critical
cently, new validation criteria based on fuzzy measures or on
success index, Gilbert skill score, Peirces skill score, Heidke skill
mechanism to account for expert knowledge and ‘‘soft data’’ have
score, among the others. A commonly adopted verification tool in
become popular, even if they introduce a larger degree of subjec-
case of probabilistic forecast for binary (yes/no) events is the Brier
tivity in performance evaluation (Seibert and McDonnell, 2002;
score, which takes the form of the more general Ranked Probability
Beven, 2006). The review presented so far has been focused on
Score (RPS) when it is intended to be applicable to multi-category
metrics mainly applicable to deterministic simulations and is not
forecasts. A further generalisation of the RPS to an infinite number
intended to be exhaustive. The reader is referred to Jachner and
of classes led to the definition of the Continuous Ranked Probabil-
van den Boogaart (2007) and Dawson et al. (2007) for a detailed
ity Score, a metric particularly useful to verify reliability, resolution
survey.
and uncertainty attributes of ensemble streamflow forecasts.
When dealing with probabilistic forecasts, traditional goodness-
of-fit metrics (like those mentioned for deterministic simulations)
do not allow a complete and fair evaluation of the forecast perfor- 3.2. Scientific validation
mance. In his essay on the nature of goodness in weather forecasting,
Murphy (1993), considering a distribution-based approach, distin- The concept of scientific validation has been originated from the
guishes nine attributes that contribute to the fullest description of idea that verifying the model performance by simply comparing
the multi-faceted nature of the probabilistic forecast quality: Bias, outputs and observations does not assure that the model is correct
Association, Accuracy, Skill, Reliability, Resolution, Sharpness, from a scientific point of view. In other words, this limitation does
Discrimination and Uncertainty. Each of these attributes carries a not allow us to assess if the model structure and parameterization
74 D. Biondi et al. / Physics and Chemistry of the Earth 42–44 (2012) 70–76

are consistent with the physics of the simulated processes (Oreskes In many research fields, a multi-criteria approach has been pro-
et al., 2003). It is well known that every model provides a simpli- posed where the behaviour of different state variables, internal to
fied representation of reality, which depends on availability of the model, is analysed and exploited in order to verify and diagnose
observations, knowledge of phenomena, computational capability, the model. This kind of approach can take advantage of information
and final purposes of the application. Given these limitations, the and data obtained from remote sensing and/or field based observa-
scientific validation aims at evaluating the consistency, and the tions of physical quantities related to vegetation states, air temper-
coherence with real world, of the model thought as an ISO (in- ature, soil moisture, etc. (Castelli, 2008). On the other hand, this
put-state-output) system. In this framework, the quantification of philosophy leads to increasing model complexity. In fact, multi-site
the different sources of uncertainty (e.g., observations, process validation is possible if simulations of spatial patterns are ac-
parameterization, model structure) is crucial (Todini, 2007). Syn- counted for. Also, multi-variable checks are source of precious
thesizing, the scientific validation has the goal of verifying that information if predictions of the behaviour of individual subsys-
right outputs are produced for the right reason (Kirchner et al., tems within a catchment are performed (Refsgaard and Henriksen,
2006; Aumann, 2007). 2004). Thus, another important checkpoint of scientific validation
The scientific validation may include and extend the perfor- lies in the assessment of the equilibrium between model purpose,
mance validation and is specifically required in particular cases, model complexity and availability of data sources and information.
including: (a) when the quality and quantity of the observations In this perspective, scientific and performance validation may be
used for comparison with model outputs are not sufficient to allow merged. This happens, for example, in cases when the performance
an adequate performance validation; (b) when the model is utilised metrics combine various measures aimed at making diagnosis or
with the goal of advancing the knowledge of physical processes, providing information to correct the model at the appropriate level.
rather than to make predictions; (c) when the hydrological model
is not the ‘‘focus’’ of a given application, but just a ‘‘means’’ to char-
acterise the initial conditions or quantify variables needed to study 4. Proposal of a validation protocol: general guidelines
other physical, chemical, and/or biological processes.
So far, no agreed methodological approach has been proposed The scientific literature continually suggests new advances
by the international scientific community to deal with scientific about model validation. However, most of the new proposals are
validation. However, beyond the general principles above stated, dedicated to specific technical details, like, for example, the opti-
a large number of techniques are already developed in hydrology mal combination of performance metrics (Reusser et al., 2009)
as well as in other disciplines, which can be considered as applica- and model diagnostic issues. Less consideration has been devoted
tions in the field of scientific validation. to the ethic and philosophical principles that should guide the
One goal of scientific validation is the assessment of model development of innovative validation techniques. The principal
hypotheses. This task often relies on the identification of the main reason for this limited attention is probably the potential subjec-
processes that affect the real world and that model should care- tivity of these guiding principles. We instead believe that the latter
fully account for. With this aim Aumann (2007), in the field of ecol- should become the main subject of the discussion. Model valida-
ogy, suggests to conduct a system analysis aimed at detecting the tion should be intended as a modeller self-training tool more then
processes, occurring at different scales, that can be considered as a way to objectively show the performance of the model.
the dominant processes or emergent properties across different The leading principle that we would like to emphasise is that the
hierarchical levels of the model. In general, the observation of value of a simulation study should be associated not only with the
the same natural processes at different scales can lead to useful in- quality of results, but also (and perhaps more importantly) with
sights in process knowledge. In this framework techniques based their scientific interest. We would like to re-elaborate the idea that
on the upscaling/downscaling of state variables, model parameters, a model performs well only if it returns satisfactory results. In fact,
input variables and model conceptualizations (Bierkens et al., it is widely recognised that the good score returned by a perfor-
2000) provide a recognised, model-oriented approach for coping mance metric only provides a limited view of the practical utility
with the scale transfer problem (e.g., Bloschl and Sivapalan, and scientific value of a model. It is also recognised that providing
1995). An upscaling/downscaling technique is also used for model examples of poor model performance is very useful to highlight
diagnostic in meteorology (Hantel and Acs, 1998). Coming back to model weaknesses. These principles are valid both for scientific
hydrology, strong enhancements in recognising the main process model development and for practical applications. Especially in
controlling water balance and rainfall–runoff processes may be the latter case, the knowledge of model limitations is even more
achieved by jointly exploiting (i) lumped, semi-distributed and dis- important than the investigation of its best performance because
tributed models; (ii) parcel, hillslope and catchment models and they should affect design strategies and safety factors. Model valida-
observations; (iii) regional and at-site estimates of hydrological tion in engineering practice is often perceived only as a quality
random variables. assurance issue while it may have a broader pro-active impact on
Besides the assessment of model hypotheses, scientific valida- modelling choices guiding monitoring campaigns, specific investi-
tion aims at providing the proof of model adequacy to the repre- gations or simply a wiser choice of the modelling tools. This misun-
sentation of real world beyond (or together with) the result of derstanding, along with the obvious costs of the procedure, took
validation tests. In fact, one model could be right for the wrong rea- model validation to be neglected in the most part of practical
sons, for example, by compensating error in model structure with applications.
errors in parameter values (Refsgaard and Henriksen, 2004). We believe an important guideline for the validation process
This argument may lead to recognising the equifinality problem could be given by the so-called SWOT analysis (Hill and West-
posed by Beven et al. (2001) and Beven (2006) but, rather, scientific brook, 1997), namely, a tool for strategic planning that can be used
validation points to the identification of model selection and to evaluate Strengths, Weaknesses, Opportunities and Threats
parameter estimation pursuing the inequifinality concept pro- associated with a model and its application. A possible schematic
posed by Todini (2007). Casted in the Bayesian framework, the of the SWOT analysis applied to hydrological validation is pre-
inequifinality concept is expressed by stating that model structure, sented in Table 2. This approach allows assessing which opportuni-
parameter vector and predictions can be chosen as those more ties can be gained and which risks can be avoided through the
likely than others, i.e. with posterior densities characterised by strengths of the model, as well as which risks can be caused by
more pronounced peaks and smaller predictive uncertainties. the model weaknesses and how they can be mitigated. The under-
D. Biondi et al. / Physics and Chemistry of the Earth 42–44 (2012) 70–76 75

Table 2
Proposal of a SWOT analysis applied to a hydrological model.

SWOT analysis Internal factors


Strenghts Weaknesses
External factors Opportunities Highlight model strengths and related opportunities Highlight model weaknesses and how they can be mitigated
Risks Highlight how model strengths allow avoiding risks Highlight which risks are caused by model weaknesses

lying philosophy is that model limitations should be discussed 2. List and discuss all the assumptions; describe the validation
with the same detail that is dedicated to model strengths. When procedure and the relative hypotheses.
a particular model is chosen strengths are usually discussed in 3. Analyse the reliability of theoretical fundaments; justify the
view of the scope of the analysis (therefore highlighting the oppor- degree of complexity and the computational burden.
tunities). However, any model only provides an approximation of 4. Evaluate and discuss possible alternative modelling hypotheses.
reality and therefore weaknesses are unavoidably present. We be- 5. Use all the possible knowledge (physical processes, observa-
lieve that these limitations should be discussed as well, along with tions) to support model development and application. Under-
the related risks. Actually, model weaknesses are rarely mentioned line the coherence of the solution with the physical basis of
in scientific studies. The systematic use of the SWOT approach in the simulated processes.
hydrology would stimulate a more insightful scientific evaluation. 6. Analyse the entire ISO system pointing out the uncertainty
associated with input and output components.
4.1. Guidelines for performance validation 7. Make data and numerical codes publicly available, make the
scientific community able to reproduce results. If this is not
The basic idea in performance validation is to provide several ele- be possible (data ownership and so on), provide a detailed
ments that can be used by researchers and practitioners/engineers description of the study to support repeatability.
to clarify different and complementary issues related to model per- 8. Identify strengths and weaknesses of the model and highlight
formance. The guidelines are summarised by the following points. their interactions with risks and opportunities, as proposed by
the SWOT analysis. Provide support to scientific review with a
1. Provide clear and unequivocal indications about model perfor- detailed discussion of the critical points.
mance in real world applications.
2. Apply the validation procedure by using independent informa- As previously highlighted, the scientific validation is contiguous
tion with respect to what was used for model calibration. to performance validation and can be enhanced by points listed in
3. Perform validation and discussion of data reliability, and possi- the previous subsection, whenever they can be applicable in rela-
bly implement a combined validation of models and data. tion to data availability and model goals.
4. Use graphical techniques and several numerical performance
metrics to evaluate different aspects of model performance.
5. Conclusions
Among the available graphical techniques, we suggest the use
of scatter plots of observed versus simulated values for their
This paper intends to provide a contribution towards the iden-
immediate readability. The use of the logarithmic scale should
tification of agreed principles for model validation. The basic idea
be properly justified. The selected metrics should be justified.
is that validation should provide an exhaustive evaluation of both
5. When dealing with probabilistic simulations, use rigorous tech-
model scientific basis and performance. For this purpose, it is nec-
niques that test several attributes of forecast quality.
essary to highlight not only the model strengths but also the weak-
6. When presenting results, do not focus only on a few cases (e.g.,
nesses, according, for example, to the principles suggested by the
a single intense flood event), but consider a statistically signif-
SWOT analysis. A first suggestion for a model validation protocol
icant number of cases including those where the model did
is presented, by providing recommendations to structure the vali-
not return satisfactory results. Indications about worst perfor-
dation process and to produce a comprehensive and comprehensi-
mance should be provided, discussing the possible reasons that
ble validation. We believe the on-going development of new
are responsible for the obtained performance level.
modelling tools and applications requires focusing on the criteria
7. If possible, extend the validation to model input and state
for a transparent presentations of models and results.
variables.
8. If possible, validate the model over different temporal and spa-
tial scales. Acknowledgements
9. Evaluate the opportunity to apply jack-knife techniques to cre-
ate confidence intervals (Shao and Tu, 1995; Castellarin et al., The authors thank three anonymous reviewers whose com-
2004; Brath et al., 2003). ments helped to improve the quality of the manuscript. The
authors thank Prof. Pasquale Versace and Prof. Riccardo Rigon
The above list is meant to be the basis for a code of practice which whose suggestions and encouragements have been very helpful.
is based on the principle of integrating different validation methods
for comprehensively evaluating model strengths and limitations.
References

4.2. Guidelines for scientific validation Anderson, M.G., Bates, P.D., 2001. Model Validation: Perspectives in Hydrological
Science. John Wiley & Sons, Inc..
Andréassian, V., Perrin, C., Berthet, L., Le Moine, N., Lerat, J., Loumagne, C., Oudin, L.,
Guidelines for the scientific validation protocol, which are
Mathevet, T., Ramos, M.-H., Valéry, A., 2009. HESS opinions crash tests for a
mainly useful for research and model development, can be summa- standardized evaluation of hydrological models. Hydrol. Earth Syst. Sci. 13,
rised as follows: 1757–1764.
Aumann, C.A., 2007. A methodology for developing simulation models of complex
systems. Ecol. Model. 202, 385–396. doi:10.1016/j.ecolmodel.2006.11.005.
1. Clearly identify the model purpose(s) and check if the adopted Belia, E., Amerlinck, Y., Benedetti, L., Johnson, B., Sin, G., Vanrolleghem, P.A.,
model addresses it (them). Gernaey, K.V., Gillot, S., Neumann, M.B., Rieger, L., Shaw, A., Villez, K., 2009.
76 D. Biondi et al. / Physics and Chemistry of the Earth 42–44 (2012) 70–76

Wastewater treatment modelling: dealing with uncertainties. Water Sci. Mascaro, G., Vivoni, E.R., Deidda, R., 2010. Implications of ensemble quantitative
Technol. 60, 1929–1941. precipitation forecast errors on distributed streamflow forecasting. J.
Beven, K.J., 2006. A manifesto for the equifinality thesis. J. Hydrol. 320, 18–36. Hydrometeorol. 11, 69–86.
Beven, K.J., Freer, J., 2001. Equifinality, data assimilation, and uncertainty estimation Matott, L.S., Babendreier, J.E., Purucker, S.T., 2009. Evaluating uncertainty in
in mechanistic modelling of complex environmental systems using the GLUE integrated environmental models: a review of concepts and tools. Water
methodology. J. Hydrol. 249, 11–29. Resour. Res. 45. doi:10.1029/2008WR007301.
Bierkens, M.F.P., Finke, P.A., de Willigen, P., 2000. Upscaling and Downscaling Murphy, A.H., 1993. What is a good forecast? An essay on nature of goodness in
Methods for Environmental Research. Kluwer Academic Publishers, Dordrecht, weather forecasting. Weather Forecast. 8, 281–293.
190 pp. Muschalla, D., Schuetze, M., Schroeder, K., Bach, M., Blumensaat, F., Gruber, G.,
Bloschl, G., Sivapalan, M., 1995. Scale issues in hydrological modelling: a review. Klepiszewski, K., Pabst, M., Pressl, A., Schindler, N., Solvi, A.M., Wiese, J., 2009.
Hydrol. Proc. 9, 251–290. The HSG procedure for modelling integrated urban wastewater systems. Water
Brath, A., Castellarin, A., Montanari, A., 2003. Assessing the reliability of regional Sci. Technol. 60, 2065–2075.
depth-duration-frequency equations for gaged and ungaged sites. Water Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models. J.
Resour. Res. 39. doi:10.1029/2003WR002399. Hydrol. 10, 282–290.
Castellarin, A., Galeati, G., Brandimarte, L., Montanari, A., Brath, A., 2004. Regional Oreskes, N., 2003. The role of quantitative models in science. In: Canham, C.D. et al.
flow-duration curves: reliability for ungauged basins. Adv. Water Resour. 27, (Eds.), The Role of Models in Ecosystem. Science Princeton Univ. Press, pp. 13–
953–965. 31.
Castelli, F., 2008. Sinergia fra modelli idrologici e osservazioni satellitari: la Power, M., 1993. The predictive validation of ecological and environmental models.
dinamica e il paradigma bayesiano. In: Proceedings of the 31th National Ecol. Model. 68, 33–50.
Conference on Hydraulics and Hydraulic Works, Perugia, Italy, 2008. Refsgaard, J.C., Henriksen, H.J., 2004. Modelling guidelines terminology and guiding
Chiew, F.H., McMahon, T.A., 1994. Application of the daily rainfall–runoff model principles. Adv. Water Resour. 27, 71–82.
MODHYDROLOG to 28 Australian catchments. J. Hydrol. 153, 383–416. Refsgaard, J.C., Henriksen, H.J., Harrar, W.G., Scholten, H., Kassahun, A., 2005. Quality
Dawson, C.W., Abrahart, R.J., See, L.M., 2007. HydroTest: a web-based toolbox of assurance in model based water management-review of existing practice and
evaluation metrics for the standardized assessment of hydrological forecasts. outline of new approaches. Environ. Model. Softw. 20, 1201–1215.
Environ. Model. Softw. 22, 1034–1052. Reusser, D.E., Blume, T., Schaefli, B., Zehe, E., 2009. Analysing the temporal dynamics
Di Baldassarre, G., Montanari, A., 2009. Uncertainty in river discharge observations: of model performance for hydrological models. Hydrol. Earth Syst. Sci. 13, 999–
a quantitative analysis. Hydrol. Earth Syst. Sci. 13, 913–921. 1018.
Engeland, K., Renard, B., Steinsland, I., Kolberg, S., 2010. Evaluation of statistical Romanowicz, R., Beven, K.J., Tawn, J.A., 1994. Evaluation of predictive uncertainty in
models for forecast errors from the HBV model. J. Hydrol. 384, 142–155. nonlinear hydrological models using a Bayesian approach. In: Barnett, V.,
Freer, J., Beven, K.J., Ambroise, B., 1996. Bayesian estimation of uncertainty in runoff Turkman, F. (Eds.), Statistics for the Environment 2: Water Related Issues.
prediction and the value of data: an application of the GLUE approach. Water Wiley.
Resour. Res. 32, 2161–2173. Romanowicz, R.J., Beven, K.J., 2006. Comments on generalised likelihood
Gupta, H.V., Sorooshian, S., Yapo, P.O., 1998. Toward improved calibration of uncertainty estimation. Reliab. Eng. Syst. Safe. 91, 1315–1321.
hydrological models: multiple and noncommensurable measures of Santhi, C., Arnold, J.G., Williams, J.R., Dugas, W.A., Srinivasan, R., Hauck, L.M., 2001.
information. Water Resour. Res. 34, 751–763. Validation of the SWAT model on a large river basin with point and nonpoint
Gupta, H.V., Wagener, T., Liu, Y., 2008. Reconciling theory with observations: sources. J. Am. Water Resour. Ass. 37, 1169–1188.
elements of a diagnostic approach to model evaluation. Hydrol. Proc. 22, 3802– Schaake, J.C., Hamill, T.M., Buizza, R., Clark, M., 2007. The hydrological ensemble
3813. prediction experiment. Bull. Am. Meteorl. Soc. 88, 1541–1547.
Gupta, H.V., Kling, H., Yilmaz, K.K., Martinez, F.G., 2009. Decomposition of the mean Schaefli, B., Gupta, H.V., 2007. Do Nash values have value? Hydrol. Proc. 21, 2075–
square error and NSE performance criteria: implications for improving 2080.
hydrological modelling. J. Hydrol. 377, 80–91. Scholten, H., Kassahun, A., Refsgaard, J.C., Kargas, T., Gavardinas, C., Beulens, A.J.M.,
Hill, T., Westbrook, R., 1997. SWOT analysis: its time for a product recall. Long 2007. A methodology to support multidisciplinary model-based water
Range Plan. 30, 46–52. doi:10.1016/S0024-6301(96)00095-7. management. Environ. Model. Softw. 22, 743–759.
Hantel, M., Acs, F., 1998. Physical aspects of the weather generator. J. Hydrol. 212– Seibert, J., McDonnell, J.J., 2002. On the dialog between experimentalist and modeler
213, 393–411. in catchment hydrology: use of soft data for multicriteria model calibration.
Hug, T., Benedetti, L., Hall, E.R., Johnson, B.R., Morgenroth, E.F., Nopens, I., Rieger, L., Water Resour. Res. 38. doi:10.1029/2001WR000978.
Shaw, A.R., Vanrolleghem, P.A., 2009. Mathematical models in teaching and Shao, J., Tu, D., 1995. The Jackknife and Bootstrap. Springer-Verlag, New York.
training: mismatch between education and requirements for jobs. Water Sci. Smith, M.B., Georgakakos, K.P., Liang, X., 2004. The distributed model
Technol. 59, 745–753. intercomparison project (DMIP). J. Hydrol. 298, 1–32. doi:10.1016/
Jachner, S., van den Boogaart, K.G., 2007. Statistical methods for the qualitative j.jhydrol.2004.05.001.
assessment of dynamic models with time delay (R Package qualV). J. Stat. Softw. Todini, E., 2007. Hydrological catchment modelling: past, present and future.
22. Hydrol. Earth Syst. Sci. 11, 468–482.
Jolliffe, I.T., Stephenson, D.B., 2003. Forecast Verification: A Practitioner’s Guide in Van der Molen, D.T., Pintér, J., 1993. Environmental model calibration under
Atmospheric Science. John Wiley and Sons, Chichester, ISBN 0-471-49759-2. different specifications: an application to the model SED. Ecol. Model. 68, 1–19.
Kirchner, J.W., 2006. Getting the right answers for the right reasons: linking Welles, E., 2005. Verification of River Stage Forecasts. PhD Dissertation, University
measurements, analyses, and models to advance the science of hydrology. of Arizona.
Water Resour. Res. 42, W03S04. Welles, E., Sorooshian, S., Carter, G., Olsen, B., 2007. Hydrologic verification, a call for
Klemeš, V., 1986. Operational testing of hydrological simulation models. Hydrol. Sci. action and collaboration. Bull. Am. Meteorl. Soc. 88, 503–511. doi:10.1175/
J. 31, 13–24. BAMS-88-4-503.
Krause, P., Boyle, D., Bse, F., 2005. Comparison of different efficiency criteria for Wilks, D.S., 2006. Statistical Methods in the Atmospheric Sciences, 2nd ed.
hydrological model assessment. Adv. Geosci. 5, 89–97. Academic Press, 627 pp.
Laio, F., Tamea, S., 2007. Verification tools for probabilistic forecasts of continuous Wilmott, C.J., Ackleson, S.G., Davis, R.E., Feddema, J.J., Klink, K.M., Legates, D.R.,
hydrological variables. Hydrol. Earth Syst. Sci. 11, 1267–1277. O’Donnell, J., Rowe, M.C., 1985. Statistics for the evaluation and comparison of
Legates, D.R., McCabe, G.J., 1999. Evaluating the use of goodness-of-fit measures in models. J. Geophys. Res. 90, 8995–9005.
hydrologic and hydroclimatic model validation. Water Resour. Res. 35, 233–241.

You might also like