Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Environmental Modelling & Software 79 (2016) 214e232

Contents lists available at ScienceDirect

Environmental Modelling & Software


journal homepage: www.elsevier.com/locate/envsoft

Sensitivity analysis of environmental models: A systematic review


with practical workflow
Francesca Pianosi a, *, Keith Beven f, Jim Freer c, Jim W. Hall d, Jonathan Rougier b,
David B. Stephenson e, Thorsten Wagener a, g
a
Department of Civil Engineering, University of Bristol, UK
b
Department of Mathematics, University of Bristol, UK
c
School of Geographical Sciences, University of Bristol, UK
d
Environmental Change Institute, University of Oxford, UK
e
Department of Mathematics and Computer Science, University of Exeter, UK
f
Lancaster Environment Centre, Lancaster University, UK
g
Cabot Institute, University of Bristol, UK

a r t i c l e i n f o a b s t r a c t

Article history: Sensitivity Analysis (SA) investigates how the variation in the output of a numerical model can be
Received 31 December 2014 attributed to variations of its input factors. SA is increasingly being used in environmental modelling for a
Received in revised form variety of purposes, including uncertainty assessment, model calibration and diagnostic evaluation,
15 January 2016
dominant control analysis and robust decision-making. In this paper we review the SA literature with the
Accepted 2 February 2016
Available online 18 February 2016
goal of providing: (i) a comprehensive view of SA approaches also in relation to other methodologies for
model identification and application; (ii) a systematic classification of the most commonly used SA
methods; (iii) practical guidelines for the application of SA. The paper aims at delivering an introduction
Keywords:
Sensitivity Analysis
to SA for non-specialist readers, as well as practical advice with best practice examples from the liter-
Uncertainty Analysis ature; and at stimulating the discussion within the community of SA developers and users regarding the
Calibration setting of good practices and on defining priorities for future research.
Evaluation © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND
Robust decision-making license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction uncertain inputs or model assumptions (e.g. Paton et al., 2013). The
increasingly common practice in weather and climate science of
Sensitivity Analysis (SA) investigates how the variation in the producing sets (ensembles) of forecasts and simulations (e.g.
output of a numerical model can be attributed to variations of its Stephenson and Doblas-Reyes, 2000; Collins et al., 2012 and ref-
input factors. Within this broad definition, the type of approach, erences therein) can be regarded as a type of SA exercise. Here,
level of complexity and purposes of SA vary quite significantly forecast uncertainty due to the imperfect knowledge of initial
depending on the modelling domain and the specific application conditions is addressed via ensembles of weather forecasts starting
aims. from perturbed initial model states, while the sensitivity of climate
In contexts where very complex simulation models are used, for simulations to model parameters is addressed using perturbed
instance climate or atmospheric sciences, the term SA often refers physics ensembles where simulations are made with different
to a ‘what-if’ analysis where the input factors of the simulation choices of model parameter values.
procedure, e.g. the model parameterization or the forcing scenario, When simulation results can be associated with a summary
are varied one at a time. Typically, the induced variations are scalar variable, for instance a measure of model performance like
assessed by visual comparison of model predictions. The goal is to the sum of squared errors or some aggregate statistic of simulated
verify the consistency of the model behaviour (e.g. Devenish et al., variables, e.g. the mean streamflow, a more formal approach is to
2012) or to assess the robustness of the simulation results to measure sensitivity as the variability induced in such a scalar var-
iable via a set of quantitative sensitivity indices. Depending on
whether output variability is obtained by varying the inputs around
* Corresponding author.
a reference (nominal) value, or across their entire feasible space, SA
E-mail address: [email protected] (F. Pianosi). is either referred to as local or global. Local SA applications typically

http://dx.doi.org/10.1016/j.envsoft.2016.02.008
1364-8152/© 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232 215

consider model parameters as varying inputs, and aim at assessing classification system of SA methods with a short description of the
how their uncertainty impacts model performance, i.e. how model underlying key assumptions, scope of application, advantages and
performance changes when moving away from some optimal or limitations of each class of methods. Finally, Section 4 illustrates
reference parameter set. Partial derivatives or finite differences are and discusses our proposed workflow for the application of SA.
used as sensitivity indices in the context of local approaches (e.g. Section 3 and 4 build on some initial thoughts presented in the
Hill and Tiedeman, 2007). The spatio-temporal evolution of local conference paper by Pianosi et al. (2014), however, both the clas-
sensitivity can also be investigated by adjoint methods (e.g. Vautard sification system and the workflow have been significantly
et al., 2000) or algebraic SA (Norton, 2008). expanded and improved with respect to the earlier version dis-
Global SA applications may consider model parameters but also cussed in that conference paper.
other input factors of the simulation procedure, for instance the
model's forcing data (e.g. Hamm et al., 2006) or its spatial resolu- 2. Conceptualization
tion (e.g. Baroni and Tarantola, 2014) simultaneously. Different
types of sensitivity indices can be used, ranging from correlation 2.1. Definition of model, input factors and outputs
measures between inputs and output to statistical properties of the
output distribution, e.g. its variance, and many others. Since In this paper we use the term model to refer to a numerical
analytical computation of these indices is impossible for most procedure (often implemented in a computer program) that sim-
models, sensitivity indices are usually approximated from a sample ulates the behaviour of an environmental system, for instance by
of inputs and output evaluations. Global SA is used for a range of solving a set of algebraic equations (static model) or integrating
very diverse purposes, including: to support model calibration, differential equations over a spatial-temporal domain (dynamic
verification, diagnostic evaluation or simplification (e.g. Sieber and model). We call input factor any element that can be changed before
Uhlenbrook, 2005; Harper et al., 2011; Nossent et al., 2011; Kelleher model execution, and output a variable that is obtained after the
et al., 2013; Shin et al., 2013; Butler et al., 2014); to prioritize efforts model execution. Examples of input factors are the parameters
for uncertainty reduction (e.g. Hamm et al., 2006); to analyse the appearing in the model equations, the initial states, the boundary
dominant controls of a system (e.g. Pastres et al., 1999); to support conditions or the input forcing data of a dynamic model; as well as
robust decision-making (e.g. Nguyen and de Kok, 2007; Singh et al., non-numerical factors like the model equations themselves or, in
2014; Anderson et al., 2014). the case of dynamic models, the time/spatial grid resolution for
In this paper we provide a systematic review and structuring of numerical integration. For dynamic models, the term ‘output’
the SA literature across different environmental modelling domains usually does not refer to the entire range of temporal and spatial
with three specific objectives: variables produced by the model simulation, but to a summary
variable that is obtained by a scalar function of the simulated time
1. To provide a comprehensive view of SA purposes and ap- series. Using the terminology proposed by Shin et al. (2013), we can
proaches by clarifying terminology (e.g. quantitative versus distinguish two types of scalar functions:
qualitative, local versus global, one-at-a-time versus all-at-a-
time) and by discussing the connections between SA and  objective functions (also called loss or cost functions), which are
other methodologies for model identification and application measures of model performance calculated by comparison of
(e.g. uncertainty analysis, model calibration and diagnostic modelled and observed variables (for instance, the Root Mean
evaluation, model-based decision-making, emulation model- Squared Error);
ling). The goal is to illustrate the broad spectrum of aims for  prediction functions, which are scalar values that are provided to
which SA can be used, and thus stimulate its effective use in the the model-user for their practical use (for instance, the value of a
environmental modelling community. variable at given time in given location, or its average over a
2. To provide a systematic review of the SA approaches most spatial and temporal domain), and that can be computed even in
widely used in environmental modelling. The goal here is the absence of observations.
twofold: to provide non-expert readers with a broad enough
background to engage with the SA literature while suggesting Fig. 1 gives a practical example of possible inputs and outputs of
references for further reading; and to propose a classification SA in the case of a dynamic simulation model. While the aggrega-
system to support SA users in the choice of the most appropriate tion of temporally and/or spatially distributed variables into a
SA method depending on the characteristics of their case study. scalar output can induce a significant loss of information, such a
3. To provide practical guidelines for the application of SA. To this loss can be recovered by considering multiple definitions of the
end, we develop a workflow for the application of SA and discuss summary output or analysing the temporal or spatial patterns of
the key choices that SA users must make at each step of this the output sensitivity. This issue will be further discussed in Section
workflow. We also provide practical suggestions on how to make 4.1.
these choices, how to assess their impacts on SA results and how Given the above definitions, we can assume for the purposes of
to revise them, with good practice examples from the literature. this paper that one can always resort to the general formulation:

The paper is intended for a broad audience including re- y ¼ gðxÞ ¼ gðx1 ; x2 ; …; xM Þ (1)
searchers and practitioners who want to gain a general introduc-
tion to SA purposes and approaches, and to obtain practical advice where y is the output, x ¼ ½x1 ; x2 ; …; xM  is the vector of input fac-
on SA applications with best practice examples from the literature. tors, which belongs to the input variability space X , and g is the
The paper also aims at stimulating the discussion within the function that maps the input factors into the output. This
community of SA developers and users on good practice in SA inputeoutput relation is sometimes referred to as response surface
application and on setting priorities for future research. or model's response, rather than ‘model’, to avoid confusion with the
The paper is divided into three main sections that reflect the underlying simulation model which, as stated earlier, might have
three objectives discussed above. Section 2 introduces common more inputs and outputs than x and y (see again Fig. 1). As the
definitions and concepts used in the SA literature and clarifies the model's response function g is hardly ever available in analytic
link between SA and related topics. Section 3 illustrates our form, we will assume hereafter that a numerical procedure is
216 F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232

Fig. 1. Example of input factors and output definition for the SA of a (dynamic) flood inundation model.

available to evaluate it for any given combination of input factor more quantitative analysis.
values.
2.2.3. One-At-a-Time (OAT) and All-At-a-Time (AAT)
2.2. Types of Sensitivity Analysis Another distinction often made is between ‘One-[factor]-At-a-
Time’ (OAT) methods and what we propose to call ‘All-[factors]-At-
Sensitivity analysis investigates how the variation in the output a-Time’ (AAT) methods. This distinction refers to the sampling
y can be attributed to variations in the different input factors x1 , x2 , strategy used to estimate the sensitivity indices. In fact, in general,
…, xM . Typical questions addressed by SA are: What input factors sensitivity indices cannot be computed analytically due to the
cause the largest variation in the output? Is there any factor whose complexity of the inputeoutput relationship of Eq. (1) and thus
variability has a negligible effect on the output? Are there in- they are numerically approximated from a sample of input factors
teractions that amplify or dampen the variability induced by indi- and associated output evaluations (sampling-based SA from now on,
vidual factors? We can distinguish different types of sensitivity see also Fig. 2). The distinction between OAT and AAT methods is
analysis depending on how these questions are formulated and based on the approach adopted to select input samples.
addressed. Specifically:

2.2.1. Local and Global SA  In OAT methods, output variations are induced by varying one
Local sensitivity analysis considers the output variability against input factor at a time, while keeping all others fixed.
variations of an input factors around a specific value x, while global  In AAT methods, output variations are induced by varying all the
sensitivity analysis (or GSA) considers variations within the entire input factors simultaneously, and therefore the sensitivity to
space of variability of the input factors. The application of local SA each factor considers the direct influence of that factor as well as
obviously requires the user to specify a nominal value x for the the joint influence due to interactions.
input factors. While GSA overcomes this possible limitation, it still
requires specifying the input variability space X . When the latter is While local SA typically uses OAT sampling, global SA can use
poorly known, the conclusions drawn from GSA should be taken either OAT or AAT strategies. In general, AAT methods provide a
with care. better characterization of interactions between input factors, and
some of them (for instance, the variance-based methods described
2.2.2. Quantitative and Qualitative SA in Section 3.5) allow the user to analyse interactions between
We use the term quantitative SA to refer to methods where each specific combinations (pairs, triples, etc.) of factors. OAT methods
input factor is associated with a quantitative and reproducible do not provide such detailed insights although some methods, for
evaluation of its relative influence, normally through a set of instance the EET described in Section 3.2, can give an indication on
sensitivity indices (or ‘importance measures’). In qualitative SA, whether interactions matter or not. The drawback of AAT methods
instead, sensitivity is assessed qualitatively by visual inspection of is that they typically require more extensive sampling and there-
model predictions or by specific visualization tools like, for fore a higher number of model evaluations (see further discussion
instance, tornado plots (e.g. Howard, 1988; Powell and Baker, 1992), in Sections 4.5 and 4.6).
scatter (or dotty) plots (e.g. Beven, 1993; Kleijnen and Helton,
1999a) or representations of the posterior distributions of the 2.2.4. Purposes (settings) of SA
input factors (e.g. Freer et al., 1996, see also Section 3.4 and Following Saltelli et al. (2008), we distinguish the following
Appendix A). Often such visual tools are used complementary to a three purposes (or ‘settings’ in their terminology):
F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232 217

Fig. 2. The three basic steps in sampling-based Sensitivity Analysis, with an example of qualitative or quantitative results produced by the post-processing step.

 Ranking (or Factor Priorization) aims at generating the ranking of Freer, 2001) was derived from the basic idea of Regional Sensi-
the input factors x1, x2,…, xM according to their relative contri- tivity Analysis (see Section 3.4). In practice, GSA and UA often offer a
bution to the output variability. valuable complement to each other: when performing GSA, UA
 Screening (or Factor Fixing) aims at identifying the input factors, should be used to verify that the output variability captured by
if any, which have a negligible influence on the output sensitivity indices falls within the range of ‘acceptable’ model
variability. behaviour (see further discussion in Section 4.3); conversely, dur-
 Mapping aims at determining the region of the input variability ing UA, the estimation of sensitivity indices adds little computing
space that produces significant, e.g. extreme, output values. effort while offering potentially valuable extra insights.

The purpose of SA defines the ultimate goal of the analysis. It


2.4. SA and model calibration
therefore guides the choice of the appropriate SA method since
different methods are better suited to address different questions.
Sensitivity Analysis is also closely connected to the process of
Although SA is most commonly used for the three purposes above,
model calibration. By ‘model calibration’ we mean here the process
our list is not exhaustive and other SA settings have been proposed.
of estimating the model parameters by maximizing the model fit to
For instance the direction (or sign) of change is a question that can be
(or at least consistency with) observations. SA can be used to
addressed by SA (e.g. Anderson et al. (2014)). Another question is
support and complement a model calibration exercise by providing
the presence of interactions between input factors. These aspects
insights on how variations in the uncertain parameters (the input
will be further discussed in Section 3. In the remainder of this
factors x) map onto variations of the performance metric (the
Section, instead, we will discuss the links between SA and other
output y) that measures the model fit. When an ‘optimal’ parameter
related methods that can support the identification and assessment
estimate x has been found, local SA can be used to investigate the
of environmental models.
uncertainty of such a parameterization: high local sensitivity to a
parameter indicates high accuracy of its optimal estimate, while
2.3. SA and uncertainty analysis low sensitivity suggests that the parameter is poorly identified and
uncertainty is large (an example is given by Sorooshian and Farid,
When used for uncertainty assessment of numerical models, 1982). A rigorous mathematical interpretation is available for the
Sensitivity Analysis, and in particular global SA (GSA), is closely case when the output y is the mean squared error and gradient-
related to Uncertainty Analysis. Some authors (e.g. Saltelli et al., based local sensitivity (see Section 3.1) is an approximation of the
2008), suggest that the discrimination is that UA focuses on curvature (Hessian matrix) of y evaluated at x (for practical exam-
quantifying the uncertainty in the output of the model, while GSA ples see for instance Sorooshian and Gupta (1985) or the PEST
focuses on apportioning output uncertainty to the different sources approach by Moore and Doherty (2005)). Most established
of uncertainty (input factors). While different in focus and objec- analytical parameter-estimation methods for linear-in-parameters
tives, UA and GSA often use similar mathematical techniques. The models (e.g. prediction-error method or generalized least squares
‘forward’ propagation of uncertainty by Monte Carlo simulation, and its variations) provide such local sensitivity information jointly
which is commonly employed in many UA methodologies (e.g. with optimal parameter estimates (Ljung, 1999). SA is closely
Vrugt et al., 2009 or Beven and Freer, 2001) is also used to perform related to Identifiability Analysis (IA), which asks if parameters of a
the initial steps of sampling-based GSA (Fig. 2). Some UA and GSA given model can be (uniquely or adequately) estimated from the
methods have been developed in close relation to each other: for available set of inputs and outputs.
instance the GLUE strategy for uncertainty analysis (Beven and While local SA usually follows the model calibration exercise,
218 F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232

global SA and model calibration are interlaced in a more complex, recommended practice to assess the robustness of the assessment
often iterative way. A model calibration based on the equifinality (and thus of the final decision) with respect to uncertain model
principle (Beven and Freer, 2001) can be used prior to GSA in order inputs or assumptions (e.g. EC, 2009; EPA, 2009). Meaning that we
to constraint the input variability space X , e.g. by finding param- can “ascertain if the inference of a model-based study is robust or
eter ranges that produce an acceptable level of model performance fragile in light of the uncertainty in the underlying assumptions”
(e.g. Freer et al., 1996). On the other hand, GSA can be used before Saltelli and D'Hombres (2010). However, SA can be applied to learn
calibration of a computationally intensive model in order to: (i) not only about models but also about systems. If the model
identify parameters that have no influence on the model fit to reasonably reflects real-world processes, the application of SA to
observations and therefore can be ignored during refined calibra- the model can provide insights into the dominant controls of the
tion (e.g. van Werkhoven et al., 2009); (ii) investigate the param- system. These insights can be used in turn to support decision-
eters’ influence and interactions in the regions of the parameter making by addressing questions like: what is the relative influ-
space associated with higher model performance, and thus provide ence of different drivers e those that can be altered by the system
the knowledge base for a more efficient local-search calibration in managers and those that cannot - on the system response? What
those regions (e.g. Spear et al., 1994); (iii) assess the potential for are critical values of the system drivers that induce threshold ef-
and limitations of model calibration given other uncertainty sour- fects in the decision objectives? An early application of this type is
ces besides parameters, e.g. measurement errors in the observa- reported in Pastres et al. (1999), who apply SA to a shallow water
tions or in the model forcing data (e.g. Baroni and Tarantola, 2014). system to estimate the interactions between controllable system
In the latter case, the insights provided by GSA can help to set drivers (e.g. nitrogen load) and uncontrollable ones (e.g. dispersion
priorities for future efforts, for instance by investing in more so- or reaeration coefficients) in determining dramatic events such as
phisticated and computationally demanding calibration techniques anoxic crisis. More recently SA has been proposed as a tool for
or by first improving the quality of the data. ‘bottom-up’ or ‘vulnerability-based’ approaches for dealing with
decision-making problems under large (and often unknown levels
2.5. SA and model diagnostic evaluation of) uncertainties (Wilby and Dessai, 2010) like for example climate
projections uncertainties. In such instances, Sensitivity Analysis,
In cases where observations are affected by large uncertainties, and in particular mapping methods of input factors, can be used to
due to observational errors, pre-processing errors, spatial averaging, explore the space of possible variability of the system drivers, for
etc., it might be hard to corroborate or reject a model based on some instance climate or socio-economic drivers like land use, demand
performance metric alone. Then, the modeller may also want to for natural resource, etc., and isolate combinations that would
verify the consistency of the model behaviour with his/her exceed vulnerability thresholds (Lempert et al., 2003); or to
perception of the real-world system (Wagener and Gupta, 2005). A quantify links between the vulnerability of a system (e.g. a catch-
model would be considered consistent if, for example, the param- ment) and its properties (e.g. climate, hydrology, see for instance
eters that control its response at a particular time or place are Prudhomme et al. (2013)). More widely employed mapping
representative for physical processes that are also expected to methods include the Patient Rule Induction Method (PRIM) and
dominate in reality. Being confident that the modelled controls are Classification And Regression Trees (CART) (Lempert et al., 2008).
in line with our perceptions is particularly important if the model Applications of SA for this purpose are far less numerous than those
will be applied outside the range of variability of the calibration data for uncertainty investigation and model calibration. However, they
(e.g. at different sites or for long term projections under nonsta- are increasingly investigated, see for example Brown et al. (2011),
tionary conditions). It is often difficult to predict when and where a Singh et al. (2014) and references therein.
specific parameter will have a significant influence on the simula-
tion results when dealing with complex environmental models with 2.7. SA and emulators
many interacting components. Modified SA techniques have been
used to formally address the question in what has in recent years An emulator (or emulation model, or surrogate model) is a
been referred to as ‘diagnostic model evaluation’ (Gupta et al., 2008). computationally efficient model, e.g. a polynomial or some other
For instance, Sieber and Uhlenbrook (2005), Reusser and Zehe algebraic relation, that is calibrated over a (small) dataset obtained
(2011) and Herman et al. (2013) used time-varying and spatially- by simulation of a computationally demanding model, and that can
varying SA (see also Section 4.1) to quantify the temporal or be used in its place during computationally expensive tasks. In the
spatial patterns of the output sensitivity to model parameters and context of SA, emulators can be used to obtain faster evaluations of
therefore verify the model structure, i.e. assuming that different the model's response (Eq. (1)) and therefore allow for applying
model components should be active during different system states. computationally demanding SA methods to complex simulation
Similarly, the parameter screening provided by SA indicates whether models. For specific choices of the emulator structure and the SA
there are ‘unnecessarily’ represented processes in the model and method, emulators can provide analytical solutions to compute
thus identify potential for model simplification, i.e. processes that sensitivity indices. For example Sudret (2008) presents an approach
are never activated in the model (e.g. Demaria et al., 2007). The where generalized polynomial chaos expansions (PCE) are used as
modeller has to decide though whether this problem could be emulators and variance-based sensitivity indices (see Section 3.5)
caused by limited calibration data variability and whether there is a are computed analytically as a post-processing of the PCE co-
potential for future, maybe more extreme, conditions to still trigger efficients. On the other hand, the use of emulators poses a number
these processes (Gupta et al., 2008; Yilmaz et al., 2008). of numerical challenges related to their calibration and validation.
In fact, the validity of an emulator relies on the assumption that the
2.6. SA, dominant controls analysis and robust decision-making samples used for its identification are sufficiently representative of
the behaviour of the original simulation model and for the intended
So far, we have discussed SA as a tool to investigate the propa- model application, an assumption that is difficult if not impossible
gation of uncertainty through a numerical model and to under- to verify. The identification and use of emulators for SA is the topic
stand the model's intrinsic behaviour. Along the same lines, when of a wide literature, whose review falls outside the scope of this
simulation models are applied to anticipate the effects of man- paper. The interested reader is referred to Forrester et al. (2008) for
agement actions and thus support decision-making, SA is a a general introduction to emulation modelling, and Ratto et al.
F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232 219

(2012) for a review of its application in SA. rough idea of the number of model evaluations required by each
class of methods. More discussion of the computational complexity
issue is given in Section 4.5. The remainder of this Section is
3. Systematic review of SA methods
dedicated to a short description of the five classes of methods, their
working principles, and their advantages and limitations. The
In this section we propose a systematic classification of SA
mathematical notation used throughout the Section is summarised
methods. This review does not aim at providing an exhaustive list
in Table 1. We intentionally do not provide excessive mathematical
of all the available SA methods, which would be hardly feasible and
details on the mechanics of the various SA methods, and refer the
likely become obsolete in a short while. Rather, we group the
reader to the cited literature. A good complement of this review in
methods most widely used in the environmental modelling domain
this regard are the introduction to sensitivity assessment of simu-
into 5 broad classes, based on their underlying concept, which
lation models by Norton (2015), the literature reviews (with a focus
reflect different assumptions, working principles and objectives. In
on the chemical modelling literature) by Saltelli et al. (2005, 2012)
this sense our review is ‘systematic’ and hopefully open to
and the review of recent methodological advances by Borgonovo
encompass methods that we do not cite here explicitly as well as for
and Plischke (2016).
future developments within each class. The reviewed SA methods
are then placed within this classification system (shown in Fig. 3)
that can be used as an operational tool to guide the choice of the 3.1. Perturbation and derivatives methods
most appropriate SA method for a problem at hand, depending on:
The simplest type of SA varies (perturbs) the input factors of the
 the specific SA purpose (screening, ranking or mapping, as simulation model from their nominal values one at a time (OAT)
described in Section 2.2) that each method can address; and assesses the impacts on the simulation results via visual in-
 the method's computational complexity, measured by the spection, for instance by pair-comparison of the time series (or
number of model evaluations required in its application. spatial patterns) of simulated variables under nominal and per-
turbed inputs (e.g. Devenish et al., 2012 and Paton et al., 2013). If a
We emphasize the role of computational complexity because scalar output variable y can be defined, a more formal approach is
sampling-based methods requiring large sample sizes can be to measure the output sensitivity to the i-th input factor by the
impossible to apply to models with long run time and/or those partial derivative vg=vxi evaluated at the nominal value of the
producing large input/output data files. In Fig. 3, we provide a factors x, or by the finite-difference gradient if the inputeoutput

Fig. 3. Classification system of Sensitivity Analysis methods based on computational complexity (vertical axis; M is the number of input factors subject to SA) and purposes of the
analysis. Some of the most widely used methods are reported (acronyms are defined in corresponding paragraphs of Sec. 3). Types of SA and sampling approaches are defined in Sec.
2.2. Figures about computational complexity are indicative, for a further discussion see Sec. 4.5.
220 F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232

Table 1
List of symbols used and their meaning.

E Expected value
f Probability Density Function (PDF)
F Empirical Cumulative Distribution Function (CDF)
g Relationship between the model's inputs and output investigated by SA (or model's response), as defined by Eq. (1)
M Number of input factors subject to SA
N Sample size (and thus number of model evaluations) in sampling-based SA
n Base sample size for variance-based sensitivity estimators (Saltelli et al., 2010)
r Number of local derivatives in multiple-starts perturbation methods
SD Standard Deviation
Si Sensitivity index of the i-th input factor
V Variance
x Vector of M input factors subject to SA
x Nominal value of x for local SA
X Variability space of x for global SA
xi i-th input factor subject to SA
y (Scalar) Model output
Yb ðYnb Þ Set of behavioural (non-behavioural) output samples in Regional Sensitivity Analysis

function g of Eq. (1) is not differentiable at x. Derivative-based SA


finds its rationale in the Taylor series expansion. This is well 1X r
explained in Helton (1993) and generalized later on in Borgonovo Si ¼ EEj
r
(2008). In order to facilitate a comparison of sensitivities across j¼1
   
input factors that may have different units of measurements, the j j j
r g x ; …; x þ D ; …; x
j j j j
1X 1 i i M  g x1 ; …; xi ; …; xM
partial derivatives are usually rescaled (e.g. Hill and Tiedeman, ¼ j
ci (4)
2007). The sensitivity measure for the i-th input factor thus takes r j¼1 D i
the form
Besides the above sensitivity measure, it is common practice to

vg  also compute the standard deviation of the EEs, which provides
Si ðxÞ ¼ c (2)
vxi x i information on the degree of interaction of the i-th input factor
with the others. A high standard deviation indicates that a factor is
where ci is the scaling factor. Given that the functional relation of interacting with others because its sensitivity changes across the
Eq. (1) is rarely known in analytic form, partial derivatives are variability space. An alternative measure proposed by Campolongo
usually approximated by finite differences, i.e. and Saltelli (1997) takes the absolute value of the finite differences
to avoid that differences of different signs would cancel out.
gðx1 ; …; xi þ Di ; …; xM Þ  gðx1 ; …; xi ; …; xM Þ Borgonovo (2010) present a method where, at the additional cost of
b
S i ðxÞ ¼ ci (3)
Di M þ 1 model evaluations per EE, one can estimate whether the
response of the model is predominantly additive or governed by
Using an approximation of Eq. (3), the computation of the interactions.
sensitivity measures for M factors requires M þ 1 model evalua- As for the sampling strategy to select the points xj ðj ¼ 1; …; rÞ
tions. Derivative-based sensitivity measures are therefore compu- and the input variations Di , different approaches have been pro-
tationally very cheap, with the drawback that they provide posed. The sampling strategy originally proposed by Morris (1991)
information about local sensitivity only. Second derivatives can be builds r trajectories in the input space, each composed of M þ 1
estimated with a relatively small number of additional model points. The starting point of each trajectory is randomly selected
evaluations, thus providing information about local interactions over a uniform grid and the subsequent M points are obtained by
between input factors. For more details on this issue see Norton moving one factor at a time of a fixed amount D, so that each tra-
(2015). jectory allows for evaluating one EE per factor. The user has to
specify the “number of levels” L, which determines the grid size
(equal to 1=ðL  1Þ of the range of variability of the input factor) and
3.2. Multiple-starts perturbation methods the size of the variation D (equal to L=ð2ðL  1ÞÞ). Typical values for
L ranges from 4 to 8, which means that D ranges from 4/6 ¼ 0.76 to
A global extension of the perturbation approach is to compute 8/14 ¼ 0.57 of the range of variability. Therefore, with this setup the
output perturbations from multiple points xj within the feasible EEs capture finite and rather large perturbations. On the one hand
input space, and to measure the global sensitivity by aggregating this avoids the risk of focussing only on very local behaviours of the
these individual sensitivities. Methods falling under this category model's response g (Eq. (1)). On the other hand it can produce
differ from each other in one or more of the following aspects: (i) misleading results if g is highly non-smooth and the characteristic
whether they use finite differences directly, or some trans- length of its variations is much smaller than D.
formation such as their absolute or squared values; (ii) how they Several variants of the sampling strategy by Morris have been
select the fixed points xj and the length of the finite variation Di to proposed, including the LH-OAT approach proposed by van
perturb the i-th input factor (design strategy); (iii) how they Griensven et al. (2006), where the starting points of each trajec-
aggregate individual sensitivities. tory are generated by Latin-Hypercube sampling rather than
The most established method of this type is the method of random sampling over a grid; and the approach by Campolongo
Morris (Morris, 1991), also called the Elementary Effect Test (EET et al. (2007), where a high number of trajectories are generated
(Saltelli et al., 2008)). Here, the mean of r finite differences (also and a subset of r trajectories is selected so to maximise the overall
called ‘Elementary Effects’ or EEs) is taken as a measure of global spread over the input space. A different approach to OAT sampling
sensitivity, i.e. is the radial-based design, where the variations Di are all taken
F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232 221

starting from the same (randomly selected) point in the input GSA in an application where multiple model outputs need to be
space. Campolongo et al. (2011) show that radial-based design accounted for simultaneously.
provides several advantages in terms of efficiency and integration Regression analysis methods instead derive the sensitivity
with subsequent AAT sensitivity analysis. The interested reader is measure as a ‘byproduct’ of regression analysis applied to the input/
referred to that paper and references therein for a discussion of output sample. The simplest and most widely used method is linear
different OAT sampling strategies. regression. Here, a linear relationship y ¼ ai þ bi xi is assumed and
For all of these sampling strategies, the computation of the the linear least-squares estimate of the regression coefficient bi is
mean (and standard deviation) of the EEs of M input factors re- the sensitivity measure. The Standardised Regression Coefficients
quires rðM þ 1Þ model evaluations, a requirement that is far lower (SRC) are used when input factors have different units of mea-
than the majority of AAT global approaches. Therefore, the EET is surement, i.e.
often used when the computing time of a single model run is high,
or when the number of factors is very large. The EET is particularly SDðxi Þ
Si ¼ bi (6)
suitable for screening, i.e. to detect non-influential factors that can SDðyÞ
be discarded from a subsequent, more time-consuming global SA
(see for instance Nguyen and de Kok, 2007), and for ranking. where SD stands for standard deviation. Multiple linear regression
Other multiple-start perturbation approaches use squared finite can be used to obtain the sensitivities to all the individual input
differences, which allow a link to be established with the variance- factors at once. The advantage of linear regression is that it can be
based SA approach discussed in Section 3.5. For instance, Sobol' and easily applied to small datasets, however it can be inadequate if the
Kucherenko (2009) suggest use of the mean of the squared finite inputeoutput relationship is non-monotonic or strongly nonlinear
differences and demonstrate that it provides an upper bound on the (e.g. Hall et al., 2009).
total-order variance-based sensitivity index (see Section 3.5). This A particularly interesting class of nonlinear regression methods
sensitivity measure is especially suitable for screening since a small in the context of Sensitivity Analysis is that of Classification And
value of the measure implies that the input factor is non-influential, Regression Trees (CART, for application examples see e.g. Harper
while the same authors show that it may give false conclusions if et al. (2011) and Singh et al. (2014)). CART provides several ad-
used for ranking. Along a similar line of reasoning is the DELSA vantages, including that they can easily handle non-numerical in-
approach (Distributed Evaluation of Local Sensitivity Analysis) by puts and outputs, and that they can be used for both ranking and
Rakovec et al. (2014), which also uses the squared finite differences mapping.
as a measure of sensitivity (scaled by the ratio between the a priori
input variance and the total output variance). Here, local sensitivities
3.4. Regional Sensitivity Analysis (or Monte-Carlo filtering)
computed at different sampling points are not aggregated but their
full frequency distribution is analysed, and if aggregated, the median
Regional Sensitivity Analysis (RSA), also called Monte Carlo
value is used rather than the mean. Another difference that is worth
filtering, is a family of methods mainly aimed at identifying regions
mentioning with respect to the EET is that in the DELSA approach
j in the inputs space corresponding to particular values (e.g. high or
the input variation D is set to 0.01 of the fixed value xi , so that finite
low) of the output, and that can be used for mapping and for
differences can be regarded as approximating local derivatives.
dominant controls analysis. The idea was first proposed and
investigated in Young et al. (1978) and Spear and Hornberger
3.3. Correlation and Regression analysis methods (1980). Here, the input samples (typically parameters) are divided
into two binary sets, ‘behavioural’ and ‘non-behavioural’, depend-
The underlying idea of these methods is to derive information ing on whether the associated model simulation exhibits the ex-
about output sensitivity from the statistical analysis of the input/ pected pattern of state variable response or not. Another way to
output dataset generated by Monte Carlo simulation. Early works in apply RSA is by splitting input samples depending on whether the
the field are Iman and Helton (1988) (mainly on regression anal- associated output is above or below a prescribed threshold. Then,
ysis) and Saltelli and Marivoet (1990) (on correlation methods). An the two input sets are compared to gain insight on the model
introduction and review of these approaches are given e.g. in behaviour and mapping. For example, QeQ plots can be used to
Kleijnen and Helton (1999a), Helton and Davis (2002) and Storlie et compare behavioural versus non-behavioural samples. Another
al. (2009). common analysis is to over-plot the marginal empirical cumulative
Correlation methods use the correlation coefficient between the distribution functions (CDF) of the behavioural and non-
input factor xi and the output y as a sensitivity measure, i.e. behavioural sets. Visual inspection of these distributions provides
information on factor mapping, for instance by highlighting a
Si ¼ correlationðxi ; yÞ (5) reduction in the variability range for behavioural inputs. The
divergence between the two distributions, for example measured
Several different definitions of correlation can be used,
by the KolmogoroveSmirnov statistic, can be used as a sensitivity
including Pearson correlation coefficient (CC) and partial correla-
index, i.e.
tion coefficient (PCC), which apply when a linear relationship exists
between the input factors x and the output y, and the Spearman  
 
rank correlation coefficient (SRCC) or partial rank correlation co- Si ¼ maxFxi jyb ðxi jy2Yb Þ  Fxi jynb ðxi jy2Ynb Þ (7)
xi
efficient (PRCC), which can be used for nonlinear but monotonic
relationships (e.g. Pastres et al., 1999). The choice among these where Fxi jyb and Fxi jynb are the empirical cumulative distribution
different alternatives depends on the degree of acceptability of the functions of xi when considering input samples associated with
linearity and/or monotonicity assumption between inputs and behavioural and non-behavioural outputs respectively (i.e. falling
output. An informal though effective way to assess this is through in the subsample Yb /Ynb of behavioural/non-behavioural outputs).
visual inspection of the input/output sample, for instance using The advantage of using empirical distribution functions is that they
scatter plots. More sophisticated correlation methods can be used usually provide a robust approximation of the underlying distri-
to address specific needs. For example, Minunno et al. (2013) bution even if computed over small samples. This may happen for
demonstrate the use of Canonical Correlation Analysis (CCA) for instance with overparameterised models where behavioural
222 F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232

parameterisations are confined to small sub-regions of the sensitivity indices of intermediate order can also be defined: for
parameter space and therefore the size of the behavioural set might instance, second-order indices measure the contribution to output
be very small even if starting from a large number of model sim- variance from pairs of factors; third-order indices from factor tri-
ulations (Norton, 2015). However, while useful for ranking, the ples; etc. These indices can be used to analyse interactions between
sensitivity measure of Eq. (7) cannot be used for screening. In fact, a specific groups of input factors. An effective account of the devel-
value of zero of the above index is a necessary but not sufficient opment of variance-based indices and their connections to earlier
condition for insensitivity because input factors contributing to works on ‘importance measures’ (e.g. Iman and Hora (1990)) can be
output variability only through interactions may have the same found in Borgonovo (2007).
behavioural and non-behavioural distribution functions (see for An interesting property of first-order and higher-order indices is
instance the example given in Section 5.2.3 of Saltelli et al. (2008)). that they are related with the terms in the variance decomposition
One advantage of this approach is that it can be applied to any type of the model output (Sobol', 1993), which “reflects the structure of
of model output, including non-numerical ones, as long as a splitting the model itself” (Oakley and O'Hagan, 2004) and holds under
condition can be defined and verified, possibly also by qualitative relatively broad assumptions, the strongest one being that input
evaluation. On the other hand, the use of a splitting criterion can be a factors are independent. In the presence of correlations among the
limitation whenever the discrimination between behavioural and input factors, instead, the tidy correspondence between variance-
non-behavioural outputs is not clear-cut. For instance, RSA has been based indices and model structure is lost (see e.g. discussion in
widely used in applications where the model output is an objective Oakley and O'Hagan (2004)) and counterintuitive results may be
function (i.e. a measure of the model accuracy against observations) obtained. For example one might observe total-order indices
and the splitting criterion reflects the achievement of a minimum smaller than first-order ones for negative correlations or total-
requirement of model performance (e.g. Freer et al., 1996; Sieber and order indices tending to zero as correlation grows to unity
Uhlenbrook, 2005). The definition of the threshold value at which the (Kucherenko et al., 2012). The mechanism of output variance
model performance is deemed acceptable is usually a subjective decomposition and the link to variance-based sensitivity indices
choice by the modeller. The problem can be especially difficult when are also discussed in Norton (2015).
the scalar model output is a predictive function, unless there exists a Another reason for the popularity of the first-order and total-
threshold value that has a specific meaning for the model users (for order indices is that they are relatively easy to implement since
instance a regulatory threshold value for an environmental variable). several closed-form algebraic equations exist for their approxima-
To overcome the issue and apply RSA without specifying thresholds, tion. For a review of these estimators in the case of independent
one option is to group the ranked output samples into a prescribed input factors, see Saltelli et al. (2010); for an extension to the case of
number, e.g. 10, of equally spaced intervals, and compare the 10 dependent inputs, see Kucherenko et al. (2012). However, the
resulting distribution functions of the input factors (Freer et al., 1996; sample size required to achieve reasonably accurate approxima-
Wagener et al., 2001). For an application example and discussion see tions can be rather large (as further discussed in Section 4.5), which
also Tang et al. (2007b). severely affects the applicability of this approach to time-
consuming models. Several methods were proposed to reduce the
number of model evaluations in the approximation of variance-
3.5. Variance-based methods based indices. These include: (i) methods using the Fourier series
expansion of the model output y, like the Fourier Amplitude
Variance-based SA relies on three basic principles: (i) input Sensitivity Test (FAST (Cukier et al., 1973)) for the approximation of
factors are regarded as stochastic variables so that the model in- the first-order indices, and the extended FAST (Saltelli et al., 1999)
duces a distribution in the output space; (ii) the variance of the for the total-order indices (for an introduction to these techniques,
output distribution is a good proxy of output uncertainty; (iii) the see Norton (2015)); and (ii) methods using an emulator like the
contribution to the output variance from a given input factor is a approach by Oakley and O'Hagan (2004).
measure of sensitivity. Besides computational aspects, another limitation of variance-
Several variance-based indices can be defined. First-order indices based indices is that, by relying on the implicit assumption that
(or ‘main effects’) measure the direct contribution to the output variance can fully capture uncertainty, they can be inappropriate
variance from individual input factors or, equivalently, the expected when the output distribution is multi-modal or highly-skewed and
reduction in output variance that can be obtained when fixing a the variance is therefore not a meaningful indicator. This issue is
specific input, i.e. discussed in the next section.

Vxi ½Exi ðyjxi Þ VðyÞ  Exi ½Vxi ðyjxi Þ


SFi ¼ ¼ (8) 3.6. Density-based methods
VðyÞ VðyÞ
The limitations of the variance-based approach have stimulated
where E denotes expected value, V denotes the variance, and xi
a number of studies on ‘moment-independent’ sensitivity indices
denotes “all input factors but the i-th”. The total-order indices (or
that do not use a specific moment of the output distribution to
‘total effects’) introduced by Homma and Saltelli (1996) measure
characterize uncertainty and therefore are applicable indepen-
the overall contribution from an input factor considering its direct
dently of the shape of the output distribution. These methods are
effect and its interactions with all the other factors, which might
sometimes referred to as ‘density-based’ methods because they
amplify the individual effects, i.e.
look at the Probability Density Function (PDF) of the model output,
Exi ½Vxi ðyjxi Þ Vx ½Ex ðyjxi Þ rather than its variance alone.
STi ¼ ¼ 1  i i (9) The key idea is to measure sensitivity through the variations in
VðyÞ VðyÞ
the output PDF that are induced when removing the uncertainty in
Total-order indices are particularly suitable for screening one input factor. In practice this is done by computing the diver-
because a value of zero of the total-order index is a necessary and gence between the unconditional output PDF, which is generated
sufficient condition for a factor to be non-influential. First-order by varying all factors, and the conditional PDFs that are obtained
indices are often used for ranking, especially if interactions are not when fixing individual input factor in turn to a prescribed value. If
significantly contributors to output variance. Variance-based multiple conditioning points are considered, some type of statistic
F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232 223

is applied to aggregate individual results. The general form of a output function must be used, its definition obviously affects the SA
density-based sensitivity index is results because different scalar outputs may have different sensi-
h i tivities to the input factors. For instance, Pappenberger et al. (2008)
Si ¼ stat divergence fy ; fyjxi ð$jxi Þ (10) shows how the ranking of the input factors (the parameters of a
xi
flood inundation model) vary when considering the mean of the
squared errors or the mean of the absolute errors as scalar outputs.
where fy and fyjxi are the unconditional and conditional output PDFs,
Often, it is convenient to define multiple scalar outputs that
and ‘stat’ and ‘divergence’ denote some statistic and divergence
summarise different aspects of the model behaviour. Their sensi-
measure. For example, in entropy-based methods the divergence
tivity can then be analysed separately (e.g. Baroni and Tarantola,
between conditional and unconditional PDF is measured by the
2014) or jointly (e.g. Minunno et al., 2013), or reframed as a
Shannon entropy (Krykacz-Hausmann, 2001) or by the Kullback-
multi-criteria analysis using for example Pareto ranking (e.g.
Leibler entropy (Park and Ahn, 1994; Liu et al., 2006), while the d-
Rosolem et al., 2012).
sensitivity approach (Borgonovo, 2007) uses the area enclosed be-
Another option that is becoming more and more accessible with
tween the two PDFs. In the d-sensitivity approach, different condi-
growing computing power is that of reducing the level of aggre-
tioning values are used for xi and individual results are averaged, i.e.
gation so to preserve more of the temporal or spatial variability of
‘stat’ in Eq. (10) is the mean, while in entropy-based methods only
the model. Sensitivity indices can be computed at different tem-
one conditioning value is typically used. Other density-based ap-
poral resolutions, therefore obtaining their temporal evolution over
proaches, e.g. Borgonovo (2014) and the novel density-based PAWN
the simulation horizon (Wagener and Harvey, 1997; Wagener et al.,
method by Pianosi and Wagener (2015), use cumulative distribution
2003; Cloke et al., 2008; Reusser and Zehe, 2011; Kelleher et al.,
functions in place of PDFs. The advantage is that unconditional and
2013; Guse et al., 2014). A similar approach can be applied to the
conditional CDFs can be efficiently approximated by the empirical
aggregation of spatial patterns into a single output function, whose
CDFs of output samples, which makes the density-based sensitivity
resolution can be varied in order to capture the space-variability of
indices very simple to compute.
sensitivities across the model domain (Tang et al., 2007a; van
One advantage of density-based sensitivity indices is that they
Werkhoven et al., 2008). Time-varying or spatially-varying SA is
can easily be tailored to measure sensitivity over the entire range of
especially useful to provide new insights about the dynamics of the
output variability as well as a specific sub-range, for instance
model (e.g. when and where a given parameter is more influential)
extreme values (the so called Regional Response Probabilistic
and/or the underlying system (e.g. what processes are mostly
Sensitivity Analysis discussed in Liu et al. (2006)). This may be very
influential, when and where). However, its application poses a
interesting in those applications, e.g. hazard assessment, where the
number of practical difficulties, for example regarding the choice of
tail of the output distribution is of particular interest. Other inter-
the averaging window size and of appropriate methods for complex
esting properties of density-based methods are that they allow for
models (Massmann et al., 2014), which constitute an opportunity
using statistics that are monotonic transformation invariant, and
for further research.
that they can be estimated from a given sample, i.e. without
requiring a tailored sampling strategy (Borgonovo (2014) and ref-
4.2. Choose the SA method
erences therein).
Application examples in the environmental domains are
As discussed in Section 3, the choice of the most appropriate SA
Pappenberger et al. (2008) for the entropy-based indices, Castaings
method for a given problem is largely driven by the purpose of the
et al. (2012); Anderson et al. (2014) and Peeters et al. (2014) for the
analysis (screening, ranking or mapping: see horizontal axis in
d-sensitivity measure, and Pianosi and Wagener (2015) for PAWN.
Fig. 3) and by the available computing resource (and therefore the
maximum number of model evaluations that can be used to
4. Workflow for the application of SA approximate sensitivity indices: see vertical axis in Fig. 3). Typi-
cally, the number of model evaluations N increases with the
Despite the differences between the individual SA methods number of input factors M subject to SA. However, the ratio be-
described in the previous section, their application requires per- tween N and M significantly varies from one method to another and
forming a sequence of steps that, to some extent, can be discussed often also from one application to another. This is illustrated in
in general terms. Here we refer to these steps as ‘workflow’. The Fig. 5, which reports some examples of combinations (M,N) taken
workflow for the application of SA is illustrated in Fig. 4. In this from the literature. The choice of the appropriate sample size will
section, we discuss this workflow and the choices that users have to be further discussed in Section 4.5.
make at each step, with the goal of providing practical guidelines to The choice of the method can be also driven by other specific
support users in their SA application. features of the problem at hand, like the linearity of the
inputeoutput relationship, the statistical characteristics of the
4.1. Experimental set-up: define input factors and output output distribution (e.g. its skew), etc., which are handled more or
less effectively by different methods, as discussed in Section 3.
Any SA exercise starts from three basic choices that together When multiple options are available, it may be advisable to
form what we call the ‘experimental setup’: (i) choosing which apply more than one method and to compare individual results so
input factors will be subject to SA; (ii) setting the values of other to reinforce the general conclusions drawn from SA. Often, this can
input factors that will be kept constant throughout the SA; and (iii) be done at almost no extra computing cost because different
defining the model output. methods can be applied to the same inputeoutput sample without
When the model execution produces a temporally or spatially re-running the model. This topic will be further discussed in Sec-
varying set of outputs, the application of SA typically requires tion 4.8. When the number of input factors is high, another option is
aggregating the outputs into a scalar function, as described in to apply methods sequentially, beginning with computationally
Section 2.1. An exception is the case when the input factors are the efficient screening methods like the EET and then applying more
model parameters and the mathematical form of the model allows computer-intensive methods to a reduced number of input factors.
the derivation of algebraic solutions of the model state's sensitivity In such a case, a careful design of the OAT sampling strategy applied
in time (Norton (2015) and references therein). When a scalar during the screening step could help to reduce the number of new
224 F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232

Fig. 4. Workflow for the application of Sensitivity Analysis, choices to be made and recommended practice for their revision.

Fig. 5. Number of model evaluations (N) used in SA against the number of input factors (M) from the applications referenced in this paper. Green markers denote that the
convergence of the sensitivity indices was reached, red markers that it was not reached, grey markers that convergence assessment was not reported in the paper. For density-based
and variance-based (bottom right panel), squares refer first-order and total-order estimators via resampling technique (Saltelli et al., 2010), diamond denote applications of FAST/
eFAST, and stars are application of the density-based d-sensitivity method. (For interpretation of the references to colour in this figure legend, the reader is referred to the web
version of this article.)

model evaluations required in the second step, as discussed for definition of the variability space of the input factors, i.e. the
instance by Campolongo et al. (2011). ‘neighbourhood’ of the nominal value x in local SA, and the input
variability space X in global SA. When using global methods where
inputs are regarded as stochastic variables, like variance-based and
4.3. Define the input variability space density-based methods (Sections 3.5 and 3.6), their PDFs over the
support X must also be defined. In the absence of specific
Whatever the chosen SA method, the first step of SA is the
F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232 225

information regarding this choice a common approach is to assume (for instance the model parameters), so that a given combination of
independent, uniformly distributed inputs so that the problem inputs can be represented by a vector x ¼ ½x1 ; x2 ; …; xM . However,
reverts to the definition of X only. in environmental modelling applications, candidate input factors
When the input factors x are the model parameters, feasible may include entities that are not immediately represented by a
ranges can often be defined based on their physical meaning or scalar number, like for example the time series of forcing inputs
from existing literature, and further constrained using a priori in- (e.g. the input hydrograph in Fig. 1) or the model's spatial resolu-
formation about the specific characteristics of the case study site tion. In order to include such input factors in SA, a link must be
(e.g. Bai et al., 2009). If observations of the simulated variables are established between possible realizations of the non-numerical
available, another option is to first apply a preliminary Regional input factor and the values of a numerical quantity xi. Ad hoc
Sensitivity Analysis to assess whether literature ranges can be procedures can be used for specific types of factors: for instance, a
narrowed down by excluding sub-ranges producing a model per- time series of forcing inputs can be associated with a scalar char-
formance below a prescribed acceptability threshold (e.g. Freer acteristic used to design it (e.g. the intensity or the duration of a
et al., 1996). design storm event as in Hamm et al. (2006)) or with the scalar
When the input factors x are the model forcing inputs, feasible multiplier used to obtain it by perturbation of a reference time
ranges should account for the observational errors that can be ex- series (e.g. Singh et al., 2014). A more flexible procedure is the one
pected from measuring devices, data pre-processing, interpolation, described in Baroni and Tarantola (2014) (and references therein).
etc. Approaches to quantify data uncertainty vary depending on the Here, the variability space of each input factor is represented by a
type of variable under study and are gaining increasing consider- list of its possible realizations. Then, the index of each element in
ation in the environmental modelling community. For an example the list is the desired scalar quantity xi, which is associated with a
of meteorological and water quality and quantity variables and discrete uniform probability distribution. Following these defini-
their uncertainties see for instance McMillan et al. (2012). When tions, sampling is performed with respect to the scalar indices
suitable data are either unavailable or sparse, ranges or probability x1 ,…, xM , while the model is evaluated against the original input
distributions can be elicited from experts. Several techniques and factors defined by the sampled indices. This procedure can be
practical tools are discussed e.g. in O'Hagan et al. (2006) and in applied to any type of input, including non-numerical. However, it
Morris et al. (2014). requires that post-processing uses output samples only, like for
While a review of the available data-based or expert-based instance in variance-based or density-based methods, while it
methods to define the input variability space falls outside the cannot be applied within the Elementary Effects Test or Regional
scope of this paper, here we mainly want to point out that the Sensitivity Analysis, which by construction requires that the input
definition of X (and possibly the associated probability distribu- variability space be a metric space (see Eqs. (4) and (7)).
tion) is often one of the most delicate steps in the application of
GSA. A number of studies demonstrate how different definitions of
X , each considered equally plausible by the analyst, can dramati- 4.4. Choose the sampling strategy
cally change the values of sensitivity measures and therefore the
conclusions drawn from GSA (e.g. Shin et al., 2013). This is espe- When sensitivity indices cannot be computed analytically,
cially true for variance-based and density-based methods where sampling-based sensitivity analysis (Fig. 2) must be used.
the sensitivity measures are directly related to the output proba- For OAT methods like the EET, several alternative strategies are
bility distribution, which is induced by the combination of the available for sampling (see discussion in Section 3.2). For AAT
model structure (Eq. (1)) and the assumed input distributions. methods like Correlation and Regression Analysis, Regional Sensi-
In the following paragraphs we discuss two other specific issues tivity Analysis and density-based methods, in principle any random
that in our opinion deserve special attention when applying GSA to or quasi-random sampling technique can be used. Among these,
environmental models. the most commonly used in the GSA literature are Latin-Hypercube
sampling and Sobol’ quasi-random sampling. A practice-oriented
4.3.1. Handling unacceptable model behaviour introduction to these techniques can be found for example in
When dealing with complex environmental models, it may Forrester et al. (2008) (Section 1.4) and Press et al. (1992) (Section
happen that a combination of input factors that a priori may seem 7.7).
feasible, generates a model's response that the analyst would reject Some other GSA methods may require a tailored sampling
as unacceptable (for instance, unacceptably large deviations from strategy. For example, the approximation of the first-order and
observations), or even causes the simulation to fail (for instance total-order variance-based indices by the estimators discussed in
due to numerical instability). These simulations may be excluded Saltelli et al. (2010) (see Section 3.5) is based on a tailored two-
from further analysis by adding a ‘filtering’ step before the post- stage procedure. First, 2n random samples are generated (so
processing step (see Fig. 2). An example is given in Pappenberger called base sample) using Sobol' quasi-random or Latin-Hypercube
et al. (2008), where output samples associated with model per- sampling; then, other Mn input samples are built by recombination
formance below a prescribed threshold are discarded before the of the vectors in the base sample. The FAST and eFAST approaches
computation of the sensitivity indices. Kelleher et al. (2013) also also require a tailored sampling strategy. In fact, the use of an
compare the sensitivity estimates that are obtained before and after efficient sampling strategy is what differentiates them from other
applying a performance criterion to screen out unacceptable estimators of variance-based indices, as described in Section 3.5.
parameter sets. Such critical look at the results of individual sam- We suggest that the implications of the sampling choice should
ples, or subsets of samples, is a practice we recommend since it may be tested similar to the other choices made in the application of
yield useful insights into the model behaviour and gives directions GSA. If computationally feasible, different strategies can be
to revise the experimental setup of the SA exercise (for instance, to compared. However, it is likely that the definition of the input
reduce or enlarge the input variability space). variability space or the output definition have a larger impact on
the GSA outcome. Furthermore, independently of the chosen
4.3.2. Handling non-scalar or non-numerical input factors sampling strategy, the robustness of the sensitivity indices can be
GSA methods described in Section 3 are usually illustrated checked through confidence intervals, as discussed in the following
assuming that all the input factors are numerical scalar quantities sections.
226 F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232

4.5. Choose the sample size how to properly choose the sample size a priori, in the next sub-
section we discuss some techniques to verify a posteriori the
The second choice to be made in sampling-based GSA is that of appropriateness of the choice made.
the sample size N. This choice has a dramatic impact on the overall
computational burden, given that the execution of the model is 4.6. Assess robustness and convergence
usually far more computationally expensive than the post-
processing step of estimating sensitivity indices. Therefore GSA When applying sampling-based SA, sensitivity indices are not
users are typically confronted with the problem of finding a computed exactly but they are approximated from the available
compromise between the need for keeping the sample size small samples. The robustness and convergence of such sensitivity esti-
and that of obtaining reliable estimates of the sensitivity indices. mates should therefore be assessed, especially when obtained from
The solution to this problem is not unique and may significantly samples of small/medium size.
vary depending on the complexity of the model's response (Eq. (1)), Convergence analysis assesses whether sensitivity estimates are
which, however, is generally difficult to know before running the independent of the size of the inputeoutput sample, i.e. if they
model. would take similar values when using an (independent) sample of
Some suggestions for the choice of the sample size for the most larger size. A simple and generic technique to address this question
widely used methods are reported in the literature. For instance, for is to re-compute the sensitivity indices using sub-samples of
the Elementary Effect Test, a common indication is to use r ¼ 10 EEs, decreasing size extracted from the original sample. The advantage
which results in a total number of N ¼ rðM þ 1Þ model evaluations. of this approach is that it does not require running new model
However, to the authors’ knowledge this choice seems to be simulations, however it might overestimate the convergence rate
motivated mainly by the need of keeping the total number of model because the sub-samples are not independent. Results of conver-
evaluations limited rather than by a formal assessment of the gence analysis can be displayed in a ‘convergence plot’ like the one
reliability of the results. For example, Campolongo and Saltelli in Appendix A. Examples are given in Nossent et al. (2011) and
(1997) show that, with r ¼ 10, the confidence bounds of the Wang et al. (2013).
sensitivity indices obtained by bootstrapping are so large that Robustness analysis assesses whether sensitivity indices are
factor ranking is essentially meaningless; Vanuytrecht et al. (2014) independent of the specific inputeoutput sample, i.e. if they would
compute the EET sensitivity indices using an increasing number of take similar values if estimated over a different sample of the same
samples and conclude that r ¼ 25 is sufficient to discriminate be- size. Technique to address the question without running new
tween influential and non-influential factors (screening) while it is model evaluations are subsampling and bootstrapping (Efron and
still not sufficient to stabilize factor ranking. Tibshirani, 1993; Romano and Shaikh, 2012). A discussion of the
For variance-based indices computed using the efficient esti- quality of bootstrapping-based confidence limits of some widely
mators discussed in Saltelli et al. (2010), the application of the used sensitivity indices can be found in Yang (2011).
resampling strategy to a base random sample of size n leads to a If convergence has not been reached and/or the confidence
total of N ¼ nðM þ 2Þ model evaluations. Common indications for n bounds are large, additional model simulations may be run and the
range from 500 to 1000 (Saltelli et al., 2008). However, application sensitivity indices re-estimated over the increased sample. If this is
examples reported in the literature seem to suggest that the base not possible because of limited computing resources, some con-
sample size may significantly vary from one application to the other clusions may still be drawn from the available results. In fact, even if
and that a much larger base sample might be needed to achieve the estimates of the sensitivity indices have not reached conver-
reliable results (see datapoints in the bottom right panel of Fig. 5). gence, the screening of the non-influential input factors or the
Furthermore, the number of samples needed to reach stable factor ranking might have stabilised (see for instance the discussion
sensitivity estimates can vary from one input factor to another, with in Ziliani et al. (2013)).
low sensitivity inputs usually converging faster than high sensi- While the evaluation of convergence and/or robustness is
tivity ones (e.g. Nossent et al. (2011)). increasingly common in applications of variance-based methods, it
The use of distribution functions in RSA usually provides quite is not equally common for other methods, for instance the
robust sensitivity estimates even for relatively small sample sizes Elementary Effects Test, although there is no technical reason not to
(see discussion in Section 3.4), a feature that made RSA particularly extend the above described techniques to this approach (see for
attractive when it was introduced in the early 1980s given that the example the visualization of the EET results with bootstrapping in
computing resource for Monte Carlo sampling was very limited at Appendix A). We suggest that the assessment of convergence and
the time. Correlation and regression methods are also generally robustness of the estimated indices and the associated screening,
applied to relatively limited datasets, typically around or less than ranking and mapping should be standard practice in any sampling-
1000M model evaluations (see again Fig. 5 for some examples). based SA exercise.
However, it is difficult to provide general rules for these classes of
methods especially because applications of RSA and Correlation 4.7. Visualize results
and Regression methods rarely report a discussion of the appro-
priateness of the selected sample size (an exception is Kleijnen and When dealing with large sets of sensitivity indices, the inter-
Helton (1999b)). pretation of SA results can be significantly enhanced by effective
To summarise, we can conclude that, roughly speaking, the visualization tools that: (i) facilitate the identification of outliers
number of model evaluations N increases with the number of in- and counterintuitive behaviours; (ii) help comparing results ob-
puts M by a factor in between 10 and 100 for multiple-starts de- tained by varying some of the underlying choices, e.g. different
rivatives, between 100 and 1000 for Regional Sensitivity Analysis definitions of the input variability space or different sampling
and Correlation and Regression methods, and around 1000 or even strategy; (iii) support the identification of temporal or spatial pat-
more for density-based and variance-based methods (though sig- terns in the output sensitivity; etc. Furthermore, effective visuali-
nificant reductions are obtained when using FAST or eFAST). zation is key to improve the communication of SA results and
However, these proportionality coefficients are expected to in- conclusions.
crease with M, and they can vary greatly from one application to General suggestions for visualizing scientific data effectively are
another. Therefore, rather than providing specific indications on presented in Kelleher and Wagener (2011). In Appendix A of this
F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232 227

paper we provide several examples of plots that have been on the KolmogoroveSmirnov statistic presented in Pianosi and
employed in SA applications and that we found helpful. Some of Wagener (2015). Here conditional (on either sensitive or insensi-
these plots have been proposed for specific SA methods (e.g. the tive parameters) and unconditional output distributions are
Elementary Effects Test or Regional Sensitivity Analysis) while compared to check whether all insensitive factors have been
others are meant to handle specific challenges. For instance, pat- identified. The limitation of these validation tests is that they
terns plots (e.g. van Werkhoven et al., 2008) can be very effective to require additional model runs.
visualize large sets of sensitivity indices, e.g. when the number of Credibility assessment also involves the interpretation and
input factors is large or when analysing the variations of output explanation of the SA results. If unexpected results are obtained, for
sensitivity across a wide temporal or spatial domain. They help instance the output is highly sensitive to an input factor that was
highlighting patterns and trends although they do not allow for a supposed hardly influential, the interpretation of the result could
detailed comparison between the exact index values. lead to either learning new aspects of the model behaviour or
Another challenge is to visualize multiple sensitivity attributes revising some of the choices made in the experimental setup, for
simultaneously, for instance first-order sensitivity, total-order example the definition of the output definition or of the input
sensitivity and interactions, in such a way that much information variability space.
is conveyed without overloading the reader. Two types of plots that
have been recently suggested to this end are Circos (Kelleher et al.,
2013) and radial convergence diagrams (Butler et al., 2014). Our 5. Conclusions
(subjective) experience is that viewers find radial convergence di-
agrams somewhat easier to interpret though both contain the same In this paper we have provided a systematic classification of
information. Sensitivity Analysis (SA) methods and discussed a workflow for its
Besides visualising sensitivity indices, it is often convenient to application, with the aim of providing the reader with the back-
visualise the input and output samples for additional insights. For ground needed:
example, variance-based methods do not provide any mapping of
the results into the input factors space, however some information  to further engage with the SA literature;
about this mapping can be obtained by applying RSA or other  to recognise the type of questions that could be addressed
visualization tools (e.g. scatter plots or parallel coordinate plots, see through SA;
Appendix A) to the base sample generated for VBSA, at no addi-  to choose the most suitable SA approach depending on the
tional computing cost. questions to be addressed, the available computing resources,
and the characteristics of the problem at hand;
 to be aware of the key assumptions underlying each approach,
4.8. Assess credibility its scope and limitations;
 to understand the typical workflow for applying SA;
The robustness and convergence analyses discussed in Section  to be aware of the most sensitive choices that are made in the
4.6 aim at assessing the uncertainty in the results of a specific SA workflow and how to assess their impacts.
method. Therefore they tell us about the reliability of the results
within the context of that method. A different and equally relevant In doing so, we also highlighted some emerging trends in the SA
question is how much the method itself can be trusted, i.e. how literature that we consider of particular interest to the environ-
suitable it is to address the questions it is expected to answer when mental modelling community. In particular:
applied to the problem at hand. For instance, variance-based
methods rely on the assumption that variance is a sensible proxy (i) the application of SA to analyse the impact of non-numerical
for uncertainty, which may not be true for a highly-skewed output uncertain factors like model resolution or structure;
distribution. In this case, even if one were able to derive almost (ii) the application of time-varying and space-varying SA, which
exact estimates of the variance-based sensitivity indices, they is made possible by increasing computing power and storage
would not provide the correct ranking (a numerical example is space, and which is a means to overcome the limitations of
given in Liu et al. (2006)). In other words, SA results may be very defining an ‘aggregated’ scalar output when dealing with
robust and yet not credible, and vice versa. dynamic models;
A way to assess credibility is by verifying that the underlying (iii) the application of SA for dominant-control analysis and
assumptions of a method are satisfied, for instance checking the robust decision-making, i.e. as a means to learn about the
linearity, monotonicity or smoothness of the inputeoutput rela- behaviour of models or systems.
tionship of Eq (1) or the characteristics of the output distribution.
Another way is to compare SA results produced by different We think that, among the topics for further research in the field,
methods. As discussed in Section 4.2, the application of different the following are of particular relevance for environmental
GSA methods does not necessarily increase the computational modellers:
burden since multiple approaches can be applied to the same
inputeoutput sample. If the screening/ranking results remain the  developing multi-method approaches to overcome the limita-
same across different methods, the comparison reinforces the tions of individual SA methods;
conclusions of SA. If instead there are contradictory results, it  providing guidance and advice on convergence and robustness
stimulates further investigations that may lead to understanding of different SA approaches;
different aspects of the model's behaviour that are captured by  integrating the evaluation of model behaviour/performance in
different SA methods (see for instance the discussion in the estimation of sensitivity indices;
Pappenberger et al. (2008)). Also, specific techniques can be applied  improving techniques to analyse interactions between input
to validate SA conclusions, e.g. the visual test proposed by Andres factors: in fact, while information about factor interactions can
(1997) to validate factor screening or the quantitative test based be gathered as a byproduct of several SA techniques (for
228 F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232

instance, by looking at the standard deviation of the EEs or at the comments and suggestions that have greatly contributed to
difference between total-order and first-order indices in improving the manuscript. This work was supported by the Natural
variance-based SA), to our knowledge there is no SA method Environment Research Council (NERC) [Consortium on Risk in the
that has been specifically proposed to effectively investigate Environment: Diagnostics, Integration, Benchmarking, Learning
factor interactions; and Elicitation (CREDIBLE); grant number NE/J017450/1].
 improving tools for visualisation and effective communication
of SA results;
 reducing computing requirements for applications to complex Appendix A. Examples of helpful visualization tools for
environmental models, including the use of emulators. global SA

The authors thank three anonymous referees for very useful

Visualize input/output samples


1. Scatter plots (or dotty plots): output samples against samples of the i-th input factor.
One point ðxji ; yj Þ per input/output sample ðj ¼ 1; …; NÞ. Uniformly scattered points
like in the left panel indicate low sensitivity (to x1 here); emergence of patterns like in
the right panel denotes high sensitivity. Useful for screening and ranking.

2. Coloured scatter plots: samples of i-th input factor against samples of the k-th, with
marker colour proportional to the associated output value. One point ðxji ; xjk Þ per
input/output sample ðj ¼ 1; …; NÞ. Useful to detect interactions, which are highlighted
by the emergence of colour patterns (as for instance in the right panel).

3. Parallel coordinate plots: distribution of input factors within their variability ranges.
One line per sample xj of input factors ðj ¼ 1; …; NÞ. Ranges are standardised to allow
for comparison across factors. Lines highlighted in different colours correspond to
‘particular’ output values, for instance above a threshold. If highlighted lines cover the
entire range of a factor (for instance black lines on factor number 2) sensitivity is low. If
they concentrate in a subrange (as for instance for factor 5) then sensitivity is high.
Useful for mapping.

Elementary Effects Test


4. Average of Elementary Effects (EEs) versus their standard deviation. One point per input
factor. The more to the right a point along the horizontal axis, the more influential the
factor. The higher up a point along the vertical axis, the larger its degree of interactions
with other factors. Useful for screening and ranking.

5. Same as before but with confidence bounds derived via bootstrapping around the mean
and standard deviation of the EEs.
F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232 229

(continued )

Regional Sensitivity Analysis


6. Empirical cumulative distribution function of the input samples associated with output
values above/below a given threshold. One plot per input factor. The larger the distance
between the two distribution functions, the more influential the factor. This plot can be
also used to determine sub-ranges of the input factor that have no influence on the
output above/below the threshold: these are the sub-ranges where the distribution
functions are either zero or one (e.g. x5 > 0:8 for y > 10 in this example). Useful for
ranking and mapping.

7. Empirical cumulative distribution function of input factors associated to output values


within ten different ranges. Same as before without the need for specifying a
threshold value.

Visualize sensitivity indices


8. Bar plot. Value of sensitivity index for different input factors.

9. Box plot. Average value of sensitivity index over bootstrap resamples for different input
factors (black line) and 90% confidence intervals.

10. Convergence plot. Sensitivity indices estimated using an increasing sample size (one
line per factor). Dashed lines represent confidence bounds obtained at each sample
size, for instance by bootstrapping.

11. Radial convergence diagrams. For each input factor, the diagram shows: its direct
(first-order) influence (proportional to the size of the inner circle); the total
influence including interactions (size of the outer circle); the existence and extent of
interactions between pairs of factors (lines and their width). Taken from Butler et al.
(2014).

(continued on next page)


230 F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232

(continued )

12. Circos. For each input factor, the diagram shows: the total influence including
interactions (proportional to the size of the pie slice on the outside of the circle);
existence and extent of interactions between pairs of factors (inner connecting lines
and their width). The direct (first-order) influence of each factor can be inferred as the
portion of the total influence that is not connected to any other pie slice (i.e. the white
space highlighted by the black arrows). Taken from Kelleher et al. (2013).

13. Pattern plots. For each study site (i.e. the watersheds listed on the horizontal axis) and
each input factor (i.e. the model parameters on the vertical axis), the picture shows the
output sensitivity via a colour scale (white denotes no sensitivity, red is maximum
sensitivity). Study sites are ordered along the horizontal axis depending on their
climate conditions, which facilitates visual investigation of trends and patterns linking
parameter sensitivity to climate properties. Taken from van Werkhoven et al. (2008).

References Sensitivity analysis of dispersion modeling of volcanic ash from Eyjafjallajokull


in May 2010. J. Geophys. Res. 117.
EC, 2009. Impact Assessment Guidelines. European Commission. Technical Report
Anderson, B., Borgonovo, E., Galeotti, M., Roson, R., 2014. Uncertainty in climate
92. http://ec.europa.eu/governance/impact/docs/key_docs/iag_2009_en.pdf.
change modeling: can global sensitivity analysis be of help? Risk Anal. 34 (2),
Efron, B., Tibshirani, R., 1993. An Introduction to the Bootstrap. Chapman & Hall/
271e293.
CRC.
Andres, T., 1997. Sampling methods and sensitivity analysis for large parameter sets.
EPA, 2009. Guidance on the Development, Evaluation, and Application of Envi-
J. Stat. Comput. Simul. 57 (1e4), 77e110.
ronmental Models. Technical Report EPA/100/K-09/003. Environmental Pro-
Bai, Y., Wagener, T., Reed, P., 2009. A top-down framework for watershed model
tection Agency. http://www.epa.gov/crem/library/cred_guidance_0309.pdf.
evaluation and selection under uncertainty. Environ. Model. Softw. 24 (8),
Forrester, A., Sobester, A., Keane, A., 2008. Engineering Design via Surrogate
901e916.
Modelling: a Practical Guide. John Wiley & Sons.
Baroni, G., Tarantola, S., 2014. A general probabilistic framework for uncertainty and
Freer, J., Beven, K., Ambroise, B., 1996. Bayesian estimation of uncertainty in runoff
global sensitivity analysis of deterministic models: a hydrological case study.
prediction and the value of data: an application of the GLUE approach. Water
Environ. Model. Softw. 51, 26e34.
Resour. Res. 32 (7), 2161e2173.
Beven, K., 1993. Prophecy, reality and uncertainty in distributed hydrological
Gupta, H.V., Wagener, T., Liu, Y., 2008. Reconciling theory with observations: ele-
modelling. Adv. Water Resour. 16, 41e51.
ments of a diagnostic approach to model evaluation. Hydrol. Process. 22 (18),
Beven, K., Freer, J., 2001. Equifinality, data assimilation, and uncertainty estimation
3802e3813.
in mechanistic modelling of complex environmental systems using the GLUE
Guse, B., Reusser, D.E., Fohrer, N., 2014. How to improve the representation of hy-
methodology. J. Hydrol. 249 (1e4), 11e29.
drological processes in SWAT for a lowland catchment e temporal analysis of
Borgonovo, E., 2007. A new uncertainty importance measure. Reliab. Eng. Syst. Saf.
parameter sensitivity and model performance. Hydrol. Process. 28 (4),
92, 771e784.
2651e2670.
Borgonovo, E., 2008. Sensitivity analysis of model output with input constraints: a
Hall, J., Boyce, S., Wang, Y., Dawson, R., Tarantola, S., Saltelli, A., 2009. Sensitivity
generalized rationale for local methods. Risk Anal. 28, 667e680.
analysis of hydraulic models. ASCE J. Hydraul. Eng. 135 (11), 959e969.
Borgonovo, E., 2010. A methodology for determining interactions in probabilistic
Hamm, N., Hall, J., Anderson, M., 2006. Variance-based sensitivity analysis of the
safety assessment models by varying one parameter at a time. Risk Anal. 30 (3),
probability of hydrologically induced slope instability. Comput. Geosci. 32 (6),
385e399.
803e817.
Borgonovo, E., 2014. Transformation and invariance in the sensitivity analysis of
Harper, E., Stella, J.C., Fremier, A., 2011. Global sensitivity analysis for complex
computer experiments. J. R. Stat. Soc. Ser. B 76 (5), 925e947.
ecological models: a case study of riparian cottonwood population dynamics.
Borgonovo, E., Plischke, E., 2016. Sensitivity analysis: a review of recent advances.
Ecol. Appl. 21 (4), 1225e1240.
Eur. J. Oper. Res. 248 (3), 869e887.
Helton, J., 1993. Uncertainty and sensitivity analysis techniques for use in perfor-
Brown, C., Werick, W., Leger, W., Fay, D., 2011. A decision-analytic approach to
mance assessment for radioactive waste disposal. Reliab. Eng. Syst. Saf. 42
managing climate risks: application to the upper great lakes. J. Am. Water
(2e3), 327e367.
Resour. Assoc. 47 (3), 524e534.
Helton, J., Davis, F., 2002. Illustration of sampling-based methods for uncertainty
Butler, M.P., Reed, P.M., Fisher-Vanden, K., Keller, K., Wagener, T., 2014. Identifying
and sensitivity analysis. Risk Anal. 22 (3), 591e622.
parametric controls and dependencies in integrated assessment models using
Herman, J.D., Kollat, J.B., Reed, P.M., Wagener, T., 2013. From maps to movies: high-
global sensitivity analysis. Environ. Model. Softw. 59, 10e29.
resolution time-varying sensitivity analysis for spatially distributed watershed
Campolongo, F., Cariboni, J., Saltelli, A., 2007. An effective screening design for
models. Hydrol. Earth Syst. Sci. 17 (12), 5109e5125.
sensitivity analysis of large models. Environ. Model. Softw. 22 (10), 1509e1518.
Hill, M., Tiedeman, C., 2007. Effective Groundwater Model Calibration: with Analysis
Campolongo, F., Saltelli, A., 1997. Sensitivity analysis of an environmental model: an
of Data, Sensitivities, Predictions, and Uncertainty. John Wiley & Sons.
application of different analysis methods. Reliab. Eng. Syst. Saf. 57 (1), 49e69.
Homma, T., Saltelli, A., 1996. Importance measures in global sensitivity analysis of
Campolongo, F., Saltelli, A., Cariboni, J., 2011. From screening to quantitative
nonlinear models. Reliab. Eng. Syst. Saf. 52 (1), 1e17.
sensitivity analysis. A unified approach. Comput. Phys. Commun. 182 (4),
Howard, R., 1988. Decision analysis: practice and promise. Manag. Sci. 34 (6),
978e988.
679e695.
Castaings, W., Borgonovo, E., Morris, M., Tarantola, S., 2012. Sampling strategies in
Iman, R., Helton, J., 1988. An investigation of uncertainty and sensitivity analysis
density-based sensitivity analysis. Environ. Model. Softw. 38 (0), 13e26.
techniques for computer models. Risk Anal. 8, 71e90.
Cloke, H., Pappenberger, F., Renaud, J., 2008. Multi-method global sensitivity anal-
Iman, R., Hora, S., 1990. A robust measure of uncertainty importance for use in fault
ysis (mmgsa) for modelling floodplain hydrological processes. Hydrol. Process.
tree system analysis. Risk Anal. 10, 401e406.
22 (11), 1660e1674.
Kelleher, C., Wagener, T., 2011. Ten guidelines for effective data visualization in
Collins, M., Chandler, R., Cox, P., Huthnance, J., Rougier, J., Stephenson, D., 2012.
scientific publications. Environ. Model. Softw. 26 (6), 822e827.
Quantifying future climate change. Nat. Clim. Change 2, 403e409.
Kelleher, C., Wagener, T., McGlynn, B., Ward, A.S., Gooseff, M.N., Payn, R.A., 2013.
Cukier, R.I., Fortuin, C.M., Shuler, K.E., Petschek, A.G., Schaibly, J.H., 1973. Study of the
Identifiability of transient storage model parameters along a mountain stream.
sensitivity of coupled reaction systems to uncertainties in rate coefficients. I
Water Resour. Res. 49 (9), 5290e5306.
Theory. J. Chem. Phys. 59 (8), 3873e3878.
Kleijnen, J., Helton, J., 1999a. Statistical analyses of scatterplots to identify important
Demaria, E.M., Nijssen, B., Wagener, T., 2007. Monte Carlo sensitivity analysis of land
factors in large-scale simulations, 1: review and comparison of techniques.
surface parameters using the Variable Infiltration Capacity model. J. Geophys.
Reliab. Eng. Syst. Saf. 65 (2), 147e185.
Res. Atmos. 112 (D11).
Kleijnen, J., Helton, J., 1999b. Statistical analyses of scatterplots to identify important
Devenish, B., Francis, P.N., Johnson, B.T., Sparks, R.S.J., Thomson, D.J., 2012.
factors in large-scale simulations, 2: robustness of techniques. Reliab. Eng. Syst.
F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232 231

Saf. 65 (2), 187e197. Variance based sensitivity analysis of model output. Design and estimator for
Krykacz-Hausmann, B., 2001. Epistemic sensitivity analysis based on the concept of the total sensitivity index. Comput. Phys. Commun. 181 (2), 259e270.
entropy. In: Proceedings of SAMO2001, CIEMAT, Madrid, pp. 31e35. Saltelli, A., D'Hombres, B., 2010. Sensitivity analysis didn't help. A practitioner's
Kucherenko, S., Tarantola, S., Annoni, P., 2012. Estimation of global sensitivity critique of the stern review. Glob. Environ. Change 20 (2), 298e302.
indices for models with dependent variables. Comput. Phys. Commun. 183 (4), Saltelli, A., Marivoet, J., 1990. Non-parametric statistics in sensitivity analysis for
937e946. model output: a comparison of selected techniques. Reliab. Eng. Syst. Saf. 28 (2),
Lempert, R., Bryant, B., Bankes, S., 2008. Comparing Algorithms for Scenario Dis- 229e253.
covery. Working Papers WR-557-NSF. RAND Corp., Santa Monica, USA. Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M.,
Lempert, R., Popper, S., Bankes, S., 2003. Shaping the Next One Hundred Years: New Tarantola, S., 2008. Global Sensitivity Analysis. The Primer. Wiley.
Methods for Quantitative, Long-term Policy Analysis. Monograph Reports MR- Saltelli, A., Ratto, M., Tarantola, S., Campolongo, F., 2005. Sensitivity analysis for
1626-RPC. RAND Corp., Santa Monica, USA. chemical models. Chem. Rev. 105, 2811e2828.
Liu, H., Sudjianto, A., Chen, W., 2006. Relative entropy based method for probabi- Saltelli, A., Ratto, M., Tarantola, S., Campolongo, F., 2012. Update 1 of: sensitivity
listic sensitivity analysis in engineering design. J. Mech. Des. 128, 326e336. analysis for chemical models. Chem. Rev. 112, 1e21.
Ljung, L., 1999. System Identification: Theory for the User, second ed. PTR Prentice Saltelli, A., Tarantola, S., Chan, K.P.-S., 1999. A quantitative model-independent
Hall, Upper Saddle River, NJ. method for global sensitivity analysis of model output. Technometrics 41 (1),
Massmann, C., Wagener, T., Holzmann, H., 2014. A new approach to visualizing 39e56.
time-varying sensitivity indices for environmental model diagnostics across Shin, M.-J., Guillaume, J.H., Croke, B.F., Jakeman, A.J., 2013. Addressing ten questions
evaluation time-scales. Environ. Model. Softw. 51, 190e194. about conceptual rainfallerunoff models with global sensitivity analyses in R.
McMillan, H., Krueger, T., Freer, J., 2012. Benchmarking observational uncertainties J. Hydrol. 503 (0), 135e152.
for hydrology: rainfall, river discharge and water quality. Hydrol. Process. 26 Sieber, A., Uhlenbrook, S., 2005. Sensitivity analyses of a distributed catchment
(26), 4078e4111. model to verify the model structure. J. Hydrol. 310 (1e4), 216e235.
Minunno, F., van Oijen, M., Cameron, D., Pereira, J., 2013. Selecting parameters for Singh, R., Wagener, T., Crane, R., Mann, M.E., Ning, L., 2014. A vulnerability driven
bayesian calibration of a process-based model: a methodology based on ca- approach to identify adverse climate and land use change combinations for
nonical correlation analysis. SIAM/ASA J. Uncertain. Quantif. 1 (1), 370e385. critical hydrologic indicator thresholds: application to a watershed in Penn-
Moore, C., Doherty, J., 2005. Role of the calibration process in reducing model sylvania, USA. Water Resour. Res. 50, 3409e3427.
predictive error. Water Resour. Res. 41, W05020. Sobol', I., 1993. Sensitivity analysis for non-linear mathematical models. Mathe-
Morris, D.E., Oakley, J.E., Crowe, J.A., 2014. A web-based tool for eliciting probability matical Modelling and Computational Experiment 1, 407e414, translated from
distributions from experts. Environ. Model. Softw. 52, 1e4. Russian: I.M. Sobol', Sensitivity estimates for nonlinear mathematical models
Morris, M., 1991. Factorial sampling plans for preliminary computational experi- Mat. Model. 2 (1990), 112e118.
ments. Technometrics 33 (2), 161e174. Sobol’, I., Kucherenko, S., 2009. Derivative based global sensitivity measures and
Nguyen, T., de Kok, J., 2007. Systematic testing of an integrated systems model for their link with global sensitivity indices. Math. Comput. Simul. 79 (10),
coastal zone management using sensitivity and uncertainty analyses. Environ. 3009e3017.
Model. Softw. 22 (11), 1572e1587. Sorooshian, S., Farid, A., 1982. Response surface parameter sensitivity analysis
Norton, J., 2008. Algebraic sensitivity analysis of environmental models. Environ. methods for postcalibration studies. Water Resour. Res. 18 (5), 1531e1538.
Model. Softw. 23 (8), 963e972. Sorooshian, S., Gupta, V.K., 1985. The analysis of structural identifiability: theory
Norton, J., 2015. An introduction to sensitivity assessment of simulation models. and application to conceptual rainfall-runoff models. Water Resour. Res. 21 (4),
Environ. Model. Softw. 69, 166e174. 487e495.
Nossent, J., Elsen, P., Bauwens, W., 2011. Sobol sensitivity analysis of a complex Spear, R., Hornberger, G., 1980. Eutrophication in peel inlet. II. Identification of
environmental model. Environ. Model. Softw. 26 (12), 1515e1525. critical uncertainties via generalized sensitivity analysis. Water Res. 14 (1),
Oakley, J., O'Hagan, A., 2004. Probabilistic sensitivity analysis of complex models: a 43e49.
bayesian approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66, 751e769. Spear, R.C., Grieb, T.M., Shang, N., 1994. Parameter uncertainty and interaction in
O'Hagan, A., Buck, C., Daneshkhah, A., Eiser, J.R., Garthwaite, P.H., Jenkinson, D.J., complex environmental models. Water Resour. Res. 30 (11), 3159e3169.
Oakley, J.E., Rakow, T., 2006. Uncertain Judgements: Eliciting Experts' Proba- Stephenson, D., Doblas-Reyes, F., 2000. Statistical methods for interpreting Monte
bilities. Wiley. Carlo ensemble forecasts. Tellus A 52 (3), 300e322.
Pappenberger, F., Beven, K., Ratto, M., Matgen, P., 2008. Multi-method global Storlie, C.B., Swiler, L.P., Helton, J.C., Sallaberry, C.J., 2009. Implementation and
sensitivity analysis of flood inundation models. Adv. Water Resour. 31 (1), 1e14. evaluation of nonparametric regression procedures for sensitivity analysis of
Park, C., Ahn, K., 1994. A new approach for measuring uncertainty importance and computationally demanding models. Reliab. Eng. Syst. Saf. 94 (11), 1735e1763.
distributional sensitivity in probabilistic safety assessment. Reliab. Eng. Syst. Sudret, B., 2008. Global sensitivity analysis using polynomial chaos expansions.
Saf. 46, 253e261. Reliab. Eng. Syst. Saf. 93 (7), 964e979.
Pastres, R., Chan, K., Solidoro, C., Dejak, C., 1999. Global sensitivity analysis of a Tang, Y., Reed, P., van Werkhoven, K., Wagener, T., 2007a. Advancing the identifi-
shallow-water 3D eutrophication model. Comput. Phys. Commun. 117, 62e74. cation and evaluation of distributed rainfall-runoff models using global sensi-
Paton, F.L., Maier, H.R., Dandy, G.C., 2013. Relative magnitudes of sources of un- tivity analysis. Water Resour. Res. 43 (6).
certainty in assessing climate change impacts on water supply security for the Tang, Y., Reed, P., Wagener, T., van Werkhoven, K., 2007b. Comparing sensitivity
southern Adelaide water supply system. Water Resour. Res. 49 (3), 1643e1667. analysis methods to advance lumped watershed model identification and
Peeters, L., Podger, G., Smith, T., Pickett, T., Bark, R., Cuddy, S., 2014. Robust global evaluation. Hydrol. Earth Syst. Sci. 11, 793e817.
sensitivity analysis of a river management model to assess nonlinear and van Griensven, A., Meixner, T., Grunwald, S., Bishop, T., Diluzio, M., Srinivasan, R.,
interaction effects. Hydrol. Earth Syst. Sci. 18 (9), 3777e3785. 2006. A global sensitivity analysis tool for the parameters of multi-variable
Pianosi, F., Wagener, T., 2015. A simple and efficient method for global sensitivity catchment models. J. Hydrol. 324 (1e4), 10e23.
analysis based on cumulative distribution functions. Environ. Model. Softw. 67, van Werkhoven, K., Wagener, T., Reed, P., Tang, Y., 2008. Characterization of
1e11. watershed model behavior across a hydroclimatic gradient. Water Resour. Res.
Pianosi, F., Wagener, T., Rougier, J., Freer, J., Hall, J., 2014. Sensitivity analysis of 44 (1).
environmental models: a systematic review with practical workflow. In: van Werkhoven, K., Wagener, T., Reed, P., Tang, Y., 2009. Sensitivity-guided reduc-
Vulnerability, Uncertainty, and Risk, pp. 290e299. tion of parametric dimensionality for multi-objective calibration of watershed
Powell, S., Baker, K., 1992. Management Science, the Art of Modeling with models. Adv. Water Resour. 32 (8), 1154e1169.
Spreadsheets, fourth ed. John Wiley & Sons, Hoboken, NJ, USA. Vanuytrecht, E., Raes, D., Willems, P., 2014. Global sensitivity analysis of yield
Press, W., Teukolsky, S., Vetterling, W., Flannery, B., 1992. Numerical Recipes in C, output from the water productivity model. Environ. Model. Softw. 51, 323e332.
second ed. Cambridge University Press. Vautard, R., Beekmann, M., Menut, L., 2000. Applications of adjoint modelling in
Prudhomme, C., Kay, A., Crooks, S., Reynard, N., 2013. Climate change and river atmospheric chemistry: sensitivity and inverse modelling. Environ. Model.
flooding: part 2 sensitivity characterisation for British catchments and example Softw. 15 (6e7), 703e709.
vulnerability assessments. Clim. Change 119 (3e4), 949e964. Vrugt, J.A., ter Braak, C., Diks, C., Robinson, B., Hyman, J., Higdon, D., 2009. Accel-
Rakovec, O., Hill, M.C., Clark, M.P., Weerts, A.H., Teuling, A.J., Uijlenhoet, R., 2014. erating Markov Chain Monte Carlo simulation by differential evolution with
Distributed Evaluation of Local Sensitivity analysis (DELSA), with application to self-adaptive randomized subspace sampling. Int. J. Nonlinear Sci. Numer.
hydrologic models. Water Resour. Res. 50 (1), 409e426. Simul. 10, 273e290.
Ratto, M., Castelletti, A., Pagano, A., 2012. Emulation techniques for the reduction Wagener, T., Boyle, D., Lees, M., Wheater, H., Gupta, H., Sorooshian, S., 2001.
and sensitivity analysis of complex environmental models. Environ. Model. A framework for development and application of hydrological models. Hydrol.
Softw. 34, 1e4. Earth Syst. Sci. 5, 13e26.
Reusser, D.E., Zehe, E., 2011. Inferring model structural deficits by analyzing tem- Wagener, T., Gupta, H.V., 2005. Model identification for hydrological forecasting
poral dynamics of model performance and parameter sensitivity. Water Resour. under uncertainty. Stoch. Environ. Res. Risk Assess. 19 (6), 378e387.
Res. 47 (7). Wagener, T., McIntyre, N., Lees, M.J., Wheater, H.S., Gupta, H.V., 2003. Towards
Romano, J., Shaikh, A., 2012. On the uniform asymptotic validity of subsampling and reduced uncertainty in conceptual rainfall-runoff modelling: dynamic identi-
the bootstrap. Ann. Stat. 40 (6), 2789e2822. fiability analysis. Hydrol. Process. 17 (2), 455e476.
Rosolem, R., Gupta, H., Shuttleworth, W., Zeng, X., deGoncalves, L., 2012. A fully Wagner, B.J., Harvey, J.W., 1997. Experimental design for estimating parameters of
multiple-criteria implementation of the Sobol' method for parameter sensi- rate-limited mass transfer: analysis of stream tracer studies. Water Resour. Res.
tivity analysis. J. Geophys. Res. 117 (D07103). 33 (7), 1731e1741.
Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., Tarantola, S., 2010. Wang, J., Li, X., Lu, L., Fang, F., 2013. Parameter sensitivity analysis of crop growth
232 F. Pianosi et al. / Environmental Modelling & Software 79 (2016) 214e232

models based on the extended Fourier Amplitude Sensitivity Test method. Resour. Res. 44 (9).
Environ. Model. Softw. 48, 171e182. Young, P.C., Spear, R.C., Hornberger, G.M., 1978. Modeling badly defined systems:
Wilby, R.L., Dessai, S., 2010. Robust adaptation to climate change. Weather 65 (7), some further thoughts. In: Proceedings SIMSIG Conference, Canberra,
180e185. pp. 24e32.
Yang, J., 2011. Convergence and uncertainty analyses in Monte-Carlo based sensi- Ziliani, L., Surian, N., Coulthard, T., Tarantola, S., 2013. Reduced-complexity
tivity analysis. Environ. Model. Softw. 26 (4), 444e457. modeling of braided rivers: assessing model performance by sensitivity anal-
Yilmaz, K.K., Gupta, H.V., Wagener, T., 2008. A process-based diagnostic approach to ysis, calibration, and validation. J. Geophys. Res. Earth Surf. 18, 1e20.
model evaluation: application to the NWS distributed hydrologic model. Water

You might also like