Applied Quantitaive Finance PDF
Applied Quantitaive Finance PDF
Wolfgang Härdle
Torsten Kleinow
Gerhard Stahl
In cooperation with
Gökhan Aydınlı, Oliver Jim Blaskowitz, Song Xi Chen,
Matthias Fengler, Jürgen Franke, Christoph Frisch,
Helmut Herwartz, Harriet Holzberger, Steffi Höse,
Stefan Huschens, Kim Huynh, Stefan R. Jaschke, Yuze Jiang
Pierre Kervella, Rüdiger Kiesel, Germar Knöchlein,
Sven Knoth, Jens Lüssem, Danilo Mercurio,
Marlene Müller, Jörn Rank, Peter Schmidt,
Rainer Schulz, Jürgen Schumacher, Thomas Siegl,
Robert Wania, Axel Werwatz, Jun Zheng
Preface xv
Contributors xix
I Value at Risk 1
II Credit Risk 85
4 Rating Migrations 87
Steffi Höse, Stefan Huschens and Robert Wania
4.1 Rating Transition Probabilities . . . . . . . . . . . . . . . . . . 88
4.1.1 From Credit Events to Migration Counts . . . . . . . . 88
vi Contents
IV Econometrics 219
Index 398
Preface
This book is designed for students and researchers who want to develop pro-
fessional skill in modern quantitative applications in finance. The Center for
Applied Statistics and Economics (CASE) course at Humboldt-Universität zu
Berlin that forms the basis for this book is offered to interested students who
have had some experience with probability, statistics and software applications
but have not had advanced courses in mathematical finance. Although the
course assumes only a modest background it moves quickly between different
fields of applications and in the end, the reader can expect to have theoretical
and computational tools that are deep enough and rich enough to be relied on
throughout future professional careers.
The text is readable for the graduate student in financial engineering as well as
for the inexperienced newcomer to quantitative finance who wants to get a grip
on modern statistical tools in financial data analysis. The experienced reader
with a bright knowledge of mathematical finance will probably skip some sec-
tions but will hopefully enjoy the various computational tools of the presented
techniques. A graduate student might think that some of the econometric
techniques are well known. The mathematics of risk management and volatil-
ity dynamics will certainly introduce him into the rich realm of quantitative
financial data analysis.
The computer inexperienced user of this e-book is softly introduced into the
interactive book concept and will certainly enjoy the various practical exam-
ples. The e-book is designed as an interactive document: a stream of text and
information with various hints and links to additional tools and features. Our
e-book design offers also a complete PDF and HTML file with links to world
wide computing servers. The reader of this book may therefore without down-
load or purchase of software use all the presented examples and methods via
the enclosed license code number with a local XploRe Quantlet Server (XQS).
Such XQ Servers may also be installed in a department or addressed freely on
the web, click to www.xplore-stat.de and www.quantlet.com.
xvi Preface
like to thank Jörg Feuerhake, Zdeněk Hlávka, Sigbert Klinke, Heiko Lehmann
and Rodrigo Witzel.
W. Härdle, T. Kleinow and G. Stahl
Berlin and Bonn, June 2002
Contributors
def
x = . . . x is defined as ...
R real numbers
def
R = R ∪ {∞, ∞}
A> transpose of matrix A
X ∼ D the random variable X has distribution D
E[X] expected value of random variable X
Var(X) variance of random variable X
Std(X) standard deviation of random variable X
Cov(X, Y ) covariance of two random variables X and Y
N(µ, Σ) normal distribution with expectation µ and covariance matrix Σ, a
similar notation is used if Σ is the correlation matrix
cdf denotes the cumulative distribution function
pdf denotes the probability density function
P[A] or P(A) probability of a set A
1 indicator function
def
(F ◦ G)(x) = F {G(x)} for functions F and G
αn = O(βn ) iff αβnn −→ constant, as n −→ ∞
αn = O(βn ) iff αβnn −→ 0, as n −→ ∞
Ft is the information set generated by all information available at time t
Value at Risk
1 Approximating Value at Risk in
Conditional Gaussian Models
Stefan R. Jaschke and Yuze Jiang
1.1 Introduction
Financial institutions are facing the important task of estimating and control-
ling their exposure to market risk, which is caused by changes in prices of
equities, commodities, exchange rates and interest rates. A new chapter of risk
management was opened when the Basel Committee on Banking Supervision
proposed that banks may use internal models for estimating their market risk
(Basel Committee on Banking Supervision, 1995). Its implementation into na-
tional laws around 1998 allowed banks to not only compete in the innovation
of financial products but also in the innovation of risk management methodol-
ogy. Measurement of market risk has focused on a metric called Value at Risk
(VaR). VaR quantifies the maximal amount that may be lost in a portfolio over
a given period of time, at a certain confidence level. Statistically speaking, the
VaR of a portfolio is the quantile of the distribution of that portfolio’s loss over
a specified time interval, at a given probability level.
control risk and to build an atmosphere where the risk management system
is accepted by all participants. This is an organizational and social problem.
The methodological question how risk should be modeled and approximated
is – in terms of the cost of implementation – a smaller one. In terms of im-
portance, however, it is a crucial question. A non-adequate VaR-methodology
can jeopardize all the other efforts to build a risk management system. See
(Jorion, 2000) for more on the general aspects of risk management in financial
institutions.
where Sti denotes the price of stock i at time t. Bonds are first decomposed
into a portfolio of zero bonds. Zero bonds are assumed to depend on
the two key interest rates with the closest maturities. How to do the
interpolation is actually not as trivial as it may seem, as demonstrated
1.1 Introduction 5
def
(with µ = E[Xt ]). An alternative route is to first specify an estimator
and then look for a model in which that estimator has certain optimality
properties. The exponential moving average
T
X −1
Σ̂T = (eλ − 1) e−λ(T −t) (Xt − µ)(Xt − µ)>
t=−∞
RiskMetrics in 1994 considered only the first derivative of the value function,
the “delta”. Without loss of generality, we assume that the constant term in
the Taylor expansion (1.1), the “theta”, is zero.)
CC > = Σ,
C > ΓC = Λ.
1.2 General Properties of Delta-Gamma-Normal Models 9
This implies
m
X 1
∆V = (δi Yi + λi Yi2 ) (1.2)
i=1
2
m
( 2 )
δi2
X 1 δi
= λi + Yi −
i=1
2 λi 2λi
npar= VaRDGdecomp(par)
uses a generalized eigen value decomposition to do a suitable co-
ordinate change. par is a list containing Delta, Gamma, Sigma on
input. npar is the same list, containing additionally B, delta,
and lambda on output.
The characteristic function of a non-central χ21 variate ((Z + a)2 , with standard
normal Z) is known analytically:
2
it(Z+a)2 −1/2 a it
Ee = (1 − 2it) exp .
1 − 2it
This implies the characteristic function for ∆V
Y 1 1
Eeit∆V = √ exp{− δj2 t2 /(1 − iλj t)}, (1.3)
j
1 − iλ j t 2
10 1 Approximating Value at Risk in Conditional Gaussian Models
z= VaRcgfDG(t,par)
Computes the cumulant generating function (cgf) for the class of
quadratic forms of Gaussian vectors.
z= VaRcharfDG(t,par)
Computes the characteristic function for the class of quadratic
forms of Gaussian vectors.
vec= VaRcumulantsDG(n,par)
Computes the first n cumulants for the class of quadratic forms
of Gaussian vectors. The list par contains at least Gamma and
Sigma.
z= VaRcumulantDG(n,par)
Computes the n-th cumulant for the class of quadratic forms of
Gaussian vectors. The parameter list par is to be generated with
VaRDGdecomp.
12 1 Approximating Value at Risk in Conditional Gaussian Models
1.3.1 Derivation
The Cornish-Fisher expansion can be derived in two steps. Let Φ denote some
base distribution and φ its density function. The generalized Cornish-Fisher
expansion (Hill and Davis, 1968) aims to approximate an α-quantile of F in
terms of the α-quantile of Φ, i.e., the concatenated function F −1 ◦ Φ. The key
to a series expansion of F −1 ◦Φ in terms of derivatives of F and Φ is Lagrange’s
inversion theorem. It states that if a function s 7→ t is implicitly defined by
t = c + s · h(t) (1.6)
and h is analytic in c, then an analytic function f (t) can be developed into a
power series in a neighborhood of s = 0 (t = c):
∞
X sr
f (t) = f (c) + Dr−1 [f 0 · hr ](c), (1.7)
r=1
r!
Setting s = 1 in (1.6) implies Φ−1 (t) = F −1 (α) and with the notations x =
F −1 (α), z = Φ−1 (α) (1.8) becomes the formal expansion
∞
X 1 r−1
x=z+ (−1)r D [((F − Φ)r /φ) ◦ Φ−1 ](Φ(z)).
r=1
r!
1.3 Cornish-Fisher Approximations 13
(ck are the Gram-Charlier coefficients. They can be derived from the moments
by multiplying the power series for the two terms on the left hand side.) Com-
ponentwise Fourier inversion yields the corresponding series for the probability
density
X∞
f (x) = ck (−1)k φ(k) (x) (1.11)
k=0
and for the cumulative distribution function (cdf)
∞
X
F (x) = Φ(x) − ck (−1)k−1 φ(k−1) (x). (1.12)
k=1
(φ und Φ are now the standard normal density and cdf. The derivatives of
the standard normal density are (−1)k φ(k) (x) = φ(x)Hk (x), where the Her-
mite polynomials Hk form an orthogonal basis in the Hilbert space L2 (R, φ)
of the square integrable functions on R w.r.t. the weight function φ. The
Gram-Charlier coefficients can thus be interpreted as the Fourier coefficients
of the function
P∞ f (x)/φ(x) in the Hilbert space L2 (R, φ) with the basis {Hk }
f (x)/φ(x) = k=0 ck Hk (x).) Plugging (1.12) into (1.9) gives the formal Cornish-
Fisher expansion, which is re-grouped as motivated by the central limit theo-
rem.
14 1 Approximating Value at Risk in Conditional Gaussian Models
Multiplying out the last term shows that the k-th Gram-Charlier coefficient
ck (n) of Sn is a polynomial expression in n−1/2 , involving the coefficients ci up
to i = k. If the terms in the formal Cornish-Fisher expansion
∞
" ∞
!r #
r 1
X X
x=z+ (−1) D(r−1) − ck (n)Hk−1 (z) (1.13)
r=1
r!
k=1
are sorted and grouped with respect to powers of n−1/2 , the classical Cornish-
Fisher series
X∞
x=z+ n−k/2 ξk (z) (1.14)
k=1
κk+2
with ak = (k+2)! . ξk (H) is a formal polynomial expression in H with the usual
algebraic relations between the summation “+” and the “multiplication” “∗”.
Once ξk (H) is multiplied out in ∗-powers of H, each H ∗k is to be interpreted
as the Hermite polynomial Hk and then the whole term becomes a polynomial
in z with the “normal” multiplication “·”. ξk denotes the scalar that results
when the “normal” polynomial ξk (H) is evaluated at the fixed quantile z, while
ξk (H) denotes the expression in the (+, ∗)-algebra.
1.3 Cornish-Fisher Approximations 15
Contents of r
[1,] 2 4.2527
[2,] 3 5.3252
[3,] 4 5.0684
[4,] 5 5.2169
[5,] 6 5.1299
[6,] 7 5.1415
[7,] 8 5.255
1.3.2 Properties
+ The error for the 99%-VaR on the real-world examples - which turned
out to be remarkably close to normal - was about 10−6 σ, which is more
than sufficient. (The error was normalized with respect to the portfolio’s
standard deviation, σ.)
− The (lower bound on the) worst-case error for the one- and two-dimensional
problems was about 1.0σ, which corresponds to a relative error of up to
100%.
Let
R ∞ fitxdenote a continuous, absolutely integrable function and φ(t) =
−∞
e f (x)dx its Fourier transform. Then, the inversion formula
Z ∞
1
f (x) = e−itx φ(t)dt (1.16)
2π −∞
holds.
The key to an error analysis of trapezoidal, equidistant approximations to the
integral (1.16)
∞
def ∆t
X
f˜(x, ∆t , t) = φ(t + k∆t )e−i(t+k∆t )x (1.17)
2π
k=−∞
1.4 Fourier Inversion 17
see (Abate and Whitt, 1992, p.22). If f (x) is approximated by f˜(x, ∆t , 0), the
residual X 2π
ea (x, ∆t , 0) = f (x + j) (1.19)
∆t
j6=0
is called the aliasing error, since different “pieces” of f are aliased into the
window (−π/∆t , π/∆t ). Another suitable choice is t = ∆t /2:
∞
X 2π
f˜(x, ∆t , ∆t /2) = f (x + j)(−1)j . (1.20)
j=−∞
∆t
def ˜
Call et (x, T, ∆t , t) = f˜(x, T, ∆t , t) − f˜(x, ∆t , t) the truncation error.
For practical purposes, the truncation error et (x, T, ∆t , t) essentially depends
only on (x, T ) and the decision on how to choose T and ∆t can be decoupled.
et (x, T, ∆t , t) converges to
ZT
def1
et (x, T ) = e−itx φ(t)dt − f (x) (1.22)
2π
−T
Rπ sin(πx) def
for ∆t ↓ 0. Using 1
2π −π
e−itx dt = πx = sinc(x) and the convolution
theorem, one gets
π/∆
Z x Z ∞
1
e−itx φ(t)dt = f (y∆x ) sinc(x/∆x − y)dy, (1.23)
2π −∞
−π/∆x
18 1 Approximating Value at Risk in Conditional Gaussian Models
which provides an explicit expression for the truncation error et (x, T ) in terms
of f . It decreases only slowly with T ↑ ∞ (∆x ↓ 0) if f does not have infinitely
many derivatives, or equivalently, φ has “power tails”. The following lemma
leads to the asymptotics of the truncation error in this case.
R∞
LEMMA 1.1 If limt→∞ α(t) = 1, ν > 0, and T α(t)t−ν eit dt exists and is
finite for some T , then
Z ∞ (
−ν itx
1
T −ν+1 if x = 0
α(t)t e dt ∼ ν−1
i −ν ixT
(1.24)
T xT e if x 6= 0
for T → ∞.
PROOF:
Under the given conditions, both the left and the right hand side converge to 0,
so l’Hospital’s rule is applicable to the ratio of the left and right hand sides.
where < denotes the real part, has the asymptotic behavior
wT −ν+1
(
π(1−ν) cos(b) if x = x∗
∼ wT −ν π
(1.26)
− π(x∗ −x)
cos(b + 2 + (x∗ − x)T ) if x 6= x∗
RT
for T → ∞ at all points x where 2π 1
−T
φ(t)e−itx converges to f (x). (If in the
first case cos(b) = 0, this shall mean that limT →∞ et (x; T )T ν−1 = 0.)
1.4 Fourier Inversion 19
PROOF:
The previous lemma is applicable for all points x where the Fourier inversion
integral converges.
The theorem completely characterizes the truncation error for those cases,
where f has a “critical point of non-smoothness” and has a higher degree of
smoothness everywhere else. The truncation error decreases one power faster
away from the critical point than at the critical point. Its amplitude is inversely
proportional to the distance from the critical point.
Let F̃ be a (continuous) approximation to a (differentiable) cdf F with f =
F 0 > 0. Denote by ≥ |F̃ (x) − F (x)| a known error-bound for the cdf. Any
solution q̃(x) to F̃ (q̃(x)) = F (x) may be considered an approximation to the
true F (x)-quantile x. Call eq (x) = q̃(x) − x the quantile error. Obviously, the
quantile error can be bounded by
|eq (x)| ≤ , (1.27)
inf y∈U f (y)
holds.
FFT-based Fourier inversion yields approximations for the cdf F on equidistant
∆x -spaced grids. Depending on the smoothness of F , linear or higher-order
interpolations may be used. Any monotone interpolation of {F (x0 + ∆x j)}j
yields a quantile approximation whose interpolation error can be bounded by
∆x . This bound can be improved if an upper bound on the density f in a
suitable neighborhood of the true quantile is known.
20 1 Approximating Value at Risk in Conditional Gaussian Models
with
m
def
Y 1
w∗ = |λi |−1/2 exp − (δi /λi )2 . (1.31)
i=1
2
The arg has the form
m
X 1 1 λi t
arg φ(t) = θt + arctan(λi t) − δi2 t2 , (1.32)
i=1
2 2 1 + λ2i t2
m
δi2 t
X π
arg φ(t) ∼ θt + sign(λi t) − ) (1.33)
i=1
4 2λi
def 2 2
Let ψ(t) = ti (φ(t)−eiµt−σ t /2 ). Since ψ(−t) = ψ(t), the truncated sum (1.21)
can for t = ∆t /2 and T = (K − 21 )∆t be written as
K−1
!
˜ ∆t X 1 −i((k+ 12 )∆t )xj
F̃ (xj ; T, ∆t , t) − Φ(xj ) = < ψ((k + )∆t )e ,
π 2
k=0
with ∆x ∆t = 2π
N and the last N − K components of the input vector to the
FFT are padded with zeros.
The aliasing error of the approximation (1.20) applied to F − N is
X 2π 2π
ea (x, ∆t , ∆t /2) = F (x + j) − Φ(x + j) (−1)j . (1.40)
∆t ∆t
j6=0
22 1 Approximating Value at Risk in Conditional Gaussian Models
√ √
The cases (λ, δ, θ) = (± 2, 0, ∓ 2/2) are the ones with the fattest tails and
are thus candidates for the worst case for (1.40), asymptotically for ∆t → 0. In
these cases, (1.40) is eventually an alternating sequence of decreasing absolute
value and thus
r
2 − 1 √2π/∆t
F (−π/∆t ) + 1 − F (π/∆t ) ≤ e 2 (1.41)
πe
is an asymptotic bound for the aliasing error.
The truncation error (1.22) applied to F − N is
Z ∞
1 i 2 2
φ(t) − eiµt−σ t /2 dt .
et (x; T ) = − < (1.42)
π T t
The Gaussian part plays no role asymptotically for T → ∞ and Theorem 1.1
applies with ν = m/2 + 1.
The quantile error for a given parameter ϑ is
√
with w = 2−1/4 , x∗ = 2/2, and the 1%-quantile x ≈ −3.98. (Note that this
is suitable only for intermediate K, leading to accuracies of 1 to 4 digits in the
quantile. For higher K, other cases become the worst case for the ratio of the
truncation error over the density at the quantile.)
Since F − N has a kink in the case m = 1, λ 6= 0, higher-order interpolations
are futile in non-adaptive methods and ∆x = N2π ∆t is a suitable upper bound
for the interpolation error. By experimentation, N ≈ 4K suffices to keep the
interpolation error comparatively small.
K = 26 evaluations of φ (N = 28 ) suffice to ensure an accuracy of 1 digit in the
approximation of the 1%-quantile over a sample of one- and two-factor cases.
K = 29 function evaluations are needed for two digits accuracy. The XploRe
implementation of the Fourier inversion is split up as follows:
z= VaRcharfDGF2(t,par)
def 2 2
implements the function ψ(t) = ti (φ(t)−eiµt−σ t /2 ) for the com-
plex argument t and the parameter list par.
z= VaRcorrfDGF2(x,par)
implements the correction term Φ(x, µ, σ 2 ) for the argument x
and the parameter list par.
vec= gFourierInversion(N,K,dt,t0,x0,charf,par)
implements a generic Fourier inversion like in (1.39). charf is a
string naming the function to be substituted for ψ in (1.39). par
is the parameter list passed to charf.
l= VaRcdfDG(par,N,K,dt)
to approximate the cumulative distribution function (cdf) of the
distribution from the class of quadratic forms of Gaussian vectors
with parameter list par. The output is a list of two vectors x and
y, containing the cdf-approximation on a grid given by x.
q= cdf2quant(a,l)
approximates the a-quantile from the list l, as returned from
VaRcdfDG.
q= VaRqDG(a,par,N,K,dt)
calls VaRcdfDG and cdf2quant to approximate an a-quantile for
the distribution of the class of quadratic forms of Gaussian vectors
that is defined by the parameter list par.
The following example plots the 1%-quantile for a one-parametric family of the
class of quadratic forms of one- and two-dimensional Gaussian vectors:
XFGqDGtest.xpl
Equation 1.1 defines the class of Delta-Gamma normal methods. The detailed
procedures to implement the partial Monte-Carlo method are as follows
1. Antithetic Method
We assume Wi = f (zi ), where zi ∈ Rm are independent samples from the
standard normal distribution. In our case, the function f is defined as
m
X 1
f (zi ) = I(Li > l) = I[− (δi zi + λi zi2 ) > l]. (1.46)
i=1
2
Since the term in parentheses has expectation zero, equation (1.52) pro-
vides an unbiased estimator of µ as long as β is independent. In practice,
1.5 Variance Reduction Techniques in Monte-Carlo Simulation 27
where Φ−1 is the inverse of the standard normal cdf. (In order to achieve
satisfactory sampling results, we need a good numerical procedure to cal-
culate Φ−1 .) An alternative is to apply the stratification only to the most
28 1 Approximating Value at Risk in Conditional Gaussian Models
Expectation is taken with z sampled from N(µ, Σ) rather than its original
distribution N(0, I). To correct for this change of distribution, we weight the
loss indictor I(L > l) by the likelihood ratio
>
1
Σ−1 µ − 12 [z > (I−Σ−1 )z−2µ> Σ−1 z]
θ(z) = |Σ|1/2 e− 2 µ e , (1.56)
1. Decomposition Process
We follow the decomposition steps described in the section 1.2 and find
the cumulant generating function of L given by
m
X 1 (ωδi )2
κ(ω) = [ − log(1 − ωλi )] (1.57)
i=1
2 1 − ωλi
The function VaRestMC uses the different types of variance reduction to calcu-
late the VaR by the partial Monte-Carlo simulation. We employ the variance
reduction techniques of moment matching, Latin Hypercube Sampling and im-
portance sampling. The output is the estimated VaR. In order to test the
efficiency of different Monte-Carlo sampling methods, we collect data from the
MD*BASE and construct a portfolio consisting of three German stocks (Bayer,
Deutsche Bank, Deutsche Telekom) and corresponding 156 options on these un-
derlying stocks with maturity ranging from 18 to 211 days on May 29, 1999.
The total portfolio value is 62,476 EUR. The covariance matrix for the stocks
is provided as well. Using the Black-Scholes model, we also construct the ag-
gregate delta and aggregate gamma as the input to the Quantlet. By choosing
the importance sampling method, 0.01 confidence level, 1 days forecast horizon
and 1,000 times of simulation, the result of the estimation is as follows.
XFGVaRMC.xpl
Contents of VaRMC
[1,] 771.73
It tells us that we expect the loss to exceed 771.73 EUR or 1.24% of portfolio
value with less than 1% probability in 1 day. However, the key question of
the empirical example is that how much variance reduction is achieved by the
different sampling methods. We run each of the four sampling methods 1,000
1.5 Variance Reduction Techniques in Monte-Carlo Simulation 31
times and estimated the standard error of the estimated VaR for each sampling
method. The table (1.1) summarizes the results.
As we see from the table (1.1), the standard error of the importance sampling
is 84.68% less than those of plain-vanilla sampling and it demonstrates that
approximately 42 times more scenarios would have to be generated using the
plain-vanilla method to achieve the same precision obtained by importance
sampling based on Delta-Gamma approximation. These results clearly indicate
the great potential speed-up of estimation of the VaR by using the importance
sampling method. This is why we set the importance sampling as the default
sampling method in the function VaRestMC. However, the Latin Hypercube
sampling method also achieved 42.31% of variance reduction. One advantage
of the Latin Hypercube sampling method is that the decomposition process is
not necessary. Especially when the number of risk factors (m) is large, the
decomposition (O(m3 )) dominates the sampling (O(m)) and summation O(1)
in terms of computational time. In this case, Latin Hypercube sampling may
offer the better performance in terms of precision for a given computational
time.
Bibliography
Abate, J. and Whitt, W. (1992). The Fourier-series method for inverting trans-
forms of probability distributions, Queuing Systems Theory and Applica-
tions 10: 5–88.
Albanese, C., Jackson, K. and Wiberg, P. (2000). Fast convolution method for
VaR and VaR gradients, http://www.math-point.com/fconv.ps.
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra,
J., Croz, J. D., Greenbaum, A., Hammarling, S., McKenney, A. and
32 1 Approximating Value at Risk in Conditional Gaussian Models
Jorion, P. (2000). Value at Risk: The New Benchmark for Managing Financial
Risk, McGraw-Hill, New York.
Lee, Y. S. and Lin, T. K. (1992). Higher-order Cornish Fisher expansion,
Applied Statistics 41: 233–240.
Lee, Y. S. and Lin, T. K. (1993). Correction to algorithm AS269 : Higher-order
Cornish Fisher expansion, Applied Statistics 42: 268–269.
Li, D. (1999). Value at Risk based on the volatility, skewness and kurtosis,
http://www.riskmetrics.com/research/working/var4mm.pdf. Risk-
Metrics Group.
Longerstaey, J. (1996). RiskMetrics technical document, Technical Report
fourth edition, J.P.Morgan. originally from http://www.jpmorgan.com/
RiskManagement/RiskMetrics/, now http://www.riskmetrics.com.
McKay, M. D., Beckman, R. J. and Conover, W. J. (1979). A comparison
of three methods for selecting values of input variables in the analysis of
output from a computer code, Technometrics 21(2): 239–245.
Mina, J. and Ulmer, A. (1999). Delta-gamma four ways, http://www.
riskmetrics.com.
Pichler, S. and Selitsch, K. (1999). A comparison of analytical VaR method-
ologies for portfolios that include options, http://www.tuwien.ac.at/
E330/Research/paper-var.pdf. Working Paper TU Wien.
Pritsker, M. (1996). Evaluating Value at Risk methodologies: Accuracy versus
computational time, http://wrdsenet.wharton.upenn.edu/fic/wfic/
papers/96/9648.pdf. Wharton Financial Institutions Center Working
Paper 96-48.
Rogers, L. and Zane, O. (1999). Saddle-point approximations to option prices,
Annals of Applied Probability 9(2): 493–503. http://www.bath.ac.uk/
~maslcgr/papers/.
Rouvinez, C. (1997). Going greek with VaR, Risk 10(2): 57–65.
Zangari, P. (1996a). How accurate is the delta-gamma methodology?, Risk-
Metrics Monitor 1996(third quarter): 12–29.
Zangari, P. (1996b). A VaR methodology for portfolios that include options,
RiskMetrics Monitor 1996(first quarter): 4–12.
2 Applications of Copulas for the
Calculation of Value-at-Risk
Jörn Rank and Thomas Siegl
We will focus on the computation of the Value-at-Risk (VaR) from the perspec-
tive of the dependency structure between the risk factors. Apart from historical
simulation, most VaR methods assume a multivariate normal distribution of
the risk factors. Therefore, the dependence structure between different risk
factors is defined by the correlation between those factors. It is shown in Em-
brechts, McNeil and Straumann (1999) that the concept of correlation entails
several pitfalls. The authors therefore propose the use of copulas to quantify
dependence.
For a good overview of copula techniques we refer to Nelsen (1999). Copulas
can be used to describe the dependence between two or more random variables
with arbitrary marginal distributions. In rough terms, a copula is a function
C : [0, 1]n → [0, 1] with certain special properties. The joint multidimensional
cumulative distribution can be written as
P(X1 ≤ x1 , . . . , Xn ≤ xn ) = C (P(X1 ≤ x1 ), . . . , P(Xn ≤ xn ))
= C (F1 (x1 ), . . . , Fn (xn )) ,
where F1 , . . . , Fn denote the cumulative distribution functions of the n random
variables X1 , . . . , Xn . In general, a copula C depends on one or more cop-
ula parameters p1 , . . . , pk that determine the dependence between the random
variables X1 , . . . , Xn . In this sense, the correlation ρ(Xi , Xj ) can be seen as a
parameter of the so-called Gaussian copula.
Here we demonstrate the process of deriving the VaR of a portfolio using the
copula method with XploRe, beginning with the estimation of the selection
of the copula itself, estimation of the copula parameters and the computation
of the VaR. Backtesting of the results is performed to show the validity and
relative quality of the results. We will focus on the case of a portfolio containing
36 2 Applications of Copulas for the Calculation of Value-at-Risk
two market risk factors only, the FX rates USD/EUR and GBP/EUR. Copulas
in more dimensions exist, but the selection of suitable n-dimensional copulas
is still quite limited. While the case of two risk factors is still important for
applications, e.g. spread trading, it is also the case that can be best described.
As we want to concentrate our attention on the modelling of the dependency
structure, rather than on the modelling of the marginal distributions, we re-
strict our analysis to normal marginal densities. On the basis of our backtesting
results, we find that the copula method produces more accurate results than
“correlation dependence”.
2.1 Copulas
In this section we summarize the basic results without proof that are necessary
to understand the concept of copulas. Then, we present the most important
properties of copulas that are needed for applications in finance. In doing so,
we will follow the notation used in Nelsen (1999).
2.1.1 Definition
It is shown in Nelsen (1999) that H has margins F1 and F2 that are given by
def def
F1 (x1 ) = H(x1 , +∞) and F2 (x2 ) = H(+∞, x2 ), respectively. Furthermore,
F1 and F2 themselves are distribution functions. With Sklar’s Theorem, the
use of the name “copula” becomes obvious. It was chosen by Sklar (1996)
to describe “a function that links a multidimensional distribution to its one-
dimensional margins” and appeared in mathematical literature for the first
time in Sklar (1959).
From Sklar’s Theorem we know that there exists a unique copula C with
P(R1 ≤ r1 , R2 ≤ r2 ) = H(r1 , r2 ) = C(F1 (r1 ), F2 (r2 )) . (2.6)
Independence can be seen using Equation (2.4) for the joint distribution func-
tion H and the definition of Π,
H(r1 , r2 ) = C(F1 (r1 ), F2 (r2 )) = F1 (r1 ) · F2 (r2 ) . (2.7)
see Embrechts, McNeil and Straumann (1999). In (2.8), fρ denotes the bivariate
normal density function with correlation ρ for n = 2. The functions Φ1 , Φ2
in (2.8) refer to the corresponding one-dimensional, cumulated normal density
functions of the margins.
In the case of vanishing correlation, ρ = 0, the Gaussian copula becomes
Z Φ1−1 (u) Z Φ−1
2 (v)
Gauss
C0 (u, v) = f1 (r1 )dr1 f2 (r2 )dr2
−∞ −∞
= uv (2.9)
= Π(u, v) if ρ = 0 .
Result (2.9) is a direct consequence of Theorem 2.2.
As Φ1 (r1 ), Φ2 (r2 ) ∈ [0, 1], one can replace u, v in (2.8) by Φ1 (r1 ), Φ2 (r2 ). If
one considers r1 , r2 in a probabilistic sense, i.e. r1 and r2 being values of two
random variables R1 and R2 , one obtains from (2.8)
CρGauss (Φ1 (r1 ), Φ2 (r2 )) = P(R1 ≤ r1 , R2 ≤ r2 ) . (2.10)
In other words: CρGauss (Φ1 (r1 ), Φ2 (r2 )) is the binormal cumulated probability
function.
2.1 Copulas 39
From (2.12) it follows that every copula C is uniformly continuous on its do-
main. A further important property of copulas concerns the partial derivatives
of a copula with respect to its variables:
THEOREM 2.4 Let C be a copula. For every u ∈ [0, 1], the partial derivative
∂ C/∂ v exists for almost every v ∈ [0, 1]. For such u and v one has
∂
0≤ C(u, v) ≤ 1 . (2.13)
∂v
The analogous statement is true for the partial derivative ∂ C/∂ u.
def def
In addition, the functions u → Cv (u) = ∂ C(u, v)/∂ v and v → Cu (v) =
∂ C(u, v)/∂ u are defined and nondecreasing almost everywhere on [0,1].
40 2 Applications of Copulas for the Calculation of Value-at-Risk
∂ n 1/θ o
Cθ,u (v) = Cθ (u, v) = exp − (− ln u)θ + (− ln v)θ ×
∂u
− θ−1 (− ln u)θ−1
(− ln u)θ + (− ln v)θ
θ
. (2.14)
u
Note that for u ∈ (0, 1) and for all θ ∈ R where θ > 1, Cθ,u is a strictly
−1
increasing function of v. Therefore the inverse function Cθ,u is well defined.
−1
However, as one might guess from (2.14), Cθ,u can not be calculated analytically
so that some kind of numerical algorithm has to be used for this task. As Cθ
is symmetric in u and v, the partial derivative of Cθ with respect to v shows
an identical behaviour for the same set of parameters.
We will end this section with a statement on the behaviour of copulas under
strictly monotone transformations of random variables.
The copula method works with any given marginal distribution, i.e. it does
not restrict the choice of margins. However, we will use normal margins for
simplicity and in order to allow a comparison with standard VaR methods.
2.2 Computing Value-at-Risk with Copulas 41
A wide variety of copulas exists, mainly for the two dimensional case (Nelsen
(1999)). In our numerical tests, we will use some of the copulas presented
in Table 4.1 of Nelsen (1999) in our experiments for comparison which are
implemented in the function
C = VaRcopula(uv,theta,0,copula)
returns Cθ (u, v) for copula copula with parameter θ = theta. uv
is a n × 2 vector of coordinates, where the copula is calculated.
For easy reference the implemented copulas are given in Table 2.1.
for t ∈ 1, . . . , T . For simplicity we assume that the s(t) are realizations of i.i.d.
random variables S (t) . The first step will be to determine the parameters of
the marginal distributions. In the numerical example we will use the normal
distribution N(0, σi2 ), and estimate the volatility σi using an equally weighted
1
PT (t) 2 (t) (t) (t−1)
volatility estimator σ̂i2 = T −1 t=2 (ri ) of the returns ri = log(si /si )
for simplicity. The marginal distributions of the risk factors are then log-
normal. The remaining task is to estimate the copula parameters. In the
XploRe VaR quantlib this is done by the function
res = VaRfitcopula(history,copula,method)
fits the copula to the history using fitting function method.
The result res is a list containing the estimates of the copula
parameter together with there standard deviations.
Least Square Fit The main idea of the least square fit is that the cumulative
(C)
distribution function Fθ (x) defined by the copula C should fit the sample
42 2 Applications of Copulas for the Calculation of Value-at-Risk
# Cθ (u,v) = θ∈
−θ −θ −1/θ
1 max [u + v − 1] ,0 [−1, ∞)\{0}
2 max 1 − [(1 − u)θ + (1 − v)θ − 1]1/θ , 0 [1, ∞)
uv
3 1−θ(1−u)(1−v) [−1, 1)
θ θ 1/θ
4 exp −[(− ln u) + (− ln v) ] [1, ∞)
−θu −θv
5 − θ1 ln 1 + (e −1)(e −θ
e −1
−1)
(−∞, ∞)\{0}
h i1/θ
6 1 − (1 − u) + (1 − v)θ − (1 − u)θ (1 − v)θ )
θ
[1, ∞)
h i
7 max θuv + (1 − θ)(u + v − 1), 0 (0, 1]
h 2 i
8 max θ2θ−(θ−1)
uv−(1−u)(1−v)
2 (1−u)(1−v) , 0 (0, 1]
9 uv exp(−θ ln u ln v) (0, 1]
h i1/θ
10 uv/ 1 + (1 − uθ )(1 − v θ ) (0, 1]
h i1/θ
11 max uθ v θ − 2(1 − uθ )(1 − v θ ) ,0 (0, 1/2]
h i1/θ −1
12 1 + (u−1 − 1)θ + (v −1 − 1)θ [1, ∞)
h i1/θ
13 exp 1 − (1 − ln u)θ + (1 − ln v)θ − 1 (0, ∞)
h i1/θ −θ
14 1 + (u−1/θ − 1)θ + (v −1/θ − 1)θ [1, ∞)
n h i1/θ oθ
15 max 1 − (1 − u1/θ )θ + (1 − v 1/θ )θ ,0 [1, ∞)
1
√
16 2 [0, ∞)
2 S + S + 4θ
,→ S = u + v − 1 − θ u1 + v1 − 1
θ θ1
21 1 − 1 − max(S(u) + S(v) − 1, 0) [1, ∞)
h i1/θ
,→ S(u) = 1 − (1 − u)θ
PT (t) (t)
distribution function S(x) = T1 t=1 1(s1 ≤ x1 , . . . , sn ≤ xn ) as close as
possible in the mean square sense. The function 1(A) is the indicator function
of the event A. In order to solve the least square problem on a computer, a
(C)
discretization of the support of Fθ is needed, for which the sample set s(t)
2.2 Computing Value-at-Risk with Copulas 43
seems to be well suited. The copula parameter estimators are therefore the
solution of the following minimization problem:
T 2
X (c) 1
min Fθ (s(t) ) (t)
− S(s ) + subject to θ ∈ DC .
t=1
2T
using the Newton method on the first derivative (method = 1). The addition of
1 1
2T avoids problems that result from the T jumps at the sample points. While
this method is inherently numerically stable, it will produce unsatisfactory
results when applied to risk management problems, because the minimization
will fit the copula best where there are the most datapoints, and not necessarily
at the extreme ends of the distribution. While this can be somewhat rectified
by weighting schemes, the maximum likelihood method does this directly.
Assume now that the copula C has been selected. For risk management pur-
poses, we are interested in the Value-at-Risk of a position. While analytical
methods for the computation of the Value-at-Risk exist for the multivariate
normal distribution (i.e. for the Gaussian copula), we will in general have
to use numerical simulations for the computation of the VaR. To that end,
we need to generate pairs of random variables (X1 , X2 ) ∼ F (C) , which form
44 2 Applications of Copulas for the Calculation of Value-at-Risk
scenarios of possible changes of the risk factor. The Monte Carlo method gen-
erates a number N of such scenarios, and evaluates the present value change of
a portfolio under each scenario. The sample α−quantile is then the one period
Value-at-Risk with confidence α.
Our first task is to generate pairs (u, v) of observations of U (0, 1) distributed
random variables U and V whose joint distribution function is C(u, v). To
reach this goal we use the method of conditional distributions. Let cu denote
the conditional distribution function for the random variable V at a given value
u of U ,
def
cu (v) = P(V ≤ v, U = u) . (2.15)
From (2.6) we have
v = VaRcopula(uv,theta,-1,copula)
returns inverse v = c−1
u such that res = cu (u, v) for copula copula
with parameter θ = theta. uv is a n × 2 vector of coordinates,
where the copula is calculated.
VaR = VaRestMCcopula(history,a,copula,opt)
fits the copula copula to the history history and returns the
N-sample Monte Carlo Value-at-Risk with confidence level α =
alpha for position a. N and alpha are contained in list opt.
2.3 Examples
In this section we show possible applications for the Gumbel-Hougaard copula,
i.e. for copula = 4. First we try to visualize C4 (u, v) in Figure 2.1.
XFGaccvar1.xpl
46 2 Applications of Copulas for the Calculation of Value-at-Risk
(0.0,0.0,1.0)
C(u,v)
0.8
v
u
0.5
0.2
(0.0,1.0,0.0)
0.8
0.5
0.2
In the next Figure 2.2 we show an example of copula sampling for fixed pa-
rameters σ1 = 1, σ2 = 1, θ = 3 for copulas numbered 4, 5, 6, and 12, see Table
2.1.
XFGaccvar2.xpl
In order to investigate the connection between the Gaussian and Copula based
dependency structure we plot θ against correlation ρ in Figure 2.3. We assume
that tmin and tmax hold the minimum respectively maximum possible θ val-
ues. Those can also be obtained by tmin=VaRcopula(0,0,0,8,copula) and
tmax=VaRcopula(0,0,0,9,copula). Care has to be taken that the values are
finite, so we have set the maximum absolute θ bound to 10.
XFGaccvar3.xpl
2.4 Results 47
Copula4 Copula5
2
0 2
v
v
0
-2
-2
-4 -2 0 2 -2 0 2
u u
Copula6 Copula12
3
2
2
1
0
0
v
v
-1
-2
-2
-3
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2
u u
2.4 Results
To judge the effectiveness of a Value-at-Risk model, it is common to use back-
testing. A simple approach is to compare the predicted and empirical number
of outliers, where the actual loss exceeds the VaR. We implement this test in
a two risk factor model using real life time series, the FX rates USD/EUR
and GBP/EUR, respectively their DEM counterparts before the introduction
of the Euro. Our backtesting investigation is based on a time series ranging
from 2 Jan. 1991 until 9 Mar. 2000 and simple linear portfolios i = 1, . . . , 4:
1
Correlation
0.5
0
2 4 6 8 10
theta
The Value-at-Risk is computed with confidence level 1−αi (α1 = 0.1, α2 = 0.05,
and α3 = 0.01) based on a time series for the statistical estimators of length
T = 250 business days. The actual next day value change of the portfolio is
compared to the VaR estimate. If the portfolio loss exceeds the VaR estimate,
an outlier has occurred. This procedure is repeated for each day in the time
series.
The prediction error as the absolute difference of the relative number of out-
liers α̂ to the predicted number α is averaged over different portfolios and con-
fidence levels. The average over the portfolios (a1 = (−3, −2) a2 = (+3, −2)
a3 = (−3, +2) a4 = (+3, +2)) uses equal weights, while the average over the
confidence levels i emphasizes the tails by a weighting scheme wi (w1 = 1,
w2 = 5, w3 = 10). Based on the result, an overall error and a relative ranking
of the different methods is obtained (see Table 2.2).
As benchmark methods for Value-at-Risk we use the variance-covariance (vcv)
method and historical simulation (his), for details see Deutsch and Eller (1999).
The variance covariance method is an analytical method which uses a multi-
variate normal distribution. The historical simulation method not only includes
2.4 Results 49
the empirical copula, but also empirical marginal distributions. For the cop-
ula VaR methods, the margins are assumed to be normal, the only difference
between the copula VaR’s is due to different dependence structures (see Table
2.1). Mainly as a consequence of non-normal margins, the historical simulation
has the best backtest results. However, even assuming normal margins, certain
copulas (5, 12–14) give better backtest results than the traditional variance-
covariance method.
Copula as in Table 2.1
α= a= his vcv 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 21
.10 a1 .103 .084 .111 .074 .100 .086 .080 .086 .129 .101 .128 .129 .249 .090 .087 .084 .073 .104 .080
.05 a1 .053 .045 .066 .037 .059 .041 .044 .040 .079 .062 .076 .079 .171 .052 .051 .046 .038 .061 .041
.01 a1 .015 .019 .027 .013 .027 .017 .020 .016 .032 .027 .033 .034 .075 .020 .022 .018 .015 .027 .018
.10 a2 .092 .078 .066 .064 .057 .076 .086 .062 .031 .049 .031 .031 .011 .086 .080 .092 .085 .065 .070
.05 a2 .052 .044 .045 .023 .033 .041 .049 .031 .012 .024 .012 .013 .003 .051 .046 .054 .049 .039 .032
.01 a2 .010 .011 .016 .002 .007 .008 .009 .006 .002 .002 .002 .002 .001 .015 .010 .018 .025 .011 .005
.10 a3 .099 .086 .126 .086 .064 .088 .096 .073 .032 .054 .033 .031 .016 .094 .086 .105 .133 .070 .086
.05 a3 .045 .048 .093 .047 .032 .052 .050 .040 .017 .026 .017 .016 .009 .049 .047 .058 .101 .034 .050
.01 a3 .009 .018 .069 .018 .012 .018 .016 .012 .007 .009 .006 .006 .002 .018 .015 .018 .073 .013 .020
.10 a4 .103 .090 .174 .147 .094 .095 .086 .103 .127 .094 .129 .127 .257 .085 .085 .085 .136 .088 .111
.05 a4 .052 .058 .139 .131 .056 .060 .058 .071 .084 .068 .084 .085 .228 .053 .054 .051 .114 .053 .098
.01 a4 .011 .020 .098 .108 .017 .025 .025 .035 .042 .056 .041 .042 .176 .016 .017 .016 .087 .015 .071
.10 Avg .014 .062 .145 .123 .085 .055 .052 .082 .193 .104 .194 .194 .478 .045 .061 .045 .110 .082 .075
.05 Avg .011 .021 .154 .124 .051 .030 .016 .060 .134 .080 .132 .136 .387 .006 .012 .017 .127 .041 .075
.01 Avg .007 .029 .169 .117 .028 .031 .032 .036 .065 .071 .065 .067 .249 .029 .025 .029 .160 .026 .083
Avg Avg .009 .028 .163 .120 .039 .032 .028 .047 .095 .076 .094 .096 .306 .022 .023 .026 .147 .034 .080
Rank 1 6 18 16 9 7 5 10 14 11 13 15 19 2 3 4 17 8 12
Table 2.2. Relative number of backtest outliers α̂ for the VaR with
confidence 1 − α, weighted average error |α̂ − α| and error ranking.
XFGaccvar4.xpl
Bibliography
H.-P. Deutsch, R. Eller (1999). Derivatives and Internal Models. Macmillan
Press.
T. P. Hutchinson, C. D. Lai (1990). Continuous Bivariate Distributions, Em-
phasising Applications. Rumsby Scientific Publishing, Adelaide.
P. Embrechts, A. McNeil, D. Straumann (1999).Correlation: Pitfalls and Al-
ternatives. RISK, May, pages 69-71.
P. Embrechts, A. McNeil, D. Straumann (1999).Correlation and Dependence
in Risk Management: Properties and Pitfalls. Preprint ETH Zürich.
50 2 Applications of Copulas for the Calculation of Value-at-Risk
3.1 Introduction
Modeling spread risk for interest rate products, i.e., changes of the yield differ-
ence between a yield curve characterizing a class of equally risky assets and a
riskless benchmark curve, is a challenge for any financial institution seeking to
estimate the amount of economic capital utilized by trading and treasury activ-
ities. With the help of standard tools this contribution investigates some of the
characteristic features of yield spread time series available from commercial
data providers. From the properties of these time series it becomes obvious
that the application of the parametric variance-covariance-approach for esti-
mating idiosyncratic interest rate risk should be called into question. Instead
we apply the non-parametric technique of historical simulation to synthetic
zero-bonds of different riskiness, in order to quantify general market risk and
spread risk of the bond. The quality of value-at-risk predictions is checked by a
backtesting procedure based on a mark-to-model profit/loss calculation for the
zero-bond market values. From the backtesting results we derive conclusions
for the implementation of internal risk models within financial institutions.
Residual risk and event risk form the two components of so-called specific price
risk or specific risk — a term used in documents on banking regulation, Bank for
International Settlements (1998a), Bank for International Settlements (1998b)
— and characterize the contribution of the individual risk of a given financial
instrument to its overall risk.
The distinction between general market risk and residual risk is not unique but
depends on the choice of the benchmark curve, which is used in the analysis
of general market risk: The market for interest rate products in a given cur-
rency has a substructure (market-sectors), which is reflected by product-specific
(swaps, bonds, etc.), industry-specific (bank, financial institution, retail com-
pany, etc.) and rating-specific (AAA, AA, A, BBB, etc.) yield curves. For the
most liquid markets (USD, EUR, JPY), data for these sub-markets is available
from commercial data providers like Bloomberg. Moreover, there are addi-
tional influencing factors like collateral, financial restrictions etc., which give
3.3 Descriptive Statistics of Yield Spread Time Series 53
rise to further variants of the yield curves mentioned above. Presently, however,
hardly any standardized data on these factors is available from data providers.
The larger the universe of benchmark curves a bank uses for modeling its
interest risk, the smaller is the residual risk. A bank, which e.g. only uses
product-specific yield curves but neglects the influence of industry- and rating-
specific effects in modelling its general market risk, can expect specific price
risk to be significantly larger than in a bank which includes these influences
in modeling general market risk. The difference is due to the consideration of
product-, industry- and rating-specific spreads over the benchmark curve for
(almost) riskless government bonds. This leads to the question, whether the
risk of a spread change, the spread risk, should be interpreted as part of the
general market risk or as part of the specific risk. The uncertainty is due to
the fact that it is hard to define what a market-sector is. The definition of
benchmark curves for the analysis of general market risk depends, however,
critically on the market sectors identified.
We will not further pursue this question in the following but will instead inves-
tigate some properties of this spread risk and draw conclusions for modeling
spread risk within internal risk models. We restrict ourselves to the continuous
changes of the yield curves and the spreads, respectively, and do not discuss
event risk. In this contribution different methods for the quantification of the
risk of a fictive USD zero bond are analyzed. Our investigation is based on
time series of daily market yields of US treasury bonds and US bonds (banks
and industry) of different credit quality (rating) and time to maturity.
institutions covers the interval from March 09 1992 to September 14 1999 and
corresponds to 1955 observations. We use yields for 3 and 6 month (3M, 6M)
as well as 1, 2, 3, 4, 5, 7, and 10 year maturities (1Y, 2Y, 3Y, 4Y, 5Y, 7Y, 10Y).
Each yield curve is based on information on the prices of a set of representative
bonds with different maturities. The yield curve, of course, depends on the
choice of bonds. Yields are option-adjusted but not corrected for coupon pay-
ments. The yields for the chosen maturities are constructed by Bloomberg’s
interpolation algorithm for yield curves. We use the USD treasury curve as a
benchmark for riskless rates and calculate yield spreads relative to the bench-
mark curve for the different rating categories and the two industries. We correct
the data history for obvious flaws using complementary information from other
data sources. Some parts of our analysis in this section can be compared with
the results given in Kiesel, Perraudin and Taylor (1999).
We store the time series of the different yield curves in individual files. The file
names, the corresponding industries and ratings and the names of the matrices
used in the XploRe code are listed in Table 3.2. Each file contains data for
the maturities 3M to 10Y in columns 4 to 12. XploRe creates matrices from
the data listed in column 4 of Table 3.2 and produces summary statistics for
the different yield curves. As example files the data sets for US treasury and
industry bonds with rating AAA are provided. The output of the summarize
command for the INAAA curve is given in Table 3.1.
Contents of summ
The long term means are of particular interest. Therefore, we summarize them
in Table 3.3. In order to get an impression of the development of the treasury
3.3 Descriptive Statistics of Yield Spread Time Series 55
yields in time, we plot the time series for the USTF 3M, 1Y, 2Y, 5Y, and 10Y
yields. The results are displayed in Figure 3.1, XFGtreasury.xpl. The
averaged yields within the observation period are displayed in Figure 3.2 for
USTF, INAAA, INBBB2, INBB2 and INB2, XFGyields.xpl.
In the next step we calculate spreads relative to the treasury curve by sub-
tracting the treasury curve from the rating-specific yield curves and store them
to variables SINAAA, SINAA2, etc. For illustrative purposes we display time
series of the 1Y, 2Y, 3Y, 5Y, 7Y, and 10Y spreads for the curves INAAA, INA2,
INBBB2, INBB2, INB2 in Figure 3.3, XFGseries.xpl.
We run the summary statistics to obtain information on the mean spreads.
Our results, which can also be obtained with the mean command, are collected
in Table 3.4, XFGmeans.xpl.
56 3 Quantification of Spread Risk by Means of Historical Simulation
Curve 3M 6M 1Y 2Y 3Y 4Y 5Y 7Y 10Y
USTF 4.73 4.92 5.16 5.50 5.71 5.89 6.00 6.19 6.33
INAAA 5.10 5.26 5.51 5.82 6.04 6.21 6.35 6.52 6.70
INAA2 5.19 5.37 5.59 5.87 6.08 6.26 6.39 6.59 6.76
INAA3 5.25 - 5.64 5.92 6.13 6.30 6.43 6.63 6.81
INA1 5.32 5.50 5.71 5.99 6.20 6.38 6.51 6.73 6.90
INA2 5.37 5.55 5.76 6.03 6.27 6.47 6.61 6.83 7.00
INA3 - - 5.84 6.12 6.34 6.54 6.69 6.91 7.09
INBBB1 5.54 5.73 5.94 6.21 6.44 6.63 6.78 7.02 7.19
INBBB2 5.65 5.83 6.03 6.31 6.54 6.72 6.86 7.10 7.27
INBBB3 5.83 5.98 6.19 6.45 6.69 6.88 7.03 7.29 7.52
INBB1 6.33 6.48 6.67 6.92 7.13 7.29 7.44 7.71 7.97
INBB2 6.56 6.74 6.95 7.24 7.50 7.74 7.97 8.34 8.69
INBB3 6.98 7.17 7.41 7.71 7.99 8.23 8.46 8.79 9.06
INB1 7.32 7.53 7.79 8.09 8.35 8.61 8.82 9.13 9.39
INB2 7.80 7.96 8.21 8.54 8.83 9.12 9.37 9.68 9.96
INB3 8.47 8.69 8.97 9.33 9.60 9.89 10.13 10.45 10.74
BNAAA 5.05 5.22 5.45 5.76 5.99 6.20 6.36 6.60 6.79
BNAA12 5.14 5.30 5.52 5.83 6.06 6.27 6.45 6.68 6.87
BNA1 5.22 5.41 5.63 5.94 6.19 6.39 6.55 6.80 7.00
BNA2 5.28 5.47 5.68 5.99 6.24 6.45 6.61 6.88 7.07
BNA3 5.36 5.54 5.76 6.07 6.32 6.52 6.68 6.94 7.13
Table 3.3. Long term mean for different USD yield curves
Now we calculate the 1-day spread changes from the observed yields and store
them to variables DASIN01AAA, etc. We run the descriptive routine to cal-
culate the first four moments of the distribution of absolute spread changes.
Volatility as well as skewness and kurtosis for selected curves are displayed in
Tables 3.5, 3.6 and 3.7.
XFGchange.xpl
5
4
3
7
6
5
5 10
Time to Maturity in Years
1Y-Spread (AAA, A2, BBB2, BB2, B2) 3Y-Spread (AAA, A2, BBB2, BB2, B2) 7Y-Spread (AAA, A2, BBB2, BB2, B2)
5
5
5
4
4
4
3
3
Spread in %
Spread in %
Spread in %
3
2
2
1
1
0
0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
Day*E2 Day*E2 Day*E2
2Y-Spread (AAA, A2, BBB2, BB2, B2) 5Y-Spread (AAA, A2, BBB2, BB2, B2) 10Y-Spread (AAA, A2, BBB2, BB2, B2)
6
5
5
4
4
3
Spread in %
Spread in %
Spread in %
3
3
2
2
1
1
0
0
0 5 10 15 20 0 0 5 10 15 20 0 5 10 15 20
Day*E2 Day*E2 Day*E2
Curve 3M 6M 1Y 2Y 3Y 4Y 5Y 7Y 10Y
INAAA 36 35 35 31 33 31 35 33 37
INAA2 45 45 43 37 37 36 40 39 44
INAA3 52 - 48 42 42 40 44 44 49
INA1 58 58 55 49 49 49 52 53 57
INA2 63 63 60 53 56 57 62 64 68
INA3 - - 68 62 63 64 69 72 76
INBBB1 81 82 78 71 72 74 79 83 86
INBBB2 91 91 87 80 82 82 87 90 94
INBBB3 110 106 103 95 98 98 104 110 119
INBB1 160 156 151 142 141 140 145 151 164
INBB2 183 182 179 173 179 185 197 215 236
INBB3 225 225 225 221 228 233 247 259 273
INB1 259 261 263 259 264 271 282 294 306
INB2 306 304 305 304 311 322 336 348 363
INB3 373 377 380 382 389 400 413 425 441
BNAAA 41 39 38 33 35 35 41 43 47
BNAA12 50 47 45 40 42 42 49 52 56
BNA1 57 59 57 52 54 54 59 64 68
BNA2 64 65 62 57 59 60 65 71 75
BNA3 72 72 70 65 67 67 72 76 81
Curve 3M 6M 1Y 2Y 3Y 4Y 5Y 7Y 10Y
INAAA 4.1 3.5 3.3 2.3 2.4 2.2 2.1 2.2 2.5
INAA2 4.0 3.5 3.3 2.3 2.4 2.2 2.2 2.2 2.5
INAA3 4.0 - 3.3 2.2 2.3 2.2 2.2 2.2 2.5
INA1 4.0 3.7 3.3 2.3 2.4 2.2 2.2 2.2 2.6
INA2 4.1 3.7 3.3 2.4 2.4 2.1 2.2 2.3 2.5
INA3 - - 3.4 2.4 2.4 2.2 2.2 2.3 2.6
INBBB1 4.2 3.6 3.2 2.3 2.3 2.2 2.1 2.3 2.6
INBBB2 4.0 3.5 3.4 2.3 2.4 2.1 2.2 2.3 2.6
INBBB3 4.2 3.6 3.5 2.4 2.5 2.2 2.3 2.5 2.9
INBB1 4.8 4.4 4.1 3.3 3.3 3.1 3.1 3.9 3.4
INBB2 4.9 4.6 4.5 3.8 3.8 3.8 3.7 4.3 4.0
INBB3 5.5 5.1 4.9 4.3 4.4 4.2 4.1 4.7 4.3
INB1 6.0 5.2 4.9 4.5 4.5 4.4 4.4 4.9 4.6
INB2 5.6 5.2 5.2 4.8 4.9 4.8 4.8 5.3 4.9
INB3 5.8 6.1 6.4 5.1 5.2 5.1 5.1 5.7 5.3
BNAAA 3.9 3.5 3.3 2.5 2.5 2.3 2.2 2.3 2.6
BNAA12 5.4 3.6 3.3 2.4 2.3 2.2 2.1 2.3 2.6
BNA1 4.1 3.7 3.2 2.1 2.2 2.1 2.0 2.2 2.6
BNA2 3.8 3.5 3.1 2.3 2.2 2.0 2.1 2.2 2.5
BNA3 3.8 3.5 3.2 2.2 2.2 2.1 2.1 2.2 2.5
Table 3.5. volatility for absolute spread changes in basis points p.a.
Curve 3M 6M 1Y 2Y 3Y 4Y 5Y 10Y
INAAA 0.1 0.0 -0.1 0.6 0.5 0.0 -0.5 0.6
INAA2 0.0 -0.2 0.0 0.4 0.5 -0.1 -0.2 0.3
INA2 0.0 -0.3 0.1 0.2 0.4 0.1 -0.1 0.4
INBBB2 0.2 0.0 0.2 1.0 1.1 0.5 0.5 0.9
INBB2 -0.2 -0.5 -0.4 -0.3 0.3 0.5 0.4 -0.3
Curve 3M 6M 1Y 2Y 3Y 4Y 5Y 10Y
INAAA 12.7 6.0 8.1 10.1 16.8 9.1 11.2 12.8
INAA2 10.5 6.4 7.8 10.1 15.8 7.8 9.5 10.0
INA2 13.5 8.5 9.2 12.3 18.2 8.2 9.4 9.8
INBBB2 13.7 7.0 9.9 14.5 21.8 10.5 13.9 14.7
INBB2 11.2 13.0 11.0 15.8 12.3 13.2 11.0 11.3
Mean 0.000354147
Std.Error 0.0253712 Variance 0.000643697
Median 0
25% Quartile -0.01 75% Quartile 0.01
Observations 2146
Distinct observations 75
XFGdist.xpl
62 3 Quantification of Spread Risk by Means of Historical Simulation
Curve 3M 6M 1Y 2Y 3Y 4Y 5Y 7Y 10Y
INAAA 36.0 19.2 15.5 8.9 8.4 8.0 6.4 7.8 10.4
INAA2 23.5 13.1 11.2 7.2 7.4 6.4 5.8 6.2 7.6
INAA3 13.4 - 9.0 5.8 6.2 5.3 5.0 5.8 6.4
INA1 13.9 9.2 7.7 5.7 5.6 4.7 4.5 4.6 5.7
INA2 11.5 8.1 7.1 5.1 4.9 4.3 4.0 4.0 4.5
INA3 - - 6.4 4.6 4.3 3.8 3.5 3.5 4.1
INBBB1 8.1 6.0 5.4 3.9 3.7 3.3 3.0 3.2 3.8
INBBB2 7.0 5.3 5.0 3.3 3.3 2.9 2.8 2.9 3.3
INBBB3 5.7 4.7 4.4 3.2 3.0 2.7 2.5 2.6 2.9
INBB1 4.3 3.8 3.4 2.5 2.4 2.2 2.1 2.5 2.2
INBB2 3.7 3.3 3.0 2.2 2.1 2.0 1.8 2.0 1.7
INBB3 3.2 2.8 2.5 2.0 1.9 1.8 1.6 1.8 1.5
INB1 3.0 2.4 2.1 1.7 1.7 1.6 1.5 1.6 1.5
INB2 2.3 2.1 1.9 1.6 1.6 1.5 1.4 1.5 1.3
INB3 1.8 2.2 2.3 1.3 1.3 1.2 1.2 1.3 1.1
BNAAA 37.0 36.6 16.9 9.8 9.0 8.2 6.1 5.9 6.5
BNAA12 22.8 9.7 8.3 7.0 6.3 5.8 4.6 4.8 5.5
BNA1 36.6 10.1 7.9 5.6 4.8 4.4 3.8 3.9 4.4
BNA2 17.8 8.0 6.6 4.5 4.1 3.6 3.4 3.3 3.7
BNA3 9.9 6.9 5.6 3.7 3.6 3.3 3.1 3.1 3.4
Curve 3M 6M 1Y 2Y 3Y 4Y 5Y 10Y
INAAA 2.3 4.6 4.3 2.2 2.3 2.1 0.6 4.6
INAA2 5.4 2.6 3.7 1.6 2.0 0.6 0.8 1.8
INA2 7.6 1.5 1.2 0.9 1.6 0.8 0.9 0.8
INBBB2 5.5 0.7 0.8 0.8 1.4 0.8 0.7 0.8
INBB2 0.8 0.4 0.6 0.3 0.4 0.5 0.3 -0.2
Curve 3M 6M 1Y 2Y 3Y 4Y 5Y 10Y
INAAA 200.7 54.1 60.1 27.8 28.3 33.9 16.8 69.3
INAA2 185.3 29.5 60.5 22.1 27.4 11.0 17.5 23.0
INA2 131.1 22.1 18.0 13.9 26.5 16.4 18.5 13.9
INBBB2 107.1 13.9 16.9 12.0 20.0 14.0 16.6 16.7
INBB2 16.3 11.9 12.9 12.4 11.0 10.1 10.2 12.0
20
10
0
risk factors with different correlations, the difference in the shape of the distri-
bution can play an important role. That is why a simple variance-covariance
approach, J.P. Morgan (1996) and Kiesel et al. (1999), seems not adequate to
capture spread risk.
period, i.e., the time window investigated, consists of N ≥ 1 trading days and
the holding period of h ≥ 1 trading days. The confidence level for the VaR is
α ∈ [0, 1]. At each point in time 0 ≤ t ≤ t1 the risky yields Ri (t) (full yield
curve) and the riskless treasury yields Bi (t) (benchmark curve) for any time to
maturity 0 < T1 < · · · < Tn are contained in our data set for 1 ≤ i ≤ n, where
n is the number of different maturities. The corresponding spreads are defined
by Si (t) = Ri (t) − Bi (t) for 1 ≤ i ≤ n.
In the following subsections 3.4.1 to 3.4.5 we specify different variants of the
historical simulation method which we use for estimating the distribution of
losses from the zero-bond position. The estimate for the distribution of losses
can then be used to calculate the quantile-based risk measure Value-at-Risk.
The variants differ in the choice of risk factors, i.e., in our case the compo-
nents of the historical yield time series. In Section 3.6 we describe how the
VaR estimation is carried out with XploRe commands provided that the loss
distribution has been estimated by means of one of the methods introduced
and can be used as an input variable.
The present value of the bond P V (t) at time t can be obtained by discounting,
1
P V (t) = T −t , t 0 ≤ t ≤ t1 . (3.2)
1 + R(t, T − t)
In the historical simulation the relative risk factor changes
(k) Ri t − k/N − Ri t − (k + h)/N
∆i (t) = , 0 ≤ k ≤ N − 1, (3.3)
Ri t − (k + h)/N
3.4 Historical Simulation and Value at Risk 65
for the yield. With (3.2) we obtain a new fictive present value at time t + h:
1
P V (k) (t + h) = T −t . (3.6)
(k)
1 + R (t + h, T − t)
In this equation we neglected the effect of the shortening of the time to maturity
in the transition from t to t + h on the present value. Such an approximation
should be refined for financial instruments whose time to maturity/time to
expiration is of the order of h, which is not relevant for the constellations
investigated in the following.
Now the fictive present value P V (k) (t + h) is compared with the present value
for unchanged yield R(t + h, T − t) = R(t, T − t) for each scenario k (here the
remaining time to maturity is not changed, either).
1
P V (t + h) = T −t . (3.7)
1 + R(t + h, T − t)
66 3 Quantification of Spread Risk by Means of Historical Simulation
i.e., losses in the economic sense are positive while profits are negative. The
VaR is the loss which is not exceeded with a probability α and is estimated as
the [(1 − α)N + 1]-th-largest value in the set
{L(k) (t + h) | 0 ≤ k ≤ N − 1}.
a renormalization of the relative risk factor changes from (3.3) with the corre-
sponding estimation of volatility for the observation day and a multiplication
with the estimate for the volatility valid at time t. Thus, we calculate the
quantity
(k)
(k) ∆i (t)
δi (t) = σi (t) · , 0 ≤ k ≤ N − 1. (3.12)
σi (t − (k + h)/N )
(k) (k)
Ri (t + h) = Ri (t) 1 + δi (t) , 1 ≤ i ≤ n, (3.13)
In this subsection the risk factors are relative changes of the benchmark curve
instead of the full yield curve. This restriction is adequate for quantifying
general market risk, when there is no need to include spread risk. The risk
factors are the yields Bi (t) for i = 1, . . . , n. The yield B(t, T − t) at time t for
68 3 Quantification of Spread Risk by Means of Historical Simulation
The generation of scenarios and the interpolation of the fictive benchmark curve
is carried out in analogy to the procedure for the full yield curve. We use
(k) Bi t − k/N − Bi t − (k + h)/N
∆i (t) = , 0 ≤ k ≤ N − 1, (3.16)
Bi t − (k + h)/N
and
(k) (k)
Bi (t + h) = Bi (t) 1 + ∆i (t) , 1 ≤ i ≤ n. (3.17)
In the determination of the fictive full yield we now assume that the spread
remains unchanged within the holding period. Thus, for the k-th scenario we
obtain the representation
which is used for the calculation of a new fictive present value and the corre-
sponding loss. With this choice of risk factors we can introduce an adjustment
for the average relative changes or/and volatility updating in complete analogy
to the four variants described in the preceding subsection.
When we take the view that risk is only caused by spread changes but not
by changes of the benchmark curve, we investigate the behavior of the spread
risk factors Si (t) for i = 1, . . . , n. The spread S(t, T − t) at time t for time to
maturity T − t is again obtained by linear interpolation. We now use
(k) Si t − k/N − Si t − (k + h)/N
∆i (t) = , 0 ≤ k ≤ N − 1, (3.19)
Si t − (k + h)/N
3.4 Historical Simulation and Value at Risk 69
and
(k) (k)
Si (t + h) = Si (t) 1 + ∆i (t) , 1 ≤ i ≤ n. (3.20)
Here, linear interpolation yields
(k) (k)
{Ti+1 − (T − t)}Si (t) + {(T − t) − Ti }Si+1 (t)
S (k) (t + h, T − t) = .
Ti+1 − Ti
Thus, in the determination of the fictive full yield the benchmark curve is
considered deterministic and the spread stochastic. This constellation is the
opposite of the constellation in the preceding subsection. For the k-th scenario
one obtains
In this context we can also work with adjustment for average relative spread
changes and volatility updating.
In the conservative approach we assume full correlation between risk from the
benchmark curve and risk from the spread changes. In this worst case scenario
we add (ordered) losses, which are calculated as in the two preceding sections
from each scenario. From this loss distribution the VaR is determined.
where, again, corrections for average risk factor changes or/and volatility up-
dating can be added. We note that the use of relative risk factor changes
is the reason for different results of the variants in subsection 3.4.1 and this
subsection.
70 3 Quantification of Spread Risk by Means of Historical Simulation
This corresponds to a loss L(t) = P V (t) − P V (t + h), where, again, the short-
ening of the time to maturity is not taken into account.
The different frameworks for the VaR estimation can easily be integrated into
the backtesting procedure. When we, e.g., only consider changes of the bench-
mark curve, R(t+h, T −t) in (3.23) is replaced with B(t+h, T −t)+S(t, T −t).
On an average (1 − α) · 100 per cent of the observed losses in a given time in-
terval should exceed the corresponding VaR (outliers). Thus, the percentage of
observed losses is a measure for the predictive power of historical simulation.
VaR timeplot
15
10
5
returns*E-2
0
-5
-10
5 10 15
time*E2
The result is displayed for the INAAA curve in Figures. 3.5 (basic historical
simulation) and 3.6 (historical simulation with volatility updating). The time
plots allow for a quick detection of violations of the VaR prediction. A striking
feature in the basic historical simulation with the full yield curve as risk fac-
tor is the platform-shaped VaR prediction, while with volatility updating the
VaR prediction decays exponentially after the occurrence of peak events in the
market data. This is a consequence of the exponentially weighted historical
72 3 Quantification of Spread Risk by Means of Historical Simulation
VaR timeplot
15
10
5
returns*E-2
0
-5
-10
-15
5 10 15
time*E2
Figure 3.6. VaR time plot historical simulation with volatility updating.
XFGtimeseries2.xpl
volatility in the scenarios. The peak VaR values are much larger for volatility
updating than for the basic historical simulation.
In order to find out, which framework for VaR estimation has the best predictive
power, we count the number of violations of the VaR prediction and divide it
by the number of actually observed losses. We use the 99% quantile, for which
we would expect an violation rate of 1% for an optimal VaR estimator. The
history used for the drawings of the scenarios consists of N = 250 days, and the
holding period is h = 1 day. For the volatility updating we use a decay factor of
γ = 0.94, J.P. Morgan (1996). For the simulation we assume that the synthetic
zero-bond has a remaining time to maturity of 10 years at the beginning of
the simulations. For the calculation of the first scenario of a basic historical
simulation N + h − 1 observations are required. A historical simulation with
volatility updating requires 2(N + h − 1) observations preceding the trading
day the first scenario refers to. In order to allow for a comparison between
different methods for the VaR calculation, the beginning of the simulations
is t0 = [2(N + h − 1)/N ]. With these simulation parameters we obtain 1646
3.7 P-P Plots 73
observations for a zero-bond in the industry sector and 1454 observations for a
zero-bond in the banking sector.
In Tables 3.12 to 3.14 we list the percentage of violations for all yield curves
and the four variants of historical simulation V1 to V4 (V1 = Basic Historical
Simulation; V2 = Basic Historical Simulation with Mean Adjustment; V3 =
Historical Simulation with Mean Adjustment; V4 = Historical Simulation with
Volatility Updating and Mean Adjustment). In the last row we display the
average of the violations of all curves. Table 3.12 contains the results for the
simulation with relative changes of the full yield curves and of the yield spreads
over the benchmark curve as risk factors. In Table 3.13 the risk factors are
changes of the benchmark curves. The violations in the conservative approach
and in the simultaneous simulation of relative spread and benchmark changes
are listed in Table 3.14.
XFGexc.xpl
Curve V1 V2 V3 V4
INAAA, INAA2, INAA3, INA1, INA2, 1,52 1,28 1,22 1,15
INA3, INBBB1, INBBB2, INBBB3,
INBB1, INBB2, INBB3, INB1, INB2,
INB3
BNAAA, BNAA1/2, BNA1, BNA2, BNA3 1,72 1,44 1,17 1,10
Average 1,57 1,32 1,20 1,14
VaRqqplot(matrix(N,1)|MMPL,VaR,opt)
1
Empirical Distribution
Empirical Distribution
0.5
0.5
0
0
0 0.5 1 0 0.5 1
Uniform Distribution Uniform Distribution
1
Empirical Distribution
Empirical Distribution
0.5
0.5
0
0 0.5 1 0 0.5 1
Uniform Distribution Uniform Distribution
Figure 3.8 displays the P-P plots for the same data set and the basic historical
simulation with different choices of risk factors. A striking feature is the poor
predictive power for a model with the spread as risk factor. Moreover, the
over-estimation of the risk in the conservative approach is clearly reflected by
a sine-shaped function, which is superposed on the ideal diagonal function.
In Figs. 3.9 and 3.10 we show the Q-Q plots for basic historic simulation and
volatility updating using the INAAA data set and the full yield curve as risk
factors. A striking feature of all Q-Q plots is the deviation from linearity (and,
thus, normality) for extreme quantiles. This observation corresponds to the
leptokurtic distributions of time series of market data changes (e.g. spread
changes as discussed in section 3.3.2).
3.9 Discussion of Simulation Results 77
1
Empirical Distribution
Empirical Distribution
0.5
0.5
0
0
0 0.5 1 0 0.5 1
Uniform Distribution Uniform Distribution
1
Empirical Distribution
Empirical Distribution
0.5
0.5
0
0 0.5 1 0 0.5 1
Uniform Distribution Uniform Distribution
4
2
L/VaR quantiles
0
-2
-4
-4 -2 0 2 4
normal quantiles
The results for the number of violations in Table 3.13 and the mean squared
deviations in Table 3.16 are comparable to the analysis, where risk factors are
changes of the full yield. Since the same relative changes are applied for all
yield curves, the results are the same for all yield curves. Again, the application
of volatility updating improves the predictive power and mean adjustment also
has a positive effect.
The number of violations (see Table 3.12) is comparable to the latter two
variants. Volatility updating leads to better results, while the effect of mean
3.9 Discussion of Simulation Results 79
0
-2
-4
-4 -2 0 2 4
normal quantiles
adjustment is only marginal. However, the mean squared deviations (see Ta-
ble 3.15) in the P-P plots are significantly larger than in the case, where the
risk factors are contained in the benchmark curve. This can be traced back to a
partly poor predictive power for intermediate confidence levels (see Figure 3.8).
Mean adjustment leads to larger errors in the P-P plots.
From Table 3.14 the conclusion can be drawn, that the conservative approach
significantly over-estimates the risk for all credit qualities. Table 3.17 indicates
the poor predictive power of the conservative approach over the full range of
confidence levels.
80 3 Quantification of Spread Risk by Means of Historical Simulation
Table 3.15. MSD P-P Plot for the full yield and the spread
curve(×10 000)
The mean squared deviations are the worst of all approaches. Volatility updat-
ing and/or mean adjustment does not lead to any significant improvements.
From Tables 3.14 and 3.17 it is apparent that simultaneous simulation leads to
much better results than the model with risk factors from the full yield curve,
when volatility updating is included. Again, the effect of mean adjustment
does not in general lead to a significant improvement. These results lead to
the conclusion that general market risk and spread risk should be modeled
independently, i.e., that the yield curve of an instrument exposed to credit
risk should be modeled with two risk factors: benchmark changes and spread
changes.
3.10 XploRe for Internal Risk Models 81
Curve V1 V2 V3 V4
INAAA, INAA2, INAA3 0,49 0,23 0,26 0,12
INA1 0,48 0,23 0,26 0,12
INA2, INA3, INBBB1, INBBB2, INBBB3, 0,49 0,23 0,26 0,12
INBB1, INBB2
INBB3 0,47 0,23 0,25 0,12
INB1 0,49 0,23 0,26 0,12
INB2 0,47 0,23 0,25 0,12
INB3 0,48 0,23 0,26 0,12
BNAAA, BNAA1/2 0,42 0,18 0,25 0,33
BNA1 0,41 0,18 0,23 0,33
BNA2 0,42 0,18 0,25 0,33
BNA3 0,41 0,18 0,24 0,33
Average 0,47 0,22 0,25 0,17
Table 3.17. MSD P-P Plot for the conservative approach and the si-
multaneous simulation(×10 000)
Bibliography
Bank for International Settlements (1998a). Amendment to the Capital Accord
to incorporate market risks, www.bis.org. (January 1996, updated to April
1998).
Credit Risk
4 Rating Migrations
Steffi Höse, Stefan Huschens and Robert Wania
tors such as the obligor’s domicile and industry and the stage of business cycle.
Rating migrations are reviewed from a statistical point of view throughout this
chapter using XploRe. The way from the observed data to the estimated one-
year transition probabilities is shown and estimates for the standard deviations
of the transition rates are given. In further extension, dependent rating migra-
tions are discussed. In particular, the modeling by a threshold normal model
is presented.
Time stability of transition matrices is one of the major issues for credit risk
estimation. Therefore, a chi-square test of homogeneity for the estimated rating
transition probabilities is applied. The test is illustrated by an example and
is compared to a simpler approach using standard errors. Further, assuming
time stability, multi-period rating transitions are discussed. An estimator for
multi-period transition matrices is given and its distribution is approximated
by bootstrapping. Finally, the change of the composition of a credit portfolio
caused by rating migrations is considered. The expected composition and its
variance is calculated for independent migrations.
We assume that credits or credit obligors are rated in d categories ranging from
1, the best rating category, to the category d containing defaulted credits. The
raw data consist of a collection of migration events. The n observed migration
events form a n × 2 matrix with rows
Thereby, ei1 characterizes the rating of i-th credit at the beginning and ei2 the
rating at the end of the risk horizon, which is usually one year. Subsequently,
4.1 Rating Transition Probabilities 89
where pjk is the probability that a credit migrates from an initial rating j to
rating k. These probabilities are the so called rating transition (or migration)
probabilities. Note that the indicator variable 1{ẽi2 = k} conditional on ẽi1 = j
is a Bernoulli distributed random variable with success parameter pjk ,
is the composition of the portfolio at the end of the period, where the last
element is the number of defaulted credits. The observed migration rate from
j to k,
def cjk
p̂jk = , (4.4)
nj
is the natural estimate of the unknown transition probability pjk .
If the migration events are independent, i. e., the variables ẽ12 , . . . , ẽn2 are
stochastically independent, cjk is the observed value of the binomially dis-
tributed random variable
c̃jk ∼ B(nj , pjk ),
and therefore the standard deviation of p̂jk is
s
pjk (1 − pjk )
σjk = ,
nj
The estimated standard errors must be carefully interpreted, because they are
based on the assumption of independence.
The case of dependent rating migrations raises new problems. In this context,
c̃jk is distributed as sum of nj correlated Bernoulli variables, see (4.1), indicat-
ing for each credit with initial rating j a migration to k by 1. If these Bernoulli
2
variables are pairwise correlated with correlation ρjk , then the variance σjk
of the unbiased estimator p̂jk for pjk is (Huschens and Locarek-Junge, 2000,
p. 44)
2 pjk (1 − pjk ) nj − 1
σjk = + ρjk pjk (1 − pjk ).
nj nj
The limit
2
lim σjk = ρjk pjk (1 − pjk )
nj →∞
shows that the sequence p̂jk does not obey a law of large numbers for ρjk > 0.
Generally, the failing of convergence in quadratic mean does not imply the
4.1 Rating Transition Probabilities 91
For a detailed example see Saunders (1999, pp. 122-125). In the special case of
independence we have pjj:kk = p2jk . Defining a migration from j to k as suc-
cess we obtain correlated Bernoulli variables with common success parameter
pjk , with probability pjj:kk of a simultaneous success, and with the migration
correlation
pjj:kk − p2jk
ρjk = .
pjk (1 − pjk )
Note that ρjk = 0 if ρ = 0.
Given ρ ≥ 0 we can estimate the migration correlation ρjk ≥ 0 by the restricted
Maximum-Likelihood estimator
( )
β(ẑj,k−1 , ẑjk ; ρ) − p̂2jk
ρ̂jk = max 0; (4.7)
p̂jk (1 − p̂jk )
with !
k
X
−1
ẑjk = Φ p̂ji . (4.8)
i=1
The estimate
s
p̂jk (1 − p̂jk ) nj − 1
σ̂jk = + ρ̂jk p̂jk (1 − p̂jk ) (4.9)
nj nj
is used. The estimator in (4.9) generalizes (4.5), which results in the special
case ρ = 0.
4.1 Rating Transition Probabilities 93
• b.nstart
the (d − 1) × 1 × m array of portfolio weights before migration
• b.nend
the d × 1 × m array portfolio weights after migration
• b.etp
the (d − 1) × d × m array of estimated transition probabilities
94 4 Rating Migrations
• b.etv
the (d − 1) × (d − 1) × m array of estimated threshold values
• b.emc
the (d − 1) × d × m array of estimated migration correlations
• b.esd
the (d − 1) × d × m array of estimated standard deviations
The matrices b.nstart and b.nend have components given by (4.2) and (4.3).
The matrices b.etp, b.emc, and b.esd contain the p̂jk , ρ̂jk , and σ̂jk from
(4.4), (4.7), and (4.9) for j = 1, . . . , d − 1 and k = 1, . . . , d. The estimates ρ̂jk
are given only for p̂jk > 0. The matrix b.etv contains the ẑjk from (4.8) for
j, k = 1, . . . , d − 1. Note that zj0 = −∞ and zjd = +∞.
XFGRatMig2.xpl
We assume that migration data are given for m periods. This data consist in m
matrices of migration counts C(t) for t = 1, . . . , m each of type (d − 1) × d. The
generic element cjk (t) of the matrix C(t) is the number of migrations from j to
k in period t. These matrices may be computed from m data sets of migration
events.
An obvious question in this context is whether the transition probabilities can
be assumed to be constant in time or not. A first approach to analyze the
time-stability of transition probabilities is to compare the estimated transition
probabilities per period for m periods with estimates from pooled data.
The aggregated migration counts from m periods are
m
def
X
c+
jk = cjk (t) (4.10)
t=1
4.2 Analyzing the Time-Stability of Transition Probabilities 95
def c+
jk
p̂+
jk = (4.12)
nj +
with
d m
def
X X
n+
j = c+
jk = nj (t), j = 1, . . . , d − 1
k=1 t=1
can be computed.
Under the assumption of independence for the migration events the vector
of migration counts (cj1 (t), . . . cjd (t)) starting from j is in each period t a
realization from a multinomial distributed random vector
(c̃j1 (t), . . . , c̃jd (t)) ∼ Mult(nj (t); pj1 (t), . . . , pjd (t)),
where pjk (t) denotes the transition probability from j to k in period t. For
fixed j ∈ {1, . . . , d − 1} the hypothesis of homogeneity
H0 : pj1 (1) = . . . = pj1 (m), pj2 (1) = . . . = pj2 (m), . . . , pjd (1) = . . . = pjd (m)
may be tested with the statistic
h i2
d X
X m c̃jk (t) − nj (t)p̂+
jk
Xj2 = . (4.13)
k=1 t=1
nj (t)p̂+
jk
96 4 Rating Migrations
2
and the χ -statistics
d X
m
" # d−1
X c̃jk (t) X
G2j =2 c̃jk (t) ln , G2 = G2j ,
k=1 t=1
nj (t)p̂+
jk j=1
4.2 Analyzing the Time-Stability of Transition Probabilities 97
for each period t ∈ {1, . . . , m}. For correlated migrations the estimated stan-
dard deviation is computed analogously to (4.9). This may graphically be
visualized by showing
p̂+
jk , p̂jk (t), p̂jk (t) ± 2σ̂jk (t), t = 1, . . . , m (4.15)
• out.cagg
the (d − 1) × d matrix with aggregated counts
• out.etpagg
the (d − 1) × d matrix with estimated aggregated transition probabilities
• out.esdagg
the (d − 1) × d matrix with estimated aggregated standard deviations
• out.etp
the (d−1)×d×m array with estimated transition probabilities per period
• out.esd
the (d − 1) × d × m array with estimated standard deviations per period
98 4 Rating Migrations
• out.chi
the 3 × d matrix with χ2 -statistics, degrees of freedom and p-values
• etp
the (d−1)×d×m array with estimated transition probabilities per period
• esd
the (d − 1) × d × m array with estimated standard deviations per period
• etpagg
the (d − 1) × d matrix with estimated aggregated transition probabilities
The following examples are based on transition matrices given by Nickell et al.
(2000, pp. 208, 213). The data set covers long-term bonds rated by Moody’s
in the period 1970–1997. Instead of the original matrices of type 8 × 9 we
use condensed matrices of type 3 × 4 by combining the original data in the
d = 4 basic rating categories A, B, C, and D, where D stands for the category
of defaulted credits.
The aggregated data for the full period from 1970 to 1997 are
21726 790 0 0 0.965 0.035 0 0
C = 639 21484 139 421 , P̂ = 0.028 0.947 0.006 0.019 ,
0 44 307 82 0 0.102 0.709 0.189
4.2 Analyzing the Time-Stability of Transition Probabilities 99
for the peak of the business cycle. The three categories depend on whether
real GDP growth in the country was in the upper, middle or lower third of the
growth rates recorded in the sample period (Nickell et al., 2000, Sec. 2.4).
In the following we use these matrices for illustrative purposes as if data from
m = 3 periods are given. Figure 4.1 gives a graphical presentation for d = 4
rating categories and m = 3 periods.
In order to illustrate the testing procedures presented in Section 4.2.2 in
the following the hypothesis is tested that the data from the three periods
came from the same theoretical transition probabilities. Clearly, from the
construction of the three periods we may expect, that the test rejects the null
hypothesis. The three χ2 -statistics with 6 = 3(3 − 1) degrees of freedom for
testing the equality of the rows of the transition matrices have p-values 0.994,
> 0.9999, and 0.303. Thus, the null hypothesis must be clearly rejected for
the first two rows at any usual level of confidence while the test for the last
row suffers from the limited sample size. Nevertheless, the χ2 -statistic for the
simultaneous test of the equality of the transition matrices has 18 = 32 · (3 − 1)
degrees of freedom and a p-value > 0.9999. Consequently, the null hypothesis
must be rejected at any usual level of confidence.
XFGRatMig3.xpl
A second example is given by comparing the matrix P̂ based on the whole data
with the matrix P̂(2) based on the data of the normal phase of the business
100 4 Rating Migrations
25
1
25
20
0.5
0.5
20
0.95+Y*E-2
0.02+Y*E-2
15
Y
0
0
15
10
-0.5
-0.5
10
-1
-1
5
8
4
20
20
6
3
0.015+Y*E-2
0.002+Y*E-2
0.005+Y*E-2
0.92+Y*E-2
15
15
4
2
10
10
2
1
5
5
25
25
15
0.5
20
20
0.55+Y*E-2
0.05+Y*E-2
Y*E-2
Y
15
0
15
10
10
-0.5
10
5
5
5
-1
cycle. In this case a test possibly may not indicate that differences between
P and P(2) are significant. Indeed, the χ2 -statistics for testing the equality
of the rows of the transition matrices with 3 degrees of freedom have p-values
0.85, 0.82, and 0.02. The statistic of the simultaneous test with 9 degrees of
freedom has a p-value of 0.69.
4.3 Multi-Period Transitions 101
whenever both sides are well-defined. Further, the process is called a homoge-
neous first-order Markov chain if the right-hand side of (4.16) is independent
of t (Brémaud, 1999).
Transferred to rating transitions, homogeneity and the Markov property imply
constant one-period transition matrices P independent of the time t, i. e. P
obeys time-stability. Then the one-period d × d transition matrix P contains
the non-negative rating transition probabilities
and
(pd1 , pd2 , . . . , pdd ) = (0, . . . , 0, 1).
The latter reflects the absorbing boundary of the transition matrix P.
102 4 Rating Migrations
The recursive scheme can also be applied for non-homogeneous transitions, i.e.
for one-period transition matrices being not equal, which is the general case.
for all initial rating categories j = 1, . . . , d − 1. Here, c̃∗jk denotes the bootstrap
random variable of migration counts from j to k in one period and p̂jk is the
estimated one-period transition probability (transition rate) from j to k.
Then the bootstrap sample {c∗jk }j=1,...,d−1,k=1,...,d is used to estimate a boot-
∗
strap transition matrix P̂ with generic elements p̂∗jk according
c∗jk
p̂∗jk = . (4.19)
nj
Obviously, defaulted credits can not upgrade. Therefore, the bootstrap is not
∗
necessary for obtaining the last row of P̂ , which is (p̂∗d1 , . . . , p̂∗dd ) = (0, . . . , 0, 1).
Then matrix multiplication gives the m-period transition matrix estimated
from the bootstrap sample,
∗(m) ∗m
P̂ = P̂ ,
∗(m)
with generic elements p̂jk .
∗(m)
We can now access the distribution of P̂ by Monte Carlo sampling, e. g. B
∗(m)
samples are drawn and labeled P̂b for b = 1, . . . , B. Then the distribution of
∗(m) (m)
P̂ estimates the distribution of P̂ . This is justified since the consistency
of this bootstrap estimator has been proven by Basawa, Green, McCormick,
∗(m)
and Taylor (1990). In orderto characterize the distribution of P̂ , the
∗(m) (m)
standard deviation Std p̂jk which is the bootstrap estimator of Std p̂jk ,
is estimated by
v
u B h i2
d p̂∗(m) = t 1
∗(m) ∗(m)
u X
Std jk p̂ jk,b − Ê p̂ jk (4.20)
B−1
b=1
with
B
∗(m)
1 X ∗(m)
Ê p̂jk = p̂jk,b
B
b=1
∗(m)
for all j = 1, . . . , d − 1 and k = 1, . . . , d. Here, p̂jk,b is the generic element of
∗(m)
the b-th m-period bootstrap sample P̂b . So (4.20) estimates
the unknown
(m)
standard deviation of the m-period transition rate Std p̂jk using B Monte
Carlo samples.
104 4 Rating Migrations
For time homogeneity, the m-period rating transition matrices are obtained
by the quantlet XFGRatMig5.xpl (q = XFGRatMig5(p, m)). It computes all
t = 1, 2, . . . , m multi-period transition matrices given the one-period d×d matrix
p. Note that the output q is a d × d × m array, which can be directly visualized
by XFGRatMig6.xpl (XFGRatMig6(q)) returning a graphical output. To vi-
sualize t-period transition matrices each with d2 elements for t = 1, . . . , m, we
plot d2 aggregated values
k
(t)
X
j−1+ pjl , j, k = 1, . . . , d (4.21)
l=1
• out.btm
the (d−1)×d×B array of bootstrapped m-period transition probabilities
• out.etm
the (d − 1) × d matrix of m-period transition rates
• out.stm
the (d − 1) × d matrix of estimated standard deviations of the m-period
transition rates
4.3 Multi-Period Transitions 105
4.5
4
3.5
3
Aggregations
2.5
2
1.5
1
0.5
0
2 4 6 8 10
Periods
Figure 4.2. Example for XFGRatMig6.xpl:
Aggregated values of multi-period transition matrices.
The components of the matrices out.btm are calculated according (4.18) and
(4.19). The matrices out.etm and out.stm have components given by (4.17)
and (4.20).
106 4 Rating Migrations
To k
From j 1 2 3 4 5 6 Default nj
1 0.51 0.40 0.09 0.00 0.00 0.00 0.00 35
2 0.08 0.62 0.19 0.08 0.02 0.01 0.00 103
3 0.00 0.08 0.69 0.17 0.06 0.00 0.00 226
4 0.01 0.01 0.10 0.64 0.21 0.03 0.00 222
5 0.00 0.01 0.02 0.19 0.66 0.12 0.00 137
6 0.00 0.00 0.00 0.02 0.16 0.70 0.12 58
Default 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0
Based on the techniques presented in the last sections we can now tackle the
problem of portfolio migration, i. e. we can assess the distribution of n(t) credits
over the d rating categories and its evolution over periods t ∈ {1, . . . m}. Here,
a stationary transition matrix P is assumed. The randomly changing number
of credits in category j at time t is labeled by ñj (t) and allows to define non-
4.3 Multi-Period Transitions 107
From j
(1)
p̂jd d p̂∗(1)
Std
(5)
p̂jd d p̂∗(5)
Std
(10)
p̂jd d p̂∗(10)
Std
jd jd jd
which are also random variables. They can be related to migration counts
c̃jk (t) of period t by
d
1 X
w̃k (t + 1) = c̃jk (t) (4.22)
n(t) j=1
counting all migrations going from any category to the rating category k. Given
the weights w̃j (t) = wj (t) at t, the migration counts c̃jk (t) are binomially
distributed
c̃jk (t)|w̃j (t) = wj (t) ∼ B (n(t) wj (t), pjk ) . (4.23)
The non-negative weights are aggregated in a row vector
and the conditional covariance matrix V [w̃(t + 1)|w̃(t) = w(t)] has elements
Pd
1
n(t) j=1 wj (t)pjk (1 − pjk )
k=l
def
vkl = for (4.24)
− 1 Pd w (t)p p
k 6
= l.
n(t) j=1 j jk jl
and
(m) (m)
c̃jk (t)|w̃j (t) = wj (t) ∼ B n(t) wj (t), pjk .
(m)
Here, cjk (t) denotes the number of credits migrating from j to k over m
periods starting in t. The conditional mean of the portfolio weights is now
given by
E[w̃(t + m)|w̃(t) = w(t)] = w(t)P(m)
and the elements of the conditional covariance matrix V [w̃(t + m)|w̃(t) = w(t)]
(m) (m)
result by replacing pjk and pjl in (4.24) by pjk and pjl .
Bibliography
Athreya, K. B. and Fuh, C. D. (1992). Bootstrapping Markov chains, in
R. LePage and L. Billard (eds), Exploring the Limits of Bootstrap, Wi-
ley, New York, pp. 49–64.
Basawa, I. V., Green, T. A., McCormick, W. P., and Taylor, R. L. (1990).
Asymptotic bootstrap validity for finite Markov chains, Communications
in Statistics A 19: 1493–1510.
Basel Committee on Banking Supervision (2001). The Internal Ratings-Based
Approach. Consultative Document.
Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975). Discrete Multi-
variate Analysis: Theory and Practice, MIT Press, Cambridge.
4.3 Multi-Period Transitions 109
5.1 Introduction
Understanding the principal components of portfolio credit risk and their in-
teraction is of considerable importance. Investment banks use risk-adjusted
capital ratios such as risk-adjusted return on capital (RAROC) to allocate eco-
nomic capital and measure performance of business units and trading desks.
The current attempt by the Basel Committee for Banking Supervision in its
Basel II proposals to develop an appropriate framework for a global financial
regulation system emphasizes the need for an accurate understanding of credit
risk; see BIS (2001). Thus bankers, regulators and academics have put con-
siderable effort into attempts to study and model the contribution of various
ingredients of credit risk to overall credit portfolio risk. A key development
has been the introduction of credit portfolio models to obtain portfolio loss
distributions either analytically or by simulation. These models can roughly
112 5 Sensitivity analysis of credit portfolio models
(1i) Default Probability: the probability that the obligor or counterparty will
default on its contractual obligations to repay its debt,
(2i) Recovery Rates: the extent to which the face value of an obligation can
be recovered once the obligor has defaulted,
(3i) Credit Migration: the extent to which the credit quality of the obligor or
counterparty improves or deteriorates;
(1p) Default and Credit Quality Correlation: the degree to which the default
or credit quality of one obligor is related to the default or credit quality
of another,
(2p) Risk Contribution and Credit Concentration: the extent to which an indi-
vidual instrument or the presence of an obligor in the portfolio contributes
to the totality of risk in the overall portfolio.
From the above building blocks a rating-based credit risk model is generated
by
(1m) the definition of the possible states for each obligor’s credit quality, and
a description of how likely obligors are to be in any of these states at the
horizon date, i.e. specification of rating classes and of the corresponding
matrix of transition probabilities (relating to (1i) and (3i)).
114 5 Sensitivity analysis of credit portfolio models
During this study we will focus on the effects of default dependence modelling.
Furthermore, we assume that on default we are faced with a zero recovery rate.
Thus, only aspects (1i) and (1p) are of importance in our context and only
two rating classes – default and non-default – are needed. A general discussion
of further aspects can be found in any of the books Caouette, Altman and
Narayanan (1998), Ong (1999), Jorion (2000) and Crouhy et al. (2001). For
practical purposes we emphasize the importance of a proper mark-to-market
methodology (as pointed out in Kiesel et al. (1999)). However, to study the
effects of dependence modelling more precisely, we feel a simple portfolio risk
model is sufficient.
As the basis for comparison we use Value at Risk (VaR) – the loss which will
be exceeded on some given fractions of occasions (the confidence level) if a
portfolio is held for a particular time (the holding period).
(1) Sj is the driving stochastic process for defaults and rating migrations,
(2) kj , lj represent the initial and end-of-period rating category,
(3) π(.) represents the credit loss (end-of-period exposure value).
In this context Sj (which is, with reference to the Merton model, often in-
terpreted as a proxy of the obligor’s underlying equity) is used to obtain the
end-of-period state of the obligor. If we assume N rating classes, we obtain
5.3 Dependence modelling 115
cut-off points −∞ = zk,0 , zk,1 , zk,2 , . . . , zk,N −1 , zk,N = ∞ using the matrix of
transition probabilities together with a distributional assumption on Sj . Then,
obligor j changes from rating k to rating l if the variable Sj falls in the range
[zk,l−1 , zkl ]. Our default-mode framework implies two rating classes, default
resp. no-default, labeled as 1 resp. 0 (and thus only a single cut-off point
obtained from the probability of default). Furthermore, interpreting π(•) as
the individual loss function, π(j, 0, 0) = 0 (no default) and according to our
zero recovery assumption π(j, 0, 1) = 1. To illustrate the methodology we plot
in Figure 5.1 two simulated drivers S1 and S2 together with the corresponding
cut-off points z1,1 and z2,1 .
1.64
1.30
0.96
0.62
Here aji describes the exposure of obligor j to factor i, i.e. the so-called factor
loading, and σj is the volatility of the idiosyncratic risk contribution. In such
a framework one can easily interfere default correlation from the correlation of
the underlying drivers Sj . To do so, we define default indicators
Yj = 1(Sj ≤ Dj ),
where Dj is the cut-off point for default of obligor j. The individual default
probabilities are
πj = P(Yj = 1) = P(Sj ≤ Dj ),
and the joint default probability is
Under the assumption that (Si , Sj ) are bivariate normal, we obtain for the joint
default probability
Z Di Z Dj
πij = ϕ(u, v; ρij )dudv,
−∞ −∞
0.1 0.0094
0.2 0.0241
0.3 0.0461
P (Yj1 = 1, . . . , Yjm = 1)
To study the effect of different copulae on default correlation, we use the fol-
lowing examples of copulae (further details on these copulae can be found in
Embrechts, Lindskog and McNeil (2001)).
1. Gaussian copula:
Gauss
CR (u) = ΦnR (Φ−1 (u1 ), . . . , Φ−1 (un )).
Here ΦnR denotes the joint distribution function of the n-variate normal
with linear correlation matrix R, and Φ−1 the inverse of the distribution
function of the univariate standard normal.
118 5 Sensitivity analysis of credit portfolio models
2. t-copula:
t
Cν,R (u) = tnν,R (t−1 −1
ν (u1 ), . . . , tν (un )),
where tnν,R denotes the distribution function of an n-variate t-distributed
random vector with parameter ν > 2 and linear correlation matrix R.
Furthermore, tν is the univariate t-distribution function with parameter
ν.
3. Gumbel copula:
n o
CθGumbel (u) = exp −[(− log u1 )θ + . . . + (− log un )θ ]1/θ ,
In Table 5.2 joint default probabilities of two obligors are reported using three
types of obligors with individual default probabilities roughly corresponding
to rating classes A,B,C. We assume that underlying variables S are univariate
normally distributed and model the joint dependence structure using the above
copulae.
The computation shows that t and Gumbel copulae have higher joint default
probabilities than the Gaussian copula (with obvious implication for default
correlation, see equation (5.2)). To explain the reason for this we need the
concept of tail dependence:
provided that the limit λU ∈ [0, 1] exists. If λU ∈ (0, 1], X and Y are said to
be asymptotically dependent in the upper tail; if λU = 0, X and Y are said to
be asymptotically independent in the upper tail.
For continuous distributions F and G one can replace (5.3) by a version involv-
ing the bivariate copula directly:
1 − 2u + C(u, u)
lim = λU . (5.4)
u→1 1−u
Lower tail dependence, which is more relevant to our current purpose, is defined
in a similar way. Indeed, if
C(u, u)
lim = λL (5.5)
u→0 u
exists, then C exhibits lower tail dependence if λL ∈ (0, 1], and lower tail
independence if λL = 0.
It can be shown that random variables linked by Gaussian copulae have no
tail-dependence, while the use of tν and the Gumbel copulae results in tail-
dependence. In fact, in case of the tν copula, we have increasing tail dependence
with decreasing parameter ν, while for the Gumbel family tail dependence
increases with increasing parameter θ.
5.4 Simulations
The purpose here is to generate portfolios with given marginals (normal) and
the above copulae. We focus on the Gaussian and t-copula case.
• Set ui = Φ(xi ), i = 1, . . . , n.
• (u1 , . . . , un )> ∼ CR
Gauss
.
120 5 Sensitivity analysis of credit portfolio models
t
To generate random variates from the t-copula Cν,R we recall that if the random
vector X admits the stochastic representation
r
ν
X =µ+ Y (in distribution), (5.6)
Z
with µ ∈ Rn , Z ∼ χ2ν and Y ∼ N(0, Σ), where Z and Y are independent, then
ν
X is tν distributed with mean µ and covariance matrix ν−2 Σ. Here we assume
as above, that ν > 2. While the stochastic representation (5.6) is still valid, the
interpretation of the parameters has to change for ν ≤ 2. Thus, the following
algorithm can be used (this is Algorithm 5.2 in Embrechts et al. (2001)):
• Set ui = tν (xi ), i = 1, . . . , n.
• (u1 , . . . , un )> ∼ Cν,R
t
.
We simulate standard portfolios of size 500 with all obligors belonging to one
rating class. We use three rating classes, named A,B,C with default prob-
abilities 0.005, 0.05, 0.15 roughly corresponding to default probabilities from
standard rating classes, Ong (1999), p. 77.
For our first simulation exercise we assume that the underlying variables Sj
are normally distributed within a single factor framework, i.e. p = 1 in (5.1).
The factor loadings aj1 in (5.1) are constant and chosen so that the correlation
for the underlying latent variables Sj is ρ = 0.2, which is a standard baseline
5.4 Simulations 121
value for credit portfolio simulations, Kiesel et al. (1999). To generate different
degrees of tail correlation, we link the individual assets together using a Gaus-
sian, a t10 and a t4 -copula as implemented in VaRcredN and VaRcredTcop.
The default driver Sj are normal for all obligors j in both quantlets. p de-
notes the default probability πj of an individual obligor and rho is the asset
correlation ρ. opt is an optional list parameter consisting of opt.alpha, the
significance level for VaR estimation and opt.nsimu, the number of simula-
tions. Both quantlets return a list containing the mean, the variance and the
opt.alpha-quantile of the portfolio default distribution.
VaR
The most striking observation from Table 5.3 is the effect tail-dependence has
on the high quantiles of highly-rated portfolios: the 99%-quantile for the t4 -
copula is more than 3-times larger than the corresponding quantile for the
Gaussian copula. The same effect can be observed for lower rated portfolios
122 5 Sensitivity analysis of credit portfolio models
VaR
Bibliography
BIS (2001). Overview of the new Basel capital accord, Technical report, Basel
Committee on Banking Supervision.
Caouette, J., Altman, E. and Narayanan, P. (1998). Managing Credit Risk, The
Next Great Financial Challenge, Wiley Frontiers in Finance, Vol. Wiley
Frontiers in Finance, Wiley & Sons, Inc, New York.
Carey, M. (1998). Credit risk in private debt portfolios, Journal of Finance
53(4): 1363–1387.
Carey, M. (2000). Dimensions of credit risk and their relationship to economic
capital requirements. Preprint, Federal Reserve Board.
Crouhy, M., Galai, D. and Mark, R. (2000). A comparative analysis of current
credit risk models, Journal of Banking and Finance 24(1-2): 59–117.
Crouhy, M., Galai, D. and Mark, R. (2001). Risk management, McGraw Hill.
Embrechts, P., Lindskog, F. and McNeil, A. (2001). Modelling dependence
with copulas and applications to risk management. Working paper, ETH
Zürich.
Frey, R. and McNeil, A. (2001). Modelling dependent defaults. Working paper,
ETH Zürich.
124 5 Sensitivity analysis of credit portfolio models
Implied Volatility
6 The Analysis of Implied
Volatilities
Matthias R. Fengler, Wolfgang Härdle and Peter Schmidt
The analysis of volatility in financial markets has become a first rank issue in
modern financial theory and practice: Whether in risk management, portfolio
hedging, or option pricing, we need to have a precise notion of the market’s
expectation of volatility. Much research has been done on the analysis of real-
ized historic volatilities, Roll (1977) and references therein. However, since it
seems unsettling to draw conclusions from past to expected market behavior,
the focus shifted to implied volatilities, Dumas, Fleming and Whaley (1998).
To derive implied volatilities the Black and Scholes (BS) formula is solved for
the constant volatility parameter σ using observed option prices. This is a more
natural approach as the option value is decisively determined by the market’s
assessment of current and future volatility. Hence implied volatility may be
used as an indicator for market expectations over the remaining lifetime of the
option.
It is well known that the volatilities implied by observed market prices exhibit
a pattern that is far different from the flat constant one used in the BS formula.
Instead of finding a constant volatility across strikes, implied volatility appears
to be non flat, a stylized fact which has been called ”smile”effect. In this
chapter we illustrate how implied volatilites can be analyzed. We focus first
on a static and visual investigation of implied volatilities, then we concentrate
on a dynamic analysis with two variants of principal components and interpret
the results in the context of risk management.
128 6 The Analysis of Implied Volatilities
6.1 Introduction
Implied volatilities are the focus of interest both in volatility trading and in
risk management. As common practice traders directly trade the so called
”vega”, i.e. the sensitivity of their portfolios with respect to volatility changes.
In order to establish vega trades market professionals use delta-gamma neutral
hedging strategies which are insensitive to changes in the underlying and to time
decay, Taleb (1997). To accomplish this, traders depend on reliable estimates
of implied volatilities and - most importantly - their dynamics.
One of the key issues in option risk management is the measurement of the
inherent volatility risk, the so called ”vega” exposure. Analytically, the ”vega”
is the first derivative of the BS formula with respect to the volatility parameter
σ, and can be interpreted as a sensitivity of the option value with respect to
changes in (implied) volatility. When considering portfolios composed out of
a large number of different options, a reduction of the risk factor space can
be very useful for assessing the riskiness of the current position. Härdle and
Schmidt (2002) outline a procedure for using principal components analysis
(PCA) to determine the maximum loss of option portfolios bearing vega expo-
sure. They decompose the term structure of DAX implied volatilities ”at the
money” (ATM) into orthogonal factors. The maximum loss, which is defined
directly in the risk factor space, is then modeled by the first two factors.
Our study on DAX options is organized as follows: First, we show how to de-
rive and to estimate implied volatilities and the implied volatility surface. A
data decription follows. In section 6.3.2, we perfom a standard PCA on the co-
variance matrix of VDAX returns to identify the dominant factor components
driving term structure movements of ATM DAX options. Section 6.3.3 intro-
duces a common principal components approach that enables us to model not
only ATM term structure movements of implied volatilities but the dynamics
of the ”smile” as well.
6.2 The Implied Volatility Surface 129
Ct − Pt = St − Ke−rτ .
XploRe offers a fast and convenient numerical way to invert the BS formula in
order to recover σ̂ from the market prices of Ct or Pt .
y = ImplVola(x{, IVmethod})
calculates implied volatilities.
which calculates European option prices according to the Black and Scholes
model, when no dividend is assumed. The first 5 input parameters follow the
notation in this paper, and task specifies whether one desires to know a call
price, task=1, or a put price, task=0. Indeed, for σ = 24.94% we reproduce
the assumed option call price of Ct = 1.94. XFGiv00.xpl
Now we present a more complex example using option data from the German
and Swiss Futures Exchange (EUREX). The data set volsurfdata2 contains
the full set of option prices (settlement prices) as observed on January 4th,
1999. The first column contains the settlement price S of the DAX, the second
the strike price K of the option, the third the interest rate r, the fourth time
to maturity τ , the fifth the option prices Ct or Pt and the last column finally
the type of option, either 0, i.e. a put, or 1, i.e. a call. Hence the data set is
already in the form as required by the quantlet ImplVola. We may therefore
use the following code to calculate the implied volatilities:
library ("finance")
x=read("volsurfdata2.dat") ; read the data
x=paf(x,x[,4]>0.14&&x[,4]<0.22) ; select 2 months maturity
y=ImplVola(x,"bisect") ; calculate ImplVola
sort(x[,2]~y) ; sort data according to strikes
6.2 The Implied Volatility Surface 131
Volatility Smile
0.5
Implied Volatility
0.45
0.4
0.35
0.3
In Figure 6.1 we display the output for the strike dimension. The deviation from
the BS model is clearly visible: implied volatilities form a convex ”smile” in
strikes. One finds a curved shape also across different maturities. In combina-
tion with the strike dimension this yields a surface with pronounced curvature
(Figure 6.2). The discontinuity of the ATM position is related to tax effects
exerting different influences on puts and calls, Hafner and Wallmeier (2001).
In our case this effect is not so important, since we smooth the observations
and calculate the returns of the implied volatility time series before applying
the PCA.
where σ̂i is the volatility implied by the observed option prices Cti or Pti . K1
and K2 are univariate kernel functions, and h1 and h2 are bandwidths. The
order 2 quartic kernel is given by
15 2
Ki (u) = 1 − u2 1(|u| ≤ 1).
16
As input parameters we first have the n × 6 matrix x which has been explained
in section 6.2.1. The remaining parameters concern the surface: stepwidth
is a 2 × 1 vector determining the stepwidth in the grid of the surface; the first
entry relates to the strike dimension, the second to the dimension across time
to maturity. firstXF, lastXF, firstMat, lastMat are scalar constants
giving the lowest limit and the highest limit in the strike dimension, and the
lowest and the highest limit of time to maturity in the volatility surface. The
option metric gives the choice whether to compute the surface in a moneyness
or in a strike metric. Setting metric = 0 will generate a surface computed
in a moneyness metric K/F , i.e. strike divided by the (implied) forward price
of the underlying, where the forward price is computed by Ft = St erτ . If
metric = 1, the surface is computed in the original strike dimension in terms
of K. bandwidth is a 2 × 1 vector determining the width of the bins for the
kernel estimator. p determines whether for computation a simple Nadaraya-
Watson estimator, p = 0, or a local polynomial regression, p 6= 0, is used.
The last and optional parameter IVmethod has the same meaning as in the
6.2 The Implied Volatility Surface 133
ImplVola quantlet. It tells XploRe which method to use for calculating the
implied volatilities, default again is Newton-Raphson.
The output are two variables. IVsurf is an N × 3 matrix containing the
coordinates of the points computed for the implied volatility surface, where
the first column contains the values of the strike dimension, the second those
of time to maturity, the third estimated implied volatilities. N is the number of
grid points. IVpoints is a M × 3 matrix containing the coordinates of the M
options used to estimate the surface. As before, the first column contains the
values for the strike dimension, the second the maturity, the third the implied
volatilities.
Before presenting an example we briefly introduce a graphical tool for display-
ing the volatility surface. The following quantlet plots the implied surface:
As input parameters we have the output of volsurf, i.e. the volatility sur-
face IVsurf, and the original observations IVpoints. An optional parame-
ter AdjustToSurface determines whether the surface plot is shown based on
the surface data given in IVsurf, or on the basis of the original observations
IVpoints. This option might be useful in a situation where one has estimated a
smaller part of the surface than would be possible given the data. By default,
or AdjustToSurface = 1, the graph is adjusted according to the estimated
surface.
XFGiv02.xpl
Volatility Surface
(3500.0,0.0,0.5)
0.4
0.4
0.7 (3500.0,1.0,0.3)
0.3
0.5
0.2
(3500.0,0.0,0.3)
4357.5
5215.0
6072.5
(6930.0,0.0,0.3)
Options on the DAX are the most actively traded contracts at the derivatives
exchange EUREX. Contracts of various strikes and maturities constitute a
liquid market at any specific time. This liquidity yields a rich basket of implied
volatilities for many pairs (K, τ ). One subject of our research concerning the
6.3 Dynamic Analysis 135
Term structure
0.45
0.4
Percentage [%]
0.35
0.3
0.25
1 2 3 4 5 6 7 8
Subindex
Figure 6.3. Term Structure of VDAX Subindices
XFGiv03.xpl
136 6 The Analysis of Implied Volatilities
Proceeding this way we obtain 8 time series of fixed maturity. Each time series
is a weighted average of two neighboring maturities and contains n = 440 data
points of implied volatilities.
The data set for the analysis of variations of implied volatilities is a collection
of term structures as given in Figure 6.3. In order to identify common factors
we use Principal Components Analysis (PCA). Changes in the term structure
can be decomposed by PCA into a set of orthogonal factors.
Define Xc = (xtj ) as the T × J matrix of centered first differences of ATM
implied volatilities for subindex j = 1, ..., J in time t = 1, ..., T , where in our
case J = 8 and T = 440. The sample covariance matrix S = T −1 Xc> Xc can be
decomposed by the spectral decomposition into
S = ΓΛΓ> , (6.6)
Pl Pl
j=1 λj j=1 V ar(yj )
ζl = P8 = P8 for l<8 (6.7)
j=1 λj j=1 V ar(yj )
6.3 Dynamic Analysis 137
The quantlet XFGiv04.xpl uses the VDAX data to estimate the proportion
of variance ζl explained by the first l PCs.
XFGiv04.xpl
As the result shows the first PC captures around 70% of the total data vari-
ability. The second PC captures an additional 13%. The third PC explains a
considerably smaller amount of total variation. Thus, the two dominant PCs
together explain around 83% of the total variance in implied ATM volatilities
for DAX options. Taking only the first two factors, i.e. those capturing around
83% in the data, the time series of implied ATM volatilities can therefore be
represented by a factor model of reduced dimension:
where γjk denotes the jkth element of Γ = (γjk ), ytk is taken from the matrix
of principal components Y , and t denotes white noise. The γj are in fact
the sensitivities of the implied volatility time series to shocks on the principal
components. As is evident from Figure 6.4, a shock on the first factor tends
to affect all maturities in a similar manner, causing a non-parallel shift of the
term structure. A shock in the second factor has a strong negative impact on
the front maturity but a positive impact on the longer ones, thus causing a
change of curvature in the term structure of implied volatilities.
Factor Loadings
1
0.5
Percentage [%]
0
-0.5
2 4 6 8
Subindex
Figure 6.4. Factor Loadings of First and Second PC
XFGiv05.xpl
nS ∼ Wp (Ψ, n − 1)
k n 1 o
− 1 (n −1)
Y
L (Ψ1 , ..., Ψk ) = C exp tr − (ni − 1)Ψ−1
i S i |Ψi | 2 i (6.10)
i=1
2
Assuming that HCP C holds, i.e. in replacing Ψi by ΓΛi Γ> , one gets after some
manipulations
p
k
!
X X γj> Si γj
g(Γ, Λ1 , ..., Λk ) = (ni − 1) ln λij + .
i=1 j=1
λij
impose the p constraints γj> γj = 1 using the Lagrange multiplyers µj , and the
remaining p(p − 1)/2 constraints γh> γj = 0 for (h 6= j) using the multiplyer
µhj . This yields
p
X p
X
g ∗ (Γ, Λ1 , ..., Λk ) = g(·) − µj (γj> γj − 1) − 2 µhj γh> γj .
j=1 h<j
Taking partial derivatives with respect to all λim and γm , it can be shown
(Flury, 1988) that the solution of the CPC model is given by the generalized
system of characteristic equations
k
!
>
X λim − λij
γm (ni − 1) Si γj = 0, m, j = 1, ..., p, m 6= j. (6.11)
i=1
λim λij
Flury (1988) proves existence and uniqueness of the maximum of the likelihood
function, and Flury and Gautschi (1988) provide a numerical algorithm, which
has been implemented in the quantlet CPC.
CPC-Analysis
A number of quantlets are designed for an analysis of covariance matrices,
amongst them the CPC quantlet:
We plot the first three eigenvectors in a parallel coordinate plot in Figure 6.5.
The basic structure of the first three eigenvectors is not altered. We find a
shift, a slope and a twist structure. This structure is common to all maturity
groups, i.e. when exploiting PCA as a dimension reducing tool, the same
transformation applies to each group! However, from comparing the size of
eigenvalues among groups, i.e. ZZ.lambda, we find that variability is dropping
across groups as we move from the front contracts to long term contracts.
Before drawing conclusions we should convince ourselves that the CPC model
is truly a good description of the data. This can be done by using a likelihood
ratio test. The likelihood ratio statistic for comparing a restricted (the CPC)
model against the unrestricted model (the model where all covariances are
treated separately) is given by
L(Ψ
b 1 , ..., Ψ
b k)
T(n1 ,n2 ,...,nk ) = −2 ln .
L(S1 , ..., Sk )
0.5
Y
0
-0.5
1 2 3 4 5 6
Index of Eigenvectors
Figure 6.5. Factor loadings of the first (blue), the second (green), and
the third PC (red)
XFGiv06.xpl
XFGiv06.xpl
The calculations yield T(n1 ,n2 ,...,nk ) = 31.836, which corresponds to the p-value
p = 0.37512 for the χ2 (30) distribution. Hence we cannot reject the CPC
model against the unrelated model, where PCA is applied to each maturity
separately.
Using the methods in section 6.3.2, we can estimate the amount of variability ζl
explained by the first l principle components: again a few number of factors, up
to three at the most, is capable of capturing a large amount of total variability
present in the data. Since the model now captures variability both in strike
and maturity dimension, this can be a suitable starting point for a simplified
6.3 Dynamic Analysis 143
VaR calculation for delta-gamma neutral option portfolios using Monte Carlo
methods, and is hence a valuable insight for risk management.
Bibliography
Aı̈t-Sahalia, Y. and Lo, A. W. (1998). Nonparametric Estimation of State-Price
Densities Implicit in Financial Assets, Journal of Finance Vol. LIII, 2,
pp. 499–547.
Aı̈t-Sahalia, Y. and Lo, A. W. (2000). Nonparametric Risk management and
implied risk aversion, Journal of Econometrics 94, pp. 9–51.
Dumas, B., Fleming, J. and Whaley, R. E. (1998). Implied Volatility Functions:
Empirical Tests, Journal of Finance Vol. LIII, 6, pp. 2059–2106.
Fengler, M. R., Härdle, W. and Villa, Chr. (2001). The Dynamics of Implied
Volatilities: A Common Principal Components Approach, SfB 373 Discus-
sion Paper No. 2001/38, HU Berlin.
Flury, B. (1988). Common Principle Components Analysis and Related Multi-
variate Models, Wiley Series in Probability and Mathematical Statistics,
John Wiley & Sons, New York.
Flury, B. and Gautschi, W. (1986). An Algorithm for simultaneous orthogonal
transformation of several positive definite symmetric matrices to nearly
diagonal form SIAM Journal on Scientific and Statistical Computing,7,
pp. 169–184.
Härdle, W.(1990). Applied Nonparametric Regression, Econometric Society
Monographs 19, Cambridge University Press.
Härdle, W., Müller, M., Sperlich, S. and Werwartz, A. (2002). Non- and
Semiparametric Modelling, Springer, e-book http://www.xplore-stat.de
Härdle, W. and Schmidt, P. (2002). Common Factors Governing VDAX Move-
ments and the Maximum Loss, Financial Markets and Portfolio Manage-
ment, forthcoming.
Hafner, R. and Wallmeier, M. (2001). The Dynamics of DAX Implied Volatil-
ities, International Quarterly Journal of Finance,1, 1, pp. 1–27.
144 6 The Analysis of Implied Volatilities
Villa, C. and Sylla, A. (2000). Measuring implied surface risk using PCA
in Franke, J., Härdle, W. and Stahl, G.: Measuring Risk in Complex
Stochastic Systems, LNS 147, Springer Verlag, New York, pp. 131–147.
7 How Precise Are Price
Distributions Predicted by
Implied Binomial Trees?
Wolfgang Härdle and Jun Zheng
In recent years, especially after the 1987 market crash, it became clear that
the prices of the underlying asset do not exactly follow the Geometric Brow-
nian Motion (GBM) model of Black and Scholes. The GBM model with con-
stant volatility leads to a log-normal price distribution at any expiration date:
All options on the underlying must have the same Black-Scholes (BS) implied
volatility, and the Cox-Ross-Rubinstein (CRR) binomial tree makes use of this
fact via the construction of constant transition probability from one node to
the corresponding node at the next level in the tree. In contrast, the implied bi-
nomial tree (IBT) method simply constructs a numerical procedure consistent
with the volatility smile. The empirical fact that the market implied volatil-
ities decrease with the strike level, and increase with the time to maturity of
options is better reflected by this construction. The algorithm of the IBT is a
data adaptive modification of the CRR method.
An implied tree should satisfy the following principles:
Besides these practical issues, the IBT may evaluate the future stock price dis-
tributions according to the BS implied volatility surfaces which are calculated
from currently observed daily market option prices.
We describe the construction of the IBT and analyze the precision of the pre-
dicted implied price distributions. In Section 7.1, a detailed outline of the IBT
algorithm for a liquid European-style option is given. We follow first the Der-
man and Kani (1994) algorithm, discuss its possible shortcomings, and then
present the Barle and Cakici (1998) construction. This method is character-
ized by a normalization of the central nodes according to the forward price.
Next, we study the properties of the IBT via Monte-Carlo simulations and
comparison with simulated conditional density from a diffusion process with a
non-constant volatility. In Section 7.3, we apply the IBT to a DAX index data
set containing the underlying asset price, strike price, interest rate, time to
maturity, and call or put option price from the MD*BASE database (included
in XploRe), and compare SPD estimated by historical index price data with
those predicted by the IBT. Conclusions and discussions on practical issues are
presented in the last section.
at any option expiration T , where St is the stock price at time t, r is the riskless
interest rate, τ = T −t is time to maturity, and σ the volatility. The model also
has the characteristic that all options on the underlying must have the same
BS implied volatility.
However, the market implied volatilities of stock index options often show ”the
volatility smile”, which decreases with the strike level, and increases with the
time to maturity τ . There are various proposed extensions of this GBM model
to account for ”the volatility smile”. One approach is to incorporate a stochas-
tic volatility factor, Hull and White (1987); another allows for discontinuous
jumps in the stock price, Merton (1976). However, these extensions cause sev-
eral practical difficulties. For example, they violate the risk-neutral condition.
7.1 Implied Binomial Trees 147
The IBT technique proposed by Rubinstein (1994), Derman and Kani (1994),
Dupire (1994), and Barle and Cakici (1998) account for this phenomenon.
These papers assume the stock prices in the future are generated by a modified
random walk where the underlying asset has a variable volatility that depends
on both stock price and time. Since the implied binomial trees allow for non-
constant volatility σ = σ(St , t), they are in fact modifications of the original
Cox, Ross and Rubinstein (1979) binomial trees. The IBT construction uses
the observable market option prices in order to estimate the implied distribu-
tion. It is therefore nonparametric in nature. Alternative approaches may be
based on the kernel method, Aı̈t-Sahalia, and Lo (1998), nonparametric con-
strained least squares, Härdle and Yatchew (2001), and curve-fitting methods,
Jackwerth and Rubinstein (1996).
The CRR binomial tree is the discrete implementation of the GBM process
dSt
= µdt + σdZt , (7.2)
St
where Zt is a standard Wiener process, and µ and σ are constants. Similarly,
the IBT can be viewed as a discretization of the following model in which the
generalized volatility parameter is allowed to be a function of time and the
underlying price,
dSt
= µt dt + σ(St , t)dZt , (7.3)
St
where σ(St , t) is the instantaneous local volatility function. The aim of the
IBT is to construct a discrete approximation of the model on the basis of the
observed option prices yielding the variable volatility σ(St , t). In addition, the
IBT may reflect a non-constant drift µt .
"
!
$
#
#
#
#
#
% &' % (
)
*,+ - . / .
0 .
a transition from node (n, i) to node (n + 1, i + 1). Figure 7.1 illustrates the
construction of an IBT.
We assume the forward price Fn,i satisfies the risk-neutral condition:
Thus the transition probability can be obtained from the following equation:
Fn,i − sn+1,i
pn,i = . (7.5)
sn+1,i+1 − sn+1,i
The Arrow-Debreu price λn,i , is the price of an option that pays 1 unit payoff
in one and only one state i at nth level, and otherwise pays 0. In general,
7.1 Implied Binomial Trees 149
stock price
122.15
110.52
100.00 100.00
90.48
81.88
Arrow-Debreu price
0.37
0.61
1.00 0.44
0.36
0.13
For example, using the CRR method, s2,1 = s1,1 e−σ 4t = 100 × e−0.1 = 90.48,
and s2,2 = s1,1 eσ 4t = 110.52, the transition probability p1,1 = 0.61 is obtained
by the formula (7.5), then according to the formula (7.6), λ2,1 = e−r 4t (1 −
p1,1 ) = 0.36. At the third level, calculate the stock prices according to the
corresponding nodes at the second level, For example, s3,1 = s2,1 · e−σ 4t =
122.15, s3,2 = s1,1 = 100.
150 7 How Precise Are Price Distributions Predicted by IBT?
where C(K, τ ) and P (K, τ ) are call option price and put option price respec-
tively, and K is the strike price. In the IBT, option prices are calculated
analogously for τ = n4t,
n+1
X
C(K, n4t) = λn+1,i max(sn+1,i − K, 0), (7.9)
i=1
n+1
X
P (K, n4t) = λn+1,i max(K − sn+1,i , 0). (7.10)
i=1
Using the risk-neutral condition (7.4) and the discrete option price calculation
from (7.9) or (7.10), one obtains the iteration formulae for constructing the
IBT.
There are (2n + 1) parameters which define the transition from the nth to
the (n + 1)th level of the tree, i.e., (n + 1) stock prices of the nodes at the
(n + 1)th level, and n transition probabilities. Suppose (2n − 1) parameters
corresponding to the nth level are known, the sn+1,i and pn,i corresponding to
the (n + 1)th level can be calculated depending on the following principles:
We always start from the center nodes in one level, if n is even, define sn+1,i =
s1,1 = S, for i = n/2 + 1, and if n is odd, start from the two central nodes
sn+1,i and sn+1,i+1 for i = (n + 1)/2, and suppose sn+1,i = s2n,i /sn+1,i+1 =
S 2 /sn+1,i+1 , which adjusts the logarithmic spacing between sn,i and sn+1,i+1
to be the same as that between sn,i and sn+1,i . This principle yields the
calculation formula of sn+1,i+1 , see Derman and Kani (1994),
C(K, τ ) is the interpolated value for a call struck today at strike price K and
time to maturity τ . In the D & K construction, the interpolated option price
entering (7.11) is based on a CRR binomial tree with constant parameters
σ = σimp (K, τ ), where the BS implied volatility σimp (K, τ ) can be calculated
from the known market option prices. Calculating interpolated option prices
by the CRR method has a drawback, it is computational intensive.
Once we have the initial nodes’ stock prices, according to the relationships
among the different parameters, we can continue to calculate those at higher
nodes (n + 1, j), j = i + 2, . . . n + 1 and transition probabilities one by one
using the formula:
7.1.2 Compensation
In order to avoid arbitrage, the transition probability pn,i at any node should
lie between 0 and 1, it makes therefore sense to limit the estimated stock prices
Sometimes, the obtained price still does not satisfy inequality (7.15), then we
choose the average of Fn,i and Fn,i+1 as a proxy for sn+1,i+1 .
In fact, the product of the Arrow-Debreu prices λn,i at the nth level with the
influence of interest rate er(n−1)4 t can be considered as a discrete estimation
of the implied distribution, the SPD, p(ST , St , r, τ ) at τ = (n − 1)4 t. In the
case of the GBM model with constant volatility, this density is corresponding
to (7.1).
After the construction of an IBT, we know all stock prices, transition proba-
bilities, and Arrow-Debreu prices at any node in the tree. We are thus able
to calculate the implied local volatility σloc (sn,i , m4t) (which describes the
structure of the second moment of the underlying process) at any level m as a
discrete estimation of the following conditional variance at s = sn,i , τ = m4t.
Under the risk-neutral assumption
2
σloc (s, τ ) = Var(log St+τ |St = s)
Z
= (log St+τ − E log St+τ )2 p(St+τ |St = s) dSt+τ
Z
= (log St+τ − E log St+τ )2 p(St , St+τ r, τ ) dSt+τ . (7.16)
Notice that the instantaneous volatility function used in (7.3) is different from
the BS implied volatility function defined in (7.16), but in the GBM they are
identical.
7.1 Implied Binomial Trees 153
Barle and Cakici (1998) proposed an improvement of the Derman and Kani
construction. The major modification is the choice of the stock price of the
central nodes in the tree: their algorithm takes the riskless interest rate into
account. If (n + 1) is odd, then sn+1,i = s1,1 er n4t = Ser n4t for i = n/2 + 1,
if (n + 1) is even, then start from the two central nodes sn+1,i and sn+1,i+1
2
for i = (n + 1)/2, and suppose sn+1,i = Fn,i /sn+1,i+1 . Thus sn+1,i can be
calculated as:
where C(K, τ ) is defined as in the Derman and Kani algorithm, and the ρu is
n
X
ρu = λn,j (Fn,j − Fn,i ). (7.19)
j=i+1
After stock prices of the initial nodes are obtained, then continue to calculate
those at higher nodes (n + 1, j), j = i + 2, . . . n + 1 and transition probabilities
one by one using the following recursion:
1, j), j = i − 1, . . . 1.
XFGIBT01.xpl
Derman and Kani one year (four step) implied binomial tree
stock price
119.91
115.06
110.04 110.06
105.13 105.13
100.00 100.00 100.00
95.12 95.12
89.93 89.92
85.22
80.01
transition probability
0.60
0.58
0.59 0.59
0.56 0.56
0.59 0.59
0.54
0.59
Arrow-Debreu price
0.111
0.187
0.327 0.312
0.559 0.405
1.000 0.480 0.343
0.434 0.305
0.178 0.172
0.080
0.033
This IBT is corresponding to τ = 1 year, and 4t = 0.25 year, which shows the
stock prices, and the elements at the jth column are corresponding to the stock
156 7 How Precise Are Price Distributions Predicted by IBT?
prices of the nodes at the (j − 1)th level in the tree. The second one, its (n, j)
element is corresponding to the transition probability from the node (n, j) to
the nodes (n + 1, j + 1). The third tree contains the Arrow-Debreu prices of
the nodes. Using the stock prices together with Arrow-Debreu prices of the
nodes at the final level, a discrete approximation of the implied distribution
can be obtained. Notice that by the definition of the Arrow-Debreu price, the
risk neutral probability corresponding to each node should be calculated as the
product of the Arrow-Debreu price and the factor erτ .
If we choose small enough time steps, we obtain the estimation of the implied
price distribution and the implied local volatility surface σloc (s, τ ). We still use
the same assumption on the BS implied volatility surface as above here, which
means σimp (K, τ ) = 0.15 − 0.0005 K, and assume S0 = 100, r = 0.03, T = 5
year.
XFGIBT02.xpl
Two figures are generated by running the quantlet XFGIBT02.xpl, Figure 7.2
shows the plot of the SPD estimation resulting from fitting an implied five-year
tree with 20 levels. The implied local volatilities σloc (s, τ ) in the implied tree at
different time to maturity and stock price levels is shown in Figure 7.3, which
obviously decreases with the stock price and increases with time to maturity
as expected.
The Barle and Cakici algorithm can be applied in analogy to Derman and
Kani’s. The XploRe quantlets used here are similar to those presented in
Section 7.2.1, one has to replace the quantlet IBTdk by IBTdc. The following
figure displays the one-year (four step) stock price tree, transition probability
tree, and Arrow-Debreu tree. Figure 7.4 presents the plot of the estimated SPD
by fitting a five year implied binomial tree with 20 levels to the volatility smile
using Barle and Cakici algorithm, and Figure 7.5, shows the characteristics of
the implied local volatility surface of the generated IBT, decreases with the
stock price, and increases with time.
7.2 A Simulation and a Comparison of the SPDs 157
50 100 150
stock price
5.50
4.25
0.23
3.00 0.16
1.75 0.08
stock price
158 7 How Precise Are Price Distributions Predicted by IBT?
123.85
117.02
112.23 112.93
104.84 107.03
100.00 101.51 103.05
96.83 97.73
90.53 93.08
87.60
82.00
transition probability
0.46
0.61
0.38 0.48
0.49 0.49
0.64 0.54
0.36
0.57
Arrow-Debreu price
0.050
0.111
0.185 0.240
0.486 0.373
1.000 0.619 0.394
0.506 0.378
0.181 0.237
0.116
0.050
We now compare the SPD estimation at the fifth year obtained by the two IBT
methods with the estimated density function of the Monte-Carlo simulation
of St , t = 5 generated from the model (7.3), where σ(St , t) = 0.15 − 0.0005 St ,
7.2 A Simulation and a Comparison of the SPDs 159
50 100 150
stock price
5.50
4.25 0.24
3.00 0.16
1.75 0.08
Figure 7.5. Implied local volatility surface by the Barle and Cakici
IBT.
µt = r = 0.03. We use the Milstein scheme, Kloeden, Platen and Schurz (1994)
to perform the discrete time approximation in (7.3). It has strong convergence
rate δ 1 . We have set the time step with δ = 1/1000 here.
160 7 How Precise Are Price Distributions Predicted by IBT?
0.15
Estimated State Price Density
probability*0.1
0.1
0.05
0
simulation. Compare Figure 7.7 and Figure 7.8 with Figure 7.9, and notice that
in the first two figures, some edge values cannot be obtained directly from the
five-year IBT. However, the three implied local volatility surface plots all actu-
ally coincide with the volatility smile characteristic, the implied local volatility
of the out-the-money options decreases with the increasing stock price, and
increase with time.
5.50
4.25 0.22
3.00 0.14
1.75 0.07
5.50
4.25 0.20
3.00 0.13
1.75 0.07
5.00
3.88 0.28
2.75 0.20
1.62 0.13
XFGIBT05.xpl
Figure 7.11 shows the price distribution estimation obtained by the Barle and
Cakici IBT, for τ = 0.5 year. Obviously, the estimated SPD by the Derman
and Kani IBT can be obtained similarly. In order to check the precision of
the estimated price distribution obtained by the IBT method, we compare it
to use DAX daily prices between January 1, 1997, and January 4. 1999. The
historical time series density estimation method described in Aı̈t-Sahalia, Wang
and Yared (2000) is used here. Notice that Risk-neutrality implies two kinds
of SPD should be equal, historical time series SPD is in fact the conditional
density function of the diffusion process. We obtain the historical time series
SPD estimation by the following procedure:
76.00 strike
92.72
109.44
126.16
142.88
159.60
0.13
0.11
0.08
vola
0.05
0.03
5.12
4.14
3.15
0.00 2.16
1.18
0.19 maturity
5000 10000
stock price
5. Estimate conditional density function g = p(ST |St , µ̂, σ̂) from Monte-
Carlo simulated process
From Figure 7.12 we conclude that the SPD estimated by the Derman and Kani
IBT and the one obtained by Barle and Cakici IBT can be used to forecast fu-
ture SPD. The SPD estimated by different methods sometimes have deviations
on skewness and kurtosis. In fact the detection of the difference between the
historical time series SPD estimation and the SPD recovered from daily option
prices may be used as trading rules, see Table 7.1 and Chapter 9. In Table 7.1,
SPD estimated from daily option prices data set is expressed by f and the time
series SPD is g. A far out of the money (OTM) call/put is defined as one whose
exercise price is 10% higher (lower) than the future price. While a near OTM
call/put is defined as one whose exercise price is 5% higher (lower) but 10%
lower(higher)than the future price. When skew(f ) < skew(g), agents appar-
ently assign a lower probability to high outcomes of the underlying than would
be justified by the time series SPD (see Figure 7.13). Since for call options
only the right ‘tail’ of the support determines the theoretical price the latter is
smaller than the price implied by diffusion process using the time series SPD.
That is we buy calls. The same reason applies to put options.
166 7 How Precise Are Price Distributions Predicted by IBT?
5000 10000
stock price
From the simulations and real data example, we find that the implied binomial
tree is an easy way to assess the future stock prices, capture the term structure
of the underlying asset, and replicate the volatility smile. But the algorithms
still have some deficiencies. When the time step is chosen too small, negative
transition probabilities are encountered more and more often. The modification
of these values loses the information about the smile at the corresponding
nodes. The Barle and Cakici algorithm is a better choice when the interest
rate is high.Figure 7.15 shows the deviation of the two methods under the
7.3 Example – Analysis of DAX data 167
Skewness Trade
g
0.4
f=SPD
probability*0.1
0.3
0.2
sell put
0.1
buy call
0
1 2 3 4 5 6 7
stock price
Figure 7.13. Skewness Trade, skew(f )< skew(g).
Kurtosis Trade
0.5
f
0.4
g
probability*0.1
0.3
0.2
0.1
sell sell
-1 0 1 2 3 4 5
stock price
Figure 7.14. Kurtosis Trade, kurt(f )> kurt(g).
situation that r = 0.2. When the interest rate is a little higher, Barle and
Cakici algorithm still can be used to construct the IBT while Derman and
Kani’s cannot work any more. The times of the negative probabilities appear
are fewer than Derman and Kani construction (see Jackwerth (1999)).
168 7 How Precise Are Price Distributions Predicted by IBT?
0.3
probability*0.1
0.2
0.1
0
Besides its basic purpose of pricing derivatives in consistency with the market
prices, IBT is useful for other kinds of analysis, such as hedging and calculating
of implied probability distributions and volatility surfaces. It estimate the
future price distribution according to the historical data. On the practical
application aspect, the reliability of the approach depends critically on the
quality of the estimation of the dynamics of the underlying price process, such
as BS implied volatility surface obtained from the market option prices.
The IBT can be used to produce recombining and arbitrage-free binomial trees
to describe stochastic processes with variable volatility. However, some serious
limitations such as negative probabilities, even though most of them appeared
at the edge of the trees. Overriding them causes loss of the information about
the smile at the corresponding nodes. These defects are a consequence of the
requirement that a continuous diffusion is approximated by a binomial process.
Relaxation of this requirement, using multinomial trees or varinomial trees is
possible.
7.3 Example – Analysis of DAX data 169
Bibliography
Aı̈t-Sahalia, Y. and Lo, A. (1998). Nonparametric Estimation of State-Price
Densities Implicit in Financial Asset Prices, Journal of Finance, 53: 499–
547.
8.1 Introduction
Derivative markets offer a rich source of information to extract the market’s
expectations of the future price of an asset. Using option prices, one may derive
the whole risk-neutral probability distribution of the underlying asset price at
the maturity date of the options. Once this distribution also called State-Price
Density (SPD) is estimated, it may serve for pricing new, complex or illiquid
derivative securities.
There exist numerous methods to recover the SPD empirically. They can be
separated in two classes:
The first class includes methods which consist in estimating the parameters of a
mixture of log-normal densities to match the observed option prices, Melick and
Thomas (1997). Another popular approach in this class is the implied binomial
trees method, see Rubinstein (1994), Derman and Kani (1994) and Chapter 7.
Another technique is based on learning networks suggested by Hutchinson, Lo
and Poggio (1994), a nonparametric approach using artificial neural networks,
radial basis functions, and projection pursuits.
The second class of methods is based on the result of Breeden and Litzen-
berger (1978). This methodology is based on European options with identical
172 8 Estimating State-Price Densities with Nonparametric Regression
time to maturity, it may therefore be applied to fewer cases than some of the
techniques in the first class. Moreover, it also assumes a continuum of strike
prices on R+ which can not be found on any stock exchange. Indeed, the
strike prices are always discretely spaced on a finite range around the actual
underlying price. Hence, to handle this problem an interpolation of the call
pricing function inside the range and extrapolation outside may be performed.
In the following, a semiparametric technique using nonparametric regression of
the implied volatility surface will be introduced to provide this interpolation
task. A new approach using constrained least squares has been suggested by
Yatchew and Härdle (2002) but will not be explored here.
The concept of Arrow-Debreu securities is the building block for the analysis of
economic equilibrium under uncertainty. Rubinstein (1976) and Lucas (1978)
used this concept as a basis to construct dynamic general equilibrium models
in order to determine the price of assets in an economy. The central idea of this
methodology is that the price of a financial security is equal to the expected
net present value of its future payoffs under the risk-neutral probability density
function (PDF). The net present value is calculated using the risk-free interest
rate, while the expectation is taken with respect to the weighted-marginal-rate-
of-substitution PDF of the payoffs. The latter term is known as the state-price
density (SPD), risk-neutral PDF, or equivalent martingale measure. The price
of a security at time t (Pt ) with a single liquidation date T and payoff Z(ST )
is then:
Z ∞
Pt = e−rt,τ τ E∗t [Z(ST )] = e−rt,τ τ Z(ST )ft∗ (ST )dST (8.1)
−∞
where E∗t is the conditional expectation given the information set in t under the
equivalent martingale probability, ST is the state variable, rt,τ is the risk-free
rate at time t with time to maturity τ , and ft∗ (ST ) is the SPD at time t for
date T payoffs.
Rubinstein (1985) shows that if one has two of the three following pieces of
information:
then one can recover the third. Since the agent’s preferences and the true data-
8.2 Extracting the SPD using Call-Options 173
u1 − u2
Z(ST , K; ∆K) = P (ST −τ , τ, K; ∆K)|τ =0 = =1 (8.2)
∆K ST =K,τ =0
where
C(S, τ, K) denotes the price of a European call with an actual underlying price
S, a time to maturity τ and a strike price K. Here, P (ST −τ , τ, K; ∆K) is the
1
corresponding price of this security ( ∆K ∗ butterf ly spread(K; ∆K)) at time
T − τ.
As ∆K tends to zero, this security becomes an Arrow-Debreu security paying 1
if ST = K and zero in other states. As it is assumed that ST has a continuous
distribution function on R+ , the probability of any given level of ST is zero
and thus, in this case, the price of an Arrow-Debreu security is zero. However,
1
dividing one more time by ∆K, one obtains the price of ( (∆K) 2 ∗ butterf ly
spread(K; ∆K)) and as ∆K tends to 0 this price tends to f (ST )e−rt,τ for
∗
ST = K. Indeed,
P (St , τ, K; ∆K)
lim = f ∗ (ST )e−rt,τ . (8.3)
∆K→0 ∆K
K=ST
174 8 Estimating State-Price Densities with Nonparametric Regression
∂ 2 Ct (·)
= . (8.4)
∂K 2
where rt,τ denotes the risk-free interest rate at time t with time to maturity τ
and ft∗ (·) denotes the risk-neutral PDF or the SPD in t. Therefore, the SPD
is defined as:
2
∗ rt,τ τ ∂ Ct (·)
ft (ST ) = e 2
. (8.5)
∂K K=ST
This method constitutes a no-arbitrage approach to recover the SPD. No as-
sumption on the underlying asset dynamics are required. Preferences are not
restricted since the no-arbitrage method only assumes risk-neutrality with re-
spect to the underlying asset. The only requirements for this method are
that markets are perfect (i.e. no sales restrictions, transactions costs or taxes
and that agents are able to borrow at the risk-free interest rate) and that
C(·) is twice differentiable. The same result can be obtained by differentiat-
ing (8.1) twice with respect to K after setting for Z the call payoff function
Z(ST ) = (ST − K)+ .
8.2 Extracting the SPD using Call-Options 175
The Black-Scholes call option pricing formula is due to Black and Scholes (1973)
and Merton (1973). In this model there are no assumptions regarding prefer-
ences, rather it relies on no-arbitrage conditions and assumes that the evolution
of the underlying asset price St follows a geometric Brownian motion defined
through
dSt
= µdt + σdWt . (8.6)
St
The Black-Scholes SPD can be calculated in XploRe using the following quant-
let:
bsspd = spdbs(K,s,r,div,sigma,tau)
estimates the Black-Scholes SPD
The arguments are the strike prices (K), underlying price (s), risk-free interest
rate (r), dividend yields (div), implied volatility of the option (sigma), and
the time to maturity (tau). The output consist of the Black-Scholes SPD
(bsspd.fbs), ∆ (bsspd.delta), and the Γ (bsspd.gamma) of the call options.
Please note that spdbs can be applied to put options by using the Put-Call
parity.
However, it is widely known that the Black-Scholes call option formula is not
valid empirically. For more details, please refer to Chapter 6. Since the Black-
Scholes model contains empirical irregularities, its SPD will not be consistent
with the data. Consequently, some other techniques for estimating the SPD
without any assumptions on the underlying diffusion process have been devel-
oped in the last years.
The use of nonparametric regression to recover the SPD was first investigated
by Aı̈t-Sahalia and Lo (1998). They propose to use the Nadaraya-Watson esti-
mator to estimate the historical call prices Ct (·) as a function of the following
state variables (St , K, τ, rt,τ , δt,τ )> . Kernel regressions are advocated because
there is no need to specify a functional form and the only required assumption
is that the function is smooth and differentiable, Härdle (1990). When the re-
gressor dimension is 5, the estimator is inaccurate in practice. Hence, there is
a need to reduce the dimension or equivalently the number of regressors. One
method is to appeal to no-arbitrage arguments and collapse St , rt,τ and δt,τ
into the forward price Ft = St e(rt,τ −δt,τ )τ in order to express the call pricing
function as:
Combining the assumptions of (8.7) and (8.8) the call pricing function can be
further reduced to a function of three variables ( FK
t,τ
, τ, rt,τ ).
Another approach is to use a semiparametric specification based on the Black-
Scholes implied volatility. Here, the implied volatility σ is modelled as a non-
parametric function, σ(Ft,τ , K, τ ):
ˆt = ∂ Ĉt (·)
Once a smooth estimate of σ̂(·) is obtained, estimates of Ĉt (·), ∆ ∂St ,
2 2
Γ̂t = ∂ ∂S
Ĉt (·)
2 , and fˆt∗ = ert,τ τ ∂ ∂K
Ĉt (·)
2 can be calculated.
t
uses intraday data for one maturity and estimates an implied volatility surface
where the dimension are the intraday time and the moneyness of the options.
Here, a slightly different method is used which relies on all settlement prices
of options of one trading day for different maturities to estimate the implied
volatility surface σ(K/Ft,τ , τ ). In the second step, these estimates are used for
a given time to maturity which may not necessarily correspond to the maturity
of a series of options. This method allows one to compare the SPD at different
dates because of the fixed maturity provided by the first step. This is interesting
if one wants to study the dynamics and the stability of these densities.
Fixing the maturity also allows us to eliminate τ from the specification of the
implied volatility function. In the following part, for convenience, the definition
of the moneyness is M = S̃t /K and we denote by σ the implied volatility. The
notation ∂f (x∂x
1 ,...,xn )
i
denotes the partial derivative of f with respect to xi and
df (x)
dx the total derivative of f with respect to x.
Moreover, we use the following rescaled call option function:
Cit
cit = ,
S̃t
S̃t
Mit = .
Ki
where Cit is the price of the ith option at time t and Ki is its strike price.
The rescaled call option function can be expressed as:
e−rτ Φ(d2 )
cit = c(Mit ; σ(Mit )) = Φ(d1 ) − ,
Mit
log(Mit ) + rt + 12 σ(Mit )2 τ
d1 = √ ,
σ(Mit ) τ
√
d2 = d1 − σ(Mit ) τ .
The standard risk measures are then the following partial derivatives (for no-
tational convenience subscripts are dropped):
∂C ∂C ∂c
∆= = = c(M, σ(M )) + S̃ ,
∂S ∂ S̃ ∂ S̃
where
∂c dc ∂M dc 1
= = ,
∂ S̃ dM ∂ S̃ dM K
2
∂2c d2 c 1
= 2
.
∂ S̃ 2 dM K
The SPD is then the second derivative of the call option function with respect
to the strike price:
∂2C ∂2c
f ∗ (·) = erτ 2
= erτ S̃ . (8.13)
∂K ∂K 2
The conversion is needed because c(·) is being estimated not C(·). The analyt-
ical expression of (8.13) depends on:
2
∂2c d2 c
M dc M
= +2
∂K 2 dM 2 K dM K 2
dc
The functional form of dM is:
The quantities in (8.14) and (8.15) are a function of the following first deriva-
tives:
dd1 ∂d1 ∂d1 ∂σ
= + ,
dM ∂M ∂σ ∂M
dd2 ∂d2 ∂d2 ∂σ
= + ,
dM ∂M ∂σ ∂M
∂d1 ∂d2 1
= = √ ,
∂M ∂M Mσ τ
√
∂d1 log(M ) + rτ τ
= − √ + ,
∂σ σ2 τ 2
√
∂d2 log(M ) + rτ τ
= − √ − .
∂σ σ2 τ 2
V = σ(M ),
∂σ(M )
V0 = ,
∂M
∂ 2 σ(M )
V 00 = . (8.16)
∂M 2
The quantities in (8.14) and (8.15) also depend on the following second deriva-
tive functions:
√
d2 d1 V0
1 1 00 τ log(M ) + rτ
= − √ + + V − √
dM 2 Mσ τ M σ 2 σ2 τ
log(M ) + rτ 1
+ V 0 2V 0 3
√ − √ , (8.17)
σ τ M σ2 τ
√
d2 d2 V0
1 1 00 τ log(M ) + rτ
= − √ + −V + √
dM 2 Mσ τ M σ 2 σ2 τ
log(M ) + rτ 1
+ V 0 2V 0 √ − √ . (8.18)
σ3 τ M σ2 τ
Local polynomial estimation is used to estimate the implied volatility smile and
its first two derivatives in (8.16). A brief explanation will be described now.
8.3 Semiparametric estimation of the SPD 181
Consider the following data generating process for the implied volatilities:
1 ∂ 2 g
∂g
g(m, τ ) ≈ g(m0 , τ0 ) + (m − m 0 ) + (m − m0 )2
∂M m0 ,τ0 2 ∂M 2 m0 ,τ0
1 ∂ 2 g
∂g
+ (τ − τ0 ) + (τ − τ0 )2
∂τ m0 ,τ0 2 ∂τ 2 m0 ,τ0
1 ∂ 2 g
+ (m − m0 )(τ − τ0 ). (8.19)
2 ∂M ∂τ m0 ,τ0
σ1 β0
σ = ... , W = diag{KhM ,hτ (Mj − m0 , τj − τ0 )} and β = ... .
σn β5
A nice feature of the local polynomial method is that it provides the estimated
implied volatility and its first two derivatives in one step. Indeed, one has from
(8.19) and (8.20):
∂g
d
= βˆ1 ,
∂M m0 ,τ0
2g
∂d = 2βˆ2 .
∂M 2 m0 ,τ0
One of the concerns regarding this estimation method is the dependence on the
bandwidth which governs how much weight the kernel function should place
on an observed point for the estimation at a target point. Moreover, as the
call options are not always symmetrically and equally distributed around the
ATM point, the choice of the bandwidth is a key issue, especially for estimation
at the border of the implied volatility surface. The bandwidth can be chosen
global or locally dependent on (M, τ ). There are methods providing ”optimal”
bandwidths which rely on plug-in rules or on data-based selectors.
In the case of the volatility surface, it is vital to determine one bandwidth for the
maturity and one for the moneyness directions. An algorithm called Empirical-
Bias Bandwidth Selector (EBBS) for finding local bandwidths is suggested by
Ruppert (1997) and Ruppert, Wand, Holst and Hössler (1997). The basic idea
of this method is to minimize the estimate of the local mean square error at
each target point, without relying on asymptotic result. The variance and the
bias term are in this algorithm estimated empirically.
8.4 An Example: Application to DAX data 183
Using the local polynomial estimations, the empirical SPD can be calculated
with the following quantlet:
lpspd = spdbl(m,sigma,sigma1,sigma2,s,r,tau)
estimates the semi-parametric SPD.
The arguments for this quantlet are the moneyness (m), V (sigma), V 0 (sigma1),
V 00 (sigma2), underlying price (s) corrected for future dividends, risk-free in-
terest rate (r), and the time to maturity (tau). The output consist of the local
polynomial SPD (lpspd.fstar), ∆ (lpspd.delta), and the Γ (lpspd.gamma)
of the call-options.
8.4.1 Data
The dataset was taken from the financial database MD*BASE located at CASE
(Center for Applied Statistics and Economics) at Humboldt-Universität zu
Berlin. Since MD*BASE is a proprietary database, only a limited dataset
is provided for demonstration purposes.
This database is filled with options and futures data provided by Eurex. Daily
series of 1, 3, 6 and 12 months DM-LIBOR rates taken from the T homson
F inancial Datastream serve as riskless interest rates. The DAX 30 futures
and options settlement data of January 1997 (21 trading days) were used in this
study. Daily settlement prices for each option contract are extracted along with
contract type, maturity and strike. For the futures, the daily settlement prices,
maturities and volumes are the relevant information. To compute the interest
rates corresponding to the option maturities a linear interpolation between the
available rates was used.
The DAX is a performance index which means that dividends are reinvested.
However, assuming no dividend yields when inverting the Black-Scholes for-
mula results in different volatilities for pairs of puts and calls contrary to the
184 8 Estimating State-Price Densities with Nonparametric Regression
Ct − Pt = St − Dt,τO − Ke−rt,τo τo
we obtain:
1 Day
2 Month
3 Year
4 Type of option (1 for calls, 0 for puts)
5 Time to maturity (in calendar days)
6 Strike prices
7 Option prices
8 Corrected spot price (implied dividends taken into account)
9 Risk-free interest rate
10 Implied volatility
11 Non-corrected spot price
The data can be read into XploRe by loading the quantlib finance and then
issuing the following command:
data=read("XFGData9701.dat")
Next extract all call options on January 3, 1997 with the paf command:
data=paf(data,(data[,1]==3)&&(data[,4]==1))
This figure shows the expected effect of time to maturity on the SPD, which
is a loss of kurtosis. The x-axis represents the terminal prices ST . The local
polynomial SPD displays a negative skew compared to a theoretical Black-
Scholes SPD. The major reason for the difference is the measure of implied
volatility. Using the local polynomial estimators one captures the effect of the
“volatility smile” and its effects on the higher moments such as skewness and
kurtosis. This result is similar to what Aı̈t-Sahalia and Lo (1998) and Rookley
(1997) found in their study.
Figure 8.2 and Figure 8.3 show Delta and Gamma for the full range of strikes
and for three different maturities. This method allows the user to get in one
step both greeks in one estimation for all strikes and maturities.
A natural question that may arise is how do the SPDs evolve over time. In
this section an illustrative example is used to show the dynamics of the SPD
over the month of January 1997. XFGSPDonemonth.xpl estimates and plots
the SPD for each trading day in January 1997. The x-axis is the moneyness,
y-axis is the trading day, and the z-axis is the SPD. Figure 8.4 shows the local
polynomial SPD for the three first weeks of January, 1997.
8.4 An Example: Application to DAX data 187
Rookley’s method serves to estimate the SPD, where V , V 0 and V 00 from (8.16)
are computed via local polynomials. The method is now applied to estimate
a SPD whose maturity is equal to the maturity of a series of options. In this
case, the nonparametric regression is a univariate one.
188 8 Estimating State-Price Densities with Nonparametric Regression
5.97
24
23
22 4.48
21
20
17
16
3.00
15
14
13 1.51
10
9
8
7
6 1.20
1.00 1.10
0.90
shown that
E|fˆn∗ − f ∗ |2 = O n−4/9 ,
because
E|V̂n − V |2 = O n−8/9 ,
0 0
E|V̂n − V |2 = O n−4/9 ,
00 00
E|V̂n − V |2 = O n−4/9 .
8.4 An Example: Application to DAX data 189
This result can be obtained using some theorems related to local polynomial
estimation, for example in Fan and Gijbels (1996), if some boundary conditions
are satisfied.
An asymptotic approximation of fˆn∗ is complicated by the fact that fˆn∗ is a
non linear function of V , V 0 and V 00 . Analytical confidence intervals can be
obtained using delta methods proposed by Aı̈t-Sahalia (1996). However, an
alternative method is to use the bootstrap to construct confidence bands. The
idea for estimating the bootstrap bands is to approximate the distribution of
1. Collect daily option prices from MD*BASE, only choose those options
with the same expiration date, for example, those with time to maturity
49 days on Jan 3, 1997.
2. Use the local polynomial estimation method to obtain the empirical SPD.
Notice that when τ is fixed the forward price F is also fixed. So that the
implied volatility function σ(K/F ) can be considered as a fixed design
situation, where K is the strike price.
3. Obtain the confidence band using the wild bootstrap method. The wild
bootstrap method entails:
• Suppose that the regression model for the implied volatility function
σ(K/F ) is:
Ki
Yi = σ + εi , i = 1, · · · , n.
F
• Choose a bandwidth g which is larger than the optimal h in or-
der to have oversmoothing. Estimate the implied volatility function
σ(K/F ) nonparametrically and then calculate the residual errors:
Ki
ε̃i = Yi − σ̂h .
F
• Replicate B times the series of the {ε̃i } with wild bootstrap ob-
taining {ε∗,j
i } for j = 1, · · · , B, Härdle (1990), and build B new
190 8 Estimating State-Price Densities with Nonparametric Regression
bootstrapped samples:
Ki
Yi∗,j = σ̂g + ε∗,j
i .
F
Two SPDs (Jan 3 and Jan 31, 1997) whose times to maturity are 49 days
were estimated and are plotted in Figure (8.5). The bootstrap confidence
band corresponding to the first SPD (Jan 3) is also visible on the chart. In
Figure (8.6), the SPDs are displayed on a moneyness metric. It seems that the
differences between the SPDs can be eliminated by switching to the moneyness
metric. Indeed, as can be extracted from Figure 8.6, both SPDs lie within
the 95 percent confidence bands. The number of bootstrap samples is set to
B = 100. The local polynomial estimation was done on standardized data, h
is then set to 0.75 for both plots and g is equal to 1.1 times h. Notice that
greater values of g are tried and the conclusion is that the confidence bands
are stable to an increase of g.
In Chapter 7, the Implied Binomial Trees (IBT) are discussed. This method is a
close approach to estimate the SPD. It also recovers the SPD nonparametrically
from market option prices and uses the Black Scholes formula to establish the
relationship between the option prices and implied volatilities as in Rookely’s
method. In Chapter 7, the Black Scholes formula is only used for Barle and
Cakici IBT procedure, but the CRR binomial tree method used by Derman
and Kani (1994) has no large difference with it in nature. However, IBT and
nonparametric regression methods have some differences caused by different
modelling strategies.
The IBT method might be less data-intensive than the nonparametric regres-
sion method. By construction, it only requires one cross section of prices. In the
8.4 An Example: Application to DAX data 191
earlier application with DAX data, option prices are used with different times
to maturity for one day to estimate the implied volatility surface first in order
192 8 Estimating State-Price Densities with Nonparametric Regression
0.2
0.15
density*E-2
0.1
0.05
0
to construct the tree using the relation formula between option prices and risk-
neutral probabilities. The precision of the SPD estimation using IBT is heavily
affected by the quality of the implied volatility surface and the choice of the
levels of the implied tree. Furthermore, from the IBT method only risk-neutral
probabilities are obtained. They can be considered as a discrete estimation of
the SPD. However, the IBT method is not only useful for estimating SPD, but
also for giving a discrete approximation of the underlying process.
The greatest difference between IBTs and nonparametric regression is the re-
quirement of smoothness. The precision of Rookley’s SPD estimation is highly
dependent on the selected bandwidth. Even if very limited option prices are
given, a part of the SPD estimation still can be obtained using nonparametric
regression, while the IBT construction has to be given up if no further struc-
ture is invoked on the volatility surface. Rookley’s method has on first sight
no obvious difference with Aı̈t-Sahalia’s method theoretically, Aı̈t-Sahalia and
Lo (1998). But investigating the convergence rate of the SPD estimation using
Aı̈t-Sahalia’s method allows one to conduct statistical inference such as test of
the stability of the SPD and tests of risk neutrality.
8.4 An Example: Application to DAX data 193
Bibliography
Aı̈t-Sahalia, Y. (1996). The Delta method for Nonparametric Kernel Function-
als, mimeo.
Aı̈t-Sahalia, Y. and Lo, A. W. (1998). Nonparametric estimation of state-price
densities implicit in financial asset prices, Journal of Finance 53: 499–547.
194 8 Estimating State-Price Densities with Nonparametric Regression
20
Pricing error in %
10
0
-20 -10
5 10 15 20 25 30
Date
10
0
-10
5 10 15 20 25 30
Date
Figure 8.8. The upper graph display the relative pricing errors for the
butterfly spread centered on the nearest strike on the left side of the
ATM point. The second graph corresponds to the butterfly spread
centered on the nearest strike on the right side of the ATM point. The
black lines represent the IBT’s pricing errors and the blue the Rookley’s
errors.
Arrow, K. (1964). The role of securities in the optimal allocation of risk bearing,
Review of Economic Studies 31: 91–96.
Bahra, B. (1997). Implied risk-neutral probability density functions from option
8.4 An Example: Application to DAX data 195
9.1 Introduction
In recent years a number of methods have been developed to infer implied
state price densities (SPD) from cross sectional option prices, Chapter 7 and
8. Instead of comparing this density to a historical density extracted from the
observed time series of the underlying asset prices, i.e. a risk neutral density to
an actual density, Ait–Sahalia, Wang and Yared (2000) propose to compare two
risk neutral densities, one obtained from cross sectional S&P 500 option data
and the other from the S&P 500 index time series. Furthermore, they propose
trading strategies designed to exploit differences in skewness and kurtosis of
both densities. The goal of this article is to apply the procedure to the german
DAX index. While the option implied SPD is estimated by means of the Barle
and Cakici, Barle and Cakici (1998), implied binomial tree version, the time
series density is inferred from the time series of the DAX index by applying a
method used by Ait–Sahalia, Wang and Yared (2000). Based on the comparison
of both SPDs the performance of skewness and kurtosis trades is investigated.
We use options data included in MD*BASE. This is a database located at
CASE (Center for Applied Statistics and Economics) of Humboldt–Universität
zu Berlin. The time period is limited to data of the period between 01/01/97
and 12/31/99 for which MD*BASE contains daily closing prices of the DAX
index, EUREX DAX option settlement prices and annual interest rates which
are adjusted to the time to maturity of the above mentioned EUREX DAX
options.
While Section 9.2 applies the Barle and Cakici implied binomial tree algorithm
198 9 Trading on Deviations of Implied and Historical Densities
which estimates the option implied SPD using a two week cross section of DAX
index options, Section 9.3 explains and applies the method to estimate DAX
time series SPD from 3 months of historical index prices. Following, in Section
9.4 we compare the conditional skewness and kurtosis of both densities. Section
9.5 and 9.6 complete the chapter with the investigation of 4 trading strategies
and Section 9.7 completes with some critical remarks.
Using the DAX index data from MD*BASE, we estimate the 3 month option
implied IBT SPD f ∗ by means of the XploRe quantlets IBTbc and volsurf and
a two week cross section of DAX index option prices for 30 periods beginning
in April 1997 and ending in September 1999. We measure time to maturity
(TTM) in days and annualize it using the factor 360, giving the annualized
time to maturity τ = TTM/360. For each period, we assume a flat yield curve.
We extract from MD*BASE the maturity consistent interest rate.
We describe the procedure in more detail for the first period. First of all, we
estimate the implied volatility surface given the two week cross section of DAX
option data and utilizing the XploRe quantlet volsurf which computes the 3
dimensional implied volatility surface (implied volatility over time to maturity
and moneyness) using a kernel smoothing procedure. Friday, April 18, 1997
is the 3rd Friday of April 1997. On Monday, April 21, 1997, we estimate the
volatility surface, using two weeks of option data from Monday, April 7, 1997,
to Friday, April 18, 1997. Following, we start the IBT computation using the
9.2 Estimation of the Option Implied SPD 199
DAX price of this Monday, April 21, 1997. The volatility surface is estimated
for the moneyness interval [0.8, 1.2] and the time to maturity interval [0.0, 1.0].
Following, the XploRe quantlet IBTbc takes the volatility surface as input and
computes the IBT using Barle and Cakici’s method. Note that the observed
smile enters the IBT via the analytical Black–Scholes pricing formula for a call
C(Fn,i , tn+1 ) and for a put P (Fn,i , tn+1 ) which are functions of St1 = s1,1 ,
K = Fn,i , r, tn+1 and σimpl (Fn,i , tn+1 ). We note, it may happen that at the
edge of the tree option prices, with associated strike prices Fn,i and node prices
sn+1,i+1 , have to be computed for which the moneyness ratio sn+1,i+1 /Fn,i is
outside the intverall [0.8, 1.2] on which the volatility surface has been estimated.
In these cases, we use the volatility at the edge of the surface. Note, as well,
that the mean of the IBT SPD is equal to the futures price by construction of
the IBT.
Finally, we transform the SPD over sN +1,i into a SPD over log–returns uN +1,i =
ln(sN +1,i /s1,1 ) as follows:
sN +1,i x
P(sN +1,i = x) = P ln s1,1 = ln s1,1 = P uN +1,i = u
where u = ln(x/s1,1 ). That is, sN +1,i has the same probability as uN +1,i . See
Figure 9.1 for the SPD computed with parameters N = 10 time steps and
interest rate r = 3.23.
A crucial aspect using binomial trees is the choice of the number of time steps
N in which the time interval [t, T ] is divided. In general one can state, the more
time steps are used the better is the discrete approximation of the continuous
diffusion process and of the SPD. Unfortunately, the bigger N , the more node
prices sn,i possibly have to be overridden in the IBT framework. Thereby we are
effectively losing the information about the smile at the corresponding nodes.
Therefore, we computed IBT’s for different numbers of time steps. We found
no hint for convergence of the variables of interest, skewness and kurtosis. Since
both variables seemed to fluctuate around a mean, we compute IBT’s with time
steps 10, 20, . . . , 100 and consider the average of these ten values for skewness
and kurtosis as the option implied SPD skewness and kurtosis.
Applying this procedure for all 30 periods, beginning in April 1997 and ending
in September 1999, we calculate the time series of skewness and kurtosis of the
3 month implied SPD f ∗ shown in Figures 9.3 and 9.4. We see that the implied
SPD is clearly negatively skewed for all periods but one. In September 1999 it
is slightly positively skewed. The pattern is similar for the kurtosis of f ∗ which
is leptokurtic in all but one period. In October 1998 the density is platykurtic.
200 9 Trading on Deviations of Implied and Historical Densities
IBT SPD
0.2
0.15
SPD
0.1
0.05
0
Let gt (St , ST , τ, rt,τ , δt,τ ) denote the conditional density of ST given St gen-
erated by the dynamics defined in equation (9.1) and gt∗ (St , ST , τ, rt,τ , δt,τ )
denote the conditional density generated by equation (9.2) then f ∗ can only be
compared to the risk neutral density g ∗ and not to g.
A crucial feature of this method is that the diffusion functions are identical
under both the actual and the risk neutral dynamics (which follows from Gir-
sanov’s theorem). Therefore, it is not necessary to observe the risk neutral path
of the DAX index {St∗ }. The function σ(•) is estimated using N ∗ observed
index values {St } and applying Florens–Zmirou’s (1993) (FZ) nonparametric
version of the minimum contrast estimators:
PN ∗ −1
i=1 KF Z ( ShiF−S
Z
)N ∗ {S(i+1)/N ∗ − Si/N ∗ }2
σ̂F Z (S) = PN ∗ Si −S
, (9.3)
i=1 KF Z ( hF Z )
where um is the log–return at the end of the mth path and KM C (•) is a kernel
function and hM C is a bandwidth parameter. The equation:
R log(S/St )
P(ST ≤ S) = P(u ≤ log(S/St )) = −∞
p∗t (u)du
p∗
t (log(S/St ))
gt∗ (S) = ∂
∂S P(ST ≤ S) = S .
√
This method results in a nonparametric estimator ĝ ∗ which is N ∗ –consistent
as M → ∞ even though σ̂F Z converges at a slower rate (Ait–Sahalia, Wang
and Yared (2000)).
In the absence of arbitrage, the futures price is the expected future value of the
spot price under the risk neutral measure. Therefore the time series distribu-
tion is translated such that its mean matches the implied future price. Then the
bandwidth hM C is chosen to best match the variance of the IBT implied distri-
bution. In order to avoid over– or undersmoothing of g ∗ , hM C is constrained
to be within 0.5 to 5 times the optimal bandwidth implied by Silverman’s rule
of thumb. This procedure allows us to focus the density comparison on the
skewness and kurtosis of the two densities.
Using the DAX index data from MD*BASE we estimate the diffusion function
σ 2 (•) from equation (9.2) by means of past index prices and simulate (forward)
M = 10, 000 paths to obtain the time series density, g ∗ .
9.3 Estimation of the Historical SPD 203
To be more precise, we explain the methodology for the first period in more
detail. First, note that Friday, April 18, 1997, is the 3rd Friday of April 1997.
Thus, on Monday, April 21, 1997, we use 3 months of DAX index prices from
Monday, January 20, 1997, to Friday, April 18, 1997, to estimate σ 2 . Following,
on the same Monday, we start the 3 months ‘forward’ Monte Carlo simulation.
The bandwidth hF Z is determined by Cross Validation applying the XploRe
quantlet regxbwcrit which determines the optimal bandwidth from a range
of bandwidths by using the resubstitution estimator with the penalty function
’Generalized Cross Validation’.
Knowing the diffusion function it is now possible to Monte Carlo simulate the
index evolution. The Milstein scheme applied to equation (9.2) is given by:
Si/N ∗∗ = S(i−1)/N ∗∗ + rS(i−1)/N ∗∗ ∆t + σ(S(i−1)/N ∗∗ )∆Wi/N ∗∗ +
1 ∂σ
σ(S(i−1)/N ∗∗ ) (S(i−1)/N ∗∗ ) (∆W(i−1)/N ∗∗ )2 − ∆t ,
2 ∂S
where we set the drift equal to r which is extracted from MD*BASE and
corresponds to the time to maturity used in the simulation and N ∗∗ is the
204 9 Trading on Deviations of Implied and Historical Densities
where x = St eu , we have
and
Therefore, we have as well (see Härdle and Simar (2002) for density transfor-
mation techniques)
g ∗ (St eu )∆(St eu )
g ∗ ’(u) ≈ ∆u ≈ g ∗ (St eu )St eu .
To simplify notations, we will denote both densities g ∗ . Figure 9.2 displays the
resulting time series density over log–returns on Friday, April 18, 1997. Pro-
ceeding in the same way for all 30 periods beginning in April 1997 and ending
in September 1999, we obtain the time series of the 3 month ‘forward’ skewness
and kurtosis values of g ∗ shown in Figures 9.3 and 9.4. The figures reveal that
the time series distribution is systematically slightly negatively skewed. Skew-
ness is very close to zero. As far as kurtosis is concerned we can extract from
Figure 9.4 that it is systematically smaller than but nevertheless very close to
3. Additionally, all time series density plots looked like the one shown in Figure
9.2.
-0.5
-1.0
-1.5
Skewness
6.0
4.0
2.0
0.0 Time
07/18/97 04/17/98 03/19/99 12/17/99
The kurtosis time series reveals a similar pattern as the skewness time series.
The IBT SPD has except for one period systematically more kurtosis than the
time series SPD. Again this feature is in line with what Ait–Sahalia, Wang
and Yared (2000) found for the S&P 500. The 3 month IBT implied SPD for
Friday, October 16, 1998 has a slightly smaller kurtosis than the time series
SPD. That is, investors assigned less probability mass to high and low index
prices. Note that the implied SPD was estimated in July 1998 after a period
of 8 months of booming asset prices (see Figure 9.11). It is comprehensible
in such an environment that high index prices seemed less realistic to appear.
Since the appearance of low index prices seemed to be unrealistic as well, agents
obviously expected the DAX move rather sideways.
where FOTM, NOTM, ATM stand for far out–of–the–money, near out–of–the–
money and at–the–money respectively.
A skewness trading strategy is supposed to exploit differences in skewness of
two distributions by buying options in the range of strike prices where they
208 9 Trading on Deviations of Implied and Historical Densities
are underpriced and selling options in the range of strike prices where they
are overpriced. More specifically, if the implied SPD f ∗ is less skewed (for
example more negatively skewed) than the time series SPD g ∗ , i.e. skew(f ∗ ) <
skew(g ∗ ), we sell the whole range of strikes of OTM puts and buy the whole
range of strikes of OTM calls (S1 trade). Conversely, if the implied SPD is
more skewed, i.e. skew(f ∗ ) > skew(g ∗ ), we initiate the S2 trade by buying the
whole range of strikes of OTM puts and selling the whole range of strikes of
OTM calls. In both cases we keep the options until expiration.
Skewness s is a measure of asymmetry of a probability distribution. While for a
distribution symmetric around its mean s = 0, for an asymmetric distribution
s > 0 indicates more weight to the left of the mean. Recalling from option
pricing theory the pricing equation for a European call option, Franke, Härdle
and Hafner (2001):
Z ∞
C(St , K, r, T − t) = e−r(T −t) max(ST − K, 0)f ∗ (ST )dST , (9.6)
0
where f ∗ is the implied SPD, we see that when the two SPD’s are such that
skew(f ∗ ) < skew(g ∗ ), agents apparently assign a lower probability to high
outcomes of the underlying than would be justified by the time series density,
see Figure 7.13. Since for call options only the right ‘tail’ of the support
determines the theoretical price, the latter is smaller than the price implied by
equation (9.6) using the time series density. That is, we buy underpriced calls.
The same reasoning applies to European put options. Looking at the pricing
equation for such an option:
Z ∞
P (St , K, r, T − t) = e−r(T −t) max(K − ST , 0)f ∗ (ST )dST , (9.7)
0
we conclude that prices implied by this pricing equation using f ∗ are higher
than the prices using the time series density. That is, we sell puts.
Since we hold all options until expiration and due to the fact that options for
all strikes are not always available in markets we are going to investigate the
payoff profile at expiration of this strategy for two compositions of the portfolio.
To get an idea about the exposure at maturity let us begin with a simplified
portfolio consisting of one short position in a put option with moneyness of
0.95 and one long position in a call option with moneyness of 1.05. To further
simplify, we assume that the future price F is equal to 100 EUR. Thus, the
portfolio has a payoff which is increasing in ST , the price of the underlying at
maturity. For ST < 95 EUR the payoff is negative and for ST > 105 EUR it is
positive.
9.5 Skewness Trades 209
Figure 9.5 shows the payoff of a portfolio of 10 short puts with strikes ranging
from 86 EUR to 95 EUR and of 10 long calls striking at 105 EUR to 114 EUR,
the future price is still assumed to be 100 EUR. The payoff is still increasing
in ST but it is concave in the left tail and convex in the right tail. This is due
to the fact that our portfolio contains, for example, at ST = 106 EUR two call
options which are in the money instead of only one compared to the portfolio
considered above. These options generate a payoff which is twice as much. At
ST = 107 EUR the payoff is influenced by three ITM calls procuring a payoff
which is three times higher as in the situation before etc. In a similar way we
can explain the slower increase in the left tail. Just to sum up, we can state
that this trading rule has a favorable payoff profile in a bull market where the
underlying is increasing. But in bear markets it possibly generates negative
cash flows. Buying (selling) two or more calls (puts) at the same strike would
change the payoff profile in a similar way leading to a faster increase (slower
decrease) with every call (put) bought (sold).
The S2 strategy payoff behaves in the opposite way. The same reasoning can
be applied to explain its payoff profile. In contradiction to the S1 trade the S2
trade is favorable in a falling market.
210 9 Trading on Deviations of Implied and Historical Densities
S1 OTM–S1
Moneyness Moneyness
9.5.1 Performance
Given the skewness values for the implied SPD and the time series SPD we
now have a look on the performance of the skewness trades. Performance is
measured in net EUR cash flows which is the sum of the cash flows generated
at initiation in t = 0 and at expiration in t = T . We ignore any interest rate
between these two dates. Using EUREX settlement prices of 3 month DAX put
and calls we initiated the S1 strategy at the Monday immediately following the
3rd Friday of each month, beginning in April 1997 and ending in September
1999. January, February, March 1997 drop out due to the time series density
estimation for the 3rd Friday of April 1997. October, November and December
1999 drop out since we look 3 months forward. The cash flow at initiation stems
from the inflow generated by the written options and the outflow generated by
the bought options and hypothetical 5% transaction costs on prices of bought
and sold options. Since all options are kept in the portfolio until maturity (time
to expiration is approximately 3 months, more precisely τ = TTM/360) the
cash flow in t = T is composed of the sum of the inner values of the options in
the portfolio.
Figure 9.6 shows the EUR cash flows at initiation, at expiration and the re-
sulting net cash flow for each portfolio. The sum of all cash flows, the total net
cash flow, is strongly positive (9855.50 EUR). Note that the net cash flow (blue
bar) is always positive except for the portfolios initiated in June 1998 and in
September 1998 where we incur heavy losses compared to the gains in the other
periods. In other words, this strategy would have procured 28 times moder-
ate gains and two times large negative cash flows. As Figure 9.5 suggests this
strategy is exposed to a directional risk, a feature that appears in December
1997 and June 1998 where large payoffs at expiration (positive and negative)
occur. Indeed, the period of November and December 1997 was a turning point
of the DAX and the beginning of an 8 month bull market, explaining the large
payoff in March 1998 of the portfolio initiated in December 1997. The same
9.5 Skewness Trades 211
Performance S1
CashFlow in EUR
5000
2500
0 Time
-2500
-5000
07/97 10/97 01/98 04/98 07/98 10/98 01/99 04/99 07/99 10/99
arguing explains the large negative payoff of the portfolio set up in June 1998
expiring in September 1998 (refer to Figure 9.11). Another point to note is
that there is a zero cash flow at expiration in 24 periods. Periods with a zero
cash flow at initiation and at expiration are due to the fact that there was not
set up any portfolio (there was no OTM option in the database).
Since there is only one period (June 1999), when the implied SPD is more
skewed than the time series SPD a comparison of the S1 trade with knowledge
of the latter SPD’s and without this knowledge is not useful. A comparison
of the skewness measures would have filtered out exactly one positive net cash
flow, more precisely the cash flow generated by a portfolio set up in June
1999. But to what extend this may be significant is uncertain. For the same
reason the S2 trade has no great informational content. Applied to real data
it would have procured a negative total net cash flow. Actually, only in June
1999 a portfolio would have been set up. While the S1 trade performance was
independent of the knowledge of the implied and the time series SPD’s the
S2 trade performance changed significantly as it was applied in each period
212 9 Trading on Deviations of Implied and Historical Densities
(without knowing both SPD’s). The cash flow profile seemed to be the inverse
of Figure 9.6 indicating that should there be an options mispricing it would
probably be in the sense that the implied SPD is more negatively skewed than
the time series SPD.
Payoff of K1 Trade
0
-5
-10
a payoff profile (Figure 9.8) which is quite similar to the one from Figure 9.7.
In fact, the payoff function looks like the ‘smooth’ version of Figure 9.7.
Changing the number of long puts and calls in the NOTM regions can produce
a positive payoff. Setting up the portfolio given in Table 9.3, NOTM–K1,
results in a payoff function shown in Figure 9.9. It is quite intuitive that the
more long positions the portfolio contains the more positive the payoff will be.
Conversely, if we added to that portfolio FOTM short puts and calls the payoff
would decrease in the FOTM regions.
As a conclusion we can state that the payoff function can have quite different
shapes depending heavily on the specific options in the portfolio. If it is possible
to implement the K1 trading rule as proposed the payoff is negative. But it may
214 9 Trading on Deviations of Implied and Historical Densities
20
15
10
5
0
-5
happen that the payoff function is positive in case that more NOTM options
(long positions) are available than FOTM or ATM (short positions) options.
K1 FOTM–NOTM–ATM–K1 NOTM–K1
Moneyness Moneyness Moneyness
9.6.1 Performance
Performance K1
CashFlow in EUR
5000
2500
0 Time
-2500
-5000
07/97 10/97 01/98 04/98 07/98 10/98 01/99 04/99 07/99 10/99
the payoff of the portfolios set up in the months of April 1997, May 1997 and in
the months from November 1997 to June 1998 is relatively more negative than
for the portfolios of June 1997 to October 1997 and November 1998 to June
1999. The reason is that the DAX is moving up or down for the former months
and stays within an almost horizontal range of quotes for the latter months
(see the payoff profile depicted in Figure 9.8). In July 1998 no portfolio was
set up since kurt(f ∗ ) < kurt(g ∗ ).
What would have happened if we had implemented the K1 trade without know-
ing both SPD’s? Again, the answer to this question can only be indicated due
to the rare occurences of periods in which kurt(f ∗ ) < kurt(g ∗ ). Contrarily to
the S1 trade, the density comparison would have filtered out a strongly nega-
tive net cash flow that would have been generated by a portfolio set up in July
1998. But the significance of this feature is again uncertain.
About the K2 trade can only be said that without a SPD comparison it would
have procured heavy losses. The K2 trade applied as proposed can not be
216 9 Trading on Deviations of Implied and Historical Densities
evaluated completely since there was only one period in which kurt(f ∗ ) <
kurt(g ∗ ).
DAX 1997-1999
DAX
7000
6500
6000
5500
5000
4500
4000
3500
3000 Time
1/974/97 7/97 10/97 1/98 4/98 7/98 10/98 1/99 4/99 7/99 10/99
developments, it is not clear how these trades will perform being exposed to
‘peso risks’. Given that profits stem from highly positive cash flows at portfolio
initiation, i.e. profits result from possibly mispriced options, who knows how
the pricing behavior of agents changes, how do agents assign probabilities to
future values of the underlying?
We measured performance in net EUR cash flows. This approach does not
take risk into account as, for example the Sharpe ratio which is a measure of
the risk adjusted return of an investment. But to compute a return an initial
investment has to be done. However, in the simulation above, some portfolios
generated positive payoffs both at initiation and at maturity. It is a challenge
for future research to find a way how to adjust for risk in such situations.
The SPD comparison yielded the same result for each period but one. The
implied SPD f ∗ was in all but one period more negatively skewed than the time
series SPD g ∗ . While g ∗ was in all periods platykurtic, f ∗ was in all but one
period leptokurtic. In this period the kurtosis of g ∗ was slightly greater than
that of f ∗ . Therefore, there was no alternating use of type 1 and type 2 trades.
But in more turbulent market environments such an approach might prove
useful. The procedure could be extended and fine tuned by applying a density
distance measure as in Ait–Sahalia, Wang and Yared (2000) to give a signal
when to set up a portfolio either of type 1 of type 2. Furthermore, it is tempting
to modify the time series density estimation method such that the monte carlo
paths be simulated drawing random numbers not from a normal distribution
but from the distribution of the residuals resulting from the nonparametric
estimation of σF Z (•), Härdle and Yatchew (2001).
Bibliography
Ait–Sahalia, Y., Wang, Y. and Yared, F. (2001). Do Option Markets correctly
Price the Probabilities of Movement of the Underlying Asset?, Journal of
Econometrics 102: 67–110.
Barle, S. and Cakici, N., (1998). How to Grow a Smiling Tree, The Journal of
Financial Engineering 7: 127–146.
Black, F. and Scholes, M., (1998). The Pricing of Options and Corporate
Liabilities, Journal of Political Economy 81: 637–659.
218 9 Trading on Deviations of Implied and Historical Densities
Econometrics
10 Multivariate Volatility Models
Matthias R. Fengler and Helmut Herwartz
10.1 Introduction
Volatility clustering, i.e. positive correlation of price variations observed on
speculative markets, motivated the introduction of autoregressive conditionally
heteroskedastic (ARCH) processes by Engle (1982) and its popular generaliza-
tions by Bollerslev (1986) (Generalized ARCH, GARCH) and Nelson (1991)
(exponential GARCH, EGARCH). Being univariate in nature, however, such
models neglect a further stylized fact of empirical price variations, namely con-
temporaneous cross correlation e.g. over a set of assets, stock market indices,
or exchange rates.
Cross section relationships are often implied by economic theory. Interest rate
parities, for instance, provide a close relation between domestic and foreign
bond rates. Assuming absence of arbitrage, the so-called triangular equation
formalizes the equality of an exchange rate between two currencies on the one
222 10 Multivariate Volatility Models
hand and an implied rate constructed via exchange rates measured towards a
third currency. Furthermore, stock prices of firms acting on the same market
often show similar patterns in the sequel of news that are important for the
entire market (Hafner and Herwartz, 1998). Similarly, analyzing global volatil-
ity transmission Engle, Ito and Lin (1990) and Hamao, Masulis and Ng (1990)
found evidence in favor of volatility spillovers between the world’s major trad-
ing areas occurring in the sequel of floor trading hours. From this point of view,
when modeling time varying volatilities, a multivariate model appears to be a
natural framework to take cross sectional information into account. Moreover,
the covariance between financial assets is of essential importance in finance.
Effectively, many problems in financial practice like portfolio optimization,
hedging strategies, or Value-at-Risk evaluation require multivariate volatility
measures (Bollerslev et al., 1988; Cecchetti, Cumby and Figlewski, 1988).
In (10.2) the matrices Ãi and G̃i each contain {N (N + 1)/2}2 elements. Deter-
ministic covariance components are collected in c, a column vector of dimension
N (N + 1)/2. We consider in the following the case p = q = 1 since in applied
work the GARCH(1,1) model has turned out to be particularly useful to de-
scribe a wide variety of financial market data (Bollerslev, Engle and Nelson,
1994).
On the one hand the vec–model in (10.2) allows for a very general dynamic
structure of the multivariate volatility process. On the other hand this specifi-
cation suffers from high dimensionality of the relevant parameter space, which
makes it almost intractable for empirical work. In addition, it might be cumber-
some in applied work to restrict the admissible parameter space such that the
implied matrices Σt , t = 1, . . . , T , are positive definite. These issues motivated
a considerable variety of competing multivariate GARCH specifications.
Prominent proposals reducing the dimensionality of (10.2) are the constant
correlation model (Bollerslev, 1990) and the diagonal model (Bollerslev et al.,
1988). Specifying diagonal elements of Σt both of these approaches assume the
absence of cross equation dynamics, i.e. the only dynamics are
for the constant correlation model (|ρij | < 1), whereas the diagonal model
requires more complicated restrictions to provide positive definite covariance
matrices.
The so-called BEKK-model (named after Baba, Engle, Kraft and Kroner, 1990)
provides a richer dynamic structure compared to both restricted processes men-
tioned before. Defining N × N matrices Aik and Gik and an upper triangular
matrix C0 the BEKK–model reads in a general version as follows:
X q
K X p
K X
X
Σt = C0> C0 + A> >
ik εt−i εt−i Aik + G>
ik Σt−i Gik . (10.6)
k=1 i=1 k=1 i=1
+ g11 g22 σ11,t−1 + (g21 g12 + g11 g22 )σ12,t−1 + g21 g22 σ22,t−1 ,
σ22,t = c22 + a212 ε21,t−1 + 2a12 a22 ε1,t−1 ε2,t−1 + a222 ε22,t−1
2 2
+ g12 σ11,t−1 + 2g12 g22 σ21,t−1 + g22 σ22,t−1 .
Compared to the diagonal model the BEKK–specification economizes on the
number of parameters by restricting the vec–model within and across equa-
tions. Since Aik and Gik are not required to be diagonal, the BEKK-model
is convenient to allow for cross dynamics of conditional covariances. The pa-
rameter K governs to which extent the general representation in (10.2) can be
approximated by a BEKK-type model. In the following we assume K = 1.
Note that in the bivariate case with K = p = q = 1 the BEKK-model contains
11 parameters. If K = 1 the matrices A11 and −A11 , imply the same condi-
tional covariances. Thus, for uniqueness of the BEKK-representation a11 > 0
and g11 > 0 is assumed. Note that the right hand side of (10.6) involves only
quadratic terms and, hence, given convenient initial conditions, Σt is positive
definite under the weak (sufficient) condition that at least one of the matrices
C0 or Gik has full rank (Engle and Kroner, 1995).
log-likelihood function.
With f denoting the multivariate normal density, the contribution of a single
observation, lt , to the log-likelihood of a sample is given as:
We analyze daily quotes of two European currencies measured against the USD,
namely the DEM and the GBP. The sample period is December 31, 1979 to
April 1, 1994, covering T = 3720 observations. Note that a subperiod of our
sample has already been investigated by Bollerslev and Engle (1993) discussing
common features of volatility processes.
The data is provided in fx. The first column contains DEM/USD and
the second GBP/USD. In XploRe a preliminary statistical analysis is easily
done by the summarize command. Before inspecting the summary statis-
tics, we load the data, Rt , and take log differences, εt = ln(Rt ) − ln(Rt−1 ).
XFGmvol01.xpl produces the following table:
226 10 Multivariate Volatility Models
XFGmvol01.xpl
Evidently, the empirical means of both processes are very close to zero (-4.72e-
06 and 1.10e-04, respectively). Also minimum, maximum and standard errors
are of similar size. First differences of the respective log exchange rates are
shown in Figure 10.1. As is apparent from Figure 10.1, variations of exchange
rate returns exhibit an autoregressive pattern: Large returns in foreign ex-
change markets are followed by large returns of either sign. This is most obvious
in periods of excessive returns. Note that these volatility clusters tend to coin-
cide in both series. It is precisely this observation that justifies a multivariate
GARCH specification.
The quantlet bigarch provides a fast algorithm to estimate the BEKK repre-
sentation of a bivariate GARCH(1,1) model. QML-estimation is implemented
by means of the BHHH-algorithm which minimizes the negative Gaussian log-
likelihood function. The algorithm employs analytical first order derivatives of
the log-likelihood function (Lütkepohl, 1996) with respect to the 11-dimensional
vector of parameters containing the elements of C0 , A11 and G11 as given in
(10.6).
10.2 An empirical illustration 227
DEM/USD
0.04
0.02
Returns
0
-0.02
-0.04
GBP/USD
0.04
0.02
Returns
0
-0.02
-0.04
where as input parameters we have initial values theta for the iteration algo-
rithm and the data set, e.g. financial returns, stored in et. The estimation
output is the vector coeff containing the stacked elements of the parameter
matrices C0 , A11 and G11 in (10.6) after numerical optimization of the Gaussian
log-likelihood function. Being an iterative procedure the algorithm requires to
determine suitable initial parameters theta. For the diagonal elements of the
matrices A11 and G11 values around 0.3 and 0.9 appear reasonable, since in uni-
variate GARCH(1,1) models parameter estimates for a1 and g1 in (10.3) often
take values around 0.32 = 0.09 and 0.81 = 0.92 . There is no clear guidance how
to determine initial values for off diagonal elements of A11 or G11 . Therefore
it might be reasonable to try alternative initializations of these parameters.
Given an initialization of A11 and G11 the starting values for the elements in
C0 are immediately determined by the algorithm assuming the unconditional
covariance of εt to exist, Engle and Kroner (1995).
Given our example under investigation the bivariate GARCH estimation yields
as output:
Contents of coeff
[ 1,] 0.0011516
[ 2,] 0.00031009
[ 3,] 0.00075685
[ 4,] 0.28185
[ 5,] -0.057194
[ 6,] -0.050449
[ 7,] 0.29344
[ 8,] 0.93878
[ 9,] 0.025117
[10,] 0.027503
[11,] 0.9391
Contents of likest
[1,] -28599
XFGmvol02.xpl
10.2 An empirical illustration 229
The last number is the obtained minimum of the negative log-likelihood func-
tion. The vector coeff given first contains as first three elements the parame-
ters of the upper triangular matrix C0 , the following four belong to the ARCH
(A11 ) and the last four to the GARCH parameters (G11 ), i.e. for our model
−3 1.15 .31
C0 = 10 ,
0 .76
.282 −.050 .939 .028
A11 = , G11 = . (10.8)
−.057 .293 .025 .939
DEM/USD
15
Sigma11
10
5
Covariance
5 10 15
Sigma12
0
GBP/USD
20 30
Sigma22
10
0
DEM/USD - Simulation
10 15 20
Sigma11
5
Covariance
10 15
Sigma12
5
0
GBP/USD - Simulation
10 20 30 40
Sigma22
0
0 500 1000
1500 2000 2500 3000
Time
Figure 10.3. Simulated variance and covariance processes, both bivari-
ate (blue) and univariate case (green), 105 Σ̂t .
XFGmvol03.xpl
232 10 Multivariate Volatility Models
such as first and second order moments or even quantiles. Due to the mul-
tivariate nature of the time series under consideration it is a nontrivial issue
to rank alternative density forecasts in terms of these statistics. Therefore,
we regard a particular volatility model to be superior to another if it provides
a higher simulated density estimate of the actual bivariate future exchange
rate. This is accomplished by evaluating both densities at the actually realized
exchange rate obtained from a bivariate kernel estimation. Since the latter
comparison might suffer from different unconditional variances under univari-
ate and multivariate volatility, the two simulated densities were rescaled to
have identical variance. Performing the latter forecasting exercises iteratively
over 3714 time points we can test if the bivariate volatility model outperforms
the univariate one.
To formalize the latter ideas we define a success ratio SRJ as
1 X ˆ
SRJ = 1{fbiv (Rt+5 ) > fˆuni (Rt+5 )}, (10.9)
|J|
t∈J
This is highly significant. In Table 10.1 we show that the overall superiority of
the bivariate volatility approach is confirmed when considering subsamples of
two-years length. A-priori one may expect the bivariate model to outperform
the univariate one the larger (in absolute value) the covariance between both
return processes is. To verify this argument we display in Figure 10.4 the
empirical covariance estimates from Figure 10.2 jointly with the success ratio
evaluated over overlapping time intervals of length |J| = 80.
As is apparent from Figure 10.4 there is a close co-movement between the
success ratio and the general trend of the covariance process, which confirms
our expectations: the forecasting power of the bivariate GARCH model is
10.3 Forecasting exchange rate densities 235
Bibliography
Baba, Y., Engle, R.F., Kraft, D.F., and Kroner, K.F. (1990). Multivariate Si-
multaneous Generalized ARCH, mimeo, Department of Economics, Uni-
versity of California, San Diego.
Berndt, E.K., Hall B.H., Hall, R.E., and Hausman, J.A. (1974). Estimation
and Inference in Nonlinear Structural Models, Annals of Economic and
Social Measurement 3/4: 653–665.
Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroscedastic-
ity, Journal of Econometrics 31: 307-327.
Bollerslev, T. (1990). Modeling the Coherence in Short-Run Nominal Exchange
Rates: A Multivariate Generalized ARCH Approach, Review of Economics
and Statistics 72: 498–505.
Bollerslev, T. and Engle, R.F. (1993). Common Persistence in Conditional
Variances, Econometrica 61: 167–186.
Bollerslev, T., Engle, R.F. and Nelson, D.B. (1994). GARCH Models, in: En-
gle, R.F., and McFadden, D.L. (eds.) Handbook of Econometrics, Vol. 4,
Elsevier, Amsterdam, 2961–3038.
Bollerslev, T., Engle, R.F. and Wooldridge, J.M. (1988). A Capital Asset Pric-
ing Model with Time-Varying Covariances, Journal of Political Economy
96: 116–131.
236 10 Multivariate Volatility Models
Statistical Process Control (SPC) is the misleading title of the area of statistics
which is concerned with the statistical monitoring of sequentially observed data.
Together with the theory of sampling plans, capability analysis and similar
topics it forms the field of Statistical Quality Control. SPC started in the
1930s with the pioneering work of Shewhart (1931). Then, SPC became very
popular with the introduction of new quality policies in the industries of Japan
and of the USA. Nowadays, SPC methods are considered not only in industrial
statistics. In finance, medicine, environmental statistics, and in other fields of
applications practitioners and statisticians use and investigate SPC methods.
A SPC scheme – in industry mostly called control chart – is a sequential scheme
for detecting the so called change point in the sequence of observed data. Here,
we consider the most simple case. All observations X1 , X2 , . . . are independent,
normally distributed with known variance σ 2 . Up to an unknown time point
m − 1 the expectation of the Xi is equal to µ0 , starting with the change point
m the expectation is switched to µ1 6= µ0 . While both expectation values
are known, the change point m is unknown. Now, based on the sequentially
observed data the SPC scheme has to detect whether a change occurred.
SPC schemes can be described by a stopping time L – known as run length –
which is adapted to the sequence of sigma algebras Fn = F(X1 , X2 , . . . , Xn ).
The performance or power of these schemes is usually measured by the Average
Run Length (ARL), the expectation of L. The ARL denotes the average num-
ber of observations until the SPC scheme signals. We distinguish false alarms
– the scheme signals before m, i. e. before the change actually took place – and
right ones. A suitable scheme provides large ARLs for m = ∞ and small ARLs
for m = 1. In case of 1 < m < ∞ one has to consider further performance
measures. In the case of the oldest schemes – the Shewhart charts – the typical
inference characteristics like the error probabilities were firstly used.
238 11 Statistical Process Control
with the reference value k and the critical value c3 (known as decision
interval). For fastest detection of µ1 − µ0 CUSUM has to be set up with
k = (µ1 + µ0 )/(2 σ).
The above notation uses normalized data. Thus, it is not important whether
Xt is a single observation or a sample statistic as the empirical mean.
Remark, that for using one-sided lower schemes one has to apply the upper
schemes to the data multiplied with -1. A slight modification of one-sided
Shewhart and EWMA charts leads to their two-sided versions. One has to
replace in the comparison of chart statistic and threshold the original statistic
Zt and ZtEWMA by their absolute value. The two-sided versions of these schemes
are more popular than the one-sided ones. For two-sided CUSUM schemes we
consider a combination of two one-sided schemes, Lucas (1976) or Lucas and
Crosier (1982), and a scheme based on Crosier (1986). Note, that in some
recent papers the same concept of combination of two one-sided schemes is
used for EWMA charts.
Recall, that Shewhart charts are a special case of EWMA schemes (λ = 1).
Therefore, we distinguish 5 SPC schemes – ewma1, ewma2, cusum1, cusum2
(two one-sided schemes), and cusumC (Crosier’s scheme). For the two-sided
EWMA charts the following quantlets are provided in the XploRe quantlib
spc.
By replacing ewma2 by one of the remaining four scheme titles the related
characteristics can be computed.
The quantlets spcewma1,...,spccusumC generate the chart figure. Here, we ap-
ply the 5 charts to artificial data. 100 pseudo random values from a normal
distribution are generated. The first 80 values have expectation 0, the next 20
values have expectation 1, i. e. model (11.1) with µ0 = 0, µ1 = 1, and m = 81.
We start with the two-sided EWMA scheme and set λ = 0.1, i. e. the chart is
0
-0.5
0 50 100
t
very sensitive to small changes. The critical value c2 (see (11.3)) is computed
to provide an in-control ARL of 300 (see Section 11.2). Thus, the scheme leads
in average after 300 observations to a false alarm.
In Figure 11.1 the graph of ZtEWMA is plotted against time t = 1, 2, . . . , 100.
Further, the design parameter λ, the in-control ARL, and the time of alarm (if
there is one) are printed. One can see, that the above EWMA scheme detects
the change point m = 81 at time point 94, i. e. the delay is equal to 14. The
related average values, i. e. ARL and Average Delay (AD), for µ1 = 1 are 9.33
and 9.13, respectively. Thus, the scheme needs here about 5 observations more
than average.
11.1 Control Charts 241
In the same way the remaining four SPC schemes can be plotted. Remark,
that in case of ewma1 one further parameter has to be set. In order to obtain
a suitable figure and an appropriate scheme the EWMA statistic ZtEWMA (see
(11.4)) is reflected at a pre-specified border zreflect ≤ 0 (= µ0 ), i. e.
0
-0.5
0 50 100
t
used in Section 11.2 for computing the chart characteristics use bounded con-
tinuation regions of the chart. If zreflect is small enough, then the ARL and
the AD (which are not worst case criterions) of the reflected scheme are the
same as of the unbounded scheme. Applying the quantlet XFGewma1fig.xpl
with zreflect= −4 p leads to Figure 11.2. Thereby, zreflect has the same
normalization factor λ/(2 − λ) like the critical value c2 (see 2.). The corre-
sponding normalized border is printed as dotted line (see Figure 11.2). The
chart signals one observation earlier than the two-sided version in Figure 11.1.
The related ARL and AD values for µ1 = 1 are now 7.88 and 7.87, respectively.
In Figure 11.3 the three different CUSUM charts with k = 0.5 are presented.
They signal at the time points 87, 88, and 88 for cusum1, cusum2, and cusumC,
242 11 Statistical Process Control
respectively. For the considered dataset the CUSUM charts are faster be-
10
Z_t
5
0
0 50 100
t
XFGcusum1fig.xpl
0
-5
0 50 100
t
XFGcusum2fig.xpl
0 50 100
t
XFGcusumCfig.xpl
cause of their better worst case performance. The observations right before
the change point at m = 81 are smaller than average. Therefore, the EWMA
charts need more time to react to the increased average. The related average
values of the run length, i. e. ARL and AD, are 8.17 and 7.52, 9.52 and 8.82,
9.03 and 8.79 for cusum1, cusum2, and cusumC, respectively.
Dµ(m)
= Em L − m + 1|L ≥ m ,
where µ is the value of µ1 in (11.1), i. e. the expectation after the change.
While Lµ measures the delay for the case m = 1, Dµ determines the delay for
a SPC scheme which ran a long time without signal. Usually, the convergence
(m)
in (11.8) is very fast. For quite small m the difference between Dµ and Dµ is
very small already. Lµ and Dµ are average values for the random variable L.
Unfortunately, L is characterized by a large standard deviation. Therefore, one
might be interested in the whole distribution of L. Again, we restrict on the
special cases m = 1 and m = ∞. We consider the probability mass function
Pµ (L = n) (PMF) and the cumulative distribution function Pµ (L ≤ n) (CDF).
Based on the CDF, one is able to compute quantiles of the run length L.
244 11 Statistical Process Control
Here we use the first approach, which has the advantage, that all considered
characteristics can be presented in a straightforward way. Next, the Markov
chain approach is briefly described. Roughly speaking, the continuous statistic
Zt is approximated by a discrete Markov chain Mt . The transition Zt−1 =
x → Zt = y is approximated by the transition Mt−1 = i w → Mt = j w with
x ∈ [i w − w/2, i w + w/2] and y ∈ [j w − w/2, j w + w/2]. That is, given an
integer r the continuation region of the scheme [−c, c], [zreflect, c], or [0, c]
is separated into 2 r + 1 or r + 1 intervals of the kind [i w − w/2, i w + w/2]
(one exception is [0, w/2] as the first subinterval of [0, c]). Then, the transition
kernel f of Zt is approximated by the discrete kernel of Mt , i. e.
f (x, y) ≈ P (i w → [j w − w/2, j w + w/2])/w
for all x ∈ [i w − w/2, i w + w/2] and y ∈ [j w − w/2, j w + w/2]. Eventually,
we obtain a Markov chain {Mt } with 2 r + 1 or r + 1 transient states and one
absorbing state. The last one corresponds to the alarm (signal) of the scheme.
Denote by Q = (qij ) the matrix of transition probabilities of the Markov chain
{Mt } on the transient states, 1 a vector of ones, and L = (Li ) the ARL vector.
Li stands for the ARL of a SPC scheme which starts in point i w (corresponds
to z0 ). In the case of a one-sided CUSUM scheme with z0 = 0 3 [0, w/2]
the value L0 approximates the original ARL. By using L we generalize the
original schemes to schemes with possibly different starting values z0 . Now,
the following linear equation system is valid, Brook and Evans (1972):
(I − Q) L = 1 , (11.9)
11.2 Chart characteristics 245
where I denotes the identity matrix. By solving this equation system we get
the ARL vector L and an approximation of the ARL of the considered SPC
scheme. Remark that the larger r the better is the approximation. In the days
of Brook and Evans (1972) the maximal matrix dimension r+1 (they considered
cusum1) was 15 because of the restrictions of the available computing facilities.
Nowadays, one can use dimensions larger than some hundreds. By looking
at different r one can find a suitable value. The quantlet XFGrarl.xpl
demonstrates this effect for the Brook and Evans (1972) example. 9 different
values of r from 5 to 500 are used to approximate the in-control ARL of a
one-sided CUSUM chart with k = 0.5 and c3 = 3 (variance σ 2 = 1). We get
XFGrarl.xpl
The true value is 117.59570 (obtainable via a very large r or by using the
quadrature methods with a suitable large number of abscissas). The computa-
tion of the average delay (AD) requires more extensive calculations. For details
see, e. g., Knoth (1998) on CUSUM for Erlang distributed data. Here we apply
the Markov chain approach again, Crosier (1986). Given one of the considered
schemes and normally distributed data, the matrix Q is primitive, i. e. there
exists a power of Q which is positive. Then Q has one single eigenvalue which
is larger in magnitude than the remaining eigenvalues. Denote this eigenvalue
by %. The corresponding left eigenvector ψ is strictly positive, i. e.
ψQ = %ψ , ψ > 0. (11.10)
D = (ψ T L)/(ψ T 1) .
Note, that the left eigenvector ψ is computed for the in-control mean µ0 , while
the ARL vector L is computed for a specific out-of-control mean or µ0 again.
If we replace in the above quantlet ( XFGrarl.xpl) the phrase arl by ad, then
246 11 Statistical Process Control
we obtain the following output which demonstrates the effect of the parameter
r again.
XFGrad.xpl
Fortunately, for smaller values of r than in the ARL case we get good accuracy
already. Note, that in case of cusum2 the value r has to be smaller (less than 30)
than for the other charts, since it is based on the computation of the dominant
eigenvalue of a very large matrix. The approximation in case of combination of
two one-sided schemes needs a twodimensional approximating Markov chain.
For the ARL only exists a more suitable approach. As, e. g., Lucas and Crosier
(1982) shown it is possible to use the following relation between the ARLs of the
one- and the two-sided schemes. Here, the two-sided scheme is a combination
of two symmetric one-sided schemes which both start at z0 = 0. Therefore,
we get a very simple formula for the ARL L of the two-sided scheme and the
ARLs Lupper and Llower of the upper and lower one-sided CUSUM scheme
Lupper · Llower
L= . (11.11)
Lupper + Llower
gi = φi /(φT ψ) ,
XFGarl.xpl
Remember that the ARL of the two-sided CUSUM (cusum2) scheme is based on
the one-sided one, i. e. 58.78 = 117.56/2 and 6.4036 = (6.4044·49716)/(6.4044+
49716) with 49716 = L−1 .
For the setup of the SPC scheme it is usual to give the design parameter λ and
k for EWMA and CUSUM, respectively, and a value ξ for the in-control ARL.
Then, the critical value c (c2 or c3 ) is the solution of the equation Lµ0 (c) =
ξ. Here, the regula falsi is used with an accuracy of |Lµ0 (c) − ξ| < 0.001.
The quantlet XFGc.xpl demonstrates the computation of the critical values
for SPC schemes with in-control ARLs of ξ = 300, reference value k = 0.5
(CUSUM), smoothing parameter λ = 0.1 (EWMA), zreflect= −4, and the
Markov chain parameter r = 50.
XFGc.xpl
248 11 Statistical Process Control
The usage of the routines for computing the Average Delay (AD) is similar
to the ARL routines. Replace only the code arl by ad. Be aware that the
computing time is larger than in case of the ARL, because of the computation
of the dominant eigenvalue. It would be better to choose smaller r, especially in
the case of the two-sided CUSUM. Unfortunately, there is no relation between
the one- and two-sided schemes as for the ARL in (11.11). Therefore, the library
computes the AD for the two-sided CUSUM based on a twodimensional Markov
chain with dimension (r + 1)2 × (r + 1)2 . Thus with values of r larger than 30,
the computing time becomes quite large. Here the results follow for the above
quantlet XFGrarl.xpl with ad instead of arl and r = 30 for spccusum2ad:
XFGad.xpl
The computation of the probability mass function (PMF) and of the cumulative
distribution function (CDF) is implemented in two different types of routines.
The first one with the syntax spcchartpmf returns the values of the PMF
P (L = n) and CDF P (L ≤ n) at given single points of n, where chart has
to be replaced by ewma1, ..., cusumC. The second one written as spcchartpmfm
computes the whole vectors of the PMF and of the CDF up to a given point
n, i. e. P (L = 1), P (L = 2), . . . , P (L = n) and the similar one of the CDF.
Note, that the same is valid as for the Average Delay (AD). In case of the
two-sided CUSUM scheme the computations are based on a twodimensional
11.2 Chart characteristics 249
XFGpmf1.xpl
15
P(L<=n)*E-2
10
5
0
0 10 20 30 40 50
n
ewma2
cusumC
0
0 10 20 30 40 50
n
Figure 11.4. CDF for two-sided EWMA and Crosier’s CUSUM for
µ = 0 (in control) and µ = 1 (out of control)
XFGpmf2.xpl
11.3 Comparison with existing methods 251
Here, we compare the ARL and AD computations of Lucas and Saccucci (1990)
with XploRe results. In their paper they use as in-control ARL ξ = 500. Then
for, e. g., λ = 0.5 and λ = 0.1 the critical values are 3.071 and 2.814, respec-
tively. By using XploRe the related values are 3.0712 and 2.8144, respectively.
It is known, that the smaller λ the worse the accuracy of the Markov chain
approach. Therefore, r is set greater for λ = 0.1 (r = 200) than for λ = 0.5
(r = 50). Table 11.1 shows some results of Lucas and Saccucci (1990) on
ARLs and ADs. Their results are based on the Markov chain approach as
well. However, they used some smaller matrix dimension and fitted a regres-
sion model on r (see Subsection 11.3.2). The corresponding XploRe results
Table 11.1. ARL and AD values from Table 3 of Lucas and Saccucci
(1990)
by using the quantlet XFGlucsac.xpl coincide with the values of Lucas and
Saccucci (1990).
XFGlucsac.xpl
Crosier (1986) derived a new two-sided CUSUM scheme and compared it with
the established combination of two one-sided schemes. Recall Table 3 of Crosier
(1986), where the ARLs of the new and the old scheme were presented. The
reference value k is equal to 0.5. First, we compare the critical values. By
252 11 Statistical Process Control
using XploRe ( XFGcrosc.xpl) with r = 100 one gets c = 4.0021 (4), 3.7304
(3.73), 4.9997 (5), 4.7133 (4.713), respectively – the original values of Crosier
are written in parentheses. By comparing the results of Table 11.2 with the
results obtainable by the quantlet XFGcrosarl.xpl (r = 100) it turns out,
that again the ARL values coincide with one exception only, namely L1.5 = 4.75
for the old scheme with h = 4.
XFGcrosarl.xpl
Further, we want to compare the results for the Average Delay (AD), which is
called Steady-State ARL in Crosier (1986). In Table 5 of Crosier we find the
related results. A slight modification of the above quantlet XFGcrosarl.xpl
allows to compute the ADs. Remember, that the computation of the AD for
the two-sided CUSUM scheme is based on a twodimensional Markov chain.
Therefore the parameter r is set to 25 for the scheme called old scheme by
Crosier. The results are summarized in Table 11.4.
While the ARL values in the paper and computed by XploRe coincide, those
for the AD differ slightly. The most prominent deviation (459 vs. 455) one
observes for the old scheme with h = 5. One further in-control ARL difference
one notices for the new scheme with h = 3.73. All other differences are small.
There are different sources for the deviations:
T T
1. Crosier computed D(32) = (p32 L)/(p32 1) and not the actual limit D
(see 11.8, 11.10, and 11.12).
2. Crosier used ARL(r) = ARL∞ + B/r2 + C/r4 and fitted this model
for r = 8, 9, 10, 12, 15. Then, ARL∞ is used as final approximation. In
order to get the above D(32) one needs the whole vector L, such that this
approach might be more sensitive to approximation errors than in the
single ARL case.
precisely, a capital asset pricing model (CAPM) is fitted for DBK and the DAX
which is used as proxy of the efficient market portfolio. That is, denoting with
rDAX,t and rDBK,t the log returns of the DAX and the DBK, respectively, one
assumes that the following regression model is valid:
Usually, the parameters of the model are estimated by the ordinary least
squares method. The parameter β is a very popular measure in applied fi-
nance, Elton and Gruber (1991). In order to construct a real portfolio, the
β coefficient is frequently taken into account. Research has therefore concen-
trated on the appropriate estimation of constant and time changing β. In the
context of SPC it is therefore useful to construct monitoring rules which signal
changes in β. Contrary to standard SPC application in industry there is no
obvious state of the process which one can call ”in-control”, i. e. there is no
target process. Therefore, pre-run time series of both quotes (DBK, DAX)
are exploited for building the in-control state. The daily quotes and log re-
turns, respectively, from january, 6th, 1995 to march, 18th, 1997 (about 450
observations) are used for fitting (11.14):
Multiple R = 0.70619
R^2 = 0.49871
Adjusted R^2 = 0.49759
Standard Error = 0.00746
5 10 15 20
lag
Figure 11.5. Partial autocorrelation function of CAPM regression resid-
uals
εt = % εt−1 + ηt (11.15)
In our example we use the parameter λ = 0.2 and an in-control ARL of 500,
such that the critical value is equal to c = 2.9623 (the Markov chain parameter
r was set to 100). Remark, that the computation of c is based on the normality
assumption, which is seldom fulfilled for financial data. In our example the
hypothesis of normality is rejected as well with a very small p value (Jarque-
Bera test with quantlet jarber). The estimates of skewness 0.136805 and
kurtosis 6.64844 contradict normality too. The fat tails of the distribution are
a typical pattern of financial data. Usually, the fat tails lead to a higher false
alarm rate. However, it would be much more complicated to fit an appropriate
distribution to the residuals and use these results for the ”correct” critical
value.
The Figures 11.6 and 11.7 present the EWMA graphs of the pre-run and the
monitoring period (from march, 19th, 1997 to april, 16th, 1999). In the pre-run
0 1 2 3 4
t*E2
period the EWMA chart signals 4 times. The first 3 alarms seem to be outliers,
while the last points on a longer change. Nevertheless, the chart performs quite
typical for the pre-run period. The first signal in the monitoring period was
obtained at the 64th observation (i. e. 06/24/97). Then, we observe more
frequently signals than in the pre-run period, the changes are more persistent
and so one has to assume, that the pre-run model is no longer valid. A new
CAPM has therefore to be fitted and, if necessary, the considered portfolio has
to be reweighted. Naturally, a new pre-run can be used for the new monitoring
11.4 Real data example – monitoring CAPM 257
0 1 2 3 4 5
t*E2
period.
XFGcapmar1.xpl
Bibliography
Brook, D. and Evans, D. A. (1972). An approach to the probability distribution
of cusum run length, Biometrika 59: 539–548.
Crosier, R. B. (1986). A new two-sided cumulative quality control scheme,
Technometrics 28: 187–194.
Crowder, S. V. (1986). A simple method for studying run-length distributions
of exponentially weighted moving average charts, Technometrics 29: 401–
407.
Elton, EJ. and Gruber, MJ. (1991). Modern portfolio theory and investment
analysis, Wiley, 4. edition.
Knoth, S. (1998). Quasi-stationarity of CUSUM schemes for Erlang distribu-
tions, Metrika 48: 31–48.
258 11 Statistical Process Control
12.1 Introduction
Let us assume a strictly stationary one-dimensional diffusion Z solving the
stochastic differential equation (SDE)
of a stock, a stock market index or any other observable process. For the rest
of the chapter the drift m : R 7→ R, and the diffusion coefficient v : R 7→ [0, ∞)
in (12.1) are assumed to be sufficiently smooth, so that a unique solution of
(12.1) exists.
In applications we are mostly interested in the stationary solutions of (12.1).
For the existence of a stationary solution, the drift and the diffusion coefficient
must satisfy some conditions, Bibby and Sørensen (1995). The most important
condition is that the stationary forward Kolmogorov equation
0
(1/2) v 2 (z)p(z) − m(z)p(z) = 0
has a solution p(z) which is a probability density. If the initial value Z(0) is
distributed in accordance with p0 , and if it is independent of the Wiener process
W (t) in (12.1), then (12.1) defines a stationary process. The above condition
holds for the Ornstein-Uhlenbeck process with a normal stationary distribution,
and for the Cox-Ingersoll-Ross process with a Γ-distribution. For the statistical
analysis we assume that Z is observed at discrete times ti = i∆, i = 1, 2, . . . , n,
with a time step size ∆ > 0. From these observations we get a time series Z ∆
with certain dynamics specified in Section 12.2.
The aim of this chapter is to test a parametric model for the drift function m
against a nonparametric alternative, i.e.
From now on, we assume that a discrete time approximation Z ∆ exists in the
form of (12.3), and that the property (12.4) holds. For the purposes of this
chapter, ∆ will always be considered small enough that one can substitute Z
by Z ∆ in our interpretation of the observed data. The increments of the Euler
approximation and so the observed data will have the form
Z ∆ (ti+1 ) − Z ∆ (ti ) = m Z ∆ (ti ) ∆ + v Z ∆ (ti ) W (ti+1 ) − W (ti ) (12.5)
We can now apply the empirical likelihood Goodness-of-Fit test for stationary
time series developed by Chen et al. (2001).
(i) The kernel K is Lipschitz continuous in [−1, 1], that is |K(t1 ) − K(t2 )| ≤
C||t1 − t2 || where || · || is the Euclidean norm, and h = O{n−1/5 };
(ii) f , m and σ 2 have continuous derivatives up to the second order in S.
(iv) ∆n (x), the local shift in the alternative H1 , is uniformly bounded with
respect to x and n, and cn = n−1/2 h−1/4 which is the order of the differ-
ence between H0 and H1 .
(v) The process {(Xi , Yi )} is strictly stationary and α-mixing, i.e.
def
α(k) = sup |P(AB) − P(A)P(B)| ≤ aρk
∞
A∈F1i ,B∈Fi+k
for some a > 0 and ρ ∈ [0, 1). Here Fkl denotes the σ-algebra of events
generated by {(Xi , Yi ), k ≤ i ≤ l} for l ≥ k. For an introduction into
α-mixing processes, see Bosq (1998) or Billingsley (1999). As shown by
Genon-Catalot, Jeantheau and Larédo (2000) this assumption is fulfilled
if Zt is an α-mixing process.
(vi) E{exp(a0 |Y1 − m(X1 )|)} < ∞ for some a0 > 0; The conditional density
of X given Y and the joint conditional density of (X1 , Xl ) given (Y1 , Yl )
are bounded for all l > 1.
12.4 Kernel Estimator 263
Assumptions (i) and (ii) are standard in nonparametric curve estimation and
are satisfied for example for bandwidths selected by cross validation, whereas
(iii) and (iv) are common in nonparametric Goodness-of-Fit tests. Assumption
(v) means the data are weakly dependent. It is satisfied for a wide class of
diffusion processes.
2
where σK is a positive constant. Let h be a positive smoothing bandwidth
which will be used to smooth (X, Y ).
The nonparametric estimator considered is the Nadaraya-Watson (NW) esti-
mator Pn
Yi Kh (x − Xi )
m̂(x) = Pi=1
n (12.7)
i=1 Kh (x − Xi )
with Kh (u) = h−1 K(h−1 u). This estimator is calculated in XploRe by the
quantlets regest or regxest.
The parameter estimation of θ depends on the √ null hypothesis. We assume
here, that the parameter θ is estimated by a n-consistent estimator. Let
P
Kh (x − Xi )mθ̂ (Xi )
m̃θ̂ (x) = Pn
i=1 Kh (x − Xi )
be the smoothed parametric model. The test statistic we are going to consider
is based on the difference between m̃θ̂ and m̂, rather than directly between m̂
264 12 An Empirical Likelihood Goodness-of-Fit Test for Diffusions
and mθ̂ , in order to avoid the issue of bias associated with the nonparametric
fit.
The local linear estimator can be used to replace the NW estimator in estimat-
ing m. However, as we compare m̂ with m̃θ̂ in formulating the Goodness-of-Fit
test, the possible bias associated with the NW estimator is not an issue here.
In addition, the NW estimator has a simpler analytic form.
Let us now as in Owen (1988) and Owen (1990) introduce the empirical likeli-
hood (EL) concept. Suppose a sample (U1 , . . . , Un ) of independent identically
distributed random variables in R1 according to a probability law with un-
known distribution function F and unknown density f . For an observation
(u1 , . . . , un ) of (U1 , . . . , Un ) the likelihood function is given by
n
Y
L̄(f ) = f (ui ) (12.8)
i=1
On a heuristic level we can reject the null hypothesis “under the true distribu-
tion F , U has expectation θ” if the ratio R(F ) is small relative to 1, i.e. the
test rejects if R(F ) < r for a certain level r ∈ (0, 1). More precisely, Owen
(1990) proves the following
as the basic element of a test about a parametric hypothesis for the drift func-
tion of a diffusion process.
We will now expand the results in Section 12.5.1 to the case of time series data.
For an arbitrary x ∈ [0, 1] and any function µ we have
x − Xi
E K {Yi − µ(x)} E[Yi |Xi = x] = µ(x) = 0. (12.13)
h
Let pi (x) be nonnegative numbers representing a density for
x − Xi
K {Yi − µ(x)} i = 1, . . . , n
h
266 12 An Empirical Likelihood Goodness-of-Fit Test for Diffusions
Let γ(x) be a random process with x ∈ [0, 1]. Throughout this chapter we use
the notation γ(x) = Õp (δn ) ( Õp (δn )) to denote the facts that supx∈[0,1] |γ(x)| =
Op (δn ) (Op (δn )) for a sequence δn .
j
Pn
Let Ūj (x) = (nh)−1 i=1 K x−X
h
i
{Y i − m̃ θ̂ (x)} for j = 1, 2, . . .. An appli-
cation of the power series expansion of 1/(1 − •) applied to (12.16) and Lemma
12.1 yields
n X∞
X x − Xi j j x − Xi j
K {Yi − m̃θ̂ (x)} (−λ(x)) K {Yi − m̃θ̂ (x)} = 0.
i=1
h j=0
h
From (12.15), Lemma 12.1 and the Taylor expansion of log(1 + •) we get
`{m̃θ̂ (x)} = nhŪ2−1 (x)Ū12 (x) + Õp {(nh)−1/2 log3 (n)}. (12.19)
be the variance and the bias coefficient functions associated with the NW esti-
mator, respectively, see Wand and Jones (1995). Let
SI,h = {x ∈ [0, 1]| min (|x − 1|, |x|) > h}.
For h → 0, SI,h converges to the set of interior points in [0, 1]. If x ∈ SI,h , we
def R
have v(x; h) = K 2 (x)dx and b(x; h) = 1. Define
v(x; h)σ 2 (x)
V (x; h) = .
f (x)b2 (x; h)
Clearly, V (x; h)/(nh) is the asymptotic variance of m̂(x) when nh → ∞ which
is one of the conditions we assumed.
It was shown by Chen et al. (2001), that
n
X
Ū1 (x) = n−1 Kh (x − Xi ){Yi − m̃θ̂ (x)}
i=1
Xn
= n−1 Kh (x − Xi ){Yi − mθ (Xi )} + Õp (n−1/2 )
i=1
We will now discuss the asymptotic distribution of the test statistic `n (m̃θ̂ ).
Theorem 12.3 was proven by Chen et al. (2001).
As K is a compact kernel on [−1, 1], when both s and t are in SI (the interior
part of [0, 1]), we get from (12.22) with u = (s − y)/h
Z hs
(2)
W0 (s, t) = K(u)K{u − (s − t)/h}du
s−1
h
Z ∞
= K(u)K{u − (s − t)/h}du
−∞
s−t
= K (2) (12.23)
h
where K (2) is the convolution of K. The compactness of K also means that
(2)
W0 (s, t) = 0 if |s − t| > 2h which implies Ω(s, t) = 0 if |s − t| > 2h. Hence
N (s) and N (t) are independent if |s − t| > 2h. As
f (s)σ 2 (s) = f (s)σ 2 (t) + O(h)
when |s − t| ≤ 2h, we get
(2)
W0 (s, t)
Ω(s, t) = q + O(h), (12.24)
(2) (2)
W0 (s, s)W0 (t, t)
12.6 Goodness-of-Fit Statistic 271
So, the leading order of the covariance function is free of σ 2 and f , i.e. Ω(s, t)
is completely known.
Let
h1/4 ∆n (s)
N0 (s) = N (s) − p . (12.25)
V (s)
Then N0 (s) is a normal process with zero mean and covariance Ω. The bound-
(2) R1
edness of K implies W0 being bounded, and hence 0 Ω(t, t)dt < ∞. We will
R1 def
now study the expectation and variance of 0 N 2 (s)ds. Let T = T1 +T2 +T3 =
R1 2
0
N (s)ds where
Z 1
T1 = N02 (s)ds,
0
Z 1
T2 = 2h1/4 V −1/2 (s)∆n (s)N0 (s)ds and
0
Z 1
T3 = h1/2 V −1 (s)∆2n (s)ds.
0
From some basic results on stochastic integrals, Lemma 12.2 and (12.24) fol-
lows,
Z 1
E(T1 ) = Ω(s, s)ds = 1 and
0
Var(T1 ) = E[T12 ] − 1 (12.26)
Z 1Z 1
E N02 (s)N02 (t) dsdt − 1
= (12.27)
0 0
Z 1 Z 1
= 2 Ω2 (s, t)dsdt
0 0
Z 1 Z 1
(2) (2) (2)
= 2 {W0 (s, t)}2 {W0 (s, s)W0 (t, t)}−1 dsdt {1 + O(h2 )}
0 0
From (12.23) and the fact that the size of the region [0, 1] \ SI,h is O(h), we
have
Z 1Z 1
(2) (2) (2)
{W0 (s, t)}2 {W0 (s, s)W0 (t, t)}−1 dsdt
0 0
Z 1 Z 1
= {K (2) (0)}−2 [K (2) {(s − t)/h}]2 dsdt {1 + O(1)}
0 0
= hK (4) (0){K (2)
(0)}−2 + O(h).
272 12 An Empirical Likelihood Goodness-of-Fit Test for Diffusions
Therefore,
Var(T1 ) = 2hK (4) (0){K (2) (0)}−2 + O(h2 ).
It is obvious that E(T2 ) = 0 and
Z Z
Var(T2 ) = 4h 1/2
V −1/2 (s)∆n (s)Ω(s, t)V −1/2 (t)∆n (t)dsdt.
As ∆n and V −1 are bounded in [0, 1], there exists a constant C1 such that
Z Z
Var(T2 ) ≤ C1 h1/2 Ω(s, t)dsdt.
with other constants C 0 1 and C100 , and thus, there exists a constant C2 , such
that
3
Var(T2 ) ≤ C2 h 2 .
As T3 is non-random, we have
Z 1
E(T ) = 1 + h1/2 V −1 (s)∆2n (s)ds and (12.28)
0
Var{T } = 2hK (4) (0){K (2) (0)}−2 + O(h) (12.29)
(12.28) and (12.29) together with Theorem 12.3 give the asymptotic expecta-
tion and variance of the test statistic kn−1 `n (m̃θ̂ ).
kn
X
N 2 (tj ) ∼ χ2kn
j=1
where P(Z > zα ) = α and Z ∼ N(0, 1). The asymptotic power of this test is
We see from the above that the binning based on the bandwidth value h pro-
vides a key role in the derivation of the asymptotic distributions. However, the
binning discretizes the null hypothesis and unavoidably leads to some loss of
274 12 An Empirical Likelihood Goodness-of-Fit Test for Diffusions
power as shown in the simulation reported in the next section. From the point
of view of retaining power, we would like to have the size of the bins smaller
than that prescribed by the smoothing bandwidth in order to increase the res-
olution of the discretized null hypothesis to the original H0 . However, this will
create dependence between the empirical likelihood evaluated at neighbouring
bins and make the above asymptotic
R1 distributions invalid. One possibility is
to evaluate the distribution of 0 N02 (s)ds by using the approach of Wood and
Chan (1994) by simulating the normal process N 2 (s) under H0 . However, this
is not our focus here and hence is not considered in this chapter.
12.8 Application
Figure 12.1 shows the daily closing value of the S&P 500 share index from
the 31st December 1976 to the 31st December 1997, which covers 5479 trading
days. In the upper panel, the index series shows a trend of exponential form
which is estimated using the method given in Härdle, Kleinow, Korostelev,
Logeay and Platen (2001). The lower panel is the residual series after removing
the exponential trend. In mathematical finance one assumes often a specific
dynamic form of this residual series, Platen (2000). More precisely, Härdle
et al. (2001) assume the following model for an index process S(t)
Z t
S(t) = S(0)X(t) exp η(s)ds (12.32)
0
805.79
622.40
439.00
255.61
0.2154
0.1129
0.0104
-0.0920
Figure 12.1. The S&P 500 Data. The upper plot shows the S&P 500
together with the exponential trend. The lower plot shows the residual
process X.
0.8
0.6
P-value
0.4
0.2
The P-values indicate that there is insufficient evidence to reject the diffusion
model.
where {ηi } are independent and identically distributed uniform random vari-
ables in [−1, 1], ηi is independent of Xi = Yi−1 for each i, and σ(x) =
exp(−x2 /4). Note that the mean and the variance functions are both bounded
which ensures the series is asymptotically stationary. To realize the station-
arity, we pre-run the series 100 times with an initial value Y−100 = 0. The
empirical likelihood test statistic is calculated via the elmtest quantlet.
12.9 Simulation Study and Illustration 277
{el,p,kn,h2} = elmtest(x,y,model{,kernel{,h{,theta}}})
calculates the empirical likelihood test statistic
The first and the second parameter are the vectors of observations of X and
Y . The third parameter model is the name of a quantlet that implements the
parametric model for the null hypothesis. The optimal parameter kernel is
the name of the kernel K that is used to calculate the test statistic and h is the
bandwidth used to calculate Ū1 and Ū2 in (12.18). theta is directly forwarded
to the parametric model.
XFGelsim1.xpl
For the simulation study the sample sizes considered for each trajectory are
n = 500 and 1000 and cn , the degree of difference between H0 and H1 , takes
value of 0, 0.03 and 0.06. As the simulation shows that the two empirical
likelihood tests have very similar power performance, we will report the results
for the test based on the χ2 distribution only. To gauge the effect of the
smoothing bandwidth h on the power, ten levels of h are used for each simulated
sample to formulate the test statistic.
n = 500 n = 1000
0.6
cn = 0.06
0.5
cn = 0.06
0.3
power of the EL test
0.2
cn = 0.03 cn = 0.03
0.1
0.1
cn = 0.00 cn = 0.00
Figure 12.3. Power of the empirical likelihood test. The dotted lines
indicate the 5% level
278 12 An Empirical Likelihood Goodness-of-Fit Test for Diffusions
Figure 12.3 presents the power of the empirical likelihood test based on 5000
simulation with a nominal 5% level of significance. We notice that when cn = 0
the simulated significance level of the test is very close to the nominal level for
large range of h values which is especially the case for the larger sample size
n = 1000. When cn increases, for each fixed h the power increases as the
distance between the null and the alternative hypotheses becomes larger. For
each fixed cn , there is a general trend of decreasing power when h increases.
This is due to the discretization of H0 by binning as discussed at the end of the
previous section. We also notice that the power curves for cn = 0.06 are a little
erratic although they maintain the same trend as in the case of cn = 0.03. This
may be due to the fact that when the difference between H0 and H1 is large, the
difference between the nonparametric and the parametric fits becomes larger
and the test procedure becomes more sensitive to the bandwidths.
In our second simulation study we consider an Ornstein-Uhlenbeck process Z
fluctuating about 0 that satisfies the stochastic differential equation
dZ(t) = aZ(t)dt + σdW (t)
where W is a standard Brownian Motion. The speed of adjustment parameter
a has to be negative to ensure stationarity. To apply the empirical likelihood
test we construct the time series X and Y as in Section 12.2, i.e.
Xi = Z ∆ (ti ) , X = (X1 , . . . , Xn )
εi = W (ti+1 ) − W (ti ) , ε = (ε1 , . . . , εn )
Yi = Xi+1 − Xi = aXi ∆ + σεi , Y = (Y1 , . . . , Yn ) (12.34)
It is well known that the transition probability of an Ornstein-Uhlenbeck pro-
cess is normal with conditional mean
E[Zt+∆ |Zt = x] = E[Xi+1 |Xi = x] = xea∆
and conditional variance
γ2
e−2β∆ − 1 .
Var(Zt+∆ |Zt = x) = Var(Xi+1 |Xi = x) =
−2β
To simulate the process we use the simou quantlet.
x = simou(n,a,s,delta)
simulates a discretely observed path of an Ornstein-Uhlenbeck
process via its transition probability law.
12.10 Appendix 279
12.10 Appendix
LEMMA 12.2 Let X, Y be standard normal random variables with covariance
Cov(X, Y ) = ρ, i.e.
X 0 1 ρ
∼N , . (12.37)
Y 0 ρ 1
Then we have:
Cov(X 2 , Y 2 ) = 2ρ2
PROOF:
def p
Define Z ∼ N(0, 1) independent of X and X 0 = ρX + 1 − ρ2 Z. Then we
get:
X 0 1 ρ
∼ N , .
X0 0 ρ 1
2
Cov(X 2 , Y 2 ) = Cov(X 2 , X 0 ) = 2ρ2
Bibliography
Baggerly, K. A. (1998). Empirical likelihood as a goodness-of-fit measure,
Biometrika 85: 535–547.
280 12 An Empirical Likelihood Goodness-of-Fit Test for Diffusions
13.1 Introduction
For most people, purchasing a house is a major decision. Once purchased,
the house will by far be the most important asset in the buyer’s portfolio.
The development of its price will have a major impact on the buyer’s wealth
over the life cycle. It will, for instance, affect her ability to obtain credit
from commercial banks and therefore influence her consumption and savings
decisions and opportunities. The behavior of house prices is therefore of central
interest for (potential) house buyers, sellers, developers of new houses, banks,
policy makers or, in short, the general public.
An important property of houses is that they are different from each other.
Hence, while houses in the same market (i.e., the same city, district or neigh-
borhood) will share some common movements in their price there will at all
times be idiosyncratic differences due to differences in maintenance, design or
furnishing. Thus, the average or median price will depend not only on the
general tendency of the market, but also on the composition of the sample. To
calculate a price index for real estate, one has to control explicitly for idiosyn-
cratic differences. The hedonic approach is a popular method for estimating
the impact of the characteristics of heterogenous goods on their prices.
The statistical model used in this chapter tries to infer the common component
in the movement of prices of 1502 single-family homes sold in a district of Berlin,
Germany, between January 1980 and December 1999. It combines hedonic
regression with Kalman filtering. The Kalman filter is the standard statistical
tool for filtering out an unobservable, common component from idiosyncratic,
284 13 A simple state space model of house prices
price index will thus exhibit some autocorrelation. This time-series-based way
of modelling the behavior of It is more parsimonious than the conventional
hedonic regressions (which need to include a seperate dummy variable for each
time period) and makes forecasting straightforward.
We can rewrite our model (13.1) and (13.2) in State Space Form (SSF)
(Gourieroux and Monfort, 1997). In general, the SSF is given as:
αt = ct + Tt αt−1 + εst (13.3a)
yt = dt + Zt αt + εm
t (13.3b)
εst ∼ (0, Rt ) , εm
t ∼ (0, Ht ) . (13.3c)
The notation partially follows Harvey (1989; 1993). The first equation is the
state equation and the second is the measurement equation. The characteristic
structure of state space models relates a series of unobserved values αt to a
set of observations yt . The unobserved values αt represent the behavior of the
system over time (Durbin and Koopman, 2001).
The unobservable state vector αt has the dimension K > 1, Tt is a square
matrix with dimension K × K, the vector of the observable variables yt has the
dimension Nt × 1. Here, Nt denotes the number of observations yt,n in period
t 6 T . If the number of observations varies through periods, we denote
def
N = max Nt .
t=1,··· ,T
1 0 x>
p1,t
1,t ε1,t
yt = . . . , Zt = ... ... .. , εm = .. (13.4b)
. t .
pNt ,t >
1 0 xNt ,t εNt ,t
For our model, both ct and dt are zero vectors. The transition matrices Tt are
non time-varying. The variance matrices of the state equation Rt are identical
for all t and equal to a 6 × 6 matrix, where the first element is σν2 and all other
elements are zeros. Ht is a Nt × Nt diagonal matrix with σε2 on the diagonal.
The variance σε2 is also an unknown parameter.
The first two elements of the state equation just resemble the process of the
common price component given in (13.2). However, we should mention that
there are other ways to put an AR(2) process into a SSF (see Harvey, 1993, p.
84). The remaining elements of the state equation are the implicit prices β of
the hedonic price equation (13.1). Multiplying the state vector αt with row n
of the matrix Zt gives It + x>
t,n β. This is just the functional relation (13.1) for
the log price without noise. The noise terms of (13.1) are collected in the SSF
in the vector εm m s
t . We assume that εt and εt are uncorrelated. This is required
for identification (Schwann, 1998, p. 274).
The Kalman filter is an algorithm for sequently updating our knowledge of the
system given a new observation yt . It calculates one step predictions conditional
on s = t. Using our general expressions, we have
at = E[αt |Ft ]
and
Pt = E[(αt − at )(αt − at )> |Ft ] .
Here we use the standard simplified notation at and Pt for at|t and Pt|t . As a
by-product of the filter, the recursions calculate also
at|t−1 = E[αt |Ft−1 ]
288 13 A simple state space model of house prices
and
Pt|t−1 = E[(αt − at|t−1 )(αt − at|t−1 )> |Ft−1 ] .
We give the filter recursions in detail in Subsection 13.5.3.
The Kalman smoother is an algorithm to predict the state vector αt given the
whole information up to T . Thus we have with our general notation s = T and
We see that the filter makes one step predictions given the information up
to t ∈ {1, . . . , T } whereas the smoother is backward looking. We give the
smoother recursions in detail in Subsection 13.5.5.
Here,
def
vt = yt − dt − Zt at|t−1 (13.10)
are the innovations of the filtering procedure and at|t−1 is the conditional
expectation of αt given information up to t − 1. As we have already mentioned,
these expressions are a by-product of the filter recursions. The matrix Ft
is the covariance matrix of the innovations at time t and also a by-product
of the Kalman filter. The above log likelihood is known as the prediction
error decomposition form (Harvey, 1989). Periods with no observations do not
contribute to the log likelihood function.
13.4 The Data 289
Starting with some initial value, one can use numerical maximization methods
to obtain an estimate of the parameter vector ψ. Under certain regularity con-
ditions, the maximum likelihood estimator ψ̃ is consistent and asymptotically
normal. One can use the information matrix to calculate standard errors of ψ̃
(Hamilton, 1994).
After fitting a SSF, one should check the appropriateness of the results by
looking at the standardized residuals
−1/2
vtst = Ft vt . (13.11)
If all parameters of the SSF were known, vtst would follow a multivariate stan-
dardized normal distribution (Harvey, 1989, see also (13.9)). We know that Ft
is a symmetric matrix and that it should be positive definite (recall that it is
just the covariance matrix of the innovations vt ). So
−1/2 −1/2
Ft = Ct Λt Ct> , (13.12)
where the diagonal matrix Λt contains all eigenvalues of Ft and Ct is the ma-
trix of corresponding normalized eigenvectors (Greene, 2000, p.43). The stan-
dardized residuals should be distributed normally with constant variance, and
should show no serial correlation. It is a signal for a misspecified model when
the residuals do not possess these properties. To check the properties, one
can use standard test procedures. For example, a Q-Q plot indicates if the
quantiles of the residuals deviate from the corresponding theoretical quantiles
of a normal distribution. This plot can be used to detect non-normality. The
Jarque-Bera test for normality can also be used for testing non-normality of
the residuals (Bera and Jarque, 1982). This test is implemented in XploRe as
jarber.
In the empirical part, we combine Kalman filter techniques and maximum
likelihood to estimate the unknown parameters and coefficients of the SSF for
the house prices in a district of Berlin.
collects information on all real estate transactions in Berlin in a data base called
Automatisierte Kaufpreissammlung.
Here, we use data for 1502 sales of detached single-family houses in a district
of Berlin for the years 1980 to 1999, stored in MD*BASE. Besides the price,
we observe the size of the lot, the floor space, and the age of the house. The
data set XFGhouseprice contains the log price observations for all 80 quarters.
There are at most N = 43 observations in any quarter. The following lines of
XploRe code
Y = read("XFGhouseprice.dat")
Y[1:20,41:44]
X = read("XFGhousequality.dat")
X[1:6,41]’
The size of the lot for the second house is about 706 square meters (just take
the antilog). The size of the floor space is 172 square meters and the age is 13
years.
13.4 The Data 291
The following table shows summary statistics of our Berlin house price data.
"========================================================="
" Summary statistics for the Berlin house price data "
"========================================================="
" Sample for 80 quarters with 1502 observations "
" "
" Observations per period "
" ----------------------------------------------------"
" Minimum = 4 Average = 18.77 Maximum = 43 "
" "
" Transaction prices (in thousand DM) "
" ----------------------------------------------------"
" Minimum = 100.00 Average = 508.46 "
" Maximum = 1750.01 Std. Dev. = 197.92 "
" "
" Size of the lot (in square meters) "
" ----------------------------------------------------"
" Minimum = 168.00 Average = 626.18 "
" Maximum = 2940.00 Std. Dev. = 241.64 "
" "
" Size of the floor space (in square meters) "
" ----------------------------------------------------"
" Minimum = 46.00 Average = 144.76 "
" Maximum = 635.00 Std. Dev. = 48.72 "
" "
" Age of the building (in years) "
" ----------------------------------------------------"
" Minimum = 0 Average = 28.59 "
" Maximum = 193 Std. Dev. = 21.58 "
"========================================================="
XFGsssm1.xpl
Not surprisingly for detached houses there are large differences in the size of
the lot. Some houses were new in the period of the sale while one was 193
years old. That is a good example for the potential bias of the average price
per quarter as a price index. If we do not control explicitly for depreciation we
might obtain a low price level simply because the houses sold in a quarter were
old.
292 13 A simple state space model of house prices
Nevertheless, the average price per quarter can give an indication of the price
level. Figure 13.1 shows the average price per quarter along with confidence
intervals at the 90% level. Instead of the average price, we could also calculate
an average adjusted price, where the most important characteristic is used for
the adjustment. Such adjustment is attained by dividing the price of every
house by—for example—the respective size of the lot. However, even in that
case we would control only for one of the observed characteristics. In our model
we will control for all of the observed characteristics.
1100.00
900.00
700.00
500.00
300.00
Figure 13.1. Average price per quarter, units are Deutsche Mark (1
DM ≈ 0.511 EURO). Confidence intervals are calculated for the 90%
level.
XFGsssm2.xpl
13.5 Estimating and filtering in XploRe 293
13.5.1 Overview
The procedure for Kalman filtering in XploRe is as follows: first, one has
to set up the system matrices using gkalarray. The quantlet adjusts the
measurement matrices for missing observations.
After the set up of the system matrices, we calculate the Kalman filter with
gkalfilter. This quantlet also calculates the value of the log likelihood
function given in equation (13.9). That value will be used to estimate the
unknown parameters of the system matrices with numerical maximization
(Hamilton, 1994, Chapter 5). The first and second derivatives of the log like-
lihood function will also be calculated numerically. To estimate the unknown
state vectors—given the estimated parameters—we use the Kalman smoother
gkalsmoother. For diagnostic checking, we use the standardized residuals
(13.11). The quantlet gkalresiduals calculates these residuals.
gkalarrayOut = gkalarray(Y,M,IM,XM)
sets the system matrices for a time varying SSF
The Kalman filter quantlets need as arguments arrays consisting of the system
matrices. The quantlet gkalarray sets these arrays in a user-friendly way. The
routine is especially convenient if one works with time varying system matrices.
In our SSF (13.4), only the system matrix Zt is time varying. As one can see
immediately from the general SSF (13.3), possibly every system matrix can be
time varying.
The quantlet uses a three step procedure to set up the system matrices.
1. To define a system matrix all constant entries must be set to their re-
spective values and all time varying entries must be set to an arbitrary
number (for example to 0).
2. One must define an index matrix for every system matrix. An entry is
set to 0 when its corresponding element in the system matrix is constant
and to some positive integer when it is not constant.
294 13 A simple state space model of house prices
3. In addition, for every time varying system matrix, one also has to specify
a data matrix that contains the time varying entries.
gkalarray uses the following notation: Y denotes the matrix of all observations
[y1 , . . . , yT ], M denotes the system matrix, IM denotes the corresponding index
matrix and XM the data matrix.
If all entries of a system matrix are constant over time, then the parameters
have already been put directly into the system matrix. In this case, one should
set the index and the data matrix to 0.
For every time varying system matrix, only constant parameters—if there are
any—have already been specified with the system matrix. The time-varying
coefficients have to be specified in the index and the data matrix.
In our example, only the matrices Zt are time varying. We have
1 0 1 0 0 0
def .. .. .. .. .. ..
Z = . . . . . .
1 0 1 0 0 0
0 0 0 1 2 3
0 0 0 4 5 6
def
IZ =
.. .. .. .. .. ..
. . . . . .
0 0 0 (3N + 1) (3N + 2) (3N + 3)
def
XZ = XFGhousequality
The system matrix Zt has the dimension (N × 6). The non-zero entries in the
index matrix IZ prescribe the rows of XFGhousequality, which contain the
time varying elements.
The output of the quantlet is an array that stacks the system matrices one
after the other. For example, the first two rows of the system matrix Z41 are
{gkalfilOut,loglike} = gkalfilter(Y,mu,Sig,ca,Ta,Ra,
da,Za,Ha,l)
Kalman filters a time-varying SSF
We assume that the initial state vector at t = 0 has mean µ and covariance
matrix Σ. Recall, that Rt and Ht denote the covariance matrix of the state noise
and—respectively—of the measurement noise. The general filter recursions are
as follows:
Start at t = 1: use the initial guess for µ and Σ to calculate
a1|0 = c1 + T1 µ
P1|0 = T1 ΣT1> + R1
F1 = Z1 P1|0 Z1> + H1
and
Step at t 6 T : using at−1 and Pt−1 from the previous step, calculate
at|t−1 = ct + Tt at−1
Pt|t−1 = Tt Pt−1 Tt> + Rt
Ft = Zt Pt|t−1 Zt> + Ht
and
array of [at Pt ] matrices. If one chooses l = 1 the value of the log likelihood
function (13.9) is calculated.
Once again, the T + 1 matrices are stacked “behind each other”, with the t = 0
matrix at the front and the t = T matrix at the end of the array. The first
entry is [µ Σ].
How can we provide initial values for the filtering procedure? If the state
matrices are non time-varying and the transition matrix T satisfies some sta-
bility condition, we should set the initial values to the unconditional mean and
variance of the state vector. Σ is given implicitly by
Here, vec denotes the vec-operator that places the columns of a matrix below
each other and ⊗ denotes the Kronecker product. Our model is time-invariant.
But does our transition matrix fulfill the stability condition? The necessary and
sufficient condition for stability is that the characteristic roots of the transition
matrix T should have modulus less than one (Harvey, 1989, p. 114). It is easy
to check that the characteristic roots λj of our transition matrix (13.4a) are
given as p
φ1 ± φ21 + 4φ2
λ1,2 = .
2
For example, if φ1 and φ2 are both positive, then φ1 + φ2 < 1 guarantees real
characteristic roots that are smaller than one (Baumol, 1959, p. 221). However,
when the AR(2) process of the common price component It has a unit root,
the stability conditions are not fulfilled. If we inspect Figure 13.1, a unit root
seems quite plausible. Thus we can not use this method to derive the initial
values.
If we have some preliminary estimates of µ, along with preliminary measures of
uncertainty—that is a estimate of Σ—we can use these preliminary estimates
as initial values. A standard way to derive such preliminary estimates is to
use OLS. If we have no information at all, we must take diffuse priors about
the initial conditions. A method adopted by Koopman, Shephard and Doornik
(1999) is setting µ = 0 and Σ = κI where κ is an large number. The large
variances on the diagonal of Σ reflect our uncertainty about the true µ.
We will use the second approach for providing some preliminary estimates as
initial values. Given the hedonic equation (13.1), we use OLS to estimate It ,
2
β, and σm by regressing log prices on lot size, floor space, age and quarterly
time dummies. The estimated coefficients of lot size, floor space and age are
13.5 Estimating and filtering in XploRe 297
Regression diagnostics
R2 0.9997 Number of observations 1502
2
R 0.9997 F-statistic 64021.67
σ̂ε2 0.4688 Prob(F-statistic) 0.0000
reported in Table 13.1. They are highly significant and reasonable in sign and
magnitude. Whereas lot size and floor space increase the price on average, age
has the opposite effect. According to (13.1), the common price component It
is a time-varying constant term and is therefore estimated by the coefficients
of the quarterly time dummies, denoted by {Iˆt }80 t=1 . As suggested by (13.2),
these estimates are regressed on their lagged values to obtain estimates of the
unknown parameters φ1 , φ2 , and σs2 . Table 13.2 presents the results for an
AR(2) for the Iˆt series. The residuals of this regression behave like white noise.
Regression diagnostics
R2 0.8780 Number of observations 78
2
R 0.8747 F-statistic 269.81
σ̂ν2 0.0063 Prob(F-statistic) 0.0000
{V,Vs} = gkalresiduals(Y,Ta,Ra,da,Za,Ha,gkalfilOut)
calculates innovations and standardized residuals
The output of the quantlet are two N × T matrices V and Vs. V contains the
innovations (13.10) and Vs contains the standardized residuals (13.11).
The Q-Q plot of the standardized residuals in Figure 13.2 shows deviations
from normality at both tails of the distribution.
-5
-5 0
X
Figure 13.2. Deviations of the dotted line from the straight line are
evidence for a nonnormal error distribution
XFGsssm5.xpl
This is evidence, that the true error distribution might be a unimodal dis-
tribution with heavier tails than the normal, such as the t-distribution. In
this case the projections calculated by the Kalman filter no longer provide the
conditional expectations of the state vector but rather its best linear predic-
tion. Moreover the estimates of ψ calculated from the likelihood (13.9) can be
interpreted as pseudo-likelihood estimates.
300 13 A simple state space model of house prices
gkalsmoothOut = gkalsmoother(Y,Ta,Ra,gkalfilOut)
provides Kalman smoothing of a time-varying SSF
The Kalman filter is a convenient tool for calculating the conditional expecta-
tions and covariances of our SSF (13.4). We have used the innovations of this
filtering technique and its covariance matrix for calculating the log likelihood.
However, for estimating the unknown state vectors, we should use in every step
the whole sample information up to period T . For this task, we use the Kalman
smoother.
The quantlet gkalsmoother needs as argument the output of gkalfilter. The
output of the smoother is an array with [at|T Pt|T ] matrices. This array of
dimension T + 1 starts with the t = 0 matrix and ends with the matrix for
t = T . For the smoother recursions, one needs at , Pt and Pt|t−1 for t = 1 . . . T .
Then the calculation procedure is as follows:
Start at t = T :
aT |T = aT
PT |T = PT
Step at t < T :
−1
Pt∗ >
= Pt Tt+1 Pt+1|t
at|T = at + Pt∗ (at+1|T − Tt+1 at )
Pt|T = Pt + Pt∗ (Pt+1|T − Pt+1|t )Pt∗>
The next program calculates the smoothed state vectors for our SSF form,
given the estimated parameters ψ̃. The smoothed series of the common price
component is given in Figure 13.3. The confidence intervals are calculated
using the variance of the first element of the state vector.
Comparison with the average prices given in Figure 13.1 reveals that the com-
mon price component is less volatile than the simple average. Furthermore,
a table for the estimated hedonic coefficients—that is β—is generated, Table
13.4.
Recall that these coefficients are just the last three entries in the state vector αt .
According to our state space model, the variances for these state variables are
13.5 Estimating and filtering in XploRe 301
0.80
0.60
0.40
0.20
0.00
-0.20
[1,] "==========================================="
[2,] " Estimated hedonic coefficients "
[3,] "==========================================="
[4,] " Variable coeff. t-Stat. p-value "
[5,] " ----------------------------------------- "
[6,] " log lot size 0.2664 21.59 0.0000 "
[7,] " log floor area 0.4690 34.33 0.0000 "
[8,] " age -0.0061 -29.43 0.0000 "
[9,] "==========================================="
zero. Thus, it is not surprising that the Kalman smoother produces constant
estimates through time for these coefficients. In the Appendix 13.6.2 we give
a formal proof of this intuitive result.
302 13 A simple state space model of house prices
The estimated coefficient of log lot size implies that, as expected, the size of the
lot has an positive influence on the price. The estimated relative price increase
for an one percent increase in the lot size is about 0.27%. The estimated effect
of an increase in the floor space is even larger. Here, a one percent increase in
the floor space lets the price soar by about 0.48%. Finally, note that the price
of a houses is estimated to decrease with age.
13.6 Appendix
We show that our treatment of missing values delivers the same results as the
procedure proposed by Shumway and Stoffer (1982; 2000). For this task, let us
assume that the (N × 1) vector of observations t
has missing values. Here, observations 2 and 4 are missing. Thus, we have only
Nt < N observations. For Kalman filtering in XploRe, all missing values in yt
and the corresponding rows and columns in the measurement matrices dt , Zt ,
and Ht , are deleted. Thus, the adjusted vector of observations is
yt,1 = y1,t y3,t y5,t . . . yN,t
where the subscript 1 indicates that this is the vector of observations used in the
XploRe routines. The procedure of Shumway and Stoffer instead rearranges the
vectors in such a way that the first Nt entries are the observations—and thus
given by yt,1 —and the last (N − Nt ) entries are the missing values. However,
all missing values must be replaced with zeros.
For our proof, we use the following generalized formulation of the measurement
equation m
yt,1 d Z ε
= t,1 + t,1 αt + t,1
yt,2 dt,2 Zt,2 εmt,2
and
εm
Ht,11 Ht,12
cov t,1 = .
εm
t,2 Ht,12 Ht,22
yt,1 contains the observations and yt,2 the missing values. The procedure of
Shumway and Stoffer employs the generalized formulation given above and sets
13.6 Appendix 303
yt,2 = 0, dt,2 = 0, Zt,2 = 0, and Ht,12 = 0 (Shumway and Stoffer, 2000, p. 330).
We should remark that the dimensions of these matrices also depend on t via
(N −Nt ). However, keep notation simple we do not make this time dependency
explicit. It is important to mention that matrices with subscript 1 and 11 are
equivalent to the adjusted matrices of XploRe’s filtering routines.
First, we show by induction that both procedures deliver the same results for
the Kalman filter. Once this equivalence is established, we can conclude that
the smoother also delivers identical results.
PROOF:
Given µ and Σ, the terms a1|0 and P1|0 are the same for both procedures. This
follows from the simple fact that the first two steps of the Kalman filter do not
depend on the vector of observations (see Subsection 13.5.3).
Now, given at|t−1 and Pt|t−1 , we have to show that also the filter recursions
at = at|t−1 + Pt|t−1 Zt> Ft−1 vt , Pt = Pt|t−1 − Pt|t−1 Zt> Ft−1 Zt Pt|t−1 (13.13)
deliver the same results. Using ss to label the results of the Shumway and
Stoffer procedure, we obtain by using
def Zt,1
Zt,ss =
0
that
>
Zt,1 Pt|t−1 Zt,1 0 Ht,11 0
Ft,ss = + .
0 0 0 Ht,22
The inverse is given by (Sydsæter, Strøm and Berck, 2000, 19.49)
−1
−1 F 0
Ft,ss = t,1 −1 (13.14)
0 Ht,22
where Ft,1 is just the covariance matrix of the innovations of XploRe’s proce-
dure. With (13.14) we obtain that
> −1
> −1
Zt,ss Ft,ss = Zt,1 Ft,1 0
We obtain immediately
> −1 > −1
Zt,ss Ft,ss vt,ss = Zt,1 Ft,1 vt,1 .
Plugging this expression into (13.13)—taking into account that at|t−1 and
Pt|t−1 are identical—delivers
with the k×k identity matrix I. Furthermore, we define with the same partition
def P̃ P̃12,t
>
P̃t = Tt+1 Pt Tt+1 = 11,t
P̃12,t P̃22,t
where the upper left part of Rt+1 contains the covariance matrix of the dis-
turbances for the stochastic state variables. We see immediately that only the
upper left part of Pt+1|T is different from P̃t .
13.6 Appendix 305
Our goal is to show that for the recursions of the smoother holds
∗ M11,t M12,t
Pt = , (13.16)
0 I
where both M s stand for some complicated matrices. With this result at hand,
we obtain immediately
akt|T = akt+1|T = akT (13.17)
for all t, where akt|T contains the last k elements of the smoothed state at|T .
Furthermore, it is possible to show with the same result that the lower right
partition of Pt|T is equal to the lower right partition of PT for all t. This lower
right partition is just the covariance matrix of akt|T . Just write the smoother
recursion
>
Pt|T = Pt (I − Tt+1 Pt∗> ) + Pt∗ Pt+1|T Pt∗> .
Then check with (13.15) and (13.16) that the lower-right partition of the first
matrix on the right hand side is a k×k matrix of zeros. The lower-right partition
of the second matrix is given by the the lower-right partition of Pt+1|T .
PROOF:
Now we derive (13.16): We assume that the inverse of Tt+1 and T11,t+1 exist.
The inverses for our model exist because we assume that φ2 6= 0. For the
partitioned transition matrix (Sydsæter, Strøm and Berck, 2000, 19.48) we
derive −1 −1
−1 T −T11,t+1 T12,t+1
Tt+1 = 11,t+1 . (13.18)
0 I
Now, it is easy to see that
−1 −1
Pt∗ = Tt+1 P̃t Pt+1|t . (13.19)
Bibliography
Bailey, M. J., Muth, R. F. and Nourse, H.O. (1963). A regression method for
real estate price index construction, Journal of the American Statistical
Association 58: 933–942.
14.1 Introduction
Long range dependence is widespread in nature and has been extensively doc-
umented in economics and finance, as well as in hydrology, meteorology, and
geophysics by authors such as Heyman, Tabatabai and Lakshman (1991), Hurst
(1951), Jones and Briffa (1992), Leland, Taqqu, Willinger and Wilson (1993)
and Peters (1994). It has a long history in economics and finance, and has
remained a topic of active research in the study of financial time series, Beran
(1994).
Historical records of financial data typically exhibit distinct nonperiodical cycli-
cal patterns that are indicative of the presence of significant power at low fre-
quencies (i.e. long range dependencies). However, the statistical investigations
that have been performed to test for the presence of long range dependence in
economic time series representing returns of common stocks have often become
sources of major controversies. Asset returns exhibiting long range dependen-
cies are inconsistent with the efficient market hypothesis, and cause havoc on
stochastic analysis techniques that have formed the basis of a broad part of
modern finance theory and its applications, Lo (1991). In this chapter, we
examine the methods used in Hurst analysis, present a process exhibiting long
memory features, and give market evidence by applying Hurst’s R/S analysis
and finally sketch a trading strategy for German voting and non–voting stocks.
310 14 Long Memory Effects Trading Strategy
X(t)-(t/n){X(an)-X((a-1)n)}
8
7.5
7
If the process Y is stationary then correction for scale is not strictly necessary,
and we may take each Sa to be the constant 1. In that case the R–S statistic
Ĥ is a version of the box-counting estimator that is widely used in physical
science applications, Carter, Cawley and Mauldin (1988), Sullivan and Hunt
(1988) and Hunt (1990). The box-counting estimator is related to the capacity
definition of fractal dimension, Barnsley (1988) p. 172ff, and the R–S estimator
may be interpreted in the same way. Statistical properties of the box-counting
estimator have been discussed by Hall and Wood (1993).
A more detailed analysis, exploiting dependence among the errors in the regres-
sion of log(R/S)n on log n, may be undertaken in place of R–S analysis. See
Kent and Wood (1997) for a version of this approach in the case where scale
correction is unnecessary. However, as Kent and Wood show, the advantages
of the approach tend to be asymptotic in character, and sample sizes may need
to be extremely large before real improvements are obtained.
Hurst used the coefficient H as an index for the persistence of the time series
considered. For 0.5 < H < 1, it is positively persistent and characterized
by ‘long memory’ effects, as described in the next section. A rather informal
interpretation of H used by practitioners is this: H may be interpreted as
the chance of movements with the same sign, Peters (1994). For H > 0.5,
it is more likely that an upward movement is followed by a movement of the
same (positive) sign, and a downward movement is more likely to be followed
312 14 Long Memory Effects Trading Strategy
That is, the autocorrelations decay to zero so slowly that their sum does not
converge, Beran (1994).
With respect to (14.1), note that the classical expression for the variance of the
def Pn
sample mean, X̄ = n−1 i=1 Xi , for independent and identically distributed
X1 , . . . , Xn ,
σ2
Var(X̄) = with σ 2 = Var(Xi ) (14.2)
n
is not valid anymore. If correlations are neither zero and nor so small to be
negligible, the variance of X̄ is equal to
n−1
!
σ2 X
k
Var(X̄) = 1+2 1− ρ(k) . (14.3)
n n
k=1
Thus, for long memory processes the variance of the sample mean converges to
zero at a slower rate than n−1 , Beran (1994). Note that long memory implies
positive long range correlations. It is essential to understand that long range
dependence is characterized by slowly decaying correlations, although nothing
is said about the size of a particular correlation at lag k. Due to the slow
decay it is sometimes
√ difficult to detect non zero but very small correlations by
looking at the ±2/ n–confidence band. Beran (1994) gives an example
√ where
the correct correlations are slowly decaying but within the ±2/ n–band. So
even if estimated correctly we would consider them as non significant.
Note that (14.1) holds in particular if the autocorrelation ρ(k) is approximately
c|k|−α with a constant c and a parameter α ∈ (0, 1). If we know the autocor-
14.3 Stationary Long Memory Processes 313
The structure of the autocorrelation then implies, that the spectral density is
approximately of the form cf |k|α−1 with a constant cf as λ → 0. Thus the
spectral density has a pole at 0.
To connect the long memory property with the Hurst coefficient, we introduce
self similar processes. A stochastic process Yt is called self similar with self
similarity parameter H, if for any positive stretching factor c, the rescaled
process c−H Yct has the same distribution as the original process Yt . If the
increments Xt = Yt − Yt−1 are stationary, there autocorrelation function is
given by
1
|k + 1|2H − 2|k|2H + |k − 1|2H ,
ρ(k) =
2
Beran (1994). From a Taylor expansion of ρ it follows
ρ(k)
→ 1 for k → ∞ .
H(2H − 1)k 2H−2
This means, that for H > 0.5, the autocorrelation function ρ(k) is approxi-
mately H(2H − 1)k −α with α = 2 − 2H ∈ (0, 1) and thus Xt has the long
memory property.
• BH (t) is Gaussian
• BH (0) = 0
• E {BH (t) − BH (s)} = 0
314 14 Long Memory Effects Trading Strategy
σ2
• Cov {BH (t), BH (s)} = 2 |t|2H − |t − s|2H + |s|2H
for any H ∈ (0, 1) and σ 2 a variance scaling parameter. Then BH (t) is called
fractional Brownian motion.
Essentially, this definition is the same as for standard Brownian motion besides
that the covariance structure is different. For H = 0.5, definition 14.1 contains
standard Brownian motion as a special case but in general (H 6= 0.5), incre-
ments BH (t) − BH (s) are not independent anymore. The stochastic process
resulting by computing first differences of FBM is called FGN with parameter
H. The covariance at lag k of FGN follows from definition 14.1:
For 0.5 < H < 1 the process has long range dependence, and for 0 < H < 0.5
the process has short range dependence.
Figures 14.2 and 14.3 show two simulated paths of N = 1000 observations of
FGN with parameter H = 0.8 and H = 0.2 using an algorithm proposed by
Davies and Harte (1987). For H = 0.2, the FBM path is much more jagged
and the range of the y–axis is about ten times smaller than for H = 0.8 which
is due to the reverting behavior of the time series.
The estimated autocorrelation√function (ACF) for the path simulated with
H = 0.8 along with the ±2/ N –confidence band is shown in Figure 14.4.
For comparison the ACF used to simulate the process given by (14.5) is su-
perimposed (dashed line). The slow decay of correlations can be seen clearly.
Applying R/S analysis we can retrieve the Hurst coefficient used to simulate
the process. Figure 14.5 displays the estimated regression line and the data
points used in the regression. We simulate the process with H = 0.8 and the
R/S statistic yields Ĥ = 0.83.
Finally, we mention that fractional Brownian motion is not the only stationary
process revealing properties of systems with long memory. Fractional ARIMA
processes are an alternative to FBM, Beran (1994). As well, there are non
stationary processes with infinite second moments that can be used to model
long range dependence, Samrodnitsky and Taqqu (1994).
14.4 Data Analysis 315
1
0.5
ACF
0 20 40 60 80
lag k
Figure 14.4. Estimated and true ACF of FGN simulated with H = 0.8,
N = 1000. XFGSimFBM.xpl
closing prices of stocks of WMF, Dyckerhoff, KSB and RWE from January 01,
1973, to December 12, 2000.
Figure 14.6 shows the performance of WMF stocks in our data period. The
plot indicates an intimate relationship of both assets. Since the performance
of both kinds of stocks are influenced by the same economic underlyings, their
relative value should be stable over time. If this holds, the log–difference Xt of
the pairs of voting (Stv ) and non–voting stocks (Stnv ),
def
Xt = log Stv − log Stnv (14.6)
14.4 Data Analysis 317
should exhibit a reverting behavior and therefore an R/S analysis should yield
estimates of the Hurst coefficient smaller than 0.5. In order to reduce the num-
ber of plots we show only the plot of WMF stocks. One may start the quantlet
XFGStocksPlots.xpl to see the time series for the other companies as well.
First, we perform R/S analysis on both individual stocks and the voting/non–
voting log–differences. In a second step, a trading strategy is applied to all four
voting/non–voting log–differences.
25
15
Table 14.1 gives the R/S statistic of each individual stock and of the log–
difference process of voting and non–voting stocks. While Ĥ is close to 0.5
for each time series taken separately, we find for the log differences a Hurst
coefficient indicating negative persistence, i.e. H < 0.5.
To test for the significance of the estimated Hurst coefficients we need to know
the finite sample distribution of the R/S statistic. Usually, if the probabilistic
behavior of a test statistic is unknown, it is approximated by its asymptotic
distribution when the number of observations is large. Unfortunately, as, for
example, Lo (1991) shows, such an asymptotic approximation is inaccurate in
the case of the R/S statistic. This problem may be solved by means of bootstrap
and simulation methods. A semiparametric bootstrap approach to hypothesis
testing for the Hurst coefficient has been introduced by Hall, Härdle, Kleinow
and Schmidt (2000), In the spirit of this chapter we use Brownian motion
(H = 0.5) to simulate under the null hypothesis. Under the null hypothesis
the log–difference process follows a standard Brownian motion and by Monte
Carlo simulation we compute 99%, 95% and 90% confidence intervals of the
R/S statistic. The results are given in Table 14.2. While the estimated Hurst
coefficients for each individual stock are at least contained in the 99% confidence
interval, we consider the R/S statistic for voting/non–voting log differences as
significant.
Table 14.2. Simulated confidence intervals for R/S statistic for Brown-
ian motion.
positive difference has a higher chance to appear in the future than a negative
one and vice versa, implying voting stocks probably to become relatively more
expensive than their non–voting counterparts. Thus, we go long the voting and
short the non–voting stock. In case of the inverse situation, we carry out the
inverse trade (short voting and long non–voting). When initiating a trade we
take a cash neutral position. That is, we go long one share of the voting and
sell short m shares of the non–voting stock to obtain a zero cash flow from this
action.
But how to know that a ‘turning point’ is reached? What is a signal for the
reverse? Naturally, one could think, the longer a negative difference persisted,
the more likely the difference is going to be positive. In our simulation, we cal-
culate the maximum and minimum difference of the preceding M trading days
(for example M = 50, 100, 150). If the current difference is more negative than
the minimum over the last M trading days, we proceed from the assumption
that a reverse is to come and that the difference is going to be positive, thereby
triggering a long voting and short non–voting position. A difference greater
than the M day maximum releases the opposite position.
When we take a new position, we compute the cash flow from closing the old
one. Finally, we calculate the total cash flow, i.e. we sum up all cash flows
without taking interests into account. To account for transaction costs, we
compute the total net cash flow. For each share bought or sold, we calculate a
hypothetical percentage, say 0.5%, of the share price and subtract the sum of
all costs incurred from the total cash flow. In order to compare the total net
cash flows of our four pairs of stocks which have different levels of stock prices,
we normalize them by taking WMF stocks as a numeraire.
In Table 14.3 we show the total net cash flows and in Table 14.4 the number
of trade reverses are given. It is clear that for increasing transaction costs the
performance deteriorates, a feature common for all 4 pairs of stocks. Moreover,
it is quite obvious that the number of trade reverses decreases with the number
of days used to compute the signal. An interesting point to note is that for
RWE, which is in the German DAX30, the total net cash flow is worse in all
situations. A possible explanation would be that since the Hurst coefficient
is the highest, the log–differences contain less ‘reversion’. Thus, the strategy
designed to exploit the reverting behavior should perform rather poorly. WMF
and KSB have a smaller Hurst coefficient than RWE and the strategy performs
320 14 Long Memory Effects Trading Strategy
better than for RWE. Furthermore, the payoff pattern is very similar in all
situations. Dyckerhoff with a Hurst coefficient of H = 0.37 exhibits a payoff
structure that rather resembles the one of WMF/KSB.
Regarding the interpretation of the trading strategy, one has to be aware that
neither the cash flows are adjusted for risk nor did we account for interest rate
effects although the analysis spread over a period of time of about 26 years.
Bibliography
Barnsley, M. (1988). Fractals everywhere., Boston, MA etc.: Academic Press,
Inc.
Beran, J. (1994). Statistics for Long Memory Processes, Chapman and Hall,
New York.
14.5 Trading the Negative Persistence 321
where Yt is real valued, Xt = (X1,t . . . Xp,t )> and θ = (θ1 . . . θp )> are Rp
valued
PT and εt is a standard normally distributed random variable. If the matrix
>
t=1 t Xt is nonsingular with inverse W , then the least squares estimator of
X
θ is:
XT
θb = W Xt Yt . (15.2)
t=1
Define wkk as the k-th element on the diagonal of W and let λ be a positive
scalar. For nonrandom regressors,the following exponential probability bound
is easy to prove:
√ λ2
P(|θbk − θk | > λσ wkk ) ≤ 2e− 2 , k = 1, . . . , p. (15.3)
2
Indeed, the estimation error θbk − θk is N(0, wkk σ 2 ) distributed, therefore:
!
λ(θbk − θk ) λ2
1 = E exp √ −
σ wkk 2
!
λ(θbk − θk ) λ2 √
≥ E exp √ − 1(θbk − θk > λσ wkk )
σ wkk 2
2
λ √
≥ exp P(θbk − θk > λσ wkk ).
2
The result in (15.3) follows from the symmetry of the normal distribution.
Equation (15.3) has been generalized by Liptser and Spokoiny (1999) to the
case of nonrandom regressors. More precisely, they allow the Xt to be only con-
ditionally independent of εt , and they include lagged values of Yt as regressors.
In this case the bound reads roughly as follows:
√ λ2
P(|θbk − θk | > λσ wkk ; W is nonsingular ) ≤ P(λ)e− 2 . (15.4)
θi,t 6
-
τ −m τ
time
Figure 15.1. Example of a locally homogeneous process.
The procedure that we describe does not require an explicit expression of the
law of the process θt , but it only assumes that θt is constant on some unknown
time interval I = [τ − m, τ ], τ − m > 0, τ, m ∈ N. This interval is referred
as an interval of time homogeneity and a model which is constant only on some
time interval is called locally time homogeneous.
Let us now define some notation. The expression θbτ will describe the (filtering)
estimator of the process (θt )t∈N at time τ ; that is to say, the estimator which
uses only observations up to time τ . For example if θ is constant, the recursive
estimator of the form:
τ
!−1 τ
X X
>
θτ =
b Xs X s Xs Ys ,
s=1 s=1
represents the best linear estimator for θ. But, if the coefficients are not con-
stant and follow a jump process, like in the picture above a recursive estimator
cannot provide good results. Ideally, only the observations in the interval
I = [τ − m, τ ] should be used for the estimation of θτ . Actually, an estima-
tor of θτ using the observation of a subinterval J ⊂ I would be less efficient,
while an estimator using the observation of a larger interval K ⊃ I would be
biased. The main objective is therefore to estimate the largest interval of time
homogeneity. We refer to this estimator as Ib = [τ − m,b τ ]. On this interval Ib
326 15 Locally time homogeneous time series modeling
occur with high probability for some sufficiently large constants λ and µ. The
adaptive estimation procedure therefore roughly corresponds to a family of
tests to check whether θbI does not differ significantly from θbJ . The latter is
done on the basis of the triangle inequality and of equation (15.4) which assigns
a large probability to the event
√ √
|θbi,I − θbi,J | ≤ µσ wii,I + λσ wii,J
two constants µ and λ, define the adaptive choice of the interval of homogeneity
by the following iterative procedure:
• Loop: If I is not rejected, then continue with the iteration step by choos-
ing a larger interval. Otherwise, set Ib = “the latest non rejected I”.
As for the variance estimation, note that the previously described procedure re-
quires the knowledge of the variance σ 2 of the errors. In practical applications,
σ 2 is typically unknown and has to be estimated from the data. The regression
representation (15.1) and local time homogeneity suggests to apply a residual-
based estimator. Given an interval I = [τ − m, τ ], we construct the parameter
estimate θbI . Next the pseudo-residuals εbt are defined as εbt = Yt −Xt> θbI . Finally
the variance estimator is defined by averaging the squared pseudo-residuals:
1 X 2
b2 =
σ εbt .
|I|
t∈I
The performance of the adaptive estimator is evaluated with data from the
following process:
The length of the sample is 300. The regressors X2 and X3 are two independent
random walks. The regressor coefficients are constant in the first half of the
328 15 Locally time homogeneous time series modeling
sample, then they make a jump after which they continue being constant until
the end of the sample. We simulate three models with jumps of different
magnitude. The values of the simulated models are presented in Table 15.1.
−2
The error term εt is a standard Gaussianp white noise, and σ = 10 . Note that
−2
the average value of σ|εt | equals 10 2/π ≈ 0.008, therefore the small jump
of magnitude 0.0005 is clearly not visible by eye. For each of the three models
above 100 realizations of the white noise εt are generated and the adaptive
estimation is performed.
In order to implement the procedure we need two parameters: µ and λ, and two
sets of intervals: I and J (I). As far as the latter are concerned the simplest
proposal is to use a regular grid G = {tk } with tk = m0 k for some integer
m0 and with τ = tk∗ belonging to the grid. We next consider the intervals
Ik = [tk , tk∗ [= [tk , τ [ for all tk < tk∗ = τ . Every interval Ik contains exactly
k ∗ − k smaller intervals J 0 = [tk0 , tk∗ [. So that for every interval Ik = [tk , tk∗ [
and k 0 : k < k 0 < k ∗ we define the set J (Ik ) of testing subintervals J 0 by
taking all smaller intervals with right end point tk∗ : J 0 = [tk0 , tk∗ [ and all
smaller intervals with left end point tk :J 0 = [tk , tk0 [:
The testing interval sets I and J (I) are therefore identified by the parameter
m0 : the grid step.
We are now left with the choice of three parameters: λ, µ and m0 . These
parameters act as the smoothing parameters in the classical nonparametric
estimation. The value of m0 determines the number of points at which the
time homogeneity is tested and it defines the minimal delay after which a jump
15.2 Estimating the coefficients of an exchange rate basket 329
50 100 150 200 250 300 50 100 150 200 250 300 50 100 150 200 250 300
ALPHA_1 ALPHA_1 ALPHA_1
50 100 150 200 250 300 50 100 150 200 250 300 50 100 150 200 250 300
ALPHA_2 ALPHA_2 ALPHA_2
50 100 150 200 250 300 50 100 150 200 250 300 50 100 150 200 250 300
ALPHA_3 ALPHA_3 ALPHA_3
In this context, equations (15.8) and (15.1) can be regarded as a state space
model, where equation (15.8) is the state equation (the signal) and equation
(15.1) is the measurement equation and it plays the role of a noisy observation
of θt . A Kalman filter algorithm can be used for the estimation, see Cooley
and Prescott (1973). The Kalman filter algorithm requires the initialization of
15.2 Estimating the coefficients of an exchange rate basket 331
two variables: θb0|0 and P0|0 = Cov(θb0|0 ) and its recursions read as follows, see
Chui and Chen (1998):
P0|0 = Cov(θb0|0 )
= Pt−1|t−1 + Σσ 2
P t|t−1
= Pt|t−1 Xt (Xt> Pt|t−1 Xt + σ 2 )−1
G
t
Pt|t
= (I − Gt Xt> )Pt|t−1
θb = θbt−1|t−1
t|t−1
θt|t
b = θbt|t−1 + Gt (Yt − Xt> θbt|t−1 ).
An exchange rate basket is a form of pegged exchange rate regime and it takes
place whenever the domestic currency can be expressed as a linear combination
of foreign currencies. A currency basket can be therefore expressed in the form
of equation (15.1), where: X1,t is set constantly equal to one and is taken as
numeraire, Yt represents the home currency exchange rate with respect to the
numeraire, and Xj,t is the amount of currency 1 per unit of currency j, i.e.
the cross currency exchange rate. The above relationship usually holds only on
the average, because the central bank cannot control the exchange rate exactly,
therefore the error term εt is added.
Because modern capital mobility enables the investors to exploit the interest
rate differentials which may arise between the domestic and the foreign cur-
rencies, a pegged exchange rate regime can become an incentive to speculation
and eventually lead to destabilization of the exchange rate, in spite of the fact
that its purpose is to reduce exchange rate fluctuations, see Eichengreen, Mas-
son, Savastano and Sharma (1999). Indeed, it appears that one of the causes
which have led to the Asian crisis of 1997 can be searched in short term capital
investments.
332 15 Locally time homogeneous time series modeling
From 1985 until its suspension on July 2, 1997 (following a speculative attack)
the Bath was pegged to a basket of currencies consisting of Thailand’s main
trading partners. In order to gain greater discretion in setting monetary pol-
icy, the Bank of Thailand neither disclosed the currencies in the basket nor the
weights. Unofficially, it was known that the currencies composing the basket
were: US Dollar, Japanese Yen and German Mark. The fact that the public
was not aware of the values of the basket weights, also enabled the monetary
authorities to secretly adjust their values in order to react to changes in eco-
nomic fundamentals and/or speculative pressures. Therefore one could express
the USD/THB exchange rate in the following way:
YU SD/T HB,t = θU SD,t + θDEM,t XU SD/DEM,t + θJP Y,t XU SD/JP Y,t + σεt .
This exchange rate policy had provided Thailand with a good stability of the
exchange rate as it can be seen in Figure 15.3. During the same period, though,
the interest rates had maintained constantly higher than the ones of the coun-
tries composing the basket, as it is shown in Figure 15.4.
This facts suggest the implementation of a speculative strategy, which con-
sists in borrowing from the countries with a lower interest rate and lending
to the ones with an higher interest rate. A formal description of the problem
can be made relying on a mean-variance hedging approach, see Musiela and
Rutkowski (1997). The optimal investment strategy ξ1∗ , . . . , ξp∗ is obtained by
the minimization of the quadratic cost function below:
2
p
X
E Yt+h − ξj Xj,t+h Ft .
j=1
DEM/USD
1.7
1.6
Y
1.5
1.4
JPY/USD
130
120
110
Y
100
90
80
THB/USD
26
25.5
Y
25
24.5
empirical analysis we find out that the relationship (15.9) is fulfilled during
the whole period under investigation for any of the four methods that we use
to estimate the basket weights. Therefore it is possible to construct a mean
334 15 Locally time homogeneous time series modeling
10
Y*E-2
5
0
Figure 15.4. Interest rates time series: German (thick dotted line),
Japanese (thin dotted line), American (thick straight line), Thai (thin
straight line). XFGbasket.xpl
• at time t
– borrow the portfolio (1+rj )−1 E(θj,t+h |Ft )Xj,t from the countries
P
composing the basket,
– lend (1 + r0 )−1 Yt to Thailand,
– invest the difference (1 + rj )−1 E(θj,t+h |Ft )Xj,t − (1 + r0 )−1 Yt in
P
the numeraire currency at the risk-free rate r1 ,
• at time t + h
– withdraw the amount Yt+h from Thailand,
15.2 Estimating the coefficients of an exchange rate basket 335
P
– pay back the loan of E(θj,t+h |Ft )Xj,t+h ,
– keep the difference.
The expression for the profit and for its expected value are:
p
X
Πt+h = Yt+h − E(θj,t+h |Ft )Xj,t+h
j=1
p
X
+(1 + r1 ) (1 + rj )−1 E(θj,t+h |Ft )Xj,t − (1 + r0 )−1 Yt
j=1
Xp
E(Πt+h |Ft ) = (1 + r1 ) (1 + rj )−1 E(θj,t+h |Ft )Xj,t − (1 + r0 )−1 Yt .
j=1
For the implementation of the investment strategy described above one needs
the estimate of the, possibly time-varying, basket weights. The precision of
the estimation has a direct impact on the economic result of the investment.
Therefore, we compare four different estimators of the basket weights: the
adaptive, the recursive, the window and the Kalman filter estimator using
economic criteria for a one month and for a three month investment horizon.
In particular we compute the average expected profit and the average realized
profit.
The adaptive estimation procedure requires three parameters: m, λ and µ. The
choice of m0 does not influence the results very much and it can be reasonably
set to 30. This value represents the minimal amount of data which are used
for the estimation, and in the case of a structural break, the minimal delay
before having the chance of detecting the change point. The selection of λ and
µ is more critical. These two values determine the sensitivity of the algorithm.
Small values would imply a fast reaction to changes in the regressor coefficients,
but but they would also lead to the selection of intervals of homogeneity which
are possibly too small. Large values would imply a slower reaction and con-
sequently the selection of intervals which can be too large. To overcome this
problem we suggest the following approach.
The main idea is that small changes in the values of λ and µ should not affect
the estimation results. Therefore we restrict our attention on a set S of possible
336 15 Locally time homogeneous time series modeling
USD weight
8
0.028+Y*E-2
6
4
2
DEM weight
6
4
Y*E-2
2
0
JPY weight
0.7
0.6
0.5
Y
0.4
0.3
0.2
pairs (λ, µ). In the present context we chose all the even number between 2
and 8:
S = {(λ, µ)| λ, µ ∈ {2, 4, 6, 8}}
15.2 Estimating the coefficients of an exchange rate basket 337
Then we compare the 16 pairs with the following criterion at each time t:
2
t−1
X d
X
(λ∗ , µ∗ ) = arg min Ys − θbj,s|s−h Xj,s .
(λ,µ)∈S s=t−200 j=1
Finally, we estimate the value of θbt+h|t with the selected pair (λ∗ , µ∗ ). The
appeal of the above selection criterion consists of the fact that it leads to the
choice of the pair (λ, µ) which has provided the least quadratic hedging costs
over the past trading periods. Notice that in general we have different results
depending on the length of the forecasting horizon: here one and three month.
Figure 15.5 shows the results for the three month horizon. It is interesting to
see that the adaptive estimate tends to coincide with the recursive estimate
during the first half of the sample, more or less, while during the second half
of the sample it tends to follow the rolling estimate.
We remark that the problem of selecting free parameters is not specific to the
adaptive estimator. The window estimator requires the choice of the length
of the window: k, while the Kalman filter needs the specification of the data
generating process of θt and the determination of Σ and σ. In this application
k is set equal to 250, Σ and σ are estimated recursively from the data using the
OLS, while θb0|0 and P0|0 are initialized using the first 350 observations which
are then discarded. We remark that this choice is consistent with the one of
Christoffersen and Giorgianni (2000).
Table 15.2 shows the result of the simulated investment. The investments are
normalized such that at each trading day we take a short position of 100 USD
in the optimal portfolio of the hard currencies. The result refers to the period
April 9 1993 to February 12 1997 for the one month horizon investment and
June 7 1993 to February 12 1997 for the three month horizon investment. Notice
first that the average realized profits are positive and, as far as the three month
investment horizon is concerned, they are significantly larger than zero among
all methods. This provides a clear evidence for the fact that arbitrage profits
were possible with in the framework of the Thai Bath basket for the period
under study. The comparison of the estimator also show the importance of
properly accounting for the time variability of the parameters. The recursive
estimator shows modest result as far as the realized profits are concerned and
the largest bias between expected the realized profit. On one side, the bias is
reduced by the window estimator and by the Kalman filter, but on the other
side these two methods provide a worse performance as far as the realized profit
are concerned. Finally, the adaptive estimator appears to be the best one, its
338 15 Locally time homogeneous time series modeling
bias is much smaller than the one of the recursive estimator and it delivers the
largest realized profits for both investment horizons.
Rt = ln St − ln St−1 .
returns
5
0
Y*E-2
-5
0 10 20 30 40 50
X
Figure 15.6. JPY/USD returns XFGretacf.xpl
The returns of financial time series are usually modeled by the following equa-
tion:
Rt = σ t εt
Where σt is a strictly positive process, which describes the dynamics of the vari-
ance of Rt , and ξt has a standard normal distribution: ξt ∼ N (0, 1). Standard
parametric models of the volatility are of (G)ARCH type:
σt2 = ω + αRt−1
2 2
+ βσt−1 ,
340 15 Locally time homogeneous time series modeling
like in Engle (1995) and Bollerslev (1995), and of stochastic volatility type:
ln σt2 = θ0 + θ1 ln σt−1
2
+ νt ,
as described by Harvey, Ruiz and Shephard (1995). These models have been
expanded in order to incorporate other characteristics of the financial return
time series: TARCH, EGARCH and QARCH explicitly assume an asymmet-
ric reaction of the volatility process to the sign of the observed returns, while
IGARCH and FIGARCH model the long memory structure of the autocorre-
lations of the square returns.
A common feature to all the models which have been cited in the previous
section is that they completely describe the volatility process by a finite set of
parameters. The availability of very large samples of financial data has given
the possibility of constructing models which display quite complicated param-
eterizations in order to explain all the observed stylized facts. Obviously those
models rely on the assumption that the parametric structure of the process
remains constant through the whole sample. This is a nontrivial and possi-
bly dangerous assumption in particular as far as forecasting is concerned as
pointed out in Clements and Hendry (1998). Furthermore checking for param-
eter instability becomes quite difficult if the model is nonlinear, and/or the
number of parameters is large. Whereby those characteristics of the returns
which are often explained by the long memory and (fractal) integrated nature
of the volatility process, could also depend on the parameters being time vary-
ing. We want to suggest an alternative approach which relies on a locally time
homogeneous parameterization, i.e. we assume that the volatility σ follows a
jump process and is constant over some unknown interval of time homogeneity.
The adaptive algorithm, which has been presented in the previous sections,
also applies in this case; its aim consists in the data-driven estimation of the
interval of time homogeneity, after which the estimate of the volatility can be
simply obtained by local averaging.
Rt = σt εt , (15.10)
1
Y
0.5
0
-1 0 1 2
X
Figure 15.7. Normal and power transformed densities for γ = 0.5.
XFGpowtrans.xpl
regression-like equation (15.11) with the constant trend θI = Cγ σIγ which can
be estimated by averaging over this interval I :
1 X
θbI = |Rt |γ . (15.12)
|I|
t∈I
By (15.11)
Cγ X γ Dγ X γ 1 X sγ X
θbI = σt + σt ζt = θt + θt ζt (15.13)
|I| |I| |I| |I|
t∈I t∈I t∈I t∈I
1 X
EθbI = E θt , (15.14)
|I|
t∈I
!2
s2γ X s2γ X 2
2
E θt ζt = E θt . (15.15)
|I| |I|2
t∈I t∈I
Define also
s2γ X 2
vI2 = θt .
|I|2
t∈I
15.3 Estimating the volatility of financial time series 343
In view of (15.15) this value is called the conditional variance of θbI . Under
local homogeneity it holds θt is constantly equal to θI for t ∈ I , and hence,
EθbI = θI
s2γ θI2
vI2 = Var θbI = .
|I|
A probability bound analogous to the one in Section 15.1 holds also in this
case. Let the volatility coefficient σt satisfy the condition b ≤ σt2 ≤ bB with
some constants b > 0, B > 1 . Then there exists aγ > 0 such that it holds for
every λ ≥ 0
√ λ2
P |θbI − θτ | > λvI ≤ 4 eλ(1 + log B) exp − . (15.16)
2aγ
The proof of the statement above and some related theoretical results can be
found in Mercurio and Spokoiny (2000).
For practical application one has to substitute the unknown conditional stan-
dard deviation with its estimate: vbI = sγ θbI |I|−1/2 . Under the assumption of
time homogeneity within an interval I = [τ − m, τ ] equation (15.16) allows
to bound |θbI − θbJ | by λb vI + µbvJ for any J ⊂ I, provided that λ and µ
are sufficiently large. Therefore we can apply the same algorithm described in
Section 15.1 in order to estimate the largest interval of time homogeneity and
the related value of θbτ . Here, as in the previous section, we are faced with
the choice of three tuning parameters: m0 , λ, and µ. Simulation studies and
repeated trying on real data by Mercurio and Spokoiny (2000) have shown that
the choice of m0 is not particularly critical and it can be selected between 10
and 50 without affecting the overall results of the procedure.
As described in Section 15.2.2, the choice of λ and µ is more delicate. The
influence of λ and µ is similar to the one of the smoothing parameters in
the nonparametric regression. The likelihood of rejecting a time homogeneous
interval decreases with increasing λ and/or µ. This is clear from equation
(15.6). Therefore if λ and µ are too large this would make the algorithm too
conservative, increasing the bias of the estimator, while too small values of
λ and µ would lead to a frequent rejection and to a high variability of the
estimate. Once again, a way of choosing the optimal values of λ and µ can be
made through the minimization of the squared forecast error. One has to define
a finite set S of the admissible pair of λ and µ. Then for each pair belonging
344 15 Locally time homogeneous time series modeling
(λ,µ)
to S one can compute the corresponding estimate: θbt and then select the
optimal pair and the corresponding estimate by the following criterion:
T 2
(λ,µ)
X
(λ,
b µb) = min |Rt |γ − θbt .
λ,µ∈S
t=0
Figure 15.8 shows the result of the on-line estimation of the locally time homo-
geneous volatility model for the JPY/USD exchange rate. The bottom plot, in
particular, shows the estimated length of the interval of time homogeneity: m,b
at each time point.
Let (Y1 , X1 ) . . . (Yτ , Xτ ) obey (15.1), where the regressors are possibly stochas-
tic, then it holds holds for the estimate θbI :
√
P |θbi,I − θi,τ | > λ wii,I ; Ai,I
p
≤ 4e ln(4B)(1 + 2ρ r(d − 1)λ)p−1 λ exp(−λ2 /2), i = 1, . . . , p.
A proof of this statement can be found in Liptser and Spokoiny (1999). For
a further generalization, where the hypothesis of local time homogeneity holds
only approximatively, see Härdle et al. (2000).
15.4 Technical appendix 345
YEN/DM returns
10
5
Y*E-2
0
-5
-10
0 5 10 15 20
X*E3
Volatility
10
0.0002+Y*E-4
8
6
4
2
0 5 10 15 20
X*E3
Interval of homogeneity
10
Y*E2
5
0 5 10 15 20
X*E3
Figure 15.8. From the top: returns, estimated locally time homoge-
neous volatility and estimated length of the interval of time homogene-
ity. XFGlochom.xpl
Bibliography
Bollerslev, T. (1995). Generalised autoregressive conditional heteroskedasticity,
in Engle (1995).
346 15 Locally time homogeneous time series modeling
The idea behind randomized algorithms is that a random sample from a pop-
ulation (of input variables) is representative for the whole population. As a
consequence, a randomized algorithm can be interpreted as a probability dis-
tribution on a set of deterministic algorithms.
We will see that there are three main advantages to randomized algorithms:
1. Performance: For many problems, it can be shown that randomized algo-
350 16 Simulation based Option Pricing
rithms run faster than the best known deterministic algorithm. 2. Simplicity:
Randomized algorithms are easier to describe and implement than comparable
deterministic algorithms. 3. Flexibility: Randomized algorithms can be easily
adapted.
In general one distinguishes two types of randomized algorithms. Las Vegas
algorithms are randomized algorithms that always give correct results with
only the variation from one run to another being its running time. Monte
Carlo algorithms are randomized algorithms that may produce an incorrect
solution for which one can bound the probability of occurrence. The quality of
the solution can be seen as a random variable.
Within this chapter, we focus on Monte Carlo algorithms calculating the value
of the following integral Z
f (x)dx (16.1)
[0,1]d
Z ∞
er(T −t) E [CT (ST )|St ] = CT (ST )g(ST |St , r, σ, T − t)dST (16.2)
0
Z
= CT {f (x, St , r, σ, T − t)}dx (16.3)
[0,1)
16.1 Simulation techniques for option pricing 351
Where
n 2
o
)(T −t)))2
exp − (log ST −(log S2σt −(r−0.5σ
2 (T −t)
g(ST |St , r, σ, T − t) = p
2
2πσ (T − t)ST
is the risk neutral density function of the Black Scholes model with parameters:
1 √
ST = f (x, St , r, σ, T − t) = St exp{(r − σ 2 )(T − t) + σ T − tF −1 (x)}
2
transforms the uniform distributed values x in g(ST |St , r, σ, T − t) distributed
underlying values ST . F −1 (x) is the inverse of the cumulative normal distri-
bution function and CT (y) is the payoff function of the option.
The Monte Carlo simulation calculates the value of the integral in the following
way:
Additionally we have to assume that the variance of the option payoffs CT (ST )
is given by:
Z
2
Var [CT (ST )] = CT (ST )2 g(ST |St , r, σ, T − t)dST − E [CT (ST )] (16.4)
[0,∞]
1
n Var [CT (ST )]
P |C̄ − er(T −t) Ct (St )| ≥ a ≤
a2
The bound given by this equation is rather imprecise since we do not make any
assumptions on the distribution of the random variable. Only the expected
value and the variance are used in the previous equation. According to the
central limit theorem the distribution of C̄ converges to a normal distribution
for n → ∞. It follows that the difference between the approximation and the
integral, C̄ − er(T −t) Ct (St ) is approximately normally distributed with mean 0
and standard deviation
r
Var [CT (ST )]
σC̄ = (16.7)
n
for large n. According to Boyle (1977) a value of n > 1000 is sufficiently large
in order to use the normal distribution for error estimation purposes.
We get the following equation if we assume that C̄ − er(T −t) Ct (St ) is normal
distributed:
Z
u2
1
P C̄ − er(T −t) Ct (St ) ≤ = √ exp − du (16.8)
2π − 2σC̄
16.1 Simulation techniques for option pricing 353
!
C̄ − er(T −t) Ct (St )
P C̄ − er(T −t) Ct (St ) ≤ kσC̄ = P ≤k
σC̄
Z k 2
1 u
= √ exp − du
2π −k 2
= p (16.9)
√
Given a fixed probability level p, the error converges to zero with O(1/ n).
The error interval holds for k = 1, 2, 3 with the respective probabilities p =
0.682, 0.955, 0.997
The confidence intervals for a given probability level depend on the standard
deviation of the payoff function CT (ST ):
p
σCT = Var [CT (ST )] . (16.10)
Figure 16.1 shows the evolution of the absolute error of the price for a European
call option calculated by Monte Carlo methods compared with √ the analytic
solution. One can observe that the error tends to zero with O (1/ n).
We would like to give some of the main properties of algorithms using Monte
Carlo techniques.
√ First from (16.9) it follows that the error bound tends to zero
with O (1/ n) for a fixed probability level p. Second,
√ the probability that a
fixed error bound holds converges to 1 with O (1/ n), Mavin H. Kalos (1986).
Since these results hold independent of the dimension of the problem, which
affects only the variance of the payoff function with respect to the Black-Scholes
risk neutral density, the Monte Carlo method is especially well suited for the
evaluation of option prices in multidimensional settings. Competing pricing
methods e.g finite differences have exponential growing computational costs in
354 16 Simulation based Option Pricing
Errors in MC Simulation
0 500000 1000000
number of iterations
Figure 16.1. Absolute error of a European Call option price calculated
by Monte Carlo simulations vs. n−1/2
the dimension of the problem. Another advantage of the Monte Carlo pricing
method is the error estimate given by the empirical standard deviation which
can be computed with a small additional effort.
The two most important drawbacks of the Monte Carlo simulation, mentioned
in literature are its small convergence speed compared to other techniques for
options on few underlyings and the difficulties occurring for options with early
exercise possibilities. For example, American options giving the investor the
possibility to exercise the option at any time before and at maturity, are difficult
to price. To evaluate an American option means to find an optimal exercise
strategy which leads - using only basic Monte Carlo techniques - to a recursive
algorithm with exponential time-complexity. But more advanced techniques
using importance sampling methods show that Monte Carlo simulations can
be applied to evaluate American contracts, Broadie (2000).
dependent options are options whose payoff depends on underlying price out-
comes St1 , . . ., Stm at several time points t1 ≤ . . . ≤ tm within the lifetime of
the respective option.
Within the group of path-dependent options one can distinguish options with
a payoff function depending on a continuously defined path variable and op-
tions with a payoff function depending on a fixed number of underlying values.
The price of an option with many - usually equally spaced - exercise dates is
often approximated by the price of an option with a continuously defined path
variable and vice versa.
Examples for path-dependent options are barrier options, lookback options,
and Asian options. The latter have a payoff function which is linked to the
average value of the underlying on a specific set of dates during the life of
the option. One distinguishes two basic forms of Asian options: options on
the geometric mean (for which the price can be calculated with standard tech-
niques) and options on the arithmetic mean (for which the price can not be
determined using standard approaches). Asian options are frequently used in
commodity markets. The volatility of the underlying prices of the commodities
is usually very high so that prices for vanilla options are more expensive than
for comparable Asian-style options.
In this section we show how to extend the Monte Carlo simulation technique
to higher dimensions. The problem is not only that one has to deal with higher
dimensional integrals, but also that one has to incorporate the underlying cor-
relation structure between the considered securities. In our framework we need
the covariance matrix of the log returns on an annual basis.
In general, a basket option is an option on several underlyings (a basket of
underlyings). Basket options can be European-, American or even Asian-style
options. Normally, the average of the underlying prices is taken to calculate
the price of the basket option, but sometimes other functions are used.
The advantage of the usage of basket options instead of a series of one dimen-
sional options is that the correlation between securities is taken into account.
This may lead to better portfolio hedges. We will look at a basket option on
five underlyings where the underlying price of the best security in the basket
is taken to calculate the option price.
356 16 Simulation based Option Pricing
Dn∗ (P ) := Dn (J ; P ) (16.13)
This means that the error is bounded from above by the product of the
variation V (f ), which in our case is model and payoff dependent and the star-
discrepancy of the sequence. The bound cannot be used for an automatic error
estimation since the variation and the star-discrepancy cannot be computed
easily. It has been shown though that sequences exist with a star-discrepancy
of the order O(n−1 (ln n)s ). All sequences with this asymptotic upper bound
are called low-discrepancy sequences Niederreiter (1992). One particular
low-discrepancy sequence is the Halton sequence.
Note that with the above equation nk,i is a function of i and takes values only
in {0; 1}. To illustrate the algorithm we calculate the first three points.
Therefore we get the sequence 1/2, 1/4, 3/4, 1/8, 5/8, .... The extension of this
construction scheme to higher dimensions is straightforward. For every dimen-
sion j = 1, . . . , d we define xji by
∞
X
xji = nk,i (j)p−k−1
j (16.17)
k=0
358 16 Simulation based Option Pricing
with pj is the jth smallest prime number and nk,i (j) is calculated as follows:
∞
X
i= nk,i (j)pkj ; 0 ≤ nk,i (j) < pj ; nk,i (j) ∈ N ∀j (16.18)
k=0
where seqnum is the number of the random generator according to Table 16.1,
d is the dimension of the random vector and n the number of vectors generated.
where seqnum is the number of the low discrepancy sequence according to Table
16.2.
16.2 Quasi Monte Carlo (QMC) techniques for option pricing 359
0 Halton sequence
1 Sobol sequence
2 Faure sequence
3 Niederreiter sequence
Table 16.2. Low-discrepancy sequences available in XploRe,
(Niederreiter, 1992) .
Figure 16.2 shows that two dimensional Halton points are much more equally
spaced than pseudo random points. This leads to a smaller error at least for
“smooth” functions.
dimension 2
0.5
0.5
0
0 0.5 1 0 0.5 1
dimension 1 dimension 1
Figure 16.2. 1000 two-dimensional pseudo random points
vs. 1000 Halton points XFGSOPRandomNumbers.xpl,
XFGSOPLowDiscrepancy.xpl
The positive effect of using more evenly spread points for the simulation task
is shown in Figure 16.3. The points of a low-discrepancy sequence are designed
in order to fill the space evenly without any restrictions on the independence
of sequence points where as the pseudo random points are designed to show no
statistically significant deviation from the independence assumption. Because
of the construction of the low discrepancy sequences one cannot calculate an
360 16 Simulation based Option Pricing
-5
log absolute error
-10
0 500000 1000000
number of iterations
Figure 16.3. Absolute error of a random sequence and the Halton
sequence for a put option
empirical standard deviation of the estimator like for Monte Carlo methods
and derive an error approximation for the estimation. One possible way out
of this dilemma is the randomization of the low-discrepancy sequences using
pseudo random numbers i.e. to shift the original quasi random numbers with
pseudo random numbers Tuffin (1996). If x1 , . . . , xn are scalar elements of a
low-discrepancy sequence X then we can define a new low discrepancy sequence
xi + xi + <= 1
W () = {y1 , . . . , yn } with yi = (16.19)
(xi + ) − 1 otherwise
1
dimension 50
dimension 50
0.5
0.5
0
0 0.5 1 0 0.5 1
dimension 49 dimension 49
Figure 16.4. The first 1000 and 10000 Halton points of dimension 49
and 50 XFGSOPLowDiscrepancy.xpl
However by using the Brownian Bridge path construction method we can limit
the effect of the high dimensional components on a simulated underlying path
and the corresponding path variable for the most common path dependent
options, Morokoff (1996). This method start with an empty path with known
start value and calculates at each step the underlying value for a time point
with maximum time distance to all other time points with known underlying
value until the whole path is computed. Experimental results show that we
can still get a faster convergence of the QMC simulation for options up to 50
time points if we apply this path construction method.
After defining the payoff function in XploRe we can start to calculate a price
estimate with the help of the appropriate simulation routine. In the one di-
mensional case we just have to call
16.3 Pricing options with simulation techniques - a guideline 363
to get a price estimate and for the Monte Carlo case an empirical standard
deviation with respect to a start price of s0, a continuous risk free interest
rate of r, a volatility vola, a time to maturity of dt years, the payoff function
opt, sample size itr and the random/low-discrepancy generator with number
gen. Table 16.1 shows the random number generators and table 16.2 the low-
discrepancy generators that can be used. An application of these routines for
a Put option can be found in XFGSOP1DPut.xpl.
Pricing path-dependent options is only slightly more complicated. Here we
have to define the vector of time points for which underlying prices have to
be generated. This vector replaces the time to maturity used to price path
independent options. Then we can apply one of the following methods to
compute a price estimate for the path dependent option
with respect to the start price s0, the continuous risk free interest rate r, the
volatility vola, the time scheme times, the payoff function opt, sample size
itr and the random/low-discrepancy generator with number gen, as given in
Tables 16.1 and 16.2. Using the above quantlets, we calculate the price of an
Asian call option in XFGSOP1DAsian.xpl.
364 16 Simulation based Option Pricing
with respect to the m dimensional start price vector s0, the continuous risk free
interest rate r, the m×m covariance matrix vola, the time to maturity dt, the
payoff function opt, the number of iterations itr and the generator number
gen according to the generators in Tables 16.1 and 16.2. Both quantlets are
illustrated in XFGSOPMD.xpl.
If in addition a dividend is paid during the time to maturity, we can use the
following two quantlets to calculate the option prices.
Monte Carlo based option pricing methods are not applicable for all types of
payoff functions. There is one theoretical, and some practical limitations for
the method. Let us look at the theoretical limitation first.
In the derivation of the probabilistic error bounds we have to assume the ex-
istence of the payoff variance with respect to the risk neutral distribution. It
follows that we are no longer able to derive the presented error bounds if this
variance does not exist. However for most payoff functions occurring in practice
and the Black Scholes model the difference between the payoff samples and the
price can be bounded from above by a polynomial function in the difference
between the underlying estimate and the start price for which the integral with
respect to the risk neutral density exists. Consequently the variance of these
payoff functions must be finite.
Much more important than the theoretical limitations are the practical limi-
tations. In the first place Monte Carlo simulation relies on the quality of the
pseudo random number generator used to generate the uniformly distributed
samples. All generators used are widely tested, but it can’t be guaranteed
that the samples generated for a specific price estimation exhibit all assumed
statistical properties. It is also important to know that all generators produce
the same samples in a fixed length cycle. For example if we use the random
number generator from Park and Miller with Bays-Durham shuffle, we will get
the same samples after ≈ 108 method invocations.
Another possible error source is the transformation function which converts the
uniformly distributed random numbers in normally distributed number. The
approximation to the inverse of the normal distribution used in our case has a
maximum absolute error of 10−15 which is sufficiently good.
The most problematic cases for Monte Carlo based option pricing are options
for which the probability of an occurrence of a strictly positive payoff is very
small. Then we will get either price and variance estimates based on a few
positive samples if we hit the payoff region or we get a zero payoff and variance
if this improbable event does not occur. However in both cases we will get a
very high relative error. More accurate results may be calculated by applying
importance sampling to these options.
366 16 Simulation based Option Pricing
Bibliography
Bauer, H. (1991). Wahrscheinlichkeitstheorie, W. de Gruyter.
Bosch, K. (1993). Elementare Einführung in die Wahrscheinlichkeitsrechnung,
Vieweg.
Boyle, P. P. (1977). Options: A monte carlo approach, Journal of Financial
Economics 4: 323–338.
Broadie, M., Glasserman, P., and Ha,Z. (2000 ) Pricing American Options by
Simulation Using a Stochastic Mesh with Optimized Weights, Probabilistic
Constrained Optimization: Methodology and Applications, S. Uryasev ed.,
Kluwer.
Marsaglia, George (1993 ) Monkey tests for random number generators, Com-
puters & Mathematics with Applications 9: 1-10.
Niederreiter, H. (1992). Random number generation and Quasi Monte Carlo
methods, 1 edn, Capital City Press, Monpellier Vermont.
Joy, C.,Boyle, P., and Tan, K. S. (1996). Quasi monte carlo methods in nu-
merical finance, Management Science 42(6): 926–936.
Tuffin, Bruno (1996). On the use of low discrepancy sequences in Monte Carlo
Methods, Technical Report IRISA - Institut de Recherche en Informatique
et Systemes Aleatoires 1060.
Morokoff, William J.,andCaflish, Russel E. (1996 ) Quasi-monte carlo simula-
tion of random walks in finance, In Monte Carlo and Quasi-Monte Carlo
methods, 340-352, University of Salzburg, Springer.
Malvin H. Kalos (1986) Monte Carlo Methods, Wiley
17 Nonparametric Estimators of
GARCH Processes
Jürgen Franke, Harriet Holzberger and Marlene Müller
Xk = ξk + ηk , k = 1, . . . , N.
We can therefore try to estimate px (x) by a common kernel estimate and ex-
tract an estimate for pξ (x) out of it. This kind of deconvolution operation is
preferably performed in the frequency domain, i.e. after applying a Fourier
transform. As the subsequent inverse Fourier transform includes already a
smoothing part we can start with the empirical distribution of X1 , . . . , XN in-
stead of a smoothed version of it. In detail, we calculate the Fourier transform
or characteristic function of the empirical law of X1 , . . . , XN , i.e. the sample
characteristic function
N
1 X iωXk
φbx (ω) = e .
N
k=1
Let Z ∞
φη (ω) = E(eiωηk ) = eiωu pη (u) du
−∞
The name of this estimate is explained by the fact that it may be written
equivalently as a kernel density estimate
N
1 X h x − Xk
pbh (x) = K
Nh h
k=1
Yk = m(ξk ) + Wk , Xk = ξk + ηk , k = 1, . . . , N,
for some constants c, β, γ > 0, β0 . The nonlinear ARMA process (17.3) has
to be stationary and strongly mixing with exponentially decaying mixing co-
efficients. Let p(x) denote the density of the stationary marginal density of
Xt .
The smoothing kernel K x in x-direction is a common kernel function with
compact support [−1, +1] satisfying 0 ≤ K x (u) ≤ K x (0) for all u. The kernel
K which is used in the deconvolution part has a Fourier transform φK (ω)
372 17 Nonparametric Estimators of GARCH Processes
which is symmetric around 0, has compact support [−1, +1] and satisfies some
smoothness conditions (Holzberger, 2001). We have chosen a kernel with the
following Fourier transform:
for some aN ∼ N 1/6 for N → ∞. Due to technical reasons we have to cut off
the density estimate in regions where it is still unreliable for given N . The
particular choice of denominator guarantees that Q b b,h (aN |x) = 1 in practice,
since Q(v|x) is a cumulative distribution function.
To estimate the unconditional density q(u) of f (Xt , et ) = Xt+1 − et+1 , we use
a standard deconvolution density estimate with smoothing parameter h∗ =
A∗ / log(N )
N
1 X u − Xt
qbh∗ (u) = K h∗ .
N h∗ t=1 h∗
RLet
v
pe (u|x) be the conditional density of et given Xt = x, and let Pe (v|x) =
−∞ e
p (u|x) du be the corresponding conditional distribution function. An es-
timate of it is given as
Z v Z aN
Pbe,h∗ (v|x) = qbh∗ (x − u) pe (u)du qbh∗ (x − u) pe (u) du
−aN −aN
library("times")
n=1000
x=genarma(0.7,0.7,normal(n))
XFGnpg01.xpl
The result is shown in Figure 17.1. The scatterplot in the right panel of Fig-
ure 17.1 defines the region where we can estimate the function f (x, v).
5
0
0
X(t)
X(t)
-5
-5
0 5 10 -5 0 5
t*E2 X(t+1)
proc(f)=f(x,e,c)
f=c[1]+c[2]*x+c[3]*e
endp
17.2 Nonparametric ARMA Estimates 375
proc(x,f)=myarma(n,c)
x=matrix(n+1)-1
f=x
e=normal(n+1)
t=1
while (t<n+1)
t=t+1
f[t]=f(x[t-1],e[t-1],c)
x[t]=f[t]+e[t]
endo
x=x[2:(n+1)]
f=f[2:(n+1)]
endp
n=1000
{x,f}=myarma(n,0|0.7|0.7)
h=0.4
library("smoother")
dh=dcdenest(x,h) // deconvolution estimate
fh=denest(f,3*h) // kernel estimate
XFGnpg02.xpl
Figure 17.2 shows both density estimates. Note that the smoothing parameter
(bandwidth h) is different for both estimates since different kernel functions
are used.
The function nparmaest computes the function f (x, v) for an ARMA process
according to the algorithm described above. Let us first consider an ARMA(1,1)
376 17 Nonparametric Estimators of GARCH Processes
Deconvolution Density
20
15
q(u)*E-2
10
5
0
-5 0 5
u
Figure 17.2. Deconvolution density estimate (solid) and kernel den-
sity estimate (dashed) of the known mean function of an ARMA(1,1)
process.
Hence, we use myarma with c=0.3|0.6|1.6 and call the estimation routine by
f=nparmaest(x)
XFGnpg03.xpl
The optional parameters N and R are set to 50 and 250, respectively. N con-
tains the grid sizes used for x and v. R is an additional grid size for internal
computations. The resulting function is therefore computed on a grid of size
N × N. For comparison, we also calculate the true function on the same grid.
Figure 17.3 shows the resulting graphs. The bandwidths h (corresponding to
h∗ ) for the one-dimensional deconvolution kernel estimator qb and g for the
17.2 Nonparametric ARMA Estimates 377
Linear ARMA(1,1)
10.5
0.7
4.7
-9.1
0.2
-4.3
4.7
0.2
-4.3
proc(f)=f(x,e,c)
378 17 Nonparametric Estimators of GARCH Processes
f=c[2]/(1+exp(-c[3]*e))+c[1]
endp
c=-2.8|8|6
XFGnpg04.xpl
The resulting graphs for this nonlinear function are shown in Figure 17.4. The
estimated surface varies obviously only in the second dimension and follows
the s-shaped underlying true function. However, the used sample size and
the internal grid sizes of the estimation procedure do only allow for a rather
imprecise reconstruction of the tails of the surface.
Nonlinear ARMA(1,1)
4.2
0.8
4.2
-2.5
0.2
-3.9
4.2
0.2
-3.9
εt = σt Zt , (17.5)
σt2 = g(ε2t−1 , σt−1
2
).
Here, g denotes a smooth unknown function and the innovations Zt are chosen
as in as in Section 17.2. This model covers the usual parametric GARCH(1,1)
process (17.1) but does not allow for representing a leverage effect like the
TGARCH(1,1) process. We show now how to transform (17.5) into an ARMA
model. First, we define
Xt = log(ε2t ), et = log(Zt2 ).
with
g1 (x, u) = g(ex , eu ), f (x, v) = log g1 (x, x − v).
Now, we can estimate the ARMA function f (x, v) from the logarithmic squared
data Xt = log(ε2t ) as in Section 17.3 using the nonparametric ARMA estimate
fbb,h,h∗ (x, v) of (17.5). Reverting the transformations, we get
gb1 (x, u) = exp{fbb,h,h∗ (x, x − u)}, gbb,h,h∗ (y, z) = gb1 (log y, log z)
proc(f)=gf(x,e,c)
f=c[1]+c[2]*x+c[3]*e
endp
proc(e,s2)=mygarch(n,c)
e=zeros(n+1)
f=e
s2=e
z=normal(n+1)
t=1
while (t<n+1)
t=t+1
s2[t]=gf(e[t-1]^2,s2[t-1]^2,c)
e[t]=sqrt(s2[t]).*z[t]
endo
e=e[2:(n+1)]
s2=s2[2:(n+1)]
endp
The function npgarchest computes the functions f (x, v) and g(y, z) for
a GARCH process using the techniques described above. Consider a
GARCH(1,1) with
g(y, z) = 0.01 + 0.6 y + 0.2 z.
Hence, we use
n=1000
c=0.01|0.6|0.2
{e,s2}=mygarch(n,c)
g=npgarchest(e)
XFGnpg05.xpl
Figure 17.5 shows the resulting graph for the estimator of f (x, v) together with
the true function (decreasing in v) and the data (Xt+1 versus Xt ). As in the
ARMA case, the estimated function shows the underlying structure only for a
part of the range of the true function.
Finally, we remark how the the general case of nonparametric GARCH models
could be estimated. Consider
εt = σt Zt (17.6)
σt2 2
= g(εt−1 , σt−1 )
where σt2 may depend asymmetrically on εt−1 . We write
g(x, z) = g + (x2 , z) 1(x ≥ 0) + g − (x2 , z) 1(x < 0).
Nonparametric GARCH(1,1)
8.1
2.3
-4.2
-3.5
-7.8
-11.4
-4.2
-7.8
-11.4
b + (v|x), Pb+ ∗ (v|x) are defined as in Section 17.2 with qb+ , pb+ replacing qbb,h
Q b,h e,h b,h b
and pbb , and, using both estimates of conditional distribution functions we get
+
an ARMA function estimate fbb,h,h ∗ (x, v). Reversing the transformation from
−
where, in the derivation of fbb,h,h ∗ , N+ and 1(εt ≥ 0) are replaced by N− and
Bibliography
Bollerslev, T.P. (1986). Generalized autoregressive conditional heteroscedas-
ticity, Journal of Econometrics 31: 307-327.
Bühlmann, P. and McNeil, A.J. (1999). Nonparametric GARCH-models,
Manuscript, ETH Zürich, http://www.math.ethz.ch/∼mcneil.
Carroll, R.J. and Hall, P. (1988). Optimal rates of convergence for deconvolut-
ing a density, J. Amer. Statist. Assoc. 83: 1184-1186.
18.1 Introduction
Modern risk management requires accurate, fast and flexible computing envi-
ronments. To meet this demand a vast number of software packages evolved
over the last decade, accompanying a huge variety of programming languages,
interfaces, configuration and output possibilities. One solution especially de-
signed for large scale explorative data analysis is XploRe, a procedural program-
ming environment, equipped with a modern client/server architecture (Härdle
et al. (1999) and Härdle et al. (2000)).
As far as flexibility in the sense of openness and accuracy is concerned XploRe
has a lot to offer a risk analyst may wish. On the contrary its matrix oriented
programming language (Fickel, 2001) might be seen as a drawback in respect
to other computational approaches. In terms of learning curve effects and total
cost of ownership an alternative solution seems desirable.
This chapter will present and demonstrate the net based spreadsheet solution
MD*ReX designed for modern statistical and econometric analysis. We concen-
trate on examples of Value-at-risk (VaR) with copulas and means of quantifying
implied volatilities presented in Chapter 2 and 6. All results will be shown in
Microsoft Excel.
Recent research work suggests that the rationale for spreadsheet based sta-
tistical computing is manifold. Ours is to bring state-of-the-art quantitative
methods to the fingertips of spreadsheet users. Throughout this chapter we
will give a short introduction into our underlying technology, briefly explain
386 18 Net Based Spreadsheets in Quantitative Finance
%Systemroot%\%Program Files%\MDTech\ReX\ReX.
In the latter case the client is available every time Excel is started. Anyway
the client can be accessed via the Excel menu bar (Figure 18.1) and exposes
its full functionality after clicking on the ReX menu item.
In order to work with MD*ReX the user first has to connect herself to a running
XploRe Quantlet Server. This can either be a local server, which by default is
triggered if MD*ReX is started via the Programs shortcut, or any other XQS
somewhere on the Internet. Evidently for the latter option a connection to the
Internet is required. The Connect dialogue offers some pre-configured XQS’.
After the connection has been successfully established the user can start right
away to work with MD*ReX.
In contrast to XploRe, the user has the option to perform statistical analysis by
using implemented dialogues e.g. the Time Series dialogue in Figure 18.3. Via
18.4 Using MD*ReX 389
this dialogue a researcher is able to conduct standard time series analysis tech-
niques as well as e.g. more refined nonlinear approaches like ARCH tests based
on neural networks. These interfaces encapsulate XploRe code while using the
standard Excel GUI elements hence undue learning overhead is minimized. Al-
ternatively one can directly write XploRe commands into the spreadsheet cells
and then let these run either via the menu button or with the context menu,
by right clicking the highlighted cell range (Figure 18.2). Furthermore it is
now much easier to get data to the XploRe Quantlet Server. Simply marking
an appropriate data range within Excel and clicking the Put button is enough
to transfer any kind of numerical data to the server. We will show this in the
next section. A further virtue of using a spreadsheet application is the com-
monly built-in database connectivity. Excel for example allows for various data
retrieval mechanisms via the Open Database Connectivity (ODBC) standard,
which is supported by most of the database systems available nowadays.
18.5 Applications
In the following paragraph we want to show how MD*ReX might be used in
order to analyze the VaR using copulas as described in Chapter 2 of this book.
Subsequently we will demonstrate the analysis of implied volatility shown in
Chapter 6. All examples are taken out of this book and have been accordingly
modified. The aim is to make the reader aware of the need of this modification
and give an idea how this client may be used for other fields of statistical
research as well.
to create custom dialogues and menus for this client. Thus no further knowledge
of the XploRe Quantlet syntax is required. An example is the aforementioned
Time Series dialogue, Figure 18.3.
The first step is rather trivial: copy and paste the example Quantlet
XFGrexcopula1.xpl from any text editor or browser into an Excel work-
sheet.
Next mark the range containing the Quantlet and apply the Run command.
Then switch to any empty cell of the worksheet and click Get to receive the
392 18 Net Based Spreadsheets in Quantitative Finance
Of course the steps 1-4 could easily be wrapped into a VBA macro with suitable
dialogues. This is exactly what we refer to as the change from the raw mode
of MD*ReX into the ”Windows” like embedded mode. Embedded here means
that XploRe commands (quantlets) are integrated into the macro language of
Excel.
The Monte Carlo simulations are obtained correspondingly and are depicted in
Figure 18.5. The according Quantlet is XFGrexmccopula.xpl. This Quant-
let again is functioning analogous to XFGaccvar2.xpl. The graphical out-
put then is constructed along same lines: paste the corresponding results z11
through z22 in cell areas and let Excel draw a scatter-plot.
18.5 Applications 393
be modified for the appropriate data set. In contrast to the above examples
where Quantlets could be adopted without any further modification, in this
case we need some redesign of the XploRe code. This is achieved with suitable
reshape operations of the output matrices. The graphical output is then ob-
tained by arranging the two output vectors x2 and y2 and the output matrix
z1.
The advantage of measuring implied volatilities is obviously an expressive vi-
sualization. Especially the well known volatility smile and the corresponding
time structure can be excellently illustrated in a movable cubic space. Further-
more this approach will enable real-time calculation of implied volatilities in
future applications. Excel can be used as a data retrieval front end for real-
time market data providers as Datastream or Bloomberg. It is imaginable then
to analyze tick-data which are fed online into such an spreadsheet system to
evaluate contemporaneous volatility surfaces.
18.5 Applications 395
Bibliography
Aydınlı, G., Härdle, W., Kleinow, T. and Sofyan, H.(2002). ReX: Link-
ing XploRe to Excel, forthcoming Computational Statistics Special Issue,
Springer Verlag, Heidelberg.
Bouyé, E., Durrleman, V., Nikeghbali, A., Riboulet, G. and Roncalli, T.(2000).
396 18 Net Based Spreadsheets in Quantitative Finance
pp. 119-134.
Fickel, N. (2001). Book Review: XploRe - Learning Guide, Allgemeines Statis-
tisches Archiv, Vol. 85/1, p. 93.
Härdle, W., Klinke, S. and Müller, M. (1999). XploRe Learning Guide, Springer
Verlag, Heidelberg.
Härdle, W., Hlavka, Z. and Klinke, S. (2000). XploRe Application Guide,
Springer Verlag, Heidelberg.
Kleinow, T. and Lehmann, H. (2002). Client/Server based Statistical Com-
puting, forthcoming in Computational Statistics Special Issue, Springer
Verlag, Heidelberg.
McCullough, B.D. and Wilson, B. (1999). On the Accuracy of Statistical Pro-
cedures in Microsoft Excel, Computational Statistics & Data Analysis,
Vol. 31, p. 27-37.
Neuwirth, E. and Baier, T. (2001). Embedding R in Standard Software, and the
other way round, DSC 2001 Proceedings of the 2nd International Work-
shop on Distributed Statistical Computing.
Przasnyski, L.L. and Seal, K.C. (1996). Spreadsheets and OR/MS models: an
end-user perspective, Interfaces, Vol. 26, pp. 92-104.
Rachev, S. (2001). Company Overview, Bravo Consulting,
http://www.bravo-group.com/inside/bravo-consulting
company-overview.pdf.
Ragsdale, C.T. and Plane, D.R. (2000). On modeling time series data using
spreadsheets, The International Journal of Management Science, Vol. 28,
pp. 215-221.
Sofyan, H. and Werwatz, A. (2001). Analysing XploRe Download Profiles with
Intelligent Miner, Computational Statistics, Vol. 16, pp. 465-479.
Index
USTF data, 55
Value at Risk,
→ XFGVAR
Value-at-Risk, 35
value-at-risk, 367
VaR, 367
VaR library, 41, 42
volatility, 323
volsurf01 data, 141
volsurf02 data, 141
volsurf03 data, 141
volsurfdata2 data, 130, 393