The Gibbs Sampler: Function
The Gibbs Sampler: Function
Suppose we want to generate data from a random vector X1 , . . , Xn with joint density f (x1 , . . , xn ) .
Unfortunately we know f only up to a constant, that is
f (x1 , . . , xn ) = c ⋅ u(x1 , . . , xn )
∞
∫−∞
u(x1 , . . , xn ) = 1
(X, Y) ∼ N(μ,
μ ΣΣ)
where μ = (μx , μy ) and
Recall
## x y x y
## -0.003 -0.007 1.003 1.002 0.011
## x y x y
## 1.024 2.058 0.996 2.987 0.899
It can be shown that the Gibbs sampler is actually a special case of the Metropolis-Hastings
algorithm:
Let X = (X1 , . . , Xn ) be a r.v. with probability mass function f (x)
x that need only be specified up to
a multiplicative constant, and suppose that we want to generate a r.v. whose distribution is that of
X . That is we want to generate a r.v. with density f (x)
x = c ⋅ u(x) x where u is known but c is not.
Using the Gibbs sampler assumes that for any i and values xj , j ≠ i , we can generate a r.v. X with
density
y = (x1 , . . , xi−1 , x, x i + 1, . . , xn )
is considered as the next candidate state. In other words, with x and y given the Gibbs sampler
uses the Metropolis-Hastings algorithm with
Because we want the limiting distribution to be p the vector y is then accepted as the new state
with probability
So in the Gibbs sampler the candidate state is always accepted as the next state!
Example Here is one of the standard models used in the actuarial sciences (Insurance) to model the
number of claims that might have to be paid on a certain type of policy:
The idea is this: there is a random number N of policies of the same type (car insurance, health ins,
etc.) Obviously N > 0 otherwise it’s to boring. Each insurance has a probability Y to be claimed, so
X is the number of policies that get claimed.
We want to generate data for X . In order to use the Gibbs sampler we need all the conditional
distributions. We already have
X|Y = y, N = n ∼ Bin(n, y)
It can be shown that
Y|X = x, N = n ∼ Beta(x + α, n − x + β)
M|X = x, Y = y ∼ Pois(λ(1 − y))
N = M+x
So the Gibbs sampler is as follows:
Example : One of the main uses of the Gibbs sampler is in Bayesian analysis. Say we have
X ∼ Bin(n, p) and p ∼ Beta(α, β) and we want a sample from the posterior distribution p|X .
Then the joint distribution of X and p is the beta-binomial distribution given by
To use the Gibbs sampler we need the conditional distributions of X|p and p|X :
X|p ∼ Bin(n, p)
p|x ∼ Beta(x + α, n − x + β)
so the Gibbs sampler is as follows:
As a specific example, say in a sample of 100 employees of a company we have 37 women and 63
men, and we want to find a 90% interval estimate for the percentage of female employees. We have
no prior knowledge of this company, so we will use U[0,1] (=beta(1,1)) as our prior.
## 2.5% 97.5%
## 0.280 0.468
Notice that the interval is very similar to the standard frequentist solution (Clopper-Pearson 1934
intervals)
round(binom.test(x=37, n=100)$conf.int, 3)
Historically Bayesian analysis suffered from the problem that prior distributions had to be chosen so
it was possible to calculate the posterior distribution (one popular choice are so-called conjugate
priors, where the posterior distribution is the same as the prior one, except for the parameters) even
though those priors were not a good description of our “prior belief”. The Gibbs sampler allows us
to be much more free in our choice of prior.
Example
say we want to generate data from the random vector (X, Y, Z) with density
f (x, y, z) = K(x + y + z), 0 < x < y < z < 1 .
To use the Gibbs Sampler we need all the conditional distributions:
y
∫0
fY,Z (y, z) = K(x + y + z)dx =
1 2
K( x + xy + xz|0y =
2
1
K( y2 + y2 + yz) =
2
3 2
K( y + yz)
2
K(x + y + z)
fX|Y=y,Z=z (x|y, z) = 3 2
=
K( 2 y + yz)
x+y+z
3 2
2
y + yz
0<x<y
Notice that the constant K vanishes in the conditional distribution. This will always happen, so we
will ignore it from now on.
Next we find
z
∫x
fX,Z (x, z) = x + y + zdy =
1 2
xy + y + yz|zx =
2
1 2 2 2 1 2
xz + z + z − x − x − xz =
2 2
3 2 3 2
z − x
2 2
x+y+z
fY|X=x,Z=z (y|x, z) = 3
z2 − 3 x2
2 2
x<y<z
and finally
1
∫y
fX,Y (x, y) = x + y + zdz =
1 2 1
xz + yz + z |y =
2
1 2 1 2
x + y + − xy − y − y =
2 2
1 3
x + y + − xy − y2
2 2
x+y+z
fZ|X=x,Y=y (z|x, y) =
x + y + 12 − xy − 32 y2
Now that we have the marginals we need to be able to generate data from them. To do this notice
that all three are linear functions of the form
∫u
G(x) = b(a + t)dt =
1 2x
b(at + t |u =
2
1 2 1 2
b(ax + x − au − u )
2 2
now we have G(v) = 1 and so
1
b=
a(v − u) + 12 (v2 − u2 )
G(x) = y
1 2 1 2
b(ax + x − au − u ) = y
2 2
1 2 1 2 y
x + ax − au − u − = 0
2 2 b
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
1 2 y‾
x1,2 = −a ± √a + 2(au + u + )
2
2 b
u <- 0.25
v <- 0.75
a <- 1
b <- 1/(a*(v-u)+(v^2-u^2)/2)
x <- (-a)+sqrt(a^2+2*(a*u+u^2/2+runif(1e4)/b))
hist(x, 50, freq=FALSE, main="")
curve(b*(a+x), u,v, add=TRUE, lwd=2)
n <- 1e4
x <- rep(0, n)
y <- rep(1/3, n)
z <- rep(2/3, n)
for(i in 2:n) {
u <- 0
v <- y[i-1]
a <- y[i-1]+z[i-1]
b <- 1/(a*(v-u)+(v^2-u^2)/2)
x[i] <- (-a)+sqrt(a^2+2*(a*u+u^2/2+runif(1)/b))
u <- x[i]
v <- z[i-1]
a <- x[i]+z[i-1]
b <- 1/(a*(v-u)+(v^2-u^2)/2)
y[i] <- (-a)+sqrt(a^2+2*(a*u+u^2/2+runif(1)/b))
u <- y[i]
v <- 1
a <- x[i]+y[i]
b <- 1/(a*(v-u)+(v^2-u^2)/2)
z[i] <- (-a)+sqrt(a^2+2*(a*u+u^2/2+runif(1)/b))
}
3 2
f (x, z) = (z − x 2 )
2
0<x<z<1
1
3 2
∫x 2
f (x) = (z − x 2 )dz =
1 3 3 2 1
z − x z|x =
2 2
1 3 2
− x + x3
2 2
1
1 3 2
∫0 2
− x + x 3 dx =
2
1 1 3 1 41
x − x + x |0 =
2 2 4
1 1 1 1
− + =
2 2 4 4
hist(x, 50, freq=FALSE, main="")
curve(4*(x^3-3/2*x^2+1/2), 0, 1, lwd=2, add=TRUE)
Looks good!