Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

The Gibbs Sampler

Suppose we want to generate data from a random vector X1 , . . , Xn with joint density f (x1 , . . , xn ) .
Unfortunately we know f only up to a constant, that is

f (x1 , . . , xn ) = c ⋅ u(x1 , . . , xn )

∫−∞
u(x1 , . . , xn ) = 1

Now the conditional distribution of

Xk |X1 = x1 , . , Xk−1 = xk−1 , , Xk+1 = xk+1 , . . , Xn = xn


is given by

so the density of the conditional distribution function does not depend on c.


The idea of the Gibbs sampler is to generate a sequence of simulated values of

fk (xk |x1 , . . , xk−1 , xk+1 , . . , xn )


with k going from 1 to n and then starting all over again.
Example Say we want to generate from

(X, Y) ∼ N(μ,
μ ΣΣ)
where μ = (μx , μy ) and

Recall

gibbsMN <- function(n=1e4, mu=c(0, 0), sigma=c(1, 1), rho=0){


x <- rep(0, n)
y <- rep(0, n)
for(i in 2:n){
x[i] <- rnorm(1, mu[1]+rho*sigma[1]/sigma[2]*(y[i-1]-mu[2]),
sigma[1]*sqrt(1-rho^2))
y[i] <- rnorm(1, mu[2]+rho*sigma[2]/sigma[1]*(x[i]-mu[1]),
sigma[2]*sqrt(1-rho^2))
}
cbind(x, y)[-c(1:1000), ]
}
xy <- gibbsMN()
plot(xy, pch=".")

round(c(apply(xy,2,mean), apply(xy,2,sd), cor(xy)[1, 2]), 3)

## x y x y
## -0.003 -0.007 1.003 1.002 0.011

xy <- gibbsMN(mu=c(1,2), sigma = c(1,3), rho = 0.9)


plot(xy, pch=".")

round(c(apply(xy,2,mean), apply(xy,2,sd), cor(xy)[1, 2]), 3)

## x y x y
## 1.024 2.058 0.996 2.987 0.899

It can be shown that the Gibbs sampler is actually a special case of the Metropolis-Hastings
algorithm:
Let X = (X1 , . . , Xn ) be a r.v. with probability mass function f (x)
x that need only be specified up to
a multiplicative constant, and suppose that we want to generate a r.v. whose distribution is that of
X . That is we want to generate a r.v. with density f (x)
x = c ⋅ u(x) x where u is known but c is not.
Using the Gibbs sampler assumes that for any i and values xj , j ≠ i , we can generate a r.v. X with
density

P(X = x) = P(Xi = x|Xj = xj , j ≠ i)


It operates the Metropolis-Hastings algorithm on a Markov chain with states x = (x1 , . . , xn ) and
with transition probabilities defined as follows:
Whenever the present state is x , a coordinate that is equally likely to be any of the 1, . . , n is
chosen. If coordinate i is chosen, then a r.v. X whose probability mass function is as above is
generated and if X=x the state

y = (x1 , . . , xi−1 , x, x i + 1, . . , xn )
is considered as the next candidate state. In other words, with x and y given the Gibbs sampler
uses the Metropolis-Hastings algorithm with

Because we want the limiting distribution to be p the vector y is then accepted as the new state
with probability

So in the Gibbs sampler the candidate state is always accepted as the next state!
Example Here is one of the standard models used in the actuarial sciences (Insurance) to model the
number of claims that might have to be paid on a certain type of policy:

The idea is this: there is a random number N of policies of the same type (car insurance, health ins,
etc.) Obviously N > 0 otherwise it’s to boring. Each insurance has a probability Y to be claimed, so
X is the number of policies that get claimed.
We want to generate data for X . In order to use the Gibbs sampler we need all the conditional
distributions. We already have

X|Y = y, N = n ∼ Bin(n, y)
It can be shown that

Y|X = x, N = n ∼ Beta(x + α, n − x + β)
M|X = x, Y = y ∼ Pois(λ(1 − y))
N = M+x
So the Gibbs sampler is as follows:

gibbsIns <- function(n=1e4, alpha=1, beta=100, lambda=1000){


X <- rep(0, n)
Y <- rep(0, n)
M <- rep(0, n)
N <- rep(0, n)
Y[1] <- alpha/(alpha+beta)
X[1]<-Y[1]*lambda
M[1] <- lambda
N[1] <- ifelse(M[1]>0,M[1],1)
for(i in 2:n){
X[i] <- rbinom(1, N[i-1], Y[i-1])
Y[i] <- rbeta(1, X[i]+alpha, N[i-1]-X[i]+beta)
M[i] <- rpois(1, lambda*(1-Y[i]))
N[i] <- M[i]+X[i]
}
X[-c(1:1000)]
}
hist(gibbsIns(), main="")

Example : One of the main uses of the Gibbs sampler is in Bayesian analysis. Say we have
X ∼ Bin(n, p) and p ∼ Beta(α, β) and we want a sample from the posterior distribution p|X .
Then the joint distribution of X and p is the beta-binomial distribution given by

To use the Gibbs sampler we need the conditional distributions of X|p and p|X :

X|p ∼ Bin(n, p)
p|x ∼ Beta(x + α, n − x + β)
so the Gibbs sampler is as follows:

gibbsBin <- function(x, n, B=1e4, alpha=1, beta=1, lambda=1){


p <- rep(0.5, B)
X <- rep(0, B)
for(i in 2:B) {
X[i] <- rbinom(1, n, p[i-1])
p[i] <- rbeta(1, x+alpha, n-x+beta)
}
p[-c(1:1000)]
}

As a specific example, say in a sample of 100 employees of a company we have 37 women and 63
men, and we want to find a 90% interval estimate for the percentage of female employees. We have
no prior knowledge of this company, so we will use U[0,1] (=beta(1,1)) as our prior.

p.x <- gibbsBin(x=37, n=100)


hist(p.x, 100, main="")

round(quantile(p.x, c(0.025, 0.975)), 3)

## 2.5% 97.5%
## 0.280 0.468

Notice that the interval is very similar to the standard frequentist solution (Clopper-Pearson 1934
intervals)

round(binom.test(x=37, n=100)$conf.int, 3)

## [1] 0.276 0.472


## attr(,"conf.level")
## [1] 0.95

Historically Bayesian analysis suffered from the problem that prior distributions had to be chosen so
it was possible to calculate the posterior distribution (one popular choice are so-called conjugate
priors, where the posterior distribution is the same as the prior one, except for the parameters) even
though those priors were not a good description of our “prior belief”. The Gibbs sampler allows us
to be much more free in our choice of prior.
Example
say we want to generate data from the random vector (X, Y, Z) with density
f (x, y, z) = K(x + y + z), 0 < x < y < z < 1 .
To use the Gibbs Sampler we need all the conditional distributions:
y

∫0
fY,Z (y, z) = K(x + y + z)dx =
1 2
K( x + xy + xz|0y =
2
1
K( y2 + y2 + yz) =
2
3 2
K( y + yz)
2
K(x + y + z)
fX|Y=y,Z=z (x|y, z) = 3 2
=
K( 2 y + yz)
x+y+z
3 2
2
y + yz
0<x<y
Notice that the constant K vanishes in the conditional distribution. This will always happen, so we
will ignore it from now on.
Next we find
z

∫x
fX,Z (x, z) = x + y + zdy =
1 2
xy + y + yz|zx =
2
1 2 2 2 1 2
xz + z + z − x − x − xz =
2 2
3 2 3 2
z − x
2 2
x+y+z
fY|X=x,Z=z (y|x, z) = 3
z2 − 3 x2
2 2
x<y<z
and finally
1

∫y
fX,Y (x, y) = x + y + zdz =

1 2 1
xz + yz + z |y =
2
1 2 1 2
x + y + − xy − y − y =
2 2
1 3
x + y + − xy − y2
2 2
x+y+z
fZ|X=x,Y=y (z|x, y) =
x + y + 12 − xy − 32 y2

Now that we have the marginals we need to be able to generate data from them. To do this notice
that all three are linear functions of the form

g(x) = b(a + x) for u < x < v


Now
x

∫u
G(x) = b(a + t)dt =
1 2x
b(at + t |u =
2
1 2 1 2
b(ax + x − au − u )
2 2
now we have G(v) = 1 and so
1
b=
a(v − u) + 12 (v2 − u2 )

and we can find the inverse of G with

G(x) = y
1 2 1 2
b(ax + x − au − u ) = y
2 2
1 2 1 2 y
x + ax − au − u − = 0
2 2 b
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
1 2 y‾
x1,2 = −a ± √a + 2(au + u + )
2
2 b

na now if U ∼ U[0, 1] we have G−1 (U) has this linear distribution.


Let’s do a quick check to see whether this works (and that it is the + in the quadratic formula!)

u <- 0.25
v <- 0.75
a <- 1
b <- 1/(a*(v-u)+(v^2-u^2)/2)
x <- (-a)+sqrt(a^2+2*(a*u+u^2/2+runif(1e4)/b))
hist(x, 50, freq=FALSE, main="")
curve(b*(a+x), u,v, add=TRUE, lwd=2)

and now we can implement the Gibbs sampler:

n <- 1e4
x <- rep(0, n)
y <- rep(1/3, n)
z <- rep(2/3, n)
for(i in 2:n) {
u <- 0
v <- y[i-1]
a <- y[i-1]+z[i-1]
b <- 1/(a*(v-u)+(v^2-u^2)/2)
x[i] <- (-a)+sqrt(a^2+2*(a*u+u^2/2+runif(1)/b))
u <- x[i]
v <- z[i-1]
a <- x[i]+z[i-1]
b <- 1/(a*(v-u)+(v^2-u^2)/2)
y[i] <- (-a)+sqrt(a^2+2*(a*u+u^2/2+runif(1)/b))
u <- y[i]
v <- 1
a <- x[i]+y[i]
b <- 1/(a*(v-u)+(v^2-u^2)/2)
z[i] <- (-a)+sqrt(a^2+2*(a*u+u^2/2+runif(1)/b))
}

Does this do the job? Let’s check the marginal of X :

3 2
f (x, z) = (z − x 2 )
2
0<x<z<1
1
3 2
∫x 2
f (x) = (z − x 2 )dz =
1 3 3 2 1
z − x z|x =
2 2
1 3 2
− x + x3
2 2
1
1 3 2
∫0 2
− x + x 3 dx =
2
1 1 3 1 41
x − x + x |0 =
2 2 4
1 1 1 1
− + =
2 2 4 4
hist(x, 50, freq=FALSE, main="")
curve(4*(x^3-3/2*x^2+1/2), 0, 1, lwd=2, add=TRUE)

Looks good!

You might also like