18.650 - Fundamentals of Statistics

18.
650 – Fundamentals of Statistics
7. Generalized linear models
1/32
Linear model
A linear model assumes

2
Y |X = x ⇠ N (µ(x), I),
And 1
>
IE(Y |X = x) = µ(x) = x ,
1
Throughout we drop the boldface notation for vectors 2/32
Components of a linear model
The two model components (that we are going to relax) are
1. Random component: the response variable Y is continuous

and Y |X = x is with mean µ(x).
2. Regression function: µ(x) = x > .
3/32
Kyphosis
The Kyphosis data consist of measurements on 81 children

following corrective spinal surgery. The binary response variable,
Y , indicates the presence or absence of a postoperative deforming.
The three covariates are:
I X (1) : Age of the child in month,
I X (2) : Number of the vertebrae involved in the operation, and
I X (3) : Start of the range of the vertebrae involved.
Write X = ( (1) (2)

,X ,X ,X )(3) > 2 IR 4
4/32
Kyphosis
I The response variable is binary so there is no choice:

Y |X = x is with expected value
µ(x) = IE[Y |X = x] 2
I We cannot write
>
µ(x) = x
because the right-hand side ranges through
I We need an invertible function f such that f (x> ) 2
5/32
Generalization
A generalized linear model (GLM) generalizes normal linear

regression models in the following directions.
1. Random component:
Y |X = x ⇠ some distribution
(e.g. Bernoulli, exponential, Poisson)
2. Regression function:
>
µ(x) = x
where g called link function and µ(x) = IE(Y |X = x) is the
6/32
Predator/Prey
Consider the following model for the number of preys Y that a
predator (Hawk) catches per day a predator given a number X of
preys (mice) in its hunting territory.
Random component: Y > 0 and the variance of capture rate is
known to be approximately equal to its expectation so we propose
the following model:
Y |X = x ⇠
Where µ(x) = IE[Y |X = x].

Regression function: We assume
mx
µ(x) = , for some unknown m, h > 0.
h+x
where:
I m is the max expected daily preys the predator can cope with
I h is the number of preys such that µ(h) =
7/32
The regression function m(x) for m = h = 10
8/32
Example 2: Prey Capture Rate
Obviously µ(x) is not linear but using reciprocal link: g(x) = ,

the right-hand side can be made linear in the parameters:
1 1
g(µ(x)) = = = 0+ 1 .
µ(x) x
9/32
Exponential Family
A family of distribution {IP✓ : ✓ 2 ⇥}, ⇥ ⇢ k

is said to be a
IR
q
k-parameter exponential family on IR , if there exist real valued
functions:
I ⌘1 , ⌘2 , · · · , ⌘k and B of ✓,
I T1 , T2 , · · · , Tk , and h of y 2 IRq such that the density

function (pmf or pdf) of IP✓ can be written as
hX
k i
f✓ (y) = exp ⌘i (✓)Ti (y) B(✓) h(y)
i=1
10/32
Normal distribution example
I Consider Y ⇠ N (µ, 2 ), ✓ = (µ, 2 ). The density is
⇣µ 1 µ 2 ⌘ 1
2
f✓ (y) = exp 2
y 2
y 2
p ,
2 2 2⇡
which forms a two-parameter exponential family with
µ 1 2
⌘1 = 2
, ⌘ 2 = 2
, T 1 (y) = y, T 2 (y) = y ,
2
µ 2 p
B(✓) = 2
+ log( 2⇡), h(y) = 1.
2
I When 2 is known, it becomes a one-parameter exponential
family on IR:
y2
µ µ 2 e 2 2
⌘= 2
, T (y) = y, B(✓) = 2
, h(y) = p .
2 2⇡
11/32
Examples of discrete distributions
The following distributions form discrete exponential families of

distributions with pmf
I Bernoulli(p): p (1y
p) 1 y
, y 2 {0, 1}
y
I Poisson( ): e , y = 0, 1, . . . .
y!
12/32
Examples of Continuous distributions
The following distributions form continuous exponential families
of distributions with pdf:
1 y
I Gamma(a, b): y a 1
e b;
(a)ba
I above: a: shape parameter, b: scale parameter
I reparametrize: µ = ab: mean parameter
✓ ◆a
1 a a 1
ay
y e µ.
(a) µ
↵
I Inverse Gamma(↵, ): y ↵ 1
e /y
.
(↵)
s
2 2 (y µ)2
I Inverse Gaussian(µ, 2 ): e 2µ2 y .
2⇡y 3
Others: Chi-square, Beta, Binomial, Negative binomial

distributions.
13/32
One-parameter canonical exponential family
I Canonical exponential family for k = 1, y 2 IR

⇣ y✓ b(✓) ⌘
f✓ (y) = exp + c(y, )
for some known functions b(·) and c(·, ·) .
I If is known, this is a one-parameter exponential family with

✓ being the canonical parameter .
I If is unknown, this may/may not be a two-parameter
exponential family.
I is called dispersion parameter.
I In this class, we always assume that is known.
14/32
Normal distribution example
I Consider the following Normal density function with known

variance 2 ,
1 (y µ)2
f✓ (y) = p e 2 2
2⇡
⇢ 1 2 ✓ ◆
yµ 2µ 1 y2 2
= exp 2 2
+ log(2⇡ ) ,
2
I Therefore ✓ = µ, = 2, b(✓) = ✓2
2 , and
1 y 2
c(y, ) = ( + log(2⇡ )).
2
15/32
Other distributions
Table 1: Exponential Family

Normal Poisson Bernoulli
Notation 2
N (µ, ) P(µ) B(p)
Range of y ( 1, 1) [0, 1) {0, 1}
2 1 1
✓2
b(✓) 2 e✓ log(1 + e✓ )
1 y2
c(y, ) 2 ( + log(2⇡ )) log y! 0
16/32
Likelihood
Let `(✓) = log f✓ (Y ) denote the log-likelihood function.

The mean IE(Y ) and the variance var(Y ) can be derived from the
following identities
I First identity
@`
IE( ) =
@✓
I Second identity
@2` @` 2
IE( 2 ) + IE( ) = 0.
@✓ @✓
17/32
Expected value
Note that
Y✓ b(✓)
`(✓) = + c(Y ; ),
Therefore
@`
=
@✓
It yields
@` IE(Y ) 0
b (✓)
0 = IE( ) = ,
@✓
which leads to
IE(Y ) =
18/32
Variance
On the other hand we have we have
@2` @` 2
+ ( ) =
@✓2 @✓
and from the previous result,
Y 0
b (✓) Y IE(Y )
=
Together, with the second identity, this yields

00
b (✓) var(Y )
0= + 2
,
which leads to
var(Y ) =
19/32
Example: Poisson distribution
Example: Consider a Poisson likelihood,

µy µ
f (y) = e = exp y log µ µ log(y!)
y!
Thus,
✓= b(✓) = = c(y, ) = log(y!),
So
✓ 00
µ=e , b(✓) = b (✓) =
20/32
Link function
I is the parameter of interest, and needs to appear somehow

in the likelihood function to use maximum likelihood.
I A link function g relates the linear predictor X > to the mean
parameter µ,
>
X = g(µ).
I g is required to be monotone increasing and di↵erentiable
1 >
µ=g (X ).
21/32
Examples of link functions
I For LM, g(·) = identity.

I Poisson data. Suppose Y |X ⇠ Poisson(µ(X)).
I µ(X) > 0;
I log(µ(X)) = X > ;
I In general, a link function for the count data should map
(0, +1) to IR.
I The log link is a natural one.
I Bernoulli/Binomial data.
I 0 < µ < 1;
I g should map (0, 1) to IR:
I 3 choices: ⇣ ⌘
µ(X)
1. logit: log 1 µ(X)
= X> ;
2. probit: 1
(µ(X)) = X > where (·) is the normal cdf;
I The logit link is the natural choice.
22/32
Examples of link functions for Bernoulli response
5
2
I in blue:
1
1 g1 (x) = f1 (x) =
x
log (logit link)
0
1 x
-1
I in red:
1 1
g2 (x) = f2 (x) = (x)
-2
(probit link)
-3
-4
-5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
23/32
Examples of link functions for Bernoulli response
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
ex
I in blue: f1 (x) =
1 + ex
I in red: f2 (x) = (x) (Gaussian CDF)
24/32
Canonical Link
I The function g that links the mean µ to the canonical

parameter ✓ is called Canonical Link:
g(µ) = ✓
I Since µ = b0 (✓), the canonical link is given by

0 1
g(µ) = (b ) (µ) .
I If > 0, the canonical link function is strictly increasing.

Why?
25/32
Example: the Bernoulli distribution
I We can check that

✓
b(✓) = log(1 + e )
I Hence we solve
0 exp(✓)
b (✓) = =µ , ✓=
1 + exp(✓)
I The canonical link for the Bernoulli distribution is the
26/32
Other examples
b(✓) g(µ)
Normal 2
✓ /2 µ
Poisson exp(✓) log µ
✓ µ
Bernoulli log(1 + e ) log 1 µ
1
Gamma log( ✓) µ
27/32
Model and notation
I Let (Xi , Yi ) 2 IRp ⇥ IR, i = 1, . . . , n be independent random

pairs such that the conditional distribution of Yi given
Xi = xi has density in the canonical exponential family:
yi ✓ i b(✓i ) o
f✓i (yi ) = exp + c(yi , ) .
I Y = (Y1 , . . . , Yn )> , X = (X1 , . . . , Xn )>

I Here the mean µi = IE[Yi |Xi ] is related to the canonical
parameter ✓i via
µi =
I and µi depends linearly on the covariates through a link
function g:
g(µi ) = .
28/32
Back to
I Given a link function g, note the following relationship

between and ✓:
0 1
✓i = (b ) (µi )
0 1 1 > >
= (b ) (g (Xi )) ⌘ h(Xi ),
where h is defined as
0 1 1 0 1
h = (b ) g = (g b ) .
I Remark: if g is the canonical link function, h is
29/32
Log-likelihood
I The log-likelihood is given by

X Yi ✓i b(✓i )
`n (Y, X, ) =
i
X Yi h(X > ) >
b(h(Xi ))
i
=
i
up to a constant term.
I Note that when we use the canonical link function, we obtain
the simpler expression
X Yi X > >
b(Xi )
i
`n (Y, X, ) =
i
30/32
Strict concavity
I The log-likelihood `(✓) is strictly concave using the

canonical function when > 0. Why?
I As a consequence the maximum likelihood estimator is
I On the other hand, if another parameterization is used, the

likelihood function may not be strictly concave leading to
several local maxima.
31/32
Concluding remarks
I Maximum likelihood for Bernoulli Y and the logit link is called
I In general, there is no closed form for the MLE and we have

to use
I The asymptotic normality of the MLE also applies to GLMs.
32/32

18.650 - Fundamentals of Statistics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

18.650 - Fundamentals of Statistics

Uploaded by

Copyright:

Available Formats

18.

650 – Fundamentals of Statistics

7. Generalized linear models

A linear model assumes

The two model components (that we are going to relax) are

1. Random component: the response variable Y is continuous

2. Regression function: µ(x) = x > .

The Kyphosis data consist of measurements on 81 children

I X (2) : Number of the vertebrae involved in the operation, and

I X (3) : Start of the range of the vertebrae involved.

Write X = ( (1) (2)

I The response variable is binary so there is no choice:

A generalized linear model (GLM) generalizes normal linear

(e.g. Bernoulli, exponential, Poisson)

where g called link function and µ(x) = IE(Y |X = x) is the

Where µ(x) = IE[Y |X = x].

Obviously µ(x) is not linear but using reciprocal link: g(x) = ,

A family of distribution {IP✓ : ✓ 2 ⇥}, ⇥ ⇢ k

I T1 , T2 , · · · , Tk , and h of y 2 IRq such that the density

The following distributions form discrete exponential families of

Others: Chi-square, Beta, Binomial, Negative binomial

I Canonical exponential family for k = 1, y 2 IR

for some known functions b(·) and c(·, ·) .

I If is known, this is a one-parameter exponential family with

I Consider the following Normal density function with known

Table 1: Exponential Family

Let `(✓) = log f✓ (Y ) denote the log-likelihood function.

Together, with the second identity, this yields

Example: Consider a Poisson likelihood,

✓= b(✓) = = c(y, ) = log(y!),

I is the parameter of interest, and needs to appear somehow

I For LM, g(·) = identity.

I The function g that links the mean µ to the canonical

I Since µ = b0 (✓), the canonical link is given by

I If > 0, the canonical link function is strictly increasing.

I We can check that

I Let (Xi , Yi ) 2 IRp ⇥ IR, i = 1, . . . , n be independent random

I Y = (Y1 , . . . , Yn )> , X = (X1 , . . . , Xn )>

I Given a link function g, note the following relationship

I Remark: if g is the canonical link function, h is

I The log-likelihood is given by

I The log-likelihood `(✓) is strictly concave using the

I On the other hand, if another parameterization is used, the

I Maximum likelihood for Bernoulli Y and the logit link is called

I In general, there is no closed form for the MLE and we have

You might also like