Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Generalized Linear Model

Badr Missaoui
Logistic Regression

Outline
I Generalized linear models
I Deviance
I Logistic regression.
Generalized Linear Model

I All models we have seen so far deal with continuous


outcome variables with no restriction on their expectations,
and (most) have assumed that mean and variance are
unrelated (i.e. variance is constant).
I Many outcomes of interest do not satisfy this.
I Examples : binary outcomes, Poisson count outcomes.
I A Generalized Linear Model (GLM) is a model with two
ingredients : a link function and a variance function.
I The link relates the means of the observations to
predictors : linearization
I The variance function relates the means to the variances.
Generalized Linear Model

I The data involve 462 males between the ages of 15 and


64. The outcome Y is the presence (Y = 1) or absence
Y = 0 of heart disease
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.9207616 1.3265724 -4.463 8.07e-06 ***
sbp 0.0076602 0.0058574 1.308 0.190942
tobacco 0.0777962 0.0266602 2.918 0.003522 **
ldl 0.1701708 0.0597998 2.846 0.004432 **
adiposity 0.0209609 0.0294496 0.712 0.476617
famhistPresent 0.9385467 0.2287202 4.103 4.07e-05 ***
typea 0.0376529 0.0124706 3.019 0.002533 **
obesity -0.0661926 0.0443180 -1.494 0.135285
alcohol 0.0004222 0.0045053 0.094 0.925346
age 0.0441808 0.0121784 3.628 0.000286 ***
Generalized Linear Model

Motivation
I Classical linear model

Y = Xβ + ε

where ε ∼ N(0, σ 2 ). That means,

Y ∼ N(X β, σ 2 )

I In the GLM, we specify that

Y ∼ P(X β)
Generalized Linear Model

We write the GLM as


E(Yi ) = µi
and
ηi = g(µi ) = Xi β
where the function g called a link function which belongs to an
exponential family.
Generalized Linear Model

I The exponential family density are specifying two


components, the canonical parameter θ and the dispersion
parameter φ .
I Let Y = (Yi )i=1...n be a sequence of random variables. Yi
has an exponential density if
 
yi θi − b(θi )
fYi (yi ; θi , φ) = exp + c(yi , φ)
ai (φ)

where the functions b, c are specific to each distribution


and ai (φ) = φ/wi .
Generalized Linear Model

Law Law  Pm µ σ2
B(m, p) py (1 − p)m−y . m k =0 δ{k } mp mp(1 − p)
Pm 1 y
P(µ) µy e−µ
n . δ
k! k
k =0 o µ µ
−µ) 2
N (µ, σ 2 ) exp − (y2σ 2 .dy µ σ2
√ n
λ(y −µ)2
o
IG(µ, λ) λ exp − 2µy . √dy µ µ3 /λ
2πy 3
Generalized Linear Model

I We write
`(y ; θ, φ) = log f (y ; θ, φ)
for the log-likelihood function of Y .
I Using the facts that
 
∂`
E = 0
∂θ
∂2`
   
∂`
Var = −E
∂θ ∂θ2
I We have
E(y ) = b0 (θ)
and
Var(y ) = b00 (θ)a(φ)
Generalized Linear Model
I Gaussian case
(y − µ)2
 
1
f (y ; θ, φ) = √ exp −
σ 2π 2σ 2
y µ − µ2 /2 1 y 2
  
2
= exp − + log(2πσ )
σ2 2 σ2
We can write θ = µ, φ = σ 2 , a(φ) 2
 = φ, b(θ) = θ /2 and
2
c(y , φ) = − 21 σy 2 + log(2πσ 2 )
I Binomial case
 
n y
f (y ; θ, φ) = µ (1 − µ)n−y
y
    
µ n
= exp y log + n log(1 − µ) + log
1−µ y
µ
We can write θ = log 1−µ , b(θ) = −n log(1 − µ) and
n

c(y , φ) = log y
Generalized Linear Model

Recall that in ordinary linear models, the MLE of β satisfies

β̂ = (X T X )−1 X T Y

if X has full rank.


In GLM, the MLE β̂ does not exist in closed form and can be
approximately estimated via iterative weighted least squares.
Generalized Linear Model
I For n observations, the log-likelihood function is
n
X
L(β) = `(yi ; θ, φ)
i=1
I Computing
∂`i ∂`i ∂θi ∂µi ∂ηi 1 1 y i − µi
= = xij 0 00
∂βj ∂θi ∂µi ∂ηi ∂βj g (µi ) b (θi ) φ/wi
I The likelihood equations are
n
∂Li X 1 ∂µi
= xij 0 2
(yi − µi ) = 0 j = 1, .., p
∂βj g (µi ) Var(yi ) ∂ηi
i=1
I Put n o
W = diag g 0 (µi )2 Var(yi )
i=1,...,n
and  
∂µ ∂µi
= diag
∂η ∂ηi i=1,...,n
Generalized Linear Model

I These likelihood equations are

∂µ
X T W −1 (y − µ) = 0
∂η
I These equations are non-linear in β and require an
iterative method (e.g Newton-Raphson).
I The Fisher’s Information matrix is

= = X T W −1 X

and in general term


n
∂ 2 L(β) ∂µi 2
   
X xij xjk
[=]jk = E =−
∂βj ∂βk Var(yi ) ∂ηi
i=1
Generalized Linear Model
Let µ̂0 = Y be the initial estimate. Then, set η̂ 0 = g(µ̂0 ),
and form the adjusted variable

∂η
Z 0 = η̂ 0 + (Y − µ̂0 ) | 0
∂µ µ=µ̂

Calculate β̂ 1 by the least squares regression of Z 0 on X ,


that means

β̂ 1 = argminβ (Z 0 − X β)T W0−1 (Z 0 − X β)

So,
β̂ 1 = (X T W0−1 X )−1 X T W0−1 Z 0
Set
η̂ 1 = X βˆ1 , µ̂1 = g −1 (η̂ 1 )
Repeat until changes in β̂ m are sufficiently small.
Generalized Linear Model
Estimation
I In theory, β̂ m → β̂ as m → ∞, but in practice, the algorithm
may fail to converge.
I Under some conditions,

β̂ → N(β, =−1 (β))

I In practice, the asymptotic covariance matrix of β̂ is


estimated by
φ(X T Wm−1 X )−1
where Wm is the weight matrix from the mth iteration.
I If φ is unknown, it is estimated by
n
1 X (yi − µ̂)2
φ̂ = wi
n−p V (µ̂)
i=1

where V (µ̂i ) = var(yi )/a(φ) = wi var(yi )/φ


Generalized Linear Model
I Confidence interval
 
1 1
CIα (βi ) = β̂j − u1−α/2 √ σ̂βj ; β̂j + u1−α/2 √ σ̂βj
n n
where u1−α/2 is the 1 − α/2 quantile of N(0, 1) and
h i−1
σ̂βj = n1 =(β̂) .
jj
I To test the hypothesis
H0 : βj = 0 against H1 : βj 6= 0

|β̂j |
q ∼ N(0, 1)
φ(X T Wm−1 X )−1 (j, j)
if φ is unknown
|β̂j |
q ∼ tn−p
φ̂(X T Wm−1 X )−1 (j, j)
Generalized Linear Model

Goodness-of-Fit

H0 : the true model is M versus H1 : the true is Msat

I The likelihood ratio test for this hypothesis is called the


deviance.
I For any submodel M,

dev (M) = 2(`ˆsat − `ˆM )

I Under H0 , dev (M) → χ2psat −p .


Generalized Linear Model

Goodness-of-Fit
I The scaled deviance for GLM is

D(y , µ̂) = 2 [`(µ̂sat , φ; y ) − `(µ̂, φ; y )]


Xn
2wi yi (θ(µ̂sat sat

= i ) − θ(µ̂i )) − b(µ̂i ) + b(µ̂i /φ
i=1
n
X
= D ? (yi ; µ̂i )/φ
i=1
?
= D (y ; µ̂)/φ
Generalized Linear Model

Tests
I We use the deviance to compare two models having p1
and p2 parameters respectively, where p1 < p2 . Let µ̂1 and
µ̂2 denote the corresponding MLEs.
I
D(y , µ̂1 ) − D(y , µ̂2 ) ∼ χ2p2 −p1
I If φ is unknown,

D ? (y , µ̂1 ) − D ? (y , µ̂2 )
∼ F1−α,p2 −p1 ,n−p2
(p2 − p1 )φ̂
Generalized Linear Model

Goodness-of-Fit
I The deviance residuals for a given model are
p
di = sign(yi − µ̂i ) D ? (yi ; µ̂i )

I A poorly fitting point will make a large contribution to the


deviance, so |di | will be large.
Generalized Linear Model

Diagnostics
I The Pearson residuals are defined by

yi − µ̂i
ri = p
(1 − hii )V (µ̂)

where hii is the ith diagonal element of

H = X (X T Wm−1 X )−1 X T Wm−1

I The deviance residuals are


s
D ? (yi ; µ̂i )
ε̂i = sign(yi − µ̂i )
1 − hii
Generalized Linear Model

Diagnostics
I The Anscombe residuals is defined as a transformation of
the Pearson residual
t(y ) − t(µ̂i )
riA = pi
t 0 (µ̂ i ) φV (µ̂i )(1 − hii )

The aim in introducing the function t is to make the


residuals as Gaussian as possible. We consider
Z x
t(x) = V (µ)−1/3 dµ
0
Generalized Linear Model

Diagnostics
I Influential points using the Cook’s distance

1 hii
Ci = (β̂ − β̂)T X T Wm X (β̂(i) − β̂) ≈ ri2
p (i) p(1 − hii )2
I The outliers points : if hii > 2p/n or hii > 3p/n, then we
consider that ith point is an outlier.
Generalized Linear Model

Model Selection
I Model selection can be done using the AIC and BIC.
I Forward, Backward and stepwise approach can be used.
Generalized Linear Model

Logistic regression
I Logistic regression is a generalization of regression that is
used when the outcome Y is binary 0, 1.
I As example, we assume that

eβ0 +β1 Xi
P(Yi = 1|Xi ) =
1 + eβ0 +β1 Xi
I Note that
E(Yi |Xi ) = P(Yi = 1|Xi )
Generalized Linear Model

Logistic regression
I Define the logit function
 
z
logit(z) = log
1−z
I We can write
logit(πi ) = β0 + β1 Xi
where πi = P(Yi = 1|Xi )
I The extension to several covariates is
p
X
logit(πi ) = β0 + βj xij
i=1
Generalized Linear Model

How do we estimate the parameters ?


I Can be fit using maximum likelihood.
I The likelihood function is
n n
πiyi (1 − πi )1−yi
Y Y
L(β) = f (yi |Xi ; β) = L(β) =
i=1 i=1

I The estimator β̂ has to be found numerically.


Generalized Linear Model

Usually, we use the reweighted least squares


I First set a starting values of β (0)
I Compute
(k )
eXi β
π̂i = β (k )
1 + eX i
I Define weighted matrix W whose i th diagonal is π̂i (1 − π̂i )
I Define the adjusted response vector
Z = X β (k ) + W −1 (Y − π̂)
I Take
β̂ (k +1) = (X T WX )−1 X T WZ
which is the weighted linear regression of Z on X
Generalized Linear Model

Model selection and diagnostics


I Diagnostics : the Pearson χ2

Y − π̂i
p i
π̂i (1 − π̂i )

I The deviance residuals


s     
Yi 1 − Yi
sign(Yi − π̂i ) 2 Yi log + (1 − Yi ) log
π̂i 1 − π̂i
Generalized Linear Model
I To fit this model, we use the glm command.
Call:
glm(formula = chd ~ ., family = binomial, data = SAheart)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.8320 -0.8250 -0.4354 0.8747 2.5503

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.9207616 1.3265724 -4.463 8.07e-06 ***
row.names -0.0008844 0.0008950 -0.988 0.323042
sbp 0.0076602 0.0058574 1.308 0.190942
tobacco 0.0777962 0.0266602 2.918 0.003522 **
ldl 0.1701708 0.0597998 2.846 0.004432 **
adiposity 0.0209609 0.0294496 0.712 0.476617
famhistPresent 0.9385467 0.2287202 4.103 4.07e-05 ***
typea 0.0376529 0.0124706 3.019 0.002533 **
obesity -0.0661926 0.0443180 -1.494 0.135285
alcohol 0.0004222 0.0045053 0.094 0.925346
age 0.0441808 0.0121784 3.628 0.000286 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 596.11 on 461 degrees of freedom


Residual deviance: 471.16 on 451 degrees of freedom
AIC: 493.16

Number of Fisher Scoring iterations: 5


Generalized Linear Model
I To fit this model, we use the glm command.
Start: AIC=493.16
chd ~ row.names + sbp + tobacco + ldl + adiposity + famhist +
typea + obesity + alcohol + age

Df Deviance AIC
- alcohol 1 471.17 491.17
- adiposity 1 471.67 491.67
- row.names 1 472.14 492.14
- sbp 1 472.88 492.88
<none> 471.16 493.16
- obesity 1 473.47 493.47
- ldl 1 479.65 499.65
- tobacco 1 480.27 500.27
- typea 1 480.75 500.75
- age 1 484.76 504.76
- famhist 1 488.29 508.29

etc...

Step: AIC=487.69
chd ~ tobacco + ldl + famhist + typea + age

Df Deviance AIC
<none> 475.69 487.69
- ldl 1 484.71 494.71
- typea 1 485.44 495.44
- tobacco 1 486.03 496.03
- famhist 1 492.09 502.09
- age 1 502.38 512.38
Generalized Linear Model

I Suppose Yi ∼ Binomial(ni , πi )
I We can fit the logistic model as before

logit(πi ) = Xi β

I Pearson residuals
Yi − ni π̂i
ri = p
ni π̂i (1 − π̂i )
I Deviation residuals
s     
Yi ni − Yi
di = sign(Yi −Ŷi ) 2 Yi log + (ni − Yi ) log
µ̂i ni − µ̂i
Generalized Linear Model

Goodness-of-Fit test
I The Pearson test X
χ2 = ri2
i
I and deviance X
D= di2
i

I both have a χ2n−p distribution if the model is correct.


Generalized Linear Model

I To fit this model, we use the glm command.


Call:
glm(formula = cbind(y, n - y) ~ x, family = binomial)

Deviance Residuals:
Min 1Q Median 3Q Max
-0.70832 -0.29814 0.02996 0.64070 0.91132

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -14.73119 1.83018 -8.049 8.35e-16 ***
x 0.24785 0.03031 8.178 2.89e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 137.7204 on 7 degrees of freedom


Residual deviance: 2.6558 on 6 degrees of freedom
AIC: 28.233

Number of Fisher Scoring iterations: 4


Generalized Linear Model

To test the correctness of the model


> pvalue = 1-pchisq(out$dev,out$df.residual)
> print(pvalue)
[1] 0.8506433
> r=resid(out,type="deviance")
> p=out$linear.predictors
> plot(p,r,pch=19,xlab="linear predictor", ylab="deviance residuals")
> print(sum(r^2))
[1] 2.655771
> cooks.distance(out)
1 2 3 4 5
0.0004817501 0.3596628502 0.0248918197 0.1034462077 0.0242941942
6 7 8
0.0688081629 0.0014847981 0.0309767612

Note that the residuals give back the deviance test, and the
p-value is large indicating no evidence of a lack of fit.

You might also like