Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Covariance and Correlation

Definition of covariance: Covariance of X


and Y is

Cov(X, Y ) = E[(X − EX)(Y − EY )].


We can also denote Cov(X, Y ) = σX,Y .

Two special cases: Cov(X, X) = V ar(X) and


Cov(X, c) = 0.

Definition of correlation: Correlation of X


and Y is
Cov(X, Y )
ρX,Y = √ .
V ar(X)V ar(Y )

By the definition we see ρ(X, X) = 1, and


ρ(X, −X) = −1.

We can show |ρ| ≤ 1.


9
A list of properties of Covariance

(1). Cov(X, Y ) = E(XY ) − EX · EY

(2). Cov(X, Y ) = Cov(Y, X)

(3). Cov(aX, bY ) = abCov(X, Y )

(4). Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z)

Remarks:

(a) We often use property (1) to compute


Cov(X, Y ).

(b) Property (2) says the covariance operation


is symmetric about X and Y .

(c) By (3) we know Cov(X, −Y ) = −Cov(X, Y ).

(d) By (4) and (2) we know


Cov(Z, X + Y ) = Cov(Z, X) + Cov(Z, Y )
10
Ex1

If X and Y are independent, then

Cov(X, Y ) = 0
and
ρX,Y = 0.

Proof: Cov(X, Y ) = E(XY ) − EX · EY . By


independence, we know E(XY ) = EX · EY , so
Cov(X, Y ) = 0.

By definition of correlation,
Cov(X, Y )
ρX,Y = √ = 0.
V ar(X)V ar(Y )

Therefore,

if Cov(X, Y ) ̸= 0, then X and Y are not inde-


pendent.
11
Ex2

Suppose X and Z are independent. V ar(X) =


1 and V ar(Z) = 0.01. Y = X + Z. Compute
the correlation between X and Y .

Solution: Use the definition.


Cov(X, Y )
ρX,Y = √ .
V ar(X)V ar(Y )

Cov(X, Y ) = Cov(X, X + Z)
= Cov(X, X) + Cov(X, Z) = 1 + 0 = 1.

V ar(Y ) = V ar(X + Z)
= V ar(X) + V ar(Z)
by independence assumption
= 1.01.
So
1
ρX,Y = √ = 0.995.
1.01
12
A useful variance formula

V ar(aX + bY )
= a2V ar(X) + 2abCov(X, Y ) + b2V ar(Y ).

Proof:
V ar(aX + bY )
= Cov(aX + bY, aX + bY )
= Cov(aX, aX + bY ) + Cov(bY, aX + bY )
= Cov(aX, aX) + Cov(aX, bY )
+Cov(bY, aX) + Cov(bY, bY )
= a2 Cov(X, X) + 2abCov(X, Y ) + b2 Cov(Y, Y )
= a2 V ar(X) + 2abCov(X, Y ) + b2 V ar(Y ).

A more general formula If X1, X2, . . . , Xn be


n random variables, a1, a2, . . . , an are constants.
Then

V ar(a1X1 + · · · + anXn)
n
∑ n

= a2
i V ar(Xi ) + 2 aiaj Cov(Xi, Xj ).
i=1 i<j
The proof is left as an optional Hw problem.
13
Ex3 If V ar(X) = 1, V ar(Y ) = 2 and Cov(X, Y ) =
−1. If U = 3X−2Y , V = X+2Y . Find V ar(U ),
V ar(V ) and Cov(U, V ).
V ar(U ) = V ar(3X − 2Y )
= 32V ar(X) + 2 · 3 · (−2)Cov(X, Y )
+(−2)2V ar(Y )
= 9 · 1 + 2 · 3 · (−2) · (−1) + 4 · 2 = 29.

V ar(V ) = V ar(X + 2Y )
= 12V ar(X) + 2 · 1 · 2Cov(X, Y )
+(2)2V ar(Y )
= 1 − 4 + 8 = 5.

Cov(U, V ) = Cov(3X − 2Y, X + 2Y )


= Cov(3X − 2Y, X) + Cov(3X − 2Y, 2Y )
= Cov(3X, X) + Cov(−2Y, X)
+Cov(3X, 2Y ) + Cov(−2Y, 2Y )
= 3Cov(X, X) + (−2)Cov(Y, X)
+3 · 2Cov(X, Y ) + (−2) · 2Cov(Y, Y )
= 3V ar(X) + 4Cov(X, Y ) − 4V ar(Y )
= 3 − 4 − 4(2) = −9.
14
Proof of |ρ| ≤ 1

We want to show for any X, Y , |ρ(X, Y )| ≤ 1.

Let Zt = X −tY where t is a real number. Then


we have
V ar(Zt) = V ar(X) − 2tCov(X, Y ) + t2V ar(Y )
Let g(t) = V ar(Zt) as a function of t. Consider
t = t0 = Cov(X,Y
V ar(Y )
)
. We find

[Cov(X, Y )]2
g(t0) = V ar(X) −
V ar(Y )
Since g(t) = V ar(Zt) ≥ 0 for all t, we must
have
[Cov(X, Y )]2
V ar(X) − ≥0
V ar(Y )
which means
[Cov(X, Y )]2
1≥ .
V ar(X)V ar(Y )
Thus by the definition of ρ(X, Y ) we see
1 ≥ ρ2(X, Y ).
15
Markov Inequality

Let X be a positive random variable and E[X] <


∞. Then for every positive real number a, we
have
E[X]
Pr(X > a) ≤ .
a

Proof: We note that

Y = X − aI(X > a) ≥ 0
Why? because if X ≤ a then Y = X − 0 = X >
0; and if X ≥ a, then Y = X − a ≥ 0. Since
Y is a non-negative random variable, by the
definition of expectation, its mean is greater
or equal to zero, so E[Y ] ≥ 0.

E[Y ] = E[X−aI(X > a)] = E[X]−aE[I(X > a)]


Thus we end up with

E[X] ≥ aE[I(X > a)] = a Pr(X > a).


We used the fact E[I(X > a)] = Pr(X > a).
16
Chebyshev Inequality

Let X be a random variable with mean µ and


variance σ 2. Then for any a > 0 we have
V ar(X)
Pr(|X − µ| > a) ≤ 2
.
a

Proof: We use Markov inequality. Observe the


following fact

Pr(|X − µ| > a) = Pr((X − µ)2 > a2)


By Markov inequality, we know
E[(X − µ) 2]
Pr((X − µ)2 > a2) ≤
a2
where we consider (X − µ)2 is the X in Markov
inequality. Note that

E[(X − µ)2] = V ar(X)


Hence we end up with
V ar(X)
Pr(|X − µ| > a) ≤ .
a2
17
Let a = kσ in Chebyshev inequality, we then
have
V ar(X) 1
Pr(|X − µ| > kσ) ≤ =
k2σ 2 k2
suppose k = 3, then
1
Pr(|X − µ| > 3σ) ≤ = 0.11.
9
Or we can conclude that
1
Pr(µ − 3σ < X < µ + 3σ) ≥ 1 − = 0.89.
9
With 89% chance the value of X will fall into
the interval (µ − 3σ, µ + 3σ).

18
An important application of Chebyshev
inequality

Let X1, . . . , Xn be iid random variables. We


assume they have a common mean µ and vari-
ance σ 2. Let
∑n
Xi
X̄ = i=1 .
n
Then for a > 0,
σ2
Pr(|X̄ − µ| > a) ≤ 2
.
na

Why? Use Chebyshev inequality on X̄. By the


iid assumption,
1 1
V ar(X̄) = V ar(X1) = σ 2
n n
Chebyshev inequality says
V ar(X̄) σ2
Pr(|X̄ − µ| > a) ≤ 2
= 2
.
a na

19
The Law of Large Numbers

Because for any a > 0 (no matter how small it


is),
σ2
Pr(|X̄ − µ| > a) ≤ 2
,
na
we can conclude

lim Pr(|X̄ − µ| > a) = 0.


n→∞
The above equation tells us the sample mean
converges to the true mean in probability. This
is called the weak law of large numbers.

There is also the strong law of large numbers.


The mathmatical expression is

Pr( lim X̄ = µ) = 1.
n→∞

20

You might also like