Statistics for Data Science - 2

Week 3 Notes
Expected value

• Expected value of a random variable

Definition: Suppose X is a discrete random variable with range TX and PMF fX . The
expected value of X, denoted E[X], is defined as
E[X] = tP (X = t)

assuming the above sum exists.

Expected value represents “center” of a random variable.

1. Consider a constant c as a random variable X with

P (X = c) = 1.
E[c] = c × 1 = c
2. If X takes only non-negative values, i.e. P (X ≥ 0) = 1. Then,

E[X] ≥ 0

• Expected value of a function of random variables

Suppose X1 . . . Xn have joint PMF fX1 ...Xn with range of Xi denoted as TXi . Let

g : TX1 × . . . × TXn → R

be a function, and let Y = g(X1 , . . . , Xn ) have range TY and PMF fY . Then,

E[g(X1 , . . . , Xn )] = tfY (t) = g(t1 , . . . , tn )fX1 ...Xn (t1 , . . . , tn )
t∈TY ti ∈TXi

• Linearity of Expected value:

1. E[cX] = cE[X] for a random variable X and a constant c.

2. E[X + Y ] = E[X] + E[Y ] for any two random variables X, Y .

• Zero mean Random variable:

A random variable X with E[X] = 0 is said to be a zero-mean random variable.

• Variance and Standard deviation:

Definition: The variance of a random variable X, denoted by Var(X), is defined as

Var(X) = E[(X − E[X])2 ]

Variance measures the spread about the expected value.
Variance of random variable X is also given by Var(X) = E[X 2 ] − E[X]2

The standard deviation of X, denoted by SD(X), is defined as

SD(X) = + Var(X)

Units of SD(X) are same as units of X.

• Properties: Scaling and translation

Let X be a random variable. Let a be a constant real number.

1. Var(aX) = a2 Var(X)
2. SD(aX) =| a | SD(X)
3. Var(X + a) = Var(X)
4. SD(X + a) = SD(X)

• Sum and product of independent random variables

1. For any two random variables X and Y (independent or dependent), E[X + Y ] =

E[X] + E[Y ].
2. If X and Y are independent random variables,
(a) E[XY ] = E[X]E[Y ]
(b) Var(X + Y ) = Var(X) + Var(Y )

• Standardised random variables:

1. Definition: A random variable X is said to be standardised if E[X] = 0, Var(X) =

X − E[X]
2. Let X be a random variable. Then, Y = is a standardised random

• Covariance:
Definition: Suppose X and Y are random variables on the same probability space. The
covariance of X and Y , denoted as Cov(X, Y ), is defined as

Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]

It summarizes the relationship between two random variables.


1. Cov(X, X) = Var(X)
2. Cov(X, Y ) = E[XY ] − E[X]E[Y ]

3. Covariance is symmetric if Cov(X, Y ) = Cov(Y, X)
4. Covariance is a “linear” quantity.
(a) Cov(X, aY + bZ) = aCov(X, Y ) + bCov(X, Z)
(b) Cov(aX + bY, Z) = aCov(X, Z) + bCov(Y, Z)
5. Independence: If X and Y are independent, then X and Y are uncorrelated, i.e.
Cov(X, Y ) = 0
6. If X and Y are uncorrelated, they may be dependent.
• Correlation coefficient:
Definition: The correlation coefficient or correlation of two random variables X and Y
, denoted by ρ(X, Y ), is defined as
Cov(X, Y )
ρ(X, Y ) =
1. −1 ≤ ρ(X, Y ) ≤ 1.
2. ρ(X, Y ) summarizes the trend between random variables.
3. ρ(X, Y ) is a dimensionless quantity.
4. If ρ(X, Y ) is close to zero, there is no clear linear trend between X and Y .
5. If ρ(X, Y ) = 1 or ρ(X, Y ) = −1, Y is a linear function of X.
6. If | ρ(X, Y ) | is close to one, X and Y are strongly correlated.
• Bounds on probabilities using mean and variance
1. Markov’s inequality: Let X be a discrete random variable taking non-negative
values with a finite mean µ. Then,
P (X ≥ c) ≤
Mean µ, through Markov’s inequality: bounds the probability that a non-negative
random variable takes values much larger than the mean.
2. Chebyshev’s inequality: Let X be a discrete random variable with a finite mean
µ and a finite variance σ 2 . Then,
P (| X − µ |≥ kσ) ≤
Other forms:
σ2 1
(a) P (| X − µ |≥ c) ≤ 2
, P ((X − µ)2 > k 2 σ 2 ) ≤ 2
c k
(b) P (µ − kσ < X < µ + kσ) ≥ 1 − 2
Mean µ and standard deviation σ, through Chebyshev’s inequality: bound the
probability that X is away from µ by kσ.

