Math556 11 ModesOfConvergence
Math556 11 ModesOfConvergence
The following definitions are stated in terms of scalar random variables, but extend naturally to vector
random variables defined on the same probability space with measure P . For√example, some results
are stated in terms of the Euclidean distance in one dimension |Xn − X| = (Xn − X)2 , or for se-
quences of k-dimensional random variables Xn = (Xn1 , . . . , Xnk )⊤ ,
1/2
∑k
∥Xn − X∥ = (Xnj − Xj )2 .
j=1
and FX is the limiting distribution. Convergence of a sequence of mgfs or cfs also indicates conver-
gence in distribution, that is, if for all t at which MX (t) is defined, if as n −→ ∞, we have
d
MXi (t) −→ MX (t) ⇐⇒ Xn −→ X.
Interpretation: A special case of convergence in distribution occurs when the limiting distribution is
discrete, with the probability mass function only being non-zero at a single value, that is, if the limiting
random variable is X, then P [X = c] = 1 and zero otherwise. We say that the sequence of random
variables X1 , . . . , Xn converges in distribution to c if and only if, for all ϵ > 0,
This definition indicates that convergence in distribution to a constant c occurs if and only if the prob-
ability becomes increasingly concentrated around c as n −→ ∞.
1
be the cdf of a degenerate distribution with probability mass 1 at x = ϵ. Now consider a sequence {ϵn }
of real values converging to ϵ from below. Then, as ϵn < ϵ, we have
{
0 x < ϵn
Fϵn (x) =
1 x ≥ ϵn
which converges to Fϵ (x) at all real values of x. However, if instead {ϵn } converges to ϵ from above,
then Fϵn (ϵ) = 0 for each finite n, as ϵn > ϵ, so lim Fϵn (ϵ) = 0.
n−→∞
Hence, as n −→ ∞,
Fϵn (ϵ) −→ 0 ̸= 1 = Fϵ (ϵ).
Thus the limiting function in this case is
{
0 x≤ϵ
Fϵ (x) =
1 x>ϵ
which is not a cdf as it is not right-continuous. However, if {Xn } and X are random variables with
distributions {Fϵn } and Fϵ , then P [Xn = ϵn ] = 1 converges to P [X = ϵ] = 1, however we take the limit,
so Fϵ does describe the limiting distribution of the sequence {Fϵn }. Thus, because of right-continuity,
we ignore points of discontinuity in the limiting function.
Proof. Using the properties of expectation, it can be shown that Yn has expectation µ and variance
σ 2 /n, and hence by the Chebychev Inequality,
σ2
P [|Yn − µ| ≥ ϵ] ≤ −→ 0 as n −→ ∞
nϵ2
for all ϵ > 0. Hence
P [|Yn − µ| < ϵ] −→ 1 as n −→ ∞
p
and Yn −→ µ.
2
Definition : CONVERGENCE IN PROBABILITY TO A RANDOM VARIABLE
The sequence of random variables X1 , . . . , Xn converges in probability to random variable X,
p
denoted Xn −→ X, if, for all ϵ > 0,
lim P [|Xn − X| < ϵ] = 1 or equivalently lim P [|Xn − X| ≥ ϵ] = 0
n−→∞ n−→∞
Alternative characterization:
• Let ϵ > 0, and the sets An (ϵ) and Bm (ϵ) be defined for n, m ≥ 0 by
∞
∪
An (ϵ) ≡ {ω : |Xn (ω) − X(ω)| > ϵ} Bm (ϵ) ≡ An (ϵ).
n=m
a.s.
Then Xn −→ X if and only if P (Bm (ϵ)) −→ 0 as m −→ ∞.
Interpretation:
– The event An (ϵ) corresponds to the set of ω for which Xn (ω) is more than ϵ away from X.
– The event Bm (ϵ) corresponds to the set of ω for which Xn (ω) is more than ϵ away from X,
for at least one n ≥ m.
– The event Bm (ϵ) occurs if there exists an n ≥ m such that |Xn − X| > ϵ.
a.s.
– Xn −→ X if and only if and only if P (Bm (ϵ)) −→ 0.
a.s.
• Xn −→ X if and only if
P [ |Xn − X| > ϵ infinitely often ] = 0
a.s.
that is, Xn −→ X if and only if there are only finitely many Xn for which
|Xn (ω) − X(ω)| > ϵ
if ω lies in a set of probability greater than zero.
3
a.s.
• Note that Xn −→ X if and only if
( ∞
)
∪
lim P (Bm (ϵ)) = lim P An (ϵ) =0
m−→∞ m−→∞
n=m
p
in contrast with the definition of convergence in probability, where Xn −→ X if
Clearly
∞
∪
Am (ϵ) ⊆ An (ϵ)
n=m
Alternative terminology:
a.e.
• Xn −→ X almost everywhere, Xn −→ X
w.p.1
• Xn −→ X with probability 1, Xn −→ X
Interpretation: A random variable is a real-valued function from (a sigma-algebra defined on) sample
space Ω to R . The sequence of random variables X1 , . . . , Xn corresponds to a sequence of functions
defined on elements of Ω. Almost sure convergence requires that the sequence of real numbers Xn (ω)
converges to X(ω) (as a real sequence) for all ω ∈ Ω, as n −→ ∞, except perhaps when ω is in a set
having probability zero under the probability distribution of X.
For example, if [ ]
E
lim
n−→∞
(Xn − X)2 = 0
then we write
r=2
Xn −→ X.
In this case, we say that {Xn } converges to X in mean-square or in quadratic mean.
4
THEOREM
For r1 > r2 ≥ 1,
r=r r=r
Xn −→1 X =⇒ Xn −→2 X
(i) If
∞
∑
P [ |Xn − X| > ϵ ] < ∞
n=1
a.s.
for every ϵ > 0, then Xn −→ X.
(ii) If, for some positive integer r,
∞
∑
E[ |Xn − X|r ] < ∞
n=1
a.s.
then Xn −→ X.
THEOREM (Slutsky’s Theorem)
Suppose that
d p
Xn −→ X and Yn −→ c
Then
d
(i) Xn + Yn −→ X + c
d
(ii) Xn Yn −→ cX
d
(iii) Xn /Yn −→ X/c provided c ̸= 0.
5
5.5 The Central Limit Theorem
THEOREM (THE LINDEBERG-LÉVY CENTRAL LIMIT THEOREM)
Suppose X1 , . . . , Xn are i.i.d. random variables with mgf MX , with expectation µ and variance σ 2 ,
both finite. Let the random variable Zn be defined by
∑
n
Xi − nµ √
i=1 n(X n − µ)
Zn = √ =
nσ 2 σ
where
1∑
n
Xn = Xi ,
n
i=1
Proof. First, let Yi = (Xi − µ)/σ for i = 1, . . . , n. Then Y1 , . . . , Yn are i.i.d. with mgf MY say, and
EfY
[Yi ] = 0, VarY [Yi ] = 1 for each i. Using a Taylor series expansion, we have that for t in a neighbour-
hood of zero,
t2 3 2
MY (t) = 1 + tEY [Y ] + EY [Y 2 ] + t EY [Y 3 ] + . . . = 1 + t + O(t3 )
2! 3! 2
using the O(t3 ) notation to capture all terms involving t3 and higher powers. Re-writing Zn as
1 ∑
n
Zn = √ Yi
n
i=1
so that
d
Zn −→ Z ∼ N (0, σ 2 ).
and σ 2 is termed the asymptotic variance of Zn .
6
Notes :
(ii) The theorem holds for the i.i.d. case, but there are similar theorems for non identically dis-
tributed, and dependent random variables.
(iii) The theorem allows the construction of asymptotic normal approximations. For example, for
large but finite n, by using the properties of the Normal distribution,
X n ∼ AN (µ, σ 2 /n)
∑
n
Sn = Xi ∼ AN (nµ, nσ 2 ).
i=1
Xn ∼.
. N (µ, σ /n)
2
is sometimes used.
(iv) The multivariate version of this theorem can be stated as follows: Suppose X1 , . . . , Xn are i.i.d.
k-dimensional random variables with mgf MX , with
where
1∑
n
Xn = Xi .
n
i=1
Then
d
Zn −→ Z ∼ N (0, Σ)
as n −→ ∞.
7
Appendix (NOT EXAMINABLE)
Proof. Relating the modes of convergence.
a.s. p a.s.
(a) Xn −→ X =⇒ Xn −→ X. Suppose Xn −→ X, and let ϵ > 0. Then
and so
p
lim P [ |Xn − X| < ϵ ] = 1 ∴ Xn −→ X.
n−→∞
r p r
(b) Xn −→ X =⇒ Xn −→ X. Suppose Xn −→ X, and let ϵ > 0. Then, using an argument similar to
Chebychev’s Lemma,
n −→ ∞
p
P [|Xn − X| > ϵ] −→ 0 ∴ Xn −→ X.
p d p
(c) Xn −→ X =⇒ Xn −→ X. Suppose Xn −→ X, and let ϵ > 0. Denote, in the usual way,
FXn (x) = P [Xn ≤ x] = P [Xn ≤ x, X ≤ x+ϵ]+P [Xn ≤ x, X > x+ϵ] ≤ FX (x+ϵ)+P [|Xn −X| > ϵ]
FX (x−ϵ) = P [X ≤ x−ϵ] = P [X ≤ x−ϵ, Xn ≤ x]+P [X ≤ x−ϵ, Xn > x] ≤ FXn (x)+P [|Xn −X| > ϵ].
as A ⊆ B =⇒ P (A) ≤ P (B) yields
Thus
FX (x − ϵ) − P [ |Xn − X| > ϵ] ≤ FXn (x) ≤ FX (x + ϵ) + P [ |Xn − X| > ϵ]
and taking limits as n −→ ∞ (with care; we cannot yet write limn−→∞ FXn (x) as we do not know
p
that this limit exists) recalling that Xn −→ X,
FX (x) ≤ lim inf FXn (x) ≤ lim sup FXn (x) ≤ FX (x)
n−→∞ n−→∞
8
Proof. (Partial converses)
as, by elementary probability theory, P (A ∪ B) ≤ P (A) + P (B). But, as it is the tail sum of a
convergent series (by assumption), it follows that
∞
∑
lim P [ |Xm − X| > ϵ ] = 0.
n−→∞
m=n
Hence
lim P [ |Xn − X| > ϵ, for some m ≥ n ] = 0
n−→∞
a.s.
and Xn −→ X.
r p
(ii) Identical to part (i), and using part (b) of the previous theorem that Xn −→ X =⇒ Xn −→ X.