Professional Documents
Culture Documents
Matthews K.R.-Elementary Linear Algebra - Lecture Notes (1991) PDF
Matthews K.R.-Elementary Linear Algebra - Lecture Notes (1991) PDF
MP274 1991
K. R. MATTHEWS
LaTeXed by Chris Fama
DEPARTMENT OF MATHEMATICS
UNIVERSITY OF QUEENSLAND
1991
3 Invariant subspaces 53
3.1 T –cyclic subspaces . . . . . . . . . . . . . . . . . . . . . . . . 54
3.1.1 A nice proof of the Cayley-Hamilton theorem . . . . . 57
3.2 An Algorithm for Finding mT . . . . . . . . . . . . . . . . . . 58
3.3 Primary Decomposition Theorem . . . . . . . . . . . . . . . . 61
i
4.10.2 Determining the real Jordan form . . . . . . . . . . . 95
4.10.3 A real algorithm for finding the real Jordan form . . . 100
ii
1 Linear Transformations
We will study mainly finite-dimensional vector spaces over an arbitrary field
F —i.e. vector spaces with a basis. (Recall that the dimension of a vector
space V (dim V ) is the number of elements in a basis of V .)
DEFINITION 1.1
(Linear transformation)
Given vector spaces U and V , T : U 7→ V is a linear transformation (LT)
if
T (λu + µv) = λT (u) + µT (v)
for all λ, µ ∈ F , and u, v ∈ U . Then T (u+v) = T (u)+T (v), T (λu) = λT (u)
and
n n
!
X X
T λk uk = λk T (uk ).
k=1 k=1
EXAMPLES 1.1
Consider the linear transformation
T = TA : Vn (F ) 7→ Vm (F )
xn
F —sometimes written F n .
Note that if T : Vn (F ) 7→ Vm (F ) is a linear transformation, then T = TA ,
where A = [T (E1 )| · · · |T (En )] and
1 0
0 ..
E1 = . , . . . , E n = .
.
. 0
0 1
Note:
x1
v ∈ Vn (F ), v = ... = x1 E1 + · · · + xn En
xn
1
If V is a vector space of all infinitely differentiable functions on R, then
T (f ) = a0 Dn f + a1 Dn−1 f + · · · + an−1 Df + an f
DEFINITIONS 1.1
(Kernel of a linear transformation)
Ker T = {u ∈ U | T (u) = 0}
(Image of T )
EXAMPLE 1.1
2
Generally, if U = hu1 , . . . , un i, then Im T = hT (u1 ), . . . , T (un )i.
Note: Even if u1 , . . . , un form a basis for U , T (u1 ), . . . , T (un ) may not
form a basis for Im T . I.e. it may happen that T (u1 ), . . . , T (un ) are linearly
dependent.
PROOF.
1. Ker T = {0}.
Then nullity T = 0.
We first show that the vectors T (u1 ), . . . , T (un ), where u1 , . . . , un are
a basis for U , are LI (linearly independent):
Suppose x1 T (u1 ) + · · · + xn T (un ) = 0 where x1 , . . . , xn ∈ F .
Then
2. Ker T = U .
So nullity T = dim U .
Hence Im T = {0} ⇒ rank T = 0
3
U (refer to last year’s notes to show that this can be done).
Then T (ur+1 ), . . . , T (un ) span Im T . For
So assume
rank T + nullity T = (n − r) + r
= n
= dim U.
DEFINITION 1.2
If U and V are any two vector spaces, then the direct sum is
U ⊕ V = {(u, v) | u ∈ U, v ∈ V }
(i.e. the cartesian product of U and V ) made into a vector space by the
component-wise definitions:
4
1. (u1 , v1 ) + (u2 , v2 ) = (u1 + u2 , v1 + v2 ),
2. λ(u, v) = (λu, λv), and
3. (0, 0) is an identity for U ⊕ V and (−u, −v) is an additive inverse for
(u, v).
We need the following result:
THEOREM 1.3
We assert that (u1 , 0), . . . , (um , 0), (0, v1 ), . . . , (0, vn ) form a basis for U ⊕ V .
Firstly, spanning:
Let (u, v) ∈ U ⊕ V , say u = x1 u1 + · · · + xm um and v = y1 v1 + · · · + yn vn .
Then
(u, v) = (u, 0) + (0, v)
= (x1 u1 + · · · + xm um , 0) + (0, y1 v1 + · · · + yn vn )
= x1 (u1 , 0) + · · · + xm (um , 0) + y1 (0, v1 ) + · · · + yn (0, vn )
So U ⊕ V = h(u1 , 0), . . . , (um , 0), (0, v1 ), . . . , (0, vn )i
Secondly, independence: assume x1 (u1 , 0) + · · · + xm (um , 0) + y1 (0, v1 ) +
· · · + yn (0, vn ) = (0, 0). Then
(x1 u1 + · · · + xm um , y1 v1 + · · · + yn vn ) = 0
⇒ x1 u1 + · · · + xm um = 0
and y1 v1 + · · · + yn vn = 0
⇒ xi = 0, ∀i
and yi = 0, ∀i
5
Hence the assertion is true and the result follows.
PROOF.
Let T : U ⊕ V 7→ U + V where U and V are subspaces of some W , such
that T (u, v) = u + v.
Thus Im T = U + V , and
Ker T = {(u, v) | u ∈ U, v ∈ V, and u + v = 0}
= {(t, −t) | t ∈ U ∩ V }
Clearly then, dim Ker T = dim(U ∩ V )1 and so
rank T + nullity T = dim(U ⊕ V )
⇒ dim(U + V ) + dim(U ∩ V ) = dim U + dim V.
xn
[u]β .
1
True if U ∩ V = {0}; if not, let S = Ker T and u1 , . . . , ur be a basis for U ∩ V . Then
(u1 , −u1 ), . . . , (ur , −ur ) form a basis for S and hence dim Ker T = dim S.
6
EXAMPLE 1.2
a b
Let A = ∈ M2×2 (F ) and let T : M2×2 (F ) 7→ M2×2 (F ) be
c d
defined by
T (X) = AX − XA.
Then T is linear2 , and Ker T consists of all 2 × 2 matrices A where AX =
XA.
Take β to be the basis E11 , E12 , E21 , and E22 , defined by
1 0 0 1 0 0 0 0
E11 = , E12 = , E21 = , E22 =
0 0 0 0 1 0 0 1
(so we can define a matrix for the transformation, consider these henceforth
to be column vectors of four elements).
Calculate [T ]ββ = B :
and similar calculations for the image of other basis vectors show that
0 −c b 0
−b a − d 0 b
B=
c 0 d − a −c
0 c b 0
nullity T = 4 − 2 = 2
2
7
Note: I2 , A ∈ Ker T which has dimension 2. Hence if A is not a scalar
matrix, since I2 and A are LI they form a basis for Ker T . Hence
AX = XA ⇒ X = αI2 + βA.
DEFINITIONS 1.2
Let T1 and T2 be LT’s mapping U to V.
Then T1 + T2 : U 7→ V is defined by
Now . . .
DEFINITION 1.4
[0]γβ = 0
and [−T ]γβ = −[T ]γβ
THEOREM 1.4
8
PROOF.
Let A = [T ]γβ , where β is the basis u1 , . . . , un , γ is the basis v1 , . . . , vm ,
and
m
X
T (uj ) = aij vi .
i=1
x1
..
Also let [u]β = . .
xn
Then u = nj=1 xj uj , so
P
n
X
T (u) = xj T (uj )
j=1
Xn m
X
= xj aij vi
j=1 i=1
m
X Xn
= aij xj vi
i=1 j=1
a11 x1 + · · · + a1n xn
⇒ [T (u)]γ =
..
.
am1 x1 + · · · + amn xn
= A[u]β
DEFINITION 1.5
(Composition of LTs)
If T1 : U 7→ V and T2 : V 7→ W are LTs, then T2 T1 : U 7→ W defined by
is a LT.
THEOREM 1.5
If β, γ and δ are bases for U , V and W , then
9
PROOF. Let u ∈ U . Then
Hence
[T2 T1 ]δβ [u]β = [T2 ]δγ [T1 ]γβ [u]β (1)
(note that we can’t just “cancel off” the [u]β to obtain the desired result!)
Finally, if β is u1 , . . . , un , note that [uj ]β = Ej (since uj = 0u1 + · · · +
0uj−1 + 1uj + 0uj+1 + · · · + 0un ) then for an appropriately sized matrix B,
EXAMPLE 1.3
If A is m × n and B is n × p, then
TA TB = TAB .
DEFINITION 1.6
(the identity transformation)
Let U be a vector space. Then the identity transformation IU : U 7→ U
defined by
IU (x) = x ∀x ∈ U
is a linear transformation, and
THEOREM 1.6
Let T : U 7→ V be a LT. Then
IV T = T IU = T.
10
Then
TIm TA = TIm A = TA = TA TAIn = TAIn
and consequently we have the familiar result
Im A = A = AIn .
DEFINITION 1.7
(Invertible LTs)
Let T : U 7→ V be a LT.
If ∃S : V 7→ U such that S is linear and satisfies
ST = IU and T S = IV
AB = Im and BA = In
Evidently
THEOREM 1.7
TA is invertible iff A is invertible (i.e. if A−1 exists). Then,
THEOREM 1.8
If u1 , . . . , un is a basis for U and v1 , . . . , vn are vectors in V , then there
is one and only one linear transformation T : U → V satisfying
T (u1 ) = v1 , . . . , T (un ) = vn ,
namely T (x1 u1 + · · · + xn un ) = x1 v1 + · · · + xn vn .
11
1.3 Isomorphisms
DEFINITION 1.8
A linear map T : U 7→ V is called an isomorphism if T is 1-1 and onto,
i.e.
T (x − y) = T (x) − T (y) = 0
⇒x−y ∈ Ker T
⇒x−y = 0⇒x=y
THEOREM 1.9
Let A ∈ Mm×n (F ). Then TA : Vn (F ) → Vm (F ) is
EXAMPLE 1.4
Let TA : Vn (F ) 7→ Vn (F ) with A invertible; so TA (X) = AX.
We will show this to be an isomorphism.
12
2. Let Y ∈ Vn (F ) : then,
T (A−1 Y ) = A(A−1 Y )
= In Y = Y
so Im TA = Vn (F )
THEOREM 1.10
If T is an isomorphism between U and V , then
dim U = dim V
PROOF.
Let u1 , . . . , un be a basis for U . Then
T (u1 ), . . . , T (un )
THEOREM 1.11
THEOREM 1.12
T : U 7→ V is invertible
⇔ T is an isomorphism between U and V .
PROOF.
⇒ Assume T is invertible. Then
T −1 T = IU
−1
and T T = IV
−1
⇒T (T (x)) = x ∀x ∈ U
and T (T −1 (y)) = y ∀y ∈ V
13
1. We prove Ker T = {0}.
Let T (x) = 0. Then
T −1 (T (x)) = T −1 (0) = 0 = x
So T is 1-1.
2. We show Im T = V .
Let y ∈ V . Now T (T −1 (y)) = y, so taking x = T −1 (y) gives
T (x) = y.
Hence Im T = V .
⇐ Assume T is an isomorphism, and let S be the inverse map of T
S : V 7→ U
COROLLARY 1.1
If A ∈ Mm×n (F ) is invertible, then m = n.
PROOF.
Suppose A is invertible. Then TA is invertible and thus an isomorphism
between Vn (F ) and Vm (F ).
Hence dim Vn (F ) = dim Vm (F ) and hence m = n.
THEOREM 1.13
If dim U = dim V and T : U 7→ V is a LT, then
14
PROOF.
⇒ Suppose T is 1-1.
Then Ker T = {0} and we have to show that Im T = V .
⇐ Suppose T is onto.
Then Im T = V and we must show that Ker T = {0}. The above
argument is reversible:
Im T = V
rank T = dim V
= dim U
= rank T + nullity T
⇒ nullity T = 0
or Ker T = {0}
COROLLARY 1.2
Let A, B ∈ Mn×n (F ). Then
AB = In ⇒ BA = In .
BX = 0 ⇒ A(BX) = A0 = 0
⇒ In X = 0 ⇒ X = 0.
TB TC = IVn (F ) = TC TB
⇒ BC = In = CB,
15
Now, knowing AB = In ,
⇒ A(BC) = A
(AB)C = A
In C = A
⇒C = A
⇒ BA = In
DEFINITION 1.9
Another standard isomorphism: Let dim V = m, with basis γ = v1 , . . . , vm .
Then φγ : V 7→ Vm (F ) is the isomorphism defined by
φγ (v) = [v]γ
THEOREM 1.14
PROOF
T
U −→ V
φβ ↓ ↓ φγ
−→
Vn (F ) TA Vm (F )
With
β : u1 , . . . , u n U
a basis for
γ : v1 , . . . , vm V,
let A = [T ]γβ . Then the commutative diagram is an abbreviation for the
equation
φγ T = TA φβ . (2)
Equivalently
φγ T (u) = TA φβ (u) ∀u ∈ U
or
[T (u)]γ = A[u]β
which we saw in Theorem 1.4.
But rank (ST ) = rank T if S is invertible and rank (T R) = rank T if R
is invertible. Hence, since φβ and φγ are both invertible,
16
and the result is proven.
Note:
Observe that φγ (T (uj )) = A∗j , the jth column of A. So Im T is mapped
under φγ into C(A). Also Ker T is mapped by φβ into N (A). Consequently
we get bases for Im T and Ker T from bases for C(A) and N (A), respectively.
THEOREM 1.15
Let β and γ be bases for some vector space V . Then, with n = dim V ,
[IV ]γβ
PROOF
IV IV = IV
⇒ [IV IV ]ββ = [IV ]ββ = In
= [IV ]βγ [IV ]γβ .
The matrix P = [IV ]γβ = [pij ] is called the change of basis matrix. For if
β : u1 , . . . , un and γ : v1 , . . . , vn then
uj = IV (uj )
= p1j v1 + · · · + pnj vn for j = 1, . . . , n.
i.e. if
v = x1 u1 + · · · + xn un
= y1 v1 + · · · + yn vn
17
then
y1 x1
.. .. ,
. = P
.
yn xn
or, more explicitly,
y1 = p11 x1 + · · · + p1n xn
..
.
yn = pn1 x1 + · · · + p1n xn .
PROOF
IV T = T = T IV
⇒ [IV T ]γβ = [T IV ]γβ
⇒ [IV ]γβ [T ]ββ = [T ]γγ [IV ]γβ
DEFINITION 1.10
(Similar matrices)
If A and B are two matrices in Mm×n (F ), then if there exists a non-
singular matrix P such that
B = P −1 AP
we say that A and B are similar over F .
18
THEOREM 1.17
Let A ∈ Mn×n (F ) and suppose that v1 , . . . , vn ∈ Vn (F ) form a basis β
for Vn (F ). Then if P = [v1 | · · · |vn ] we have
P −1 AP = [TA ]ββ .
PROOF. Let γ be the standard basis for Vn (F ) consisting of the unit vectors
E1 , . . . , En and let β : v1 , . . . , vn be a basis for Vn (F ). Then the change of
basis theorem applied to T = TA gives
pn1 pnn
19
2 Polynomials over a field
A polynomial over a field F is a sequence
(a0 , a1 , a2 , . . . , an , . . .) where ai ∈ F ∀i
0 = (0, 0, 0, . . .)
1 = (1, 0, 0, . . .)
x = (0, 1, 0, . . .).
DEFINITION 2.1
(Multiplication of polynomials)
Let f = (a0 , a1 , . . .) and g = (b0 , b1 , . . .). Then f g = (c0 , c1 , . . .) where
cn = a0 bn + a1 bn−1 + · · · + an b0
Xn
= ai bn−i
i=0
X
= ai bj .
0≤i,0≤j
i+j=n
EXAMPLE 2.1
If deg f = n, we have f = a0 1 + a1 x + · · · + an xn .
20
THEOREM 2.1 (Associative Law)
f (gh) = (f g)h
PROOF Take f, g as above and h = (c0 , c1 , . . .). Then f (gh) = (d0 , d1 , . . .),
where
X
dn = (f g)i hj
i+j=n
!
X X
= fu gv hj
i+j=n u+v=i
X
= fu gv hj .
u+v+j=n
fg = gf
0f = 0
1f = f
f (g + h) = fg + fh
f 6= 0 and g 6= 0 ⇒ f g 6= 0
and deg(f g) = deg f + deg g.
f g = 0 ⇒ f = 0 or g = 0.
f h = f g and f 6= 0 ⇒ h = g.
21
If f ∈ Pn [F ] and c ∈ F , we write
f (c) = a0 + a1 c + · · · + an cn .
DEFINITION 2.2
Let c1 , . . . , cn+1 be distinct members of F . Then the Lagrange inter-
polation polynomials p1 , . . . , pn+1 are polynomials of degree n defined
by
n+1
Y x − cj
pi = , 1 ≤ i ≤ n + 1.
ci − cj
j=1
j6=i
EXAMPLE 2.2
x − c2 x − c3 x − cn+1
p1 = c1 − c2 · · ·
c1 − c3 c1 − cn+1
x − c1 x − c3 x − cn+1
p2 = c2 − c1 × c2 − c3 ··· c2 − cn+1
etc. . .
We now show that the Lagrange polynomials also form a basis for Pn [F ].
PROOF Noting that there are n + 1 elements in the ‘standard’ basis, above,
we see that dim Pn [F ] = n + 1 and so it suffices to show that p1 , . . . , pn+1
are LI.
We use the following property of the polynomials pi :
1 if i = j
pi (cj ) = δij =
0 if i 6= j.
Assume that
a1 p1 + · · · + an+1 pn+1 = 0
where ai ∈ F, 1 ≤ i ≤ n + 1. Evaluating both sides at c1 , . . . , cn+1 gives
22
⇒
a1 × 1 + a2 × 0 + · · · + an+1 × 0 = 0
a1 × 0 + a2 × 1 + · · · + an+1 × 0 = 0
..
.
a1 × 0 + a2 × 0 + · · · + an+1 × 1 = 0
Hence ai = 0 ∀i as required.
COROLLARY 2.1
If f ∈ Pn [F ] then
f (c1 ) = λ1 ,
..
.
f (cn+1 ) = λn+1
as required.
COROLLARY 2.2
If f ∈ Pn [F ] and f (c1 ) = 0, . . . , f (cn+1 ) = 0 where c1 , . . . , cn+1 are dis-
tinct, then f = 0. (I.e. a non-zero polynomial of degree n can have at most
n roots.)
COROLLARY 2.3
If b1 , . . . , bn+1 are any scalars in F , and c1 , . . . , cn+1 are again distinct,
then there exists a unique polynomial f ∈ Pn [F ] such that
namely
f = b1 p1 + · · · + bn+1 pn+1 .
23
EXAMPLE 2.3
Find the quadratic polynomial
f = a0 + a1 x + a2 x2 ∈ P2 [R]
such that
f (1) = 8, f (2) = 5, f (3) = 4.
Solution: f = 8p1 + 5p2 + 4p3 where
(x − 2)(x − 3)
p1 =
(1 − 2)(1 − 3)
(x − 1)(x − 3)
p2 =
(2 − 1)(2 − 3)
(x − 1)(x − 2)
p3 =
(3 − 1)(3 − 2)
g = f h.
For this we write “f | g”, and “f6 | g” denotes the negation “f does not di-
vide g”.
Some properties:
f = qg + r, (3)
24
If f = 0 or deg f < deg g, (3) is trivially true (taking q = 0 and r = f ).
So assume deg f ≥ deg g, where
f = am xm + am−1 xm−1 + · · · a0 ,
g = bn xn + · · · + b0
am b−1
n x
m−n + ···
bn xn + · · · + b0 am xm + am−1 xm−1 + · · · + a0
am xm
etc. . .
1. d | f and d | g, and
2. ∀e ∈ F [x], e | f and e | g ⇒ e | d.
rn = gcd(f, g)
= uf + vg
r1 = f + (−q1 )g
r2 = g + (−q2 )r1
25
= g + (−q2 )(f + (−q1 )g)
= g + (−q2 )f + (q1 q2 )g
= (−q2 )f + (1 + q1 q2 )g
..
.
rn = (. . .) f + (. . .) g.
|{z} |{z}
u v
and
sk = −qk sk−1 + sk−2 , tk = −qk tk−1 + tk−2
for 1 ≤ k ≤ n. (Proof by induction.)
The special case gcd(f, g) = 1 (i.e. f and g are relatively prime) is of
great importance: here ∃u, v ∈ F [x] such that
uf + vg = 1.
EXERCISE 2.1
Find gcd(3x2 + 2x + 4, 2x4 + 5x + 1) in Q[x] and express it as uf + vg
for two polynomials u and v.
g is a constant
g|f ⇒
or g = constant × f
EXAMPLE 2.4
f (x) = x2 + x + 1 ∈ Z2 [x] is irreducible, for f (0) = f (1) = 1 6= 0, and
hence there are no polynomials of degree 1 which divide f .
26
THEOREM 2.2
Let f be irreducible. Then if f6 | g, gcd(f, g) = 1 and ∃u, v ∈ F [x] such
that
uf + vg = 1.
d|f and d | g.
f |d and d | g
⇒ f |g —a contradiction.
So d = 1 as required.
COROLLARY 2.4
If f is irreducible and f | gh, then f | g or f | h.
Proof: Suppose f is irreducible and f | gh, f6 | g. We show that f | h.
By the above theorem, ∃u, v such that
uf + vg = 1
⇒ uf h + vgh = h
⇒f | h
THEOREM 2.3
Any non-constant polynomial is expressible as a product of irreducible
polynomials where representation is unique up to the order of the irreducible
factors.
Some examples:
(x + 1)2 = x2 + 2x + 1
= x2 + 1 inZ2 [x]
2 2 4 2
(x + x + 1) = x +x +1 in Z2 [x]
2 3 2
(2x + x + 1)(2x + 1) = x + x + 1 inZ3 [x]
2
= (x + 2x + 2)(x + 2) inZ3 [x].
PROOF
27
Existence of factorization: If f ∈ F [x] is not a constant polynomial, then
f being irreducible implies the result.
Otherwise, f = f1 F1 , with 0 < deg f1 , deg F1 < deg f . If f1 and F1 are
irreducible, stop. Otherwise, keep going.
Eventually we end with a decomposition of f into irreducible poly-
nomials.
Uniqueness: Let
cf1 f2 · · · fm = dg1 g2 · · · gn
be two decompositions into products of constants (c and d) and monic
irreducibles (fi , gj ). Now
f1 | f1 f2 · · · fm =⇒ f1 | g1 g2 · · · gn
and since fi , gi are irreducible we can cancel f1 and some gj .
Repeating this for f2 , . . . , fm , we eventually obtain m = n and c = d—
in other words, each expression is simply a rearrangement of the factors
of the other, as required.
THEOREM 2.4
Let Fq be a field with q elements. Then if n ∈ N, there exists an irred-
ucible polynomial of degree n in F [x].
PROOF First we introduce the idea of the Riemann zeta function:
∞
X 1 Y 1
ζ(s) = = .
n s
1 − 1
n=1 p prime p s
and so
∞
!
Y X 1
R.H.S. =
pis
p prime i=0
1 1 1 1
= 1 + s + 2s + · · · 1 + s + 2s + · · ·
2 2 3 3
1 1 1
= 1 + s + s + s + ···
2 3 4
28
—note for the last step that terms will be of form
s
1
pa11 · · · paRR
29
Equating the two, we have
∞
1 Y 1
= N n . (4)
1
1−
q s−1 n=1 1 − q1ns
We now take logs of both sides, and then use the fact that
∞
xn
X
1
log = if |x| < 1;
1−x n
n=1
so (4) becomes
∞
1 Y 1
log = N n
1 − q −(s−1)
1 − q1ns
n=1
∞ ∞
X 1 X 1
⇒ = − Nn log 1 − ns
k=1
kq (s−1)k n=1
q
∞ ∞
X X 1
= Nn
mq mns
n=1 m=1
∞ ∞ ∞
X qk X X n
so = Nn
kq sk mnq mns
k=1 n=1 m=1
P
∞ nNn
mn=k
X
= .
kq ks
k=1
Putting x = q s , we have
∞ k k ∞
X q x X X
= xk × nNn ,
k
k=1 k=1 mn=k
and since both sides are power series, we may equate coefficients of xk to
obtain X X
qk = nNn = nNn . (5)
mn=k n|k
30
This proves the theorem for n = p, a prime.
But what if k is not prime? Equation (5) also tells us that
q k ≥ kNk .
But
q t+1 − 1
< q t+1 if q ≥ 2,
q−1
so
Since q > 1 (we cannot have a field with a single element, since the additive
and multiplicative identities cannot be equal by one of the axioms), the
latter condition is equivalent to
k ≥ bk/2c + 1
31
2.4 Minimum Polynomial of a (Square) Matrix
Let A ∈ Mn×n (F ), and g = chA . Then g(A) = 0 by the Cayley–Hamilton
theorem.
DEFINITION 2.5
Any non–zero polynomial g of minimum degree and satisfying g(A) = 0
is called a minimum polynomial of A.
Note: If f is a minimum polynomial of A, then f cannot be a constant
polynomial. For if f = c, a constant, then 0 = f (A) = cIn implies c = 0.
THEOREM 2.5
If f is a minimum polynomial of A and g(A) = 0, then f | g. (In partic-
ular, f | chA .)
g = qf + r,
So if r 6= 0, the inequality deg r < deg f would give a contradict the defini-
tion of f . Consequently r = 0 and f | g.
Note: It follows that if f and g are minimum polynomials of A, then f |g
and g|f and consequently f = cg, where c is a scalar. Hence there is a
unique monic minimum polynomial and we denote it by mA .
EXAMPLES (of minimum polynomials):
1. A = 0 ⇔ mA = x
2. A = In ⇔ mA = x − 1
3. A = cIn ⇔ mA = x − c
4. A2 = A and A 6= 0 and A 6= In ⇔ mA = x2 − x.
EXAMPLE 2.5
F = Q and
5 −6 −6
A = −1 4 2 .
3 −6 −4
32
Now
A 6= c0 I3 , c0 ∈ Q, so mA 6= x − c0 ,
A2 = 3A − 2I3
⇒ mA = x2 − 3x + 2
THEOREM 2.6
If f = xn + an−1 xn−1 + · · · + a1 x + a0 ∈ F [x], then mC(f ) = f , where
0 0 0 −a0
1 0 ··· 0 −a1
C(f ) =
0 1 0 −a2
.. .. ..
. . .
0 0 ··· 1 −an−1
.
AE1 = E2
AE2 = E3 ⇒ A2 E1 = E3
..
.
AEn−1 = En ⇒ An−1 E1 = En
AEn = −a0 E1 − a2 E2 − · · · − an−1 En
= −a0 E1 − a2 AE1 − · · · − an−1 An−1 E1 = An E1 ,
33
so
⇒ f (A)E1 = 0 ⇒ first column of f (A) zero
Thus
f (A)E2 = f (A)AE1 = Af (A)E1 = 0
and thus mA |f .
To show mA = f , we assume deg mA = t < n; say
mA = xt + bt−1 xt−1 + · · · + b0 .
Now
mA (A) = 0
t t−1
⇒ A + bt−1 A + · · · + b0 In = 0
t t−1
⇒ (A + bt−1 A + · · · + b0 In )E1 = 0,
Et+1 + bt−1 Et + · · · + b1 E2 + b0 E1 = 0
EXERCISE 2.2
If A = Jn (a) for a ∈ F , an elementary Jordan matrix of size n, show
34
that mA = (x − a)n where
a 0 0
1 a ···
0 1
A = Jn (a) =
.. . .
..
. . .
0 0 ··· a 0
0 0 1 a
(i.e. A is an n × n matrix with a’s on the diagonal and 1’s on the subdiag-
onal).
Note: Again, the minimum polynomial happens to equal the characteristic
polynomial here.
DEFINITION 2.6
(Direct Sum of Matrices)
Let A1 , . . . , At be matrices over F . Then the direct sum of these matrices
is defined as follows:
A1 0 . . .
0 A2
A1 ⊕ A2 ⊕ · · · ⊕ At = . .
. . . .
.
. . .
··· 0 At
Properties:
1.
2. If λ ∈ F ,
λ(A1 ⊕ · · · ⊕ At ) = (λA1 ) ⊕ · · · ⊕ (λAt )
3.
(A1 ⊕ · · · ⊕ At )(B1 ⊕ · · · ⊕ Bt ) = (A1 B1 ) ⊕ · · · ⊕ (At Bt )
DEFINITION 2.7
If f1 , . . . , ft ∈ F [x], we call f ∈ F [x] a least common multiple ( lcm ) of
f1 , . . . , ft if
35
1. f1 | f, . . . ft | f , and
2. f1 | e, . . . ft | e ⇒ f | e.
This uniquely defines the lcm up to a constant multiple and so we set “the”
lcm to be the monic lcm .
EXAMPLES 2.1
If f g 6= 0, lcm (f, g) | f g .
(Recursive property)
THEOREM 2.7
f (A1 ⊕ · · · ⊕ At ) = 0
⇒ f (A1 ) ⊕ · · · ⊕ f (At ) = 0 ⊕ · · · ⊕ 0
⇒ f (A1 ) = 0, . . . , f (At ) = 0
⇒ mA1 | f, . . . , mAt | f
⇒ g | f.
Conversely,
mA1 | g, . . . , mAt | g
⇒ g(A1 ) = 0, . . . , g(At ) = 0
⇒ g(A1 ) ⊕ · · · ⊕ g(At ) = 0 ⊕ · · · ⊕ 0
⇒ g(A1 ⊕ · · · ⊕ At ) = 0
⇒ f = mA1 ⊕···⊕At | g.
Thus f = g.
EXAMPLE 2.6
Let A = C(f ) and B = C(g).
Then mA⊕B = lcm (f, g).
36
Note: If
f = cpa11 . . . pat t
g = dpb11 . . . pbt t
Note
min(ai , bi ) + max(ai , bi ) = ai + bi .
so
gcd(f, g) lcm (f, g) = f g.
EXAMPLE 2.7
If A = diag (λ1 , . . . , λn ), then mA = (x − c1 ) · · · (x − ct ), where c1 , . . . , ct
are the distinct members of the sequence λ1 , . . . , λn .
mA = lcm (x − λ1 , . . . , x − λn ) = (x − c1 ) · · · (x − ct ).
mA = pb11 . . . pbt t
where 0 ≤ bi ≤ ai , ∀i = 1, . . . , t.
We soon show that each bi > 0, i.e. if p | chA and p is irreducible then
p | mA .
37
2.5 Construction of a field of pn elements
(where p is prime and n ∈ N)
n = 2, p = 2 ⇒ x2 + x + 1 = f
n = 3, p = 2 ⇒ x3 + x + 1 = f or x3 + x2 + 1 = f.
b0 In + b1 A + · · · + bt At
where b0 , . . . , bt ∈ Zp .
We need only show existence of a multiplicative inverse for each element
except 0 (the additive identity), as the remaining axioms clearly hold.
So let g ∈ Zp [x] such that g(A) 6= 0. We have to find h ∈ Zp [x] satisfying
g(A)h(A) = In .
f | g ⇒ g = f f1
and hence
g(A) = f (A)f1 (A) = 0f1 (A) = 0.
Then since f is irreducible and f6 | g, there exist u, v ∈ Zp [x] such that
uf + vg = 1.
g = fq + r
38
where q, r ∈ Zp [x] and deg r < deg g. So let
r = r0 + r1 x + · · · + rn−1 xn−1
where r0 , . . . , rn−1 ∈ Zp . Then
g(A) = f (A)q(A) + r(A)
= 0q(A) + r(A)
= r(A)
= r0 In + r1 A + · · · + rn−1 An−1
Secondly, linear independence over Zp : Suppose that
r0 In + r1 A + · · · + rn−1 An−1 = 0,
where r0 , r1 , . . . , rn−1 ∈ Zp . Then r(A) = 0, where
r = r0 + r1 x + · · · + rn−1 xn−1 .
Hence mA = f divides r. Consequently r = 0, as deg f = n whereas
deg r < n if r 6= 0.
Consequently, there are pn such matrices g(A) in the field we have con-
structed.
Numerical Examples
EXAMPLE 2.8
Let p = 2, n = 2, f = x2 + x + 1 ∈ Z2 [x], and A = C(f ). Then
0 −1 0 1
A= = ,
1 −1 1 1
and
F4 = { a0 I2 + a1 A | a0 , a1 ∈ Z2 }
= { 0, I2 , A, I2 + A }.
We construct addition and multiplication tables for this field, with B =
I2 + A (as an exercise, check these):
⊕ 0 I2 A B ⊗ 0 I2 A B
0 0 I2 A B 0 0 0 0 0
I2 I2 0 B A I2 0 I2 A B
A A B 0 I2 A 0 A B I2
B B A I2 0 B 0 B I2 A
39
EXAMPLE 2.9
Let p = 2, n = 3, f = x3 + x + 1 ∈ Z2 [x]. Then
0 0 −1 0 0 1
A = C(f ) = 1 0 −1 = 1 0 1 ,
0 1 0 0 1 0
F8 = { a0 I3 + a1 A + a2 A2 | a0 , a1 , a2 ∈ Z2 }
= { 0, I3 , A, A2 , I3 + A, I3 + A2 , A + A2 , I3 + A + A2 }.
−1
Now find (A2 + A) .
Solution: use Euclid’s algorithm.
x3 + x + 1 = (x + 1)(x2 + x) + 1.
Hence
x3 + x + 1 + (x + 1)(x2 + x) = 1
A3 + A + I3 + (A + I3 )(A2 + A) = I3
(A + I3 )(A2 + A) = I3 .
THEOREM 2.8
Every finite field has precisely pn elements for some prime p—the least
positive integer with the property that
1| + 1 + 1{z+ · · · + 1} = 0.
p
40
2.6 Characteristic and Minimum Polynomial of a Transform-
ation
DEFINITION 2.8
(Characteristic polynomial of T : V 7→ V )
Let β be a basis for V and A = [T ]ββ .
Then we define chT = chA . This polynomial is independent of the basis
β:
chA = chP −1 BP
= det(xIn − P −1 BP ) where n = dim V
−1 −1
= det(P (xIn )P − P BP )
−1
= det(P (xIn − B)P )
−1
= det P chB det P
= chB .
DEFINITION 2.9
If f = a0 + · · · + at xt , where a0 , . . . , at ∈ F , we define
f (T ) = a0 IV + · · · + at T t .
LEMMA 2.1
f ∈ F [x] ⇒ [f (T )]ββ = f [T ]ββ .
Note: The Cayley-Hamilton theorem for matrices says that chA (A) = 0.
Then if A = [T ]ββ , we have by the lemma
so chT (T ) = 0V .
41
DEFINITION 2.10
Let T : V → V be a linear transformation over F . Then any polynomial
of least positive degree such that
f (T ) = 0V
is called a minimum polynomial of T .
We have corresponding results for polynomials in a transformation T to
those for polynomials in a square matrix A:
g = qf + r ⇒ g(T ) = q(T )f (T ) + r(T ).
Again, there is a unique monic minimum polynomial of T is denoted by mT
and called “the” minimum polynomial of T .
Also note that because of the lemma,
mT = m[T ]β .
β
42
2.6.2 Mn×n (F )[y]—Ring of Matrix Polynomials
This consists of all polynomials in y with coefficients in Mn×n (F ).
Example:
0 1 5 1 0 2 0 5 2 1
y + y + y+ ∈ M2×2 (F )[y].
0 0 0 0 1 0 3 1
THEOREM 2.9
The mapping
Φ : Mn×n (F )[y] 7→ Mn×n (F [x])
given by
Φ(A0 + A1 y + · · · + Am y m ) = A0 + xA1 + · · · + xm Am
Also
Φ(In y − A) = xIn − A ∀A ∈ Mn×n (F ).
R = Am Bm + · · · + AB1 + B0
and Q = Cm−1 y m−1 + · · · + C0
Bm = Cm−1
Bm−1 = −ACm−1 + Cm−2
..
.
B1 = −AC1 + C0 .
43
PROOF. First we verify that B0 = −AC0 + R:
R = Am Bm = Am Cm−1
+Am−1 Bm−1 −Am Cm−1 + Am−1 Cm−2
+ +
.. ..
. .
+AB1 −A2 C1 + AC0
+B0 B0
= B0 + AC0 .
Then
THEOREM 2.11
If p is an irreducible polynomial dividing chA , then p | mA .
mA In = (xIn − A)Φ(Q)
44
So letting p be an irreducible polynomial dividing chA , we have p | {mA }n
and hence p | mA .
Alternative simpler proof (MacDuffee):
mA (x) − mA (y) = (x − y)k(x, y), where k(x, y) ∈ F [x, y]. Hence
Exercise: If ∆(x) is the gcd of the elements of adj(xIn − A), use the
equation (xIn −a)adj(xIn −A) = chA (x)In and an above equation to deduce
that mA (x) = chA (x)/∆(x).
EXAMPLES 2.3
With A = 0 ∈ Mn×n (F ), we have chA = xn and mA = x.
A = diag (1, 1, 2, 2, 2) ∈ M5×5 (Q). Here
DEFINITION 2.11
A matrix A ∈ Mn×n (F ) is called diagonable over F if there exists a
non–singular matrix P ∈ Mn×n (F ) such that
P −1 AP = diag (λ1 , . . . , λn ),
where λ1 , . . . , λn belong to F .
THEOREM 2.12
If A is diagonable, then mA is a product of distinct linear factors.
PROOF
If P −1 AP = diag (λ1 , . . . , λn ) (with λ1 , . . . , λn ∈ F ) then
mA = mP −1 AP = m diag (λ , . . . , λ )
1 n
= (x − c1 )(x − c2 ) . . . (x − ct )
45
EXAMPLE 2.10
A = Jn (a).
DEFINITION 2.12
(Diagonable LTs)
T : V 7→ V is called diagonable over F if there exists a basis β for V
such that [T ]ββ is diagonal.
THEOREM 2.13
A is diagonable ⇔ TA is diagonable.
PROOF (Sketch)
TA (P1 ) = AP1 = λ1 P1
..
.
TA (Pn ) = APn = λn Pn
THEOREM 2.14
Let A ∈ Mn×n (F ). Then if λ is an eigenvalue of A with multiplicity m,
(that is (x − λ)m is the exact power of x − λ which divides chA ), we have
nullity (A − λIn ) ≤ m.
46
REMARKS. (1) If m = 1, we deduce that nullity (A − λIn ) = 1. For the
inequality
1 ≤ nullity (A − λIn )
always holds.
(2) The integer nullity (A − λIn ) is called the geometric multiplicity of
the eigenvalue λ, while m is referred to as the algebraic multiplicity of λ.
PROOF. Let v1 , . . . , vr be a basis for N (A − λIn ), where λ is an eigenvalue
of A having multiplicity m. Extend this linearly independent family to a
basis v1 , . . . , vr , vr+1 , . . . , vn of Vn (F ). Then the following equations hold:
Av1 = λv1
..
.
Avr = λvr
Avr+1 = b11 v1 + · · · + bn1 vn
..
.
Avn = b1n−r v1 + · · · + bnn−r vn .
Then
chA = chP −1 AP = chλIr · chB2 = (x − λ)r chB2
and because (x − λ)m is the exact power of x − λ dividing chA , it follows
that
nullity (A − λIn ) = r ≤ m.
THEOREM 2.15
Suppose that chT = (x − c1 )a1 · · · (x − ct )at . Then T is diagonable if
nullity (T − ci Iv ) = ai for 1 ≤ i ≤ t.
47
PROOF. We first prove that the subspaces Ker (T − ci IV ) are independent.
(Subspaces V1 , . . . , Vt are called independent if
v1 + · · · + vt = 0, vi ∈ Vi , i = 1, . . . t, ⇒ v1 = 0, . . . , vt = 0.
T (v1 + · · · + vt ) = T (0)
c1 v1 + · · · + ct vt = 0.
c21 v1 + · · · + c2t vt = 0
..
.
ct−1 t−1
1 v1 + · · · + ct vt = 0.
Hence
V = V1 + · · · + V t .
Then if βi is a basis for Vi for i ≤ i ≤ t and β = β1 ∪ · · · ∪ βt , it follows that
β is a basis for V . Moreover
t
[T ]ββ
M
= (ci Iai )
i=1
48
and T is diagonable.
EXAMPLE. Let
5 2 −2
A= 2 5 −2 .
−2 −2 5
(a) We find that chA = (x − 3)2 (x − 9). Next we find bases for each of the
eigenspaces N (A − 9I3 ) and N (A − 3I3 ):
First we solve (A − 3I3 )X = 0. We have
2 2 −2 1 1 −1
A − 3I3 = 2 2 −2 → 0 0 0 .
−2 −2 2 0 0 0
so X11 = [−1, 1, 0]t and X12 = [1, 0, 1]t form a basis for the eigenspace
corresponding to the eigenvalue 3.
Next we solve (A − 9I3 )X = 0. We have
−4 2 −2 1 0 1
A − 9I3 = 2 −4 −2 → 0 1 1 .
−2 −2 −4 0 0 0
and we can take X21 = [−1, −1, 1]t as a basis for the eigenspace correspond-
ing to the eigenvalue 9.
Then P = [X11 |X12 |X21 ] is non–singular and
3 0 0
P −1 AP = 0 3 0 .
0 0 9
49
THEOREM 2.16
If
mT = (x − c1 ) . . . (x − ct )
for c1 , . . . , ct distinct in F , then T is diagonable and conversely. Moreover
there exist unique linear transformations T1 , . . . , Tt satisfying
IV = T1 + · · · + Tt ,
T = c1 T1 + · · · + ct Tt ,
Ti Tj = 0V if i 6= j,
Ti2 = Ti , 1 ≤ i ≤ t.
Remarks.
T m = cm m
1 T1 + · · · + ct Tt .
T −1 = c−1 −1
1 T1 + · · · + ct Tt .
Then
g ∈ F [x] ⇒ g = g(c1 )p1 + · · · + g(ct )pt .
In particular,
g = 1 ⇒ 1 = p1 + · · · + p t
50
and
g = x ⇒ x = c1 p1 + · · · + ct pt .
Hence with Ti = pi (T ),
IV = T1 + · · · + Tt
T = c1 T1 + · · · + ct Tt .
Next
mT = (x − c1 ) . . . (x − ct ) | pi pj if i 6= j
⇒ (pi pj )(T ) = 0V if i 6= j
⇒ pi (T )pj (T ) = 0V or Ti Tj = 0V if i 6= j.
dim V − ai ≤ nullity pi (T )
Ti T = T Ti = ci Ti
Sj T = T Sj = cj Sj
Ti (T Sj ) = Ti (cj Sj ) = cj Ti Sj = (Ti T )Sj = ci Ti Sj
51
so (cj − ci )Ti Sj = 0V and Ti Sj = 0V if i 6= j. Hence
Xt
Ti = Ti IV = Ti ( Sj ) = T i S i
j=1
t
X
Si = IV Si = ( Tj )Si = Ti Si .
j=1
Hence Ti = Si .
Conversely, suppose that T is diagonable and let β be a basis of V such
that
A = [T ]ββ = diag (λ1 , . . . , λn ).
Then mT = mA = (x − c1 ) · · · (x − ct ), where c1 , . . . , ct are the distinct
members of the sequence λ1 , . . . , λn .
COROLLARY 2.5
If
chT = (x − c1 ) . . . (x − ct )
with ci distinct members of F , then T is diagonable.
Proof: Here mT = chT and we use theorem 3.3.
EXAMPLE 2.11
Let
0 a
A= a, b ∈ F, ab 6= 0, 1 + 1 6= 0.
b 0
Then A is diagonable if and only if ab = y 2 for some y ∈ F .
For chA = x2 − ab, so if ab = y 2 ,
chA = x2 − y 2 = (x + y)(x − y)
which is a product of distinct linear factors, as y 6= −y here.
Conversely suppose that A is diagonable. Then as A is not a scalar
matrix, it follows that mA is not linear and hence
mA = (x − c1 )(x − c2 ),
where c1 6= c2 . Also chA = mA , so chA (c1 ) = 0. Hence
c21 − ab = 0, or ab = c21 .
For example, take F = Z7 and let a = 1 and b = 3. Then ab 6= y 2 and
consequently A is not diagonable.
52
3 Invariant subspaces
DEFINITIONS 3.1
Subspaces V1 , . . . , Vt of V are called independent if
v1 + · · · + vt = 0 ⇒ v1 = 0, . . . , vt = 0
∀v1 ∈ V1 , . . . , vt ∈ Vt .
(b) V = V1 + · · · + Vt .
v = v1 + · · · + vt
with vi ∈ Vi .
Then V is isomorphic to the (external) direct sum V1 ⊕ · · · ⊕ Vt under
the isomorphism v 7→ (v1 , . . . , vt ) and we write V = V1 ⊕ · · · ⊕ Vt .
THEOREM 3.1
If V = V1 ⊕ · · · ⊕ Vt (an internal direct sum) and β1 , . . . , βt are bases for
V1 , . . . , Vt respectively, then
β = β1 ∪ · · · ∪ βt ,
w ∈ W ⇒ T (w) ∈ W,
TW (w) = T (w) ∀w ∈ W.
53
If β 0 is a basis for W , {0} ⊂ W ⊂ V , and β is an extension to a basis of
V , then
0
" #
β [TW ]ββ 0 B1
[T ]β = .
0 B2
A situation of great interest is when we have T -invariant subspaces
W1 , . . . , Wt and V = W1 ⊕ · · · ⊕ Wt . For if β = β1 ∪ · · · ∪ βt , where βi is
a basis for Wi , we see that
f (T )(v) = 0
DEFINITION 3.2
(T -cyclic subspace generated by v.)
If v ∈ V , the set of all vectors of the form f (T )(v), f ∈ F [x], forms a
subspace of V called the T -cyclic subspace generated by v. It is denoted by
CT,v .
54
PROOF. Exercise.
Also, CT,v is a T -invariant subspace of V . For
w ∈ CT,v ⇒ w = f (T )(v)
⇒ T (w) = T (f (T )(v)) = (T f (T ))(v) = ((xf )(T ))(v) ∈ CT,v .
We see that v = 0 if and only if CT,v = {0}.
THEOREM 3.2
Let v 6= 0, v ∈ V . Then CT,v has the basis β
v, T (v), T 2 (v), . . . , T k−1 (v)
where k = deg mT,v . (β is called the T -cyclic basis generated by v. ) Note
that dim CT,v = deg mT,v .
Finally,
β
TCT,v β = C(mT,v ),
the companion matrix of the minimum polynomial of v.
PROOF.
1. The T -cyclic basis is a basis for CT,v :
Spanning:
Let w ∈ hv, T (v), . . . , T k−1 (v)i, so
w = w0 v + w1 T (v) + · · · + wk−1 T k−1 (v)
= (w0 IV + · · · + wk−1 T k−1 )(v)
= g(T )(v),
where g = w0 + · · · + wk−1 xk−1 , so w ∈ CT,v . Hence
hv, T (v), . . . , T k−1 (v)i ⊆ CT,v .
Conversely, suppose that w ∈ CT,v so
w = f (T )(v)
and
f = qmT,v + r
where r = a0 + a1 x + · · · + ak−1 xk−1 and a0 , . . . , ak−1 ∈ F . So
f (T )(v) = q(T )mT,v (T )(v) + r(T )(v)
= q(T )mT,v (T )(v) + a0 v + a1 T (v) + · · · + ak−1 T k−1 (v)
= a0 v + a1 T (v) + · · · + ak−1 T k−1 (v)
∈ hv, T (v), . . . , T k−1 (v)i.
55
Independence:
Assume
a0 v + a1 T (v) + · · · + ak−1 T k−1 (v) = 0,
f = a0 + a1 x + · · · + ak−1 xk−1 .
Hence
0 0 . . . 0 −a0
1 0 −a1
[L]ββ = = C(mT,v ),
.. .. ..
. . .
0 0 · · · 1 −ak−1
as required.
56
THEOREM 3.3
Suppose that mT, v = (x − c)k . Then the vectors
v, (T − cIV )(v), . . . , (T − cIV )k−1 (v)
form a basis β for W = CT, v which we call the elementary Jordan basis.
Also
[TW ]ββ = Jk (c).
More generally suppose mT,v = pk , where p is a monic irreducible polynomial
in F [x], with n = deg p. Then the vectors
v, T (v), ..., T n−1 (v)
p(T )(v), T p(T )(v), ..., T n−1 p(T )(v)
.. .. .. ..
. . . .
pk−1 (T )(v), T pk−1 (T )(v), . . . , T n−1 pk−1 (T )(v),
form a basis for W = CT,v , which reduces to the elementary Jordan basis
when p = x − c. Also
[TW ]ββ = H(pk ),
where H(pk ) is a hypercompanion matrix, which reduces to the elemen-
tary Jordan matrix Jk (c) when p = x − c:
C(p) 0 ··· 0
N
C(p) · · · 0
k
H(p ) =
0 N ··· 0 ,
.. .. .. ..
. . . .
0 · · · N C(p)
where there are k blocks on the diagonal and N is a square matrix of same
size as C(p) which is everywhere zero, except in the top right–hand corner,
where there is a 1. The overall effect is an unbroken subdiagonal of 10 s.
57
and chT = chTW · chB2 . So chTW | chT and since we know that
chTW = mT,v ,
we have mT,v | chT .
Hence chT = gmT,v and
chT (T )(v) = (g(T )mT,v (T ))(v) = g(T )(mT, v (T )(v)) = g(T )(0) = 0.
THEOREM 3.4
Suppose T : V →
7 V,
mT = pb11 . . . pbt t
where b1 , . . . , bt ≥ 1, and p1 , . . . , pt are distinct monic irreducibles.
Then for i = 1, . . . , t we have
(a)
V ⊃ Im pi (T ) ⊃ · · · ⊃ Im pbi i −1 (T ) ⊃ Im pbi i (T ) = · · ·
(b)
{0} ⊂ Ker pi (T ) ⊂ · · · ⊂ Ker pbi i −1 (T ) ⊂ Ker pbi i (T ) = · · · .
58
1.
(f + g)v = f v + gv ∀f, g ∈ F [x], v ∈ V ;
2.
f (v + w) = f v + f w ∀f ∈ F [x], v, w ∈ V ;
3.
(f g)v = f (gv) ∀f, g ∈ F [x], v ∈ V ;
4.
1v = v ∀v ∈ V.
These axioms, together with the four axioms for addition on V , turn V into
what is called a “left F [x]-module”. (So there are deeper considerations
lurking in the background—ideas of greater generality which make the algo-
rithm we unravel for the rational canonical form also apply to other things
such as the theorem that any finite abelian group is a direct product of cyclic
prime power subgroups.)
The containment
Ker pb (T ) ⊆ Ker pb+1 (T )
is obvious, so we need only show that
pb = up2b + vmT .
59
Hence
and thus
pb w = (pb−1 )(pb+1 w) = pb−1 0 = 0
and w ∈ Ker pb , as required.
(iii)
(iv)
Ker pb−1 (T ) ⊂ Ker pb (T )
and this forces a chain of proper inclusions:
qv 6∈ Ker pb−1 (T ),
but qv ∈ Ker pb (T ) as
pb qv = mT v = 0.
60
3.3 Primary Decomposition Theorem
THEOREM 3.5 (Primary Decomposition)
If T : V 7→ V is a LT with mT = pb11 . . . pbt t , where p1 , . . . , pt are monic
irreducibles, then
(qi qj )(T ) = 0V if i 6= j as mT | q i q j if i 6= j.
1 = f1 q1 + · · · + ft qt
IV = T1 + · · · + Tt . (6)
Also
Then
t
M
V = Im Ti .
i=1
61
For Ti2 = Ti (T1 + · · · + Tt ) = Ti IV = Ti . Next, V = Im T1 + · · · + Im Tt . For
T1 (u1 ) + · · · + Tt (ut ) = 0
Ti (T1 (u1 ) + · · · + Tt (ut )) = T (0) = 0
Ti Ti (ui ) = 0
vi = Ti (ui ) = 0.
v = fi qi w
⇒ pbi i v = pbi i fi qi w
= fi (pbi i qi )w
= 0.
Tj (v) = fj qj v = 0.
So
as required.
Finally, let Vi = Ker pbi i (T ) and Li = TVi . Then because V1 , . . . , Vt are
T –invariant subspaces of V , we have
Now pbi i (T )(v) = 0 if v ∈ Vi , so pbi i (Li ) = 0Vi . Hence mLi has the form
mLi = pei i . Hence chLi has the form chLi = pdi i . Hence
62
and consequently di = ai .
Finally,
dim Vi = deg chLi = deg pai i = ai deg pi .
(Incidentally, we mention that mT = lcm (mL1 , . . . , mLt ). Hence
mT = pb11 · · · pbt t = pe11 · · · pet t
PROOF. (From Samelson page 158). We prove the result when m = 2, the
general case follows by an easy iteration. Suppose T1 and T2 are commuting
diagonable linear transformations on V . Because mT1 splits as a product
of distinct linear factors, the primary decomposition theorem gives a direct
sum decomposition as a sum of the T1 –eigenspaces:
V = U1 ⊕ · · · ⊕ Ut .
It turns out that not only are the subspaces Ui T1 –invariant, they are T2 –
invariant. For if Ui = Ker (T1 − cIV ), then
v ∈ Ui ⇒ T1 (v) = cv
⇒ T2 (T1 (v)) = cT2 (v)
⇒ T1 (T2 (v)) = cT2 (v)
⇒ T2 (v) ∈ Ui .
Now because T2 is diagonable, V has a basis consisting of T2 –eigenvectors
and it is an easy exercise to show that in a direct sum of T2 –invariant
subspaces, each non-zero ”component” of a T2 –eigenvector is itself a T2 –
eigenvector; moreover each non–zero component is a T1 – eigenvector. Hence
V is spanned by a family of vectors which are simultaneously T1 –eigenvectors
and T2 –eigenvectors. If β is a subfamily which forms a basis for V , then [T1 ]ββ
and [T2 ]ββ are diagonal.
63
THEOREM 3.7 (Fitting’s lemma)
Suppose T : V → V is a linear transformation over T and
Then V = Im T n ⊕ Ker T n .
COROLLARY 3.1
If T : V → V is an indecomposable linear transformation (that is the
only T –invariant subspaces of V are {0} and V ), then T is either nilpotent
(that is T n = 0V for some n ≥ 1) or T is an isomorphism.
64
4 The Jordan Canonical Form
The following subspaces are central for our treatment of the Jordan and
rational canonical forms of a linear transformation T : V → V .
DEFINITION 4.1
With mT = pb11 . . . pbt t as before and p = pi , b = bi for brevity, we define
Hence
THEOREM 4.1
Im Lh−1 ⊇ Im Lh
with L = p(T ).
The fact that Nb,p 6= {0} and that Nb+1,p = {0} follows directly from
the formula
dim Nh,p = ν(ph (T )) − ν(ph−1 (T )).
For simplicity, assume that p is linear, that is that p = x − c. The general
story (when deg p > 1) is similar, but more complicated; it is delayed until
the next section.
Telescopic cancellation then gives
65
THEOREM 4.2
EXAMPLE 4.1
Suppose T : V 7→ V is a LT such that p4 ||mT , p = x − c and
ν(p(T )) = 3, ν(p2 (T )) = 6,
ν(p3 (T )) = 8, ν(p4 (T )) = 10.
So
Then
ν1,p = 3, ν2,p = 6 − 3 = 3,
ν3,p = 8 − 6 = 2, ν4,p = 10 − 8 = 2
so
N1,p = N2,p ⊃ N3,p = N4,p 6= {0}.
66
In general,
· · νb,p
height b
·
· ν1,p
|{z}
γ columns
e1 ≥ e2 ≥ · · · ≥ eγ .
Finally, note that the total number of dots in the dot diagram is ν(pb (T )),
by Theorem 4.2.
THEOREM 4.3
∃v1 , . . . , vγ ∈ V such that
(i)
mT,vi = pei
(ii)
Ker pb (T ) = CT,v1 ⊕ · · · ⊕ CT,vγ
PROOF.
67
(i) We have pei −1 vi ∈ Ker p(T ), so pei vi = 0 and hence mT,vi | pei . Hence
mT,vi = pf , where 0 ≤ f ≤ ei .
But pei −1 vi 6= 0, as it is part of a basis. Hence f ≥ ei and f = ei as
required.
(ii) (a)
CT,vi ⊆ Ker pb (T ).
For pei vi = 0 and so pei (f vi ) = 0 ∀f ∈ F [x]. Hence as ei ≤ b,
we have
pb (f vi ) = pb−ei (pei f vi ) = pb−ei 0 = 0
and f vi ∈ Ker pb (T ). Consequently CT,vi ⊆ Ker pb (T ) and hence
= ν(pb (T ))
= dim Ker pb (T ).
Hence
f1 v1 + · · · + fγ vγ = 0 ; f1 , . . . , fγ ∈ F [x]
ej
⇒ p | fj 1 ≤ j ≤ γ.
Proof: (induction on e1 )
68
Firstly, consider e1 = 1. Then
e1 = e2 = · · · = eγ = 1.
pe1 −1 v1 , . . . , peγ −1 vγ
f1 v1 + · · · + fγ vγ = 0 f1 , . . . , fγ ∈ F [x]. (7)
Thus
fj vj = qj (x − c)vj + fj (c)vj
= fj (c)vj .
So (7) implies
mT,vj = pej ;
pe1 −1 v1 , . . . , peγ −1 vγ are LI,
and f1 v1 + · · · + fγ vγ = 0 (9)
as before, we have
69
we obtain
pej −1 | fj ∀j = 1, . . . , δ,
so we may write
fj = pej −1 gj ,
(where if gj = fj if j > δ). Now substituting in (9),
g1 pe1 −1 v1 + · · · + gγ peγ −1 vγ = 0. (11)
But
mT,pej −1 vj = p
so (11) and the case e1 = 1 give
p | gj ∀j,
as required.
A summary:
If mT = (x − c1 )b1 . . . (x − ct )bt = pb11 . . . pbt t , then there exist vectors vij
and positive integers eij (1 ≤ i ≤ t, 1 ≤ j ≤ γi ), where γi = ν(T − ci IV ),
satisfying
e
bi = ei1 ≥ · · · ≥ eiγi , mT,vij = pi ij
and
γi
t M
M
V = CT,vij .
i=1 j=1
We choose the elementary Jordan bases
βij : vij , (T − ci IV )(vij ), . . . , (T − ci IV )eij −1 (vij )
for CT,vij . Then if
γi
t [
[
β= βij ,
i=1 j=1
β is a basis for V and we have
γi
t M
[T ]ββ =
M
Jeij (ci ) = J.
i=1 j=1
70
4.2 Two Jordan Canonical Form Examples
4.2.1 Example (a):
4 0 1 0
2 2 3 0
Let A = −1 0 2 0 ∈ M4×4 (Q).
4 0 1 2
We find chT = (x − 2)2 (x − 3)2 = p21 p22 , where p1 = x − 2, p2 = x − 3.
CASE 1, p1 = x − 2:
2 0 1 0 1 0 0 0
2 0 3 0
→ 0 0 1 0 ,
p1 (A) = A − 2I4 =
−1 0 0 0 0 0 0 0
4 0 1 0 0 0 0 0
· · N1, x−2
0 0
1 0
We find v11 = 0 and v12 = 0 form a basis for Ker p1 (TA ) =
0 1
N (A − 2I4 ) and mTA , v11 = mTA , v12 = x − 2. Also
Ker (pb11 (TA )) = N (p1 (A)) = N (A − 2I4 ) = CTA , v11 ⊕ CTA , v12 .
Note that CTA , v11 and CTA , v12 have Jordan bases β11 : v11 and β12 : v12
respectively.
CASE 2, p2 = x − 3:
0 − 31
1 0 1 0 1 0
2 −1 1
3 0 0 1 0 3 ,
p2 (A) = A − 3I4 =
−1
→ 1
0 −1 0 0 0 1 3
4 0 1 −1 0 0 0 0
71
We have to find a basis of the form p2 (TA )(v21 ) = (A−3I4 )v21 for Ker p2 (TA ) =
N (A − 3I4 ).
To find v21 we first get a basis for N (A − 3I4 )2 . We have
0 0 0 0 1 0 −2 −1
−3 1 −4 0 0 1 −10 −3
p22 (A) = (A − 3I4 )2 =
0
→
0 0 0 0 0 0 0
−1 0 2 1 0 0 0 0
2 1
10 3
and we find X1 =
1 and X2 = 0
is such a basis. Then we have
0 1
N2, p2 = hp2 X1 , p2 X2 i
= hp2 (A)X1 , p2 (A)X2 i = h(A − 3I4 )X1 , (A − 3I4 )X2 i
* 3 1 + * 3 +
−3 −1 −3
= −3 , −1 = −3 .
9 3 9
Moreover CTA , v21 has Jordan basis β21 : v21 , (A − 3I4 )v21 .
Finally we have V4 (Q) = CTA , v11 ⊕ CTA , v12 ⊕ CTA , v21 and β = β11 ∪ β12 ∪
β21 is a basis for V4 (Q). Then with
0 0 2 3
1 0 10 −3
P = [v11 |v12 |v21 |(A − 3I4 )v21 ] =
0 0 1 −3
0 1 0 9
we have
2 0 0 0
β 0 2 0 0
P −1 AP = [TA ]β = J1 (2) ⊕ J1 (2) ⊕ J2 (3) =
.
0 0 3 0
0 0 1 3
72
4.2.2 Example (b):
Let A ∈ M6×6 (F ) have the property that chA = x6 , mA = x3 and
ν1, x = ν(A) = 3 = γ1 ;
ν2, x = ν(A2 ) − ν(A) = 5 − 3 = 2;
ν3, x = ν(A3 ) − ν(A2 ) = 6 − 5 = 1.
Hence the dot diagram corresponding to the (only) monic irreducible factor
x of mA is
· N3,x
· · N2,x
· · · N1,x
Hence we read off that ∃ a non-singular P ∈ M6×6 (F ) such that P −1 AP =
J3 (0) ⊕ J2 (0) ⊕ J1 (0). To find such a matrix P we proceed as follows:
(i) First find a basis for N3, x . We do this by first finding a basis for
N (A3 ): X1 , X2 , X3 , X4 , X5 , X6 . Then
N3, x = hA2 X1 , A2 X2 , A2 X3 , A2 X4 , A2 X5 , A2 X6 i.
and apply the LRA to find a basis for N2, x which includes A2 X1 . This
will have the form A2 v11 , Av12 , where Av12 is the first vector in the list
AY1 , . . . , AY5 which is not a linear combination of A2 v11 .
(iii) Now extend the linearly independent family A2 v11 , Av12 to a basis
for N1, x = N (A). We do this by first finding a basis Z1 , Z2 , Z3 for N (A).
73
Then place the linearly independent family A2 v11 , Av12 at the head of this
spanning family:
The LRA is then applies to the above spanning family selects a basis of the
form A2 v11 , Av12 , v13 , where v13 is the first vector among Z1 , Z2 , Z3 which
is not a linear combination of A2 v11 and Av12 .
Then mTA , v11 = x3 , mTA , v12 = x2 , mTA , v13 = x. Also
Ker pb11 (TA ) = N (A3 ) = CTA , v11 ⊕ CTA , v12 ⊕ CTA , v13 .
for the three T–cyclic subspaces CTA , v11 , CTA , v12 , CTA , v13 , respectively, we
then get the basis
for V6 (F ). Then if
we have
74
4.3 Uniqueness of the Jordan form
Let β be a basis for V for which [T ]ββ is in Jordan canonical form
J = Je1 (λ1 ) ⊕ · · · Jes (λs ).
If we change the order of the basis vectors in β, we produce a corresponding
change in the order of the elementary Jordan matrices. It is customary to
assume our Jordan forms arranged so as to group together into a block those
elementary Jordan matrices having the same eigenvalue ci :
J = J1 ⊕ · · · ⊕ Jt ,
where
γi
M
Ji = Jeij (ci ).
j=1
Moreover within this i–th block Ji , we assume the sizes ei1 , . . . , eiγi of the
elementary Jordan matrices decrease monotonically:
ei1 ≥ . . . ≥ eiγi .
We prove that with this convention, the above sequence is uniquely deter-
mined by T and the eigenvalue ci .
We next observe that
t
Y Y γi
t Y t
Y
eij
chT = chJ = chJi = (x − ci ) = (x − ci )ei1 +···+eiγi .
i=1 i=1 j=1 i=1
LEMMA 4.1
Let
0 0 0
1 0 ···
0 1
A = Je (0) = .
.. . . ..
. . .
0 0 ··· 0 0
0 0 1 0
75
Then
h h if 1 ≤ h ≤ e − 1,
ν(A ) =
e if e ≤ h.
We now can prove that the sequence ei1 ≥ . . . eiγi is determined uniquely
by T and the eigenvalue ci .
Let pk = x − ck and
γi
t M
A = [T ]ββ =
M
Jeij (ci ).
i=1 j=1
Then
ν(Jehij (ci − ck )) = 0
if i 6= k. Hence
γk
X
ν(phk (T )) = ν(Jehkj (0)).
j=1
76
Hence
γk
X
νh, x−ck = ν(phk (T )) − ν(pkh−1 (T )) = ν(Jehkj (0)) − ν(Jeh−1
kj
(0))
j=1
γk
X
= 1.
j=1
h ≤ ekj
Consequently νh, x−ck − νh+1, x−ck is the number of ekj which are equal to
h. Hence by taking h = 1, . . . , we see that the sequence ek1 , . . . , ekγk is
determined by T and ck and is in fact the contribution of the eigenvalue ck
to the Segre characteristic of T .
REMARK. If A and B are similar matrices over F , then B = P −1 AP say.
Also A and B have the same characteristic polynomials. Then if ck is an
eigenvalue of A and B and pk = x − ck , we have
and hence
ν(phk (TB )) = ν(phk (TA ))
for all h ≥ 1.
Consequently the Weyr characteristics of TA and TB will be identical.
Hence the corresponding dot diagrams and so the Segre characteristics will
also be identical. Hence TA and TB have the same Jordan form.
EXAMPLE 4.2
Let A = J2 (0) ⊕ J2 (0) and B = J2 (0) ⊕ J1 (0) ⊕ J1 (0). Then
However A is not similar to B. For both matrices are in Jordan form and
the Segre characteristics for TA and TB are 2, 2 and 2, 1, 1, respectively.
77
We now present some interesting applications of the Jordan canonical
form.
THEOREM 4.5
Suppose that chT splits completely in F [x]. Then chT = mT ⇔ ∃ a
basis β for V such that
[T ]ββ = Jb1 (c1 ) ⊕ . . . ⊕ Jbt (ct ),
where c1 , . . . , ct are distinct elements of F .
PROOF.
⇐
t
Y t
Y
chT = chJb (ci ) = (x − ci )bi ,
i
i=1 i=1
mT = lcm ((x − c1 ) , . . . , (x − ct )bt ) = (x − c1 )b1 . . . (x − ct )bt = chT .
b1
78
and we get the secondary decomposition
as required.
(a)
m
c 0 ··· ··· 0
m m−1 m
1 c c ··· ··· 0
m m−2 m m−1
2 c 1 c ··· ··· 0
.. .. .. .. .. ..
m
. . . . . .
Jn (c) = m
m ··· ··· 0
m
0 m ··· ··· 0
.. .. .. . .. ..
. ..
. . . .
m m m−1
cm
0 ··· m ··· 1 c
if 1 ≤ m ≤ n − 1;
(b)
m
c 0 ··· 0 0
m m−1 m ···
1 c c 0 0
m m−2 m m−1
Jnm (c) =
2 c 1 c ··· 0 0
.. .. .. ..
. . . .
m m−n+1 m m−n+2 · · · m
cm−1 cm
n−1 c n−2 c 1
79
if n − 1 ≤ m, where m
k is the binomial coefficient
m m! m(m − 1) · · · (m − n + 1)
= = .
k k!(m − k)! k!
PROOF. Jn (c) = cIn + N , where N has the special property that N k has 1
on the k–th sub–diagonal and 0 elsewhere, for 0 ≤ k ≤ n − 1.
Then because cIn and N commute, we can use the binomial theorem:
m
as N k = 0 if n ≤ k. Hence Jnm (c) is an n × n matrix having cm−k on the
k
k–th sub–diagonal, 0 ≤ k ≤ n − 1 and 0 elsewhere.
COROLLARY 4.1
Let F = C. Then
80
is a polynomial in m of degree k and
COROLLARY 4.2
Let A ∈ Mn×n (C) and suppose that all the eigenvalues of A are less than
1 in absolute value. Then
lim Am = 0.
m→∞
Hence
γi
t M
M
P −1 Am P = (P −1 AP )m = J m = Jemij (ci ).
i=1 j=1
(m)
To justify this definition, we let Am = [aij ]. We have to show that
1 2 1 M (0) 1 (1) 1 (M )
In + A + A + · · · + A = aij + aij + · · · + a
2! M! ij 1! M ! ij
81
tends to a limit as M → ∞; in other words, we have to show that the series
∞
X 1 (m)
a
m! ij
m=0
|aij | ≤ ρ, ∀ i, j.
(i) e0 = In ;
82
(viii) eA is non–singular and
(eA )−1 = e−A ;
(x)
ec
00 ··· 0
ec /1! ec0 ··· 0
ec /2! c ec
e /1! ··· 0
eJn (c)
= .. .... .. .. .
. . . . .
. . . ec /1!
ec /(n − 2)! ec
0
e /(n − 1)! e /(n − 2)! · · · e /2! e /1! ec
c c c c
(xi)
etc
0 0 ··· 0
tetc /1! etc 0 ··· 0
t2 etc /2! tc
e /1!etc ··· 0
etJn (c)
= .. .. .. .. .. .
. . . . .
. . . tetc /1!
tn−2 etc /(n − 2)! etc
0
t n−1 tc
e /(n − 1)! t n−2 e /(n − 2)! · · · t e /2! tetc /1! ec
tc 2 tc
(xii) If
γi
t M
M
P −1 AP = J = Jeij (ci ).
i=1 j=1
then
γi
t M
M
P −1 eA P = J = eJeij (ci ) .
i=1 j=1
PROOF.
(i)
∞
X 1 k
e0 = 0 = In ;
m!
m=0
83
(ii) Let A = diag (λ1 , . . . , λn ). Then
Am = diag (λm m
1 , . . . , λn )
∞ ∞ ∞
!
X 1 m X λm 1
X λm
n
A = diag ,...,
m! m! m!
m=0 m=0 m=0
λ1 λn
= diag (e , . . . , e ).
(iii)
∞
P −1 AP
X 1
e = (P −1 AP )m
m!
m=0
∞
X 1
= (P −1 Am P )
m!
m=0
∞
!
X 1
= P −1 Am P
m!
m=0
−1 A
= P e P.
84
∞
X tm m+1
= A
m!
m=0
= AetA .
(vii) Let deg mA = r. Then the matrices In , A, . . . , Ar−1 are linearly inde-
pendent over C, as if
mA = xr − ar−1 xr−1 − · · · − a0 ,
then
mA (A) = 0 ⇒ Ar = a0 In + a1 A + · · · + ar−1 Ar−1 .
Consequently for each m ≥ 1, we can express Am as a linear combina-
tion over C of In , A, . . . , Ar−1 :
(m) (m) (m)
Am = a0 In + a1 A + · · · + ar−1 Ar−1
and hence
M M (m) M (m) M (m)
X 1 m X a0 X a1 X ar−1 r−1
A = In + A + ··· + A ,
m! m! m! m!
m=0 m=0 m=0 m=0
or
(M )
[tij ] = s0M In + s1M A + · · · + sr−1 M Ar−1 ,
say.
(M )
Now [tij ] → eA as M → ∞.
Also the above matrix equation can be regarded as n2 equations in
s0M , s1M , . . . , sr−1, M .
Also the linear independence of In , A, . . . , Ar−1 implies that this sytem
has a unique solution. Consequently we can express s0M , s1M , . . . , sr−1, M
as linear combinations with coefficients independent of M of the se-
(M )
quences tij . Hence, because each of the latter sequences converges,
it follows that each of the sequences s0M , s1M , . . . , sr−1, M converges
to s0 , s1 , . . . , sr−1 , respectively. Consequently
r−1
X r−1
X
skM Ak → sk Ak
k=0 k=0
and
eA = s0 In + s1 A + · · · sr−1 Ar−1 ,
a polynomial in A.
85
(viii) – (ix) Suppose that AB = BA. Then etB is a polynomial in B and
hence A commutes with etB . Similarly, A and B commute with eA+B .
Now let
C(t) = et(A+B) e−tB e−tA , t ∈ R.
In = e0 eA e−A = eA e−A ,
86
4.8 Systems of differential equations
THEOREM 4.8
If X = X(t) satisfies the system of differential equations
Ẋ = AX,
X = e(t−t0 )A X(t0 ).
and
X = etA e−t0 A X(t0 ) = e(t−t0 )A X(t0 ).
EXAMPLE 4.3
Solve Ẋ = AX, where
0 4 −2
A = −1 −5 3 .
−1 −4 2
Solution: ∃P with
P −1 AP = J2 (−1) ⊕ J1 (−1)
−1 0 0
= 1 −1 0
0 0 −1
and
−t 0 0
P −1 (tA)P = t −t 0 .
0 0 −t
87
Thus
88
4.9 Markov matrices
DEFINITION 4.3
A real n × n matrix A = [aij ] is called a Markov matrix, or row–
stochastic matrix if
EXERCISE 4.1
If A and B are n × n Markov matrices, prove that AB is also a Markov
matrix.
THEOREM 4.9
Every eigenvalue λ of a Markov matrix satisfies |λ| ≤ 1.
AX = λX. (13)
Let k be such that |xj | ≤ |xk |, ∀j, 1 ≤ j ≤ n. Then equating the k–th
component of each side of equation (13) gives
n
X
akj xj = λxk . (14)
j=1
Hence
n
X n
X
|λxk | = |λ| · |xk | = | akj xj | ≤ akj |xj | (15)
j=1 j=1
n
X
≤ akj |xk | = |xk |. (16)
j=1
Hence |λ| ≤ 1.
89
DEFINITION 4.4
A positive Markov matrix is one with all positive elements (i.e.
strictly greater than zero). For such a matrix A we may write “A > 0”.
THEOREM 4.10
If A is a positive Markov matrix, then 1 is the only eigenvalue of modulus
1. Moreover nullity (A − In ) = 1.
so λ = 1.
COROLLARY 4.3
If A is a positive Markov matrix, then At has 1 as the only eigenvalue
of modulus 1. Also nullity (At − In ) = 1.
90
THEOREM 4.11
If A is a positive Markov matrix, then
(i) (x − 1)||mA ;
Xt
Xt
X is uniquely defined as the (positive) vector satisfying At X = X
whose components sum to 1.
Remark: In view of part (i) and the equation ν(A − In ) = 1, it follows that
(x − 1)|| chA .
PROOF As ν(A − In ) = 1, the Jordan form of A has the form Jb (1) ⊕
K, where (x − 1)b ||mA . Here K is the direct sum of all Jordan blocks
corresponding to all the eigenvalues of A other than 1 and hence K m → 0.
Now suppose that b > 1; then Jb (1) has size b > 1. Then ∃P such that
P −1 AP = Jb (1) ⊕ K,
−1 m
P A P = Jbm (1) ⊕ K m .
t1 X t
B = ...
tn X t
91
Now Am → B, so Am+1 = Am · A → BA. Hence B = BA and
At B t = B t . (20)
and hence At X = X.
However X ≥ 0 and At > 0, so X = At X > 0.
DEFINITION 4.5
We have thus proved that there is a positive eigenvector X of At corre-
sponding to the eigenvalue 1, where the components of X sum to 1. Then
because we know that the eigenspace N (At − In ) is one–dimensional, it
follows that this vector is unique.
This vector is called the stationary vector of the Markov matrix A.
EXAMPLE 4.4
Let
1/2 1/4 1/4
A = 1/6 1/6 2/3 .
1/3 1/3 1/3
Then
1 0 −4/9
At − I3 row–reduces to 0 1 −2/3 .
0 0 0
* 4/9 + * 4/19 +
Hence N (At − I3 ) = 2/3 = 6/19 and
1 9/19
4 6 9
1
lim Am = 4 6 9 .
m→∞ 19
4 6 9
DEFINITION 4.6
A Markov Matrix is called regular or primitive if ∃k ≥ 1 such that
k
A > 0.
92
THEOREM 4.12
If A is a primitive Markov matrix, then A satisfies the same properties
enunciated in the last two theorems for positive Markov matrices.
PROOF Suppose Ak > 0. Then (x − 1)|| chAk and hence (x − 1)|| chA , as
EXAMPLE 4.5
The following Markov matrix is primitive (its fourth power is positive)
and is related to the 5x + 1 problem:
0 0 1 0
1/2 0 1/2 0
.
0 0 1/2 1/2
0 1/2 1/2 0
1 2 8 4 t
Its stationary vector is [ 15 , 15 , 15 , 15 ] .
We remark that chA = (x − 1)(x + 1/2)(x2 + 1/4).
93
4.10 The Real Jordan Form
4.10.1 Motivation
If A is a real n × n matrix, the characteristic polynomial of A will in general
have real roots and complex roots, the latter occurring in complex pairs.
In this section we show how to derive a canonical form B for A which has
real entries. It turns out that there is a simple formula for eB and this is
useful in solving Ẋ = AX, as it allows one to directly express the complete
solution of the system of differential equations in terms of real exponentials
and sines and cosines.
We first introduce a real analogue of Jn (a+ib). It’s the matrix Kn (a, b) ∈
M2n×2n (R) defined
as follows:
a b
Let D = = aI2 + bJ where J 2 = −I2 (J is a matrix version
−b a
√
of i = −1, while D corresponds to the complex number a + ib) then
eD = eaI2 +bJ
= eaI2 ebJ
(bJ)2
a bJ
= e I2 I2 + + + ···
1! 2!
b2 b4 b3
a b
= e I2 − I2 + I2 + · · · + J − J + ···
2! 4! 1! 3!
= ea [(cos b)I2 + (sin b)J]
a cos b sin b
= e .
− sin b cos b
DEFINITION 4.7
Let a and b be real numbers and Kn (a, b) ∈ M2n×2n (R) be defined by
D 0 ...
I2 D
Kn (a, b) =
0 I2
..
.
D
94
a b
where D = . Then it is easy to prove that
−b a
eD 0 ...
eD /1! eD
eKn (a, b) = .. .
eD /2! eD /1!
.
..
.. ..
. . .
eD /(n − 1) ! ··· ··· eD /1! eD
EXAMPLE 4.6
0 1 0 0
−1 0 0 0
K2 (0, 1) =
1
0 0 1
0 1 −1 0
and
cos t sin t 0 0
− sin t cos t 0 0
etK2 (0, 1) =
t cos t t sin t
.
cos t sin t
−t sin t t cos t − sin t cos t
A ± B = A ± B, cA = cA c ∈ C, AB = A · B.
a0 In + · · · ar Ar = a0 In + · · · + ar Ar .
W = hw1 , . . . , wr i.
95
5. Let A be a real n × n matrix and c ∈ C. Then
(a)
(b)
W = W1 ⊕ · · · ⊕ Wr ⇒ W = W 1 ⊕ · · · ⊕ W r .
(c)
W = CTA , v ⇒ W = CTA , v .
(d)
r
M r
M
W = CTA , vi ⇒ W = CTA , vi .
i=1 i=1
(e)
Let A ∈ Mn×n (R). Then mA ∈ R[x] and so any complex roots will occur in
conjugate pairs.
Suppose that c1 , . . . , cr are the distinct real eigenvalues and cr+1 , . . . , cr+s ,
c̄r+1 , . . . , c̄r+s are the distinct non-real roots and
96
For brevity, let c = ci , v = vij , e = eij . Let
and
The large “real jordan form” matrix is the 2e × 2e matrix Ke (a, b).
Note: If e = 1, no I2 block is present in this matrix.
The spaces CTA , v and CTA , v̄ are independent and have bases P1 , . . . , Pe
and P̄1 , . . . , P̄e , respectively.
Consequently the vectors
P1 , . . . , Pe , P̄1 , . . . , P̄e
form a basis for CTA , v + CTA , v̄ . It is then an easy exercise to deduce that the
real vectors X1 , Y1 , . . . , Xe , Ye form a basis β for the T –invariant subspace
W = CTA , v + CTA , v̄ .
Writing T = TA for brevity, the above right hand batch of equations tells
us that [TW ]ββ = Ke (a, b). There will be s such real bases corresponding to
each of the complex eigenvalues cr+1 . . . , cr+s .
97
Joining together these bases with the real elementary Jordan bases aris-
ing from any real eigenvalues c1 , . . . , cr gives a basis β for Vn (C) such that
if P is the non–singular real matrix formed by these basis vectors, then
P −1 AP = [TA ]ββ = J ⊕ K,
where
M γi
r M γi
r+s M
M
J= Jeij (ci ), K= Keij (ai , bi ),
i=1 j=1 i=r+1 j=1
EXAMPLE 4.7
1 1 0 0
−2 0 1 0
A=
2
so mA = (x2 + 1)2
0 0 1
= (x − i)2 (x + i)2 .
−2 −1 −1 −1
· N2,p1
· N1,p1 = N (A − iI4 ).
yielding
AX11 = −Y11 + X12
(22)
AY11 = X11 + Y12 .
Now we know
98
Writing the four real equations (22) and (23) in matrix form, with
99
4.10.3 A real algorithm for finding the real Jordan form
A I4
Referring to the last example, if we write Z = , then
−I4 A
X11 X12
Z = ,
Y11 Y12
−Y11 −Y12
Z = ,
X11 X12
X12 0
Z = ,
Y12 0
−Y12 0
Z = .
X12 0
Then the vectors
X11 −Y11 X11 −Y11
, , Z , Z
Y11 X11 Y11 X11
actually form an R–basis for N (Z). This leads to a method for finding the
real Jordan canonical form using real matrices. (I am indebted to Dr. B.D.
Jones for introducing me to the Z matrix approach.)
More generally, we observe that a collection of equations of the form
AXij1 = ai Xij1 − bi Yij1 + Xij2
AYij1 = bi Xij1 + ai Yij1 + Yij2
..
.
AXijeij = ai Xijeij − bi Yijeij
AYijeij = bi Xijeij + ai Yijeij
can be written concisely in real matrix form, giving rise to an elementary
Jordan basis corresponding to an elementary divisor xeij for the following
real matrix: Let
A − ai In bi In
Zi = .
−bi In A − ai In
Then
Xij1 Xij2
Zi =
Yij1 Yij2
..
.
Xijeij
0
Zi = .
Yijeij 0
100
LEMMA 4.2
If V is a C–vector space with basis v1 , . . . , vn , then V is also an R–vector
space with basis
v1 , iv1 , . . . , vn , ivn .
Hence
dimR V = 2 dimC V.
DEFINITION 4.8
Let A ∈ Mn×n (R) and c = a + ib be a complex eigenvalue of A with
b 6= 0. Let Z ∈ M2n×2n (R) be defined by
A − aIn bIn
Z= = (A − aIn ) ⊗ In − In ⊗ (bJ).
−bIn A − aIn
Also let p = x − c.
LEMMA 4.3
Let Φ : V2n (R) → Vn (C) be the mapping defined by
X
Φ = X + iY, X, Y ∈ Vn (R).
Y
Then
(i) Φ is an R– isomorphism;
(ii) Φ −Y
X = i(X + iY );
(iii) Φ Z h X = ph (A)(X + iY );
Y
COROLLARY 4.4
If
pe1 −1 (A)(X1 + iY1 ), . . . , peγ −1 (A)(Xγ + iYγ )
form a C–basis for N (p(A)), then
e1 −1 X1 e1 −1 −Y1 eγ −1 Xγ eγ −1 −Yγ
Z ,Z ,...,Z ,Z
Y1 X1 Yγ Xγ
101
Remark: Consequently the dot diagram for the eigenvalue 0 for the matrix
Z has the same height as that for the eigenvalue c of A, with each row
expanded to twice the length.
To find suitable vectors X1 , Y1 , . . . , Xγ , Yγ , we employ the usual algo-
rithm for finding the Jordan blocks corresponding to the eigenvalue 0 of the
matrix Z, with the extra proviso that we always ensure that the basis for
Nh, x is chosen to have the form
X1 −Y1 Xr −Yr
Z h−1 , Z h−1 , . . . , Z h−1 , Z h−1 ,
Y1 X1 Yr Xr
to the form
" # " #
Xν(Z h ) −Yν(Z h )
X1 −Y1
, ,..., , .
Y1 X1 Yν(Z h ) Xν(Z h )
EXAMPLE 4.8
1 1 0 0
−2 0 1 0
A= ∈ M4×4 (R) has mA = (x2 + 1)2 . Find a real
2 0 0 1
−2 −1 −1 −1
non–singular matrix P such that P −1 AP is in real Jordan form.
Solution:
1 1 0 0 1 0 0 0
−2 0 1 0 0 1 0 0
2 0 0 1 0 0 1 0
−2 −1 −1 −1 0 0 0 1
Z=
−1 0 0 0 1 1 0 0
0 −1 0 0 −2 0 1 0
0 0 −1 0 2 0 0 1
0 0 0 −1 −2 −1 −1 −1
102
1 1 1/2 1/2
−2 −1/2 0 −1/2
2 1/2 1 3/2
2
−2 −3/2 −2 −3/2
basis for N (Z ) :
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
1 −1
1 0
−2 0 −1/2 −1
2 0 1/2 0
2
−2 0 −3/2 0
→ left–to–right basis for N (Z ) :
1 1 0 1
0 −2 1 −1/2
0 2 0 1/2
0 −2 0 −3/2
0 0 1/2 0
0 0 −1/2 −1/2
0 0 1/2 1/2
0 0 −1/2 −1/2
Z × basis matrix = → basis for N2, x :
0 0 0 1/2
0 0 1/2 −1/2
0 0 −1/2 1/2
0 0 1/2 −1/2
103
1/2 0
−1/2 −1/2
1/2 1/2
−1/2 −1/2
0 1/2
1/2 −1/2
−1/2 1/2
1/2 −1/2
h i h i
X11 X12
Consequently we read off that Z Y11 = Y12 is a basis for N2, x = N1, x =
N (Z). where
1 0 1/2 0
−1/2 1 −1/2 1/2
P = [X11 |Y11 |X12 |Y12 ] = .
1/2 0 1/2 −1/2
−3/2 0 −1/2 1/2
Then
0 1 0 0
−1 0 0 0
P −1 AP =
,
1 0 0 1
0 1 −1 0
which is in real Jordan form.
104
5 The Rational Canonical Form
Here p is a monic irreducible factor of the minimum polynomial mT and is
not necessarily of degree one.
Let Fp denote the field constructed earlier in the course, consisting of
all matrices of the form f (B), f ∈ F [x], where B = C(p), the companion
matrix of p. (We saw that if deg p = n, then
Let f = f (B), where f ∈ F [x]. Then this new symbol has the following
properties:
(i) f + g = f + g; f g = f g;
(ii) f = 0 ⇔ p | f ;
(iii) f = g ⇔ p | (f − g);
−1
(iv) f exists ⇔ p does not divide f .
Note: If p = x − c, then Fp = F .
THEOREM 5.1
Nh, p becomes a vector space over Fp if we define
f v = f v = f (T )(v).
First we must verify that the above definition is well–defined, that is,
independent of the particular polynomial f used to define the field element
f . So suppose f = g. Then f = g + kp, k ∈ F [x]. Hence
(ii) f (v + w) = f (v + w) = f v + f w = f v + f w;
105
(iii) f (gv) = f (gv) = f (gv) = (f g)v = (f g)v;
(iv) 1v = 1v = v.
Remark: An F –basis for Nh, p will be an Fp –spanning family for Nh, p , but
will not, in general, be an Fp –basis for Nh, p . The precise connection between
F –independence and Fp –independence is given by the following theorem:
THEOREM 5.2
Vectors v1 , . . . , vr form an Fp –basis for Nh, p if and only if the vectors
COROLLARY 5.1
1 ν(ph (T )) − ν(ph−1 (T ))
νh, p = dimFp Nh, p = dimF Nh, p = .
deg p deg p
ν1, p ≥ · · · ≥ νb, p ≥ 1,
ν(p(T ))
where ν1, p = dimFp Ker p(T ) = deg p .
Also
ν(pb (T ))
ν1, p + · · · + νb, p = , (24)
deg p
where pb k mT .
There is a corresponding dot diagram where the number of dots in the
h–th row from the bottom represents the integer νh, p . We also have a similar
theorem to an earlier one, in terms of the conjugate partition
e1 ≥ · · · ≥ eγ ≥ 1
ν(p(T ))
of the partition (24) above, where γ = ν1, p = dimFp Ker p(T ) = deg p .
106
THEOREM 5.3
Vectors v1 , . . . , vγ ∈ V can be found with the property that
pe1 −1 v1 , . . . , peγ −1 vγ
ei1 = bi ≥ . . . ≥ eiγi
form the conjugate partition for the dot diagram corresponding to pi . Here
ν(pi (T ))
γi = .
deg pi
Taking T –cyclic bases βij for CT, vij , then gives a basis
γi
t [
[
β= βij
i=1 j=1
for V . Moreover
γi
t M
e
[T ]ββ =
M
C(pi ij ).
i=1 j=1
T n−1 (vij )
vij , T (vij ), ...,
T n−1 pi (T )(vij )
pi (T )(vij ), T pi (T )(vij ), ...,
0
βij : .. .. .. ..
. . . .
e −1 e −1 e −1
pi ij (T )(vij ), T pi ij (T )(vij ), . . . , T n−1 pi ij (T )(vij ),
107
(with n = deg pi ) which reduces to the Jordan basis when pi = x − ci , it is
e
not difficult to verify that we get a corresponding matrix H(pi ij ) called a
hypercompanion matrix, which reduces to the elementary Jordan matrix
Jeij (ci ) when pi = x − ci :
C(pi ) 0 ··· 0
N
C(pi ) · · · 0
eij 0 N ··· 0
H(pi ) = ,
.. .. .. ..
. . . .
0 ··· N C(pi )
where there are eij blocks on the diagonal and N is a square matrix of same
size as C(pi ) which is everywhere zero, except in the top right–hand corner,
where there is a 1. The overall effect is an unbroken subdiagonal of 10 s.
We then get the corresponding rational canonical form:
γi
t M
0 e
[T ]ββ 0 =
M
H(pi ij ).
i=1 j=1
Computational Remark:
We can do our computations completely over F , without going into Fp ,
as follows. Suppose v1 , . . . , vr form an F –spanning family for Nh, p . Then
we could, in principle, perform the LRA over Fp on this spanning family
and find an Fp –basis vc1 , . . . , vcR . A little thought reveals that if we had
instead applied the LRA algorithm over F to the expanded sequence:
108
0 0 0 0 0 0
0 2 0 2 1 0
0 1 2 2 0 1 ν(p(A))
p(A) = , ν(p(A)) = 4, ν1, p = deg p = 2.
0 1 1 0 1 2
0 0 1 2 2 2
0 0 0 0 0 0
· N2, p
· · N1, p
We have to find an Fp –basis p(A)v11 for N2, p and extend this to an Fp –basis
p(A)v11 , v12 for N (p(A)).
An F –basis for N (p2 (A)) is E1 , . . . , E6 . Then
and the LRA give p(A)E2 as an Fp –basis for N2, p so we can take v11 = E2 .
We find the columns of the following matrix form an F –basis for N (p(A)):
1 0 0 0
0 2 1 0
0 1 1 1
0 1 0 0 .
0 0 1 0
0 0 0 1
We place p(A)E2 in front and then pad the resulting matrix to get
0 0 1 1 0 0 0 0 0 2
2 0 0 1 2 0 1 2 0 1
1 2 0 0 1 2 1 0 1 2
.
1 1 0 2 1 1 0 2 0 0
0 1 0 0 0 1 1 1 0 1
0 0 0 1 0 0 0 0 1 1
The first four columns p(A)E2 , Ap(A)E2 , E1 , AE1 of this matrix form a LR
F –basis for N (p(A)) and hence p(A)E2 , E1 form an Fp –basis for N (p(A)).
So we can take v12 = E1 .
109
Then V6 (Z3 ) = N (p2 (A)) = CTA , v11 ⊕ CTA , v12 .
Then joining hypercompanion bases for CTA , v11 and CTA , v12 :
gives a basis v11 , Av11 , p(A)v11 , Ap(A)v11 ; v12 , Av11 for V6 (Z3 ). Finally if
P is the non–singular matrix whose columns are these vectors, we transform
A into direct sum of hypercompanion matrices:
0 1 0 0 0 0
1 2 0 0 0 0
−1 2
0 1 0 1 0 0
P AP = H(p ) ⊕ H(p) = 0 0 1 2 0 0
0 0 0 0 0 1
0 0 0 0 1 2
Explicitly, we have
0 0 0 0 1 1
1 0 2 0 0 1
0 1 1 2 0 0
P = .
0 0 1 1 0 2
0 0 0 1 0 0
0 0 0 0 0 1
where
ei1 ≥ . . . ≥ eiγi ≥ 1 (26)
and p1 , . . . , pt are distinct monic irreducible polynomials.
We show that the polynomials pi and the sequences (26) are determined
by the transformation T .
First, it is not difficult to show that
γi
t [
[
β= βij ,
i=1 j=1
110
where
βij : vij , T (vij ), . . . , T nij −1 (vij )
e e
and nij = deg pi ij and mT, vij = pi ij . Then we have the direct sum decom-
position
M t M γi
V = CT, vij .
i=1 j=1
and hence
t
M
V = Ker pbi i (T ).
i=1
eijh −1
pei i1 −1 vi1 , . . . , pi vijh ,
ν(pbi i (T ))
= ai
deg pi
111
Note that this determines bi —we may evaluate
ν(phi (T ))
deg pi
So
γi
t Y
Y
chT = chBi,j
i=1 j=1
e
where, for brevity, we write Bi,j = C(pi ij ). Hence
γi
t Y
e
Y
chT = pi ij
i=1 j=1
Xγi
eij
t
Y j=1
= pi
i=1
t
ν(pbi i (T ))
Y deg pi
= pi
i=1
as required.
THEOREM 5.5
chT = mT ⇔ ∃ a basis β for V such that
112
⇐
t
Y t
Y
chT = chC(pbi ) = pbi i ,
i
i=1 i=1
b1
mT = bt
lcm (p1 , . . . , pt ) = pb11 . . . pbt t = chT .
113
and we get the secondary decomposition
Ker pbi i (T ) = CT,vi1 .
Further, if β = β11 ∪ · · · ∪ βt1 , where βi1 is the T –cyclic basis for CT,vi1 ,
then
t Mγi
e
[T ]ββ =
M
C(pi ij )
i=1 j=1
t
M
= C(pbi i )
i=1
= C(pb11 ) ⊕ . . . ⊕ C(pbt t )
as required.
THEOREM 5.6
mT = p1 p2 . . . pt , a product of distinct monic irreducibles, if and only if
∃ a basis β for V such that
[T ]ββ = C(p ) ⊕ . . . ⊕ C(p1 ) ⊕ . . . . . .
| 1 {z }
γ1 times
⊕ C(pt ) ⊕ . . . ⊕ C(pt ) . (27)
| {z }
γt times
as eij = 1 ∀i, j.
114
5.3 Elementary divisors and invariant factors
5.3.1 Elementary Divisors
DEFINITION 5.1
e
The polynomials pi ij occurring in the rational canonical form of T are
called the elementary divisors of T . Similarly the elementary divisors of
e
a matrix A ∈ Mn×n (F ) are the polynomials pi ij occurring in the rational
canonical form of A.
THEOREM 5.7
Linear transformations T1 , T2 : V → V have the same elementary divi-
sors if and only if there exists an isomorphism L : V → V such that
T2 = L−1 T1 L.
PROOF
“only if”. Suppose that T1 and T2 have the same elementary divisors.
Then ∃ bases β, γ for V such that
φβ T1 = TA φβ
φγ T2 = TA φγ .
Hence
φβ T1 φ−1 −1
β = TA = φγ T2 φγ ,
so
−1
φ−1
γ φβ T1 φβ φγ = T2 ,
or
L−1 T1 L = T2 ,
where L = φ−1
β φγ is an isomorphism.
“if”. Suppose that L−1 T1 L = T2 . Then
115
we have
ν(phi (T2 )) = ν(phi (T1 )).
Hence for each pi , the corresponding dot diagrams for T1 and T2 are identical
and consequently the elementary divisors for T1 and T2 are identical.
COROLLARY 5.2
Let A, B ∈ Mn×n (F ). Then A is similar to B if and only if A and B
have the same elementary divisors.
PROOF
THEOREM 5.8
Let T : V → V be a linear transformation over F . Then there exist
non–constant monic polynomials d1 , . . . , ds ∈ F [x], such that
where mT, vk = dk .
116
Let s = max (γ1 , . . . , γt ) and if 1 ≤ i ≤ t and γi < j ≤ s, define
eij = 0 and vij = 0, the zero vector of V . Now arrange the polynomials
e
pi ij , 1 ≤ i ≤ t; 1 ≤ j ≤ s as a t × s rectangular array:
Let
d1 = pe11s · · · pet ts , . . . , ds = pe111 · · · pet t1
be the products along columns of the array, from left to right. Then
d1 , . . . , ds are monic non–constant polynomials and
d1 |d2 | · · · |ds .
Also CT, vij = {0} if vij = 0, so V is the direct sum of the following ts
T –cyclic subspaces:
CT, v1s · · · CT, v11
.. .. ..
. . .
CT, vts · · · CT, vt1
Then by Problem Sheet 5, Question 15(b), if we let
Consequently
V = CT, v1 ⊕ · · · ⊕ CT, vs .
DEFINITION 5.2
Polynomials d1 , . . . , ds satisfying the conditions of the above theorem are
called invariant factors of T .
There is a similar definition for matrices: if A ∈ Mn×n (F ) is similar to
a direct sum
Ms
C(dk ),
k=1
117
where d1 , . . . , ds are non–constant monic polynomials in F [x] such that dk
divides dk+1 for 1 ≤ k ≤ s − 1, then d1 , . . . , ds are called invariant factors
of A. So the invariant factors of A are the invariant factors of TA .
THEOREM 5.9
The invariant factors of a linear transformation T : V → V are uniquely
defined by T .
PROOF
Reverse the construction in the proof of the above theorem using Ques-
tion 15(a) of Problem Sheet 5, thereby recapturing the rectangular array of
elementary divisors, which in turn is uniquely determined by T .
EXAMPLE 5.1
Suppose T : V → V has elementary divisors
d 1 = p3
d 2 = p2 p3
d3 = p21 p22 p43
d4 = p31 p22 p53
d5 = p31 p42 p53 .
THEOREM 5.10
If d1 , . . . , ds are the invariant factors of T : V → V , then
(i) mT = ds ;
(ii) chT = d1 · · · ds .
118
PROOF
Suppose B = [T ]ββ = sk=1 C(dk ) is the canonical form corresponding to
L
the invariant factors d1 , . . . , ds of T . Then
Also
s
Y s
Y
chT = chB = chC(dk ) = dk .
k=1 k=1
119
6 The Smith Canonical Form
6.1 Equivalence of Polynomial Matrices
DEFINITION 6.1
A matrix P ∈ Mn×n (F [x]) is called a unit in Mn×n (F [x]) if ∃ Q ∈
Mn×n (F [x]) such that
P Q = In .
Clearly if P and Q are units, so is P Q.
THEOREM 6.1
A matrix P ∈ Mn×n (F [x]) is a unit in Mn×n (F [x]) if and only if det P =
c, where c ∈ F and c 6= 0.
proof
“only if”. Suppose P is a unit. Then P Q = In and
However det P and det Q belong to F [x], so both are in fact non–zero ele-
ments of F .
“if”. Suppose P ∈ Mn×n (F [x]) satisfies det P = c, where c ∈ F and
c 6= 0. Then
P adj P = (det P )In = cIn .
Hence P Q = In , where Q = c−1 adj P ∈ Mn×n (F [x]). Hence P is a unit in
Mn×n (F [x]).
EXAMPLE
6.1
1 + x −x
P = ∈ M2×2 (F [x]) is a unit, as det P = 1.
x 1−x
THEOREM 6.2
Elementary row matrices in Mn×n (F [x]) are units:
(i) Eij : interchange rows i and j of In ;
(ii) Ei (t): multiply row i of In by t ∈ F, t 6= 0;
(iii) Eij (f ): add f times row j of In to row i, f ∈ F [x].
In fact det Eij = −1; det Ei (t) = t; det Eij (f ) = 1.
Similarly for elementary column matrices in Mn×n (F [x]):
120
Remark: It follows that a product of elementary matrices in Mn×n (F [x])
is a unit. Later we will be able to prove that the converse is also true.
DEFINITION 6.2
Let A, B ∈ Mm×n (F [x]). Then A is equivalent to B over F [x] if units
P ∈ Mm×m (F [x]) and Q ∈ Mn×n (F [x]) exist such that
P AQ = B.
THEOREM 6.3
Equivalence of matrices over F [x] defines an equivalence relation on
Mm×n (F [x]).
THEOREM 6.4
For 1 ≤ k ≤ ρ(A), we have dk (A) 6= 0. Also dk (A) divides dk+1 (A) for
1 ≤ k ≤ ρ(A) − 1.
proof
Let r = ρ(A). Then there exists an r × r non–zero minor and hence
dr (A) 6= 0. Then because each r × r minor is a linear combination over F [x]
of (r − 1) × (r − 1) minors of A, it follows that some (r − 1) × (r − 1) minor of
A is also non–zero and hence dr−1 (A) 6= 0; also dr−1 (A) divides each minor
of size r − 1 and consequently divides each minor of size r; hence dr−1 (A)
divides dr (A), the gcd of all minors of size r. This argument can be repeated
with r replaced by r − 1 and so on.
THEOREM 6.5
Let A, B ∈ Mm×n (F [x]). Then if A is equivalent to B over F [x], we
have
121
(ii) dk (A) = dk (B) for 1 ≤ k ≤ r.
proof
Suppose P AQ = B, where P and Q are units. First consider P A. The
rows of P A are linear combinations over F [x] of the rows of A, so it follows
that each k × k minor of P A is a linear combination of the k × k minors of
A. Similarly each column of (P A)Q is a linear combinations over F [x] of
the columns of P A, so it follows that each k × k minor of B = (P A)Q is a
linear combination over F [x] of the k × k minors of P A and consequently of
the k × k minors of A.
It follows that all minors of B with size k > ρ(A) must be zero and hence
ρ(B) ≤ ρ(A). However B is equivalent to A, so we deduce that ρ(A) ≤ ρ(B)
and hence ρ(A) = ρ(B).
Also dk (B) is a linear combination over F [x] of all k × k minors of B
and hence of all k × k minors of A. Hence dk (A)|dk (B) and by symmetry,
dk (B)|dk (A). Hence dk (A) = dk (B) if 1 ≤ k ≤ r.
DEFINITION 6.3
The matrix D is said to be in Smith canonical form.
proof
This is presented in the form of an algorithm which is in fact used by
Cmat to find unit matrices P and Q such that P AQ is in Smith canonical
form.
122
Our account is based on that in the book “Rings, Modules and Linear
Algebra,” by B. Hartley and T.O. Hawkes.
We describe a sequence of elementary row and column operations over
F [x], which when applied to a matrix A with a11 6= 0 either yields a matrix
C of the form
f1 0 · · · 0
0
C= .
. C ∗
.
0
where f1 is monic and divides every element of C ∗ , or else yields a matrix
B in which b11 6= 0 and
a1j = a11 q + b,
by Euclid’s division theorem, where b 6= 0 and deg b < deg a11 . Subtract q
times column 1 from column j and then interchange columns 1 and j. This
yields a matrix of type B mentioned above.
123
Case 2. ∃ ai1 in column 1 with a11 not dividing ai1 . Proceed as in Case 1,
operating on rows rather than columns, again reaching a matrix of type B.
Case 3. Here a11 divides every element in the first row and first column.
Then by subtracting suitable multiples of column 1 from the other columns,
we can replace all the entries in the first row other than a11 by 0. Similarly
for the first column. We then have a matrix of the form
e11 0 · · · 0
0
E= . .
. E ∗
.
0
EXAMPLE 6.2
(of the Smith Canonical Form)
1 + x2
x
A=
x 1+x
We want D = P AQ in Smith canonical form. So we construct the augmented
matrix
work on rows work on columns
↓ ↓
1 0 1 + x2 x 1 0
0 1 x 1+x 0 1
R1 → R1 − xR2 ⇒ 1 −x 1 −x2 1 0
0 1 x 1+x 0 1
C2 → C2 + x2 C1 ⇒ 1 −x 1 0 1 x2
0 1 x 1 + x + x3 0 1
R2 → R2 − xR1 ⇒ 1 −x 1 0 1 x2
−x 1 + x2 0 1 + x + x3 0 1
↑ ↑ ↑
P D Q
d2 (A)
f1 = d1 (A), f2 = .
d1 (A)
124
6.2.1 Uniqueness of the Smith Canonical Form
THEOREM 6.7
Every matrix A ∈ Mm×n (F [x]) is equivalent to precisely one matrix is
Smith canonical form.
dk (A) = dk (B) = f1 f2 . . . fk
f1 = d1 (A)
d2 (A)
f2 =
d1 (A)
..
.
dr (A)
fr = .
dr−1 (A)
1, 1, . . . , 1, d1 , . . . , ds
| {z }
n−s
125
LEMMA 6.1
The Smith canonical form of xIn − C(d) where d is a monic polynomial
of degree n is
diag (1, . . . , 1, d).
| {z }
n−1
R1 → R1 + xR2 + x2 R3 + · · · + xn−1 Rn
to obtain
0 0 d
−1 x · · · a1
0 −1 a2
.. . . ..
. . .
x an−2
0 · · · −1 x + an−1
(think about it!) and then column operations
and then
yielding
0 0 ... 0 d
−1 0 0
0 −1 .
.. ..
. .
0 · · · −1 0
126
Trivially, elementary operations now form the matrix
THEOREM 6.8
Let B ∈ Mn×n (F ). Then if the invariant factors of B are d1 , . . . , ds , then
the invariant factors of xIn − B are
1, . . . , 1, d , d , . . . , ds .
| {z } 1 2
n−s
Then
s
M
P −1 (xIn − B)P = xIn − C(dk )
k=1
s
M
= (xImk − C(dk )) where mk = deg dk .
k=1
EXAMPLE 6.3
Find the invariant factors of
2 0 0 0
−1 1 0 0
B= ∈ M4×4 (Q)
0 −1 0 −1
1 1 1 2
127
by finding the Smith canonical form of xI4 − B.
Solution:
x−2 0 0 0
1 x − 1 0 0
xI4 − B =
0
1 x 1
−1 −1 −1 x − 2
We start off with the row operations
R1 → R1 − (x − 2)R2
R1 ↔ R2
R4 → R4 + R1
and get
1 x−1 0 0
0 −(x − 1)(x − 2) 0 0
0 1 x 1
0 x−2 −1 x − 2
1 0 0 0
0 −(x − 1)(x − 2) 0 0
(column ops.) ⇒
0
1 x 1
0 x − 2 −1 x − 2
1 0 0 0
0 1 x 1
⇒
0 −(x − 1)(x − 2) 0
0
0 x−2 −1 x − 2
1 0 0 0
0 1 x 1
⇒ 0 0 x(x − 1)(x − 2) (x − 1)(x − 2)
0 0 −1 − x(x − 2) 0
2
{= −(x − 1) }
1 0 0 0
0 1 0 0
⇒ 0 0 x(x − 1)(x − 2) (x − 1)(x − 2) .
0 0 −(x − 1)2 0
Now, for brevity, we work just on the 2 × 2 block in the bottom right corner:
(x − 1)(x − 2) x(x − 1)(x − 2)
⇒
0 −(x − 1)2
128
(x − 1)(x − 2) 0
C2 → C2 − xC1 ⇒
0 −(x − 1)2
(x − 1)(x − 2) (x − 1)2
R1 → R1 + R2 ⇒
0 −(x − 1)2
(x − 1)(x − 2) x−1
C2 → C2 − C1 ⇒
0 −(x − 1)2
x−1 (x − 1)(x − 2)
C1 ↔ C2 ⇒
−(x − 1)2 0
x−1 0
C2 → C2 − (x − 2)C1 ⇒
−(x − 1) (x − 2)(x − 1)2
2
x−1 0
R2 → R2 + (x − 1)R1 ⇒
0 (x − 2)(x − 1)2
THEOREM 6.9
Let A, B ∈ Mn×n (F ). Then A is similar to B
proof
129
⇒ Obvious. If P −1 AP = B, P ∈ Mn×n (F ) then
⇐ If xIn − A and xIn − B are equivalent over F [x], then they have the
same invariant factors and so have the same non-trivial invariant fac-
tors. That is, A and B have the same invariant factors and hence are
similar.
Note: It is possible to start from xIn − A and find P ∈ Mn×n (F ) such that
s
M
P −1 AP = C(dk )
k=1
where
P1 (xIn − B)Q1 = diag (1, . . . , 1, d1 , . . . , ds ).
(See Perlis, Theory of matrices, p. 144, Corollary 8–1 and p. 137, Theorem
7–9.)
THEOREM 6.10
Every unit in Mn×n (F [x]) is a product of elementary row and column
matrices.
130
7 Various Applications of Rational Canonical Forms
7.1 An Application to commuting transformations
THEOREM 7.1 (Cecioni 1908, Frobenius 1910)
Let L : U 7→ U and M : V 7→ V be given LTs. Then the vector space
ZL,M of all LTs N : U 7→ V satisfying
MN = NL
has dimension
s X
X t
deg gcd(dk , Dl ),
k=1 l=1
COROLLARY 7.1
Now take U = V and L = M . Then ZL,L the vector space of LTs satis-
fying
N L = LN,
has dimension
s
X
(2s − 2k + 1) deg dk .
k=1
f (L) : U 7→ U f ∈ F [x].
THEOREM 7.2
PL = ZL,L ⇔ mL = chL .
131
proof First note that dim PL = deg mL as
IV , L, . . . , Ldeg mL −1
⇒ M n N = N Ln ∀n ≥ 1
⇒ f (M )N = N f (L) ∀f ∈ F [x].
N 7→ (w1 , . . . , ws )
132
Now let
t
X
wk = ckl (M )(vl ) k = 1, . . . , s and ckl ∈ F [x].
l=1
N.B.
Then the matrices [ckl ], where ckl satisfy (30), form a vector space (call
it X) which is isomorphic to W .
Then in (31),
Clearly then,
s X
X t
dim X = dim ZL,M = deg gkl
k=1 l=1
as required.
EXAMPLE 7.1
(of the vector space X, when s = t = 2)
Say
2 0
[deg gkl ] = .
1 3
133
Then X consists of all matrices of the form
(a0 + a1 x) · gD111 0 · gD122
" #
[ckl ] =
b0 · gD211 (c0 + c1 x + c2 x2 ) · gD222
D1 xD1
0 0 0 0
= a0 g11 + a1 g11 + b0 D1 + ···
0 0 0 0 g21 0
. . . and so on.
EXAMPLE 7.2
Let A ∈ M3×3 (Q) such that there exists non-singular P ∈ M3×3 (Q) with
BA = AB,
i.e. T B TA = TA TB .
Now
1 1
[deg gcd(dk , dl )] =
1 2
so
a0 b0 (x − 1)
[ckl ] =
c0 d0 + d1 x
134
where a0 etc. ∈ Q, so (32) gives
Bu1 = a0 u1 + b0 (x − 1)u2
= a0 u1 − b0 u2 + b0 T (u2 ) (33)
Bu2 = c0 u1 + (d0 + d1 x)u2
= c0 u1 + d0 u2 + d1 T (u2 ). (34)
Noting that
mT,u1 = x − 1 ⇒ T (u1 ) = u1
and mT,u2 = (x − 1)2 = x2 − 2x + 1 ⇒ T 2 (u2 ) = 2T (u2 ) − u2 ,
In terms of matrices,
a0 c0 c0
B[u1 |u2 |T (u2 )] = [u1 |u2 |T (u2 )] −b0 d0 −d1
b0 d1 d0 + 2d1
i.e. BP = P K, say
or B = P KP −1 .
BA = AB.
Note: BA = AB becomes
P KP −1 P JP −1 = P JP −1 P KP −1
⇔ KJ = JK.
DEFINITION 7.1
(Tensor or Kronecker product)
135
If A ∈ Mm1 ×n1 (F ) and B ∈ Mm2 ×n2 (F ) we define
a11 B a12 B · · ·
A ⊗ B = a21 B a22 B · · · ∈ Mm1 m2 ×n1 n2 (F ).
.. .. ..
. . .
In terms of elements,
(A ⊗ B)(i,j),(k,l) = aij bkl
—the element at the intersection of the i-th row block, k-th row sub-block,
and the j-th column block, l-th column sub-block.4
EXAMPLE 7.3
a11
.. ···
.
a11
a21
A ⊗ Ip = ,
.. ···
.
a21
.. ..
. .
A 0 ···
Ip ⊗ A = 0 A · · · .
.. .. . .
. . .
(Tensor-product-taking is obviously far from commutative!)
(ii) A ⊗ B = 0 ⇔ A = 0 or B = 0;
(iii) A ⊗ (B ⊗ C) = (A ⊗ B) ⊗ C;
(iv) A ⊗ (B + C) = (A ⊗ B) + (A ⊗ C);
(v) (B + C) ⊗ D = (B ⊗ D) + (C ⊗ D);
4
That is, the ((i − 1)m2 + k, (j − 1)n2 + l)-th element in the tensor product is aij bkl .
136
(vi) (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD);
(vii) (B ⊕ C) ⊗ D = (B ⊗ D) ⊕ (C ⊗ D);
s Y
Y t
chf (A;B) = (x − f (λk , µl ));
k=1 l=1
Remark: (ix) can be proved using the uniqueness theorem for alternating
m–linear functions met in MP174; (x) follows from the the equations
P −1 AP = J1 and Q−1 BQ = J2 ,
137
and more generally
s X
t s X
t
cij (J1i ⊗ J2j ).
X X
(P ⊗ Q)−1 cij (Ai ⊗ B j )(P ⊗ Q) =
i=0 j=0 i=0 j=0
The matrix on the right–hand side is lower triangular and has diagonal
elements
f (λk , µl ), 1 ≤ k ≤ m, 1 ≤ l ≤ n.
THEOREM 7.3
Let β be the standard basis for Mm×n (F )—i.e. the basis consisting of
the matrices
E11 , . . . . . . , Emn
and γ be the standard basis for Mp×n (F ).
Let A be p × m, and
T1 : Mm×n (F ) 7→ Mp×n (F )
[T1 ]γβ = A ⊗ In .
Similarly if B is n × p, and
T2 : Mm×n (F ) 7→ Mm×p (F )
is defined by T2 (Y ) = Y B, then
[T2 ]δβ = A ⊗ In
COROLLARY 7.2
Let A be m × m,
B be n × n,
X be m × n, and
138
T : Mm×n (F ) 7→ Mm×n (F )
be defined by T (X) = AX − XB.
Then
[T ]ββ = A ⊗ In − Im ⊗ B t ,
where β is the standard basis for Mm×n (F ).
DEFINITION 7.2
For brevity in the coming theorems, we define
νA,B = ν(A ⊗ In − Im ⊗ B t )
where A is m × m and B is n × n.
THEOREM 7.4
νA,B = ν(A ⊗ In − Im ⊗ B t )
s X
X t
= deg gcd(dk , Dl )
k=1 l=1
where
d1 | d2 | · · · | ds and
D1 | D2 | · · · | Dt
proof With the transformation T from corollary 7.2 above, we note that
νA,B = nullity T
= dim{ X ∈ Mm×n (F ) | AX = XB }
= dim{ N ∈ Hom (Vn (F ), Vm (F )) | TA N = N TB }
139
Then
s X
X s
{min(mk , ml ) + min(nk , nl ) − 2 min(mk , nl )} ≥ 0.
k=1 l=1
proof
Case 1: k = l.
The terms to consider here are of the form
mk + nk − 2 min(mk , nk )
Since the sum of the diagonal terms and the sum of the pairs of sums
of off-diagonal terms are non-negative, the sum is non-negative. Also, if the
sum is zero, so must be the sum along the diagonal terms, making
mk = n k ∀k.
proof
140
We now extend the definitions of d1 , . . . , ds and D1 , . . . , Dt by renaming
them as follows, with N = max(s, t):
1, . . . , 1, d1 , . . . , ds 7→ f1 , . . . , fN
| {z }
N −s
and 1, . . . , 1, D , . . . , Dt 7→ F1 , . . . , FN .
| {z } 1
N −t
This is so we may rewrite the above sum of three sums as a single sum, viz:
N X
X N
νA,A + νB,B − 2νA,B = {deg gcd(fk , fl ) + deg gcd(Fk , Fl )
k=1 l=1
−2 deg gcd(fk , Fl )}. (35)
We now let p1 , . . . , pr be the distinct monic irreducibles in mA mB and write
fk = pa1k1 pa2k2 . . . par kr
1≤k≤N
Fk = pb1k1 pb2k2 . . . pbrkr
where the sequences {aki }ri=1 , {bki }ri=1 are monotonic increasing non-negative
integers. Then
r
Y min(aki , bli )
gcd(fk , Fl ) = pi
i=1
r
X
⇒ deg gcd(fk , Fl ) = deg pi min(aki , bli )
i=1
Xr
and deg gcd(fk , fl ) = deg pi min(aki , ali )
i=1
Xr
and deg gcd(Fk , Fl ) = deg pi min(bki , bli ).
i=1
Then equation (35) may be rewritten as
νA,A + νB,B − 2νA,B
N X
X N X
r
= deg pi {min(aki , ali ) + min(bki , bli )
k=1 l=1 i=1
−2 min(aki , bli )}
r
X N X
X N
= deg pi {min(aki , ali ) + min(bki , bli )
i=1 k=1 l=1
−2 min(aki , bli )}.
141
The latter double sum is of the form in lemma 7.1 and so, since deg pi > 0,
we have
νA,A + νB,B − 2νA,B ≥ 0,
proving the first part of the theorem.
Next we show that equality to zero in the above is equivalent to similarity
of the matrices:
EXERCISE 7.1
Show if if
P −1 A1 P = A2 and Q−1 B1 Q = B2
then
142
8 Further directions in linear algebra
1. Dual space of a vector space; Tensor products of vector spaces; exte-
rior algebra of a vector space. See C.W. Curtis, Linear Algebra, an
introductory approach and T.S. Blyth, Module theory.
3. Iterative methods for finding inverses and solving linear systems. See
D.R. Hill and C.B. Moler, Experiments in Computational Matrix Al-
gebra.
[1] N.J. Pullman. Matrix Theory and its Applications, 1976. Marcel
Dekker Inc. New York.
[2] M. Pearl. Matrix Theory and Finite Mathematics, 1973. McGraw–
Hill Book Company, New York.
[3] H. Minc. Nonnegative Matrices, 1988. John Wiley and Sons,
New York.
5. There are at least two research journals devoted to linear and multilin-
ear algebra in our Physical Sciences Library: Linear and Multilinear
Algebra and Linear Algebra and its applications.
143