Oliveira L Linear Algebra
Oliveira L Linear Algebra
Features
Lina Oliveira
First edition published 2022
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
Reasonable efforts have been made to publish reliable data and information, but the author and pub-
lisher cannot assume responsibility for the validity of all materials or the consequences of their use.
The authors and publishers have attempted to trace the copyright holders of all material reproduced
in this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.
com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermis-
[email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are
used only for identification and explanation without intent to infringe.
DOI: 10.1201/9781351243452
Publisher’s note: This book has been prepared from camera-ready copy provided by the authors.
To my daughters.
Contents
Preface xi
Biography xv
1 Matrices 1
1.1 Real and Complex Matrices . . . . . . . . . . . . . . . . . . . 1
1.2 Matrix Calculus . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.4 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . 37
1.4.1 LU and LDU factorisations . . . . . . . . . . . . . . . 43
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.6 At a Glance . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2 Determinant 55
2.1 Axiomatic Definition . . . . . . . . . . . . . . . . . . . . . . 55
2.2 Leibniz’s Formula . . . . . . . . . . . . . . . . . . . . . . . . 66
2.3 Laplace’s Formula . . . . . . . . . . . . . . . . . . . . . . . . 70
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.5 At a Glance . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3 Vector Spaces 81
3.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.2 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . 87
3.3 Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . 93
3.3.1 Matrix spaces and spaces of polynomials . . . . . . . . 99
3.3.2 Existence and construction of bases . . . . . . . . . . 101
3.4 Null Space, Row Space, and Column Space . . . . . . . . . . 107
3.4.1 Ax = b . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.5 Sum and Intersection of Subspaces . . . . . . . . . . . . . . . 116
3.6 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.8 At a Glance . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
vii
viii Contents
8 Appendix 285
8.1 Uniqueness of Reduced Row Echelon Form . . . . . . . . . . 285
8.2 Uniqueness of Determinant . . . . . . . . . . . . . . . . . . . 286
8.3 Direct Sum of Subspaces . . . . . . . . . . . . . . . . . . . . 287
9 Solutions 289
9.1 Solutions to Chapter 1 . . . . . . . . . . . . . . . . . . . . . 289
9.2 Solutions to Chapter 2 . . . . . . . . . . . . . . . . . . . . . 294
9.3 Solutions to Chapter 3 . . . . . . . . . . . . . . . . . . . . . 294
9.4 Solutions to Chapter 4 . . . . . . . . . . . . . . . . . . . . . 299
9.5 Solutions to Chapter 5 . . . . . . . . . . . . . . . . . . . . . 300
9.6 Solutions to Chapter 6 . . . . . . . . . . . . . . . . . . . . . 301
9.7 Solutions to Chapter 7 . . . . . . . . . . . . . . . . . . . . . 303
Bibliography 307
Index 309
Preface
This is a first course in Linear Algebra. It grew out of several courses given
at Instituto Superior Técnico, University of Lisbon, and is naturally shaped
by the experience of teaching. But, perhaps more importantly, it also imparts
the feedback of generations of students: the way they felt challenged and the
challenges they set in turn.
This is a book on Linear Algebra which could also be easily described
as a book about matrices. The opening chapter defines what a matrix is,
establishes the basics, while the following chapters build on it to develop the
theory and the closing chapter showcases special types of matrices present in
concrete applications – for example, life sciences, statistics, or the internet.
The book aims at conciseness and simplicity, for it is intended as an un-
dergraduate textbook which also allows for self-learning. In this trait, it sum-
marises the theory throughout in ‘How to. . . ’ text boxes and makes notes
of common mistakes and pitfalls. Every aspect of the content is assessed in
exercise and solution sections.
The narrative of the book is intertwined with matrices either as the main
subject or tools to explore the theory. There is not a chapter where they are
not present, be it in the forefront or the background. As it happens, each of
the first five chapters is anchored on a particular number or numbers, in this
order: the rank of a matrix, its determinant, the dimension of a vector space,
the eigenvalues of a matrix, and the dimensions of the null space and the image
of a linear transformation. The sixth chapter is about real and complex inner
product spaces. Chapters 7 and 9 are applications of the theory and solutions
to the exercises, respectively, whilst Chapter 8 is an appendix consisting of
the proofs of some results relegated to a later stage, as not to impair the flow
of the exposition.
Notwithstanding the simplicity goal of the presentation voiced above, the
book ventures a few times into more advanced topics that, given the mostly
self-contained nature of the book, call for more involved proofs. However,
it is the reader’s choice to avoid these topics at first (or even definitely),
if so wished. This will not be an impediment to the understanding of the
fundamentals of Linear Algebra.
At the end of each chapter, the respective contents are briefly highlighted
in the very synthetic ‘At a Glance’ sections, which will mostly make sense
for those who read the theory, solved the corresponding exercises, and, in the
xi
xii Preface
xiii
Biography
Lina Oliveira has a DPhil in Mathematics from the University of Oxford and
is a faculty member of the Department of Mathematics of Instituto Superior
Técnico, University of Lisbon. She has taught several graduate and under-
graduate courses at Instituto Superior Técnico where she regularly teaches
Linear Algebra. Her research interests are in the areas of Functional Analysis,
Operator Algebras, and Operator Theory. As of late, she has been responsible
for the Linear Algebra course at the Portuguese Air Force Academy.
xv
Chapter 1
Matrices
This first chapter is about matrices, the link binding all the subjects presented
in this book. After defining what a matrix is, we will move on into its properties
by developing a toolkit to work with, namely, Gaussian and Gauss–Jordan
eliminations, elementary matrices, and matrix calculus.
As said in the preface, almost every chapter has a particularly outstanding
number associated. In this chapter, this number is the rank of a matrix: it will
be used to classify systems of linear equations and decide if a matrix has an
inverse, for example, and, mostly, the whole chapter will revolve around it.
of scalars in K having k rows and n columns. Each number aij , for all indices
i = 1, . . . , k and j = 1, . . . , n, is called an entry of the matrix. The indices
DOI: 10.1201/9781351243452-1 1
2 Linear Algebra
i and j correspond, respectively, to the number of the row and the number of
the column where the entry-ij, i.e., the scalar aij is located.
The rows of the matrix are numbered from 1 to k, starting from the top,
and the columns of the matrix are numbered from 1 to n, starting from the
left.
A matrix whose entries are real numbers is called a real matrix or a
matrix over R, and a matrix whose entries are complex numbers is called a
complex matrix or a matrix over C.
is the scalar which is located in row 2 and column 3, that is, a23 = 7. This
matrix has two rows and four columns and therefore is a 2 × 4 matrix.
The matrix (1.1) can also be presented as [aij ] i=1,...,k , or simply as [aij ],
j=1,...,n
whenever the matrix size is clear from the context.
The matrix is said to be
(a) a rectangular matrix if k ̸= n;
(b) a square matrix if k = n, and in this case, the matrix is called a square
matrix of order n (or k);
(c) a column matrix or column vector if n = 1;
(d) a row matrix or row vector if k = 1.
For reasons that will become apparent in Chapter 3, in most cases, column
vectors will be referred to as vectors.
In what follows, a row of a matrix whose entries consist only of zeros will
be called a zero row. A zero column is defined similarly.
The first non-zero entry in each row of a row echelon matrix is called a
pivot.
Matrix A is in row echelon form and its pivots are 1, 4, and 6. The matrices
B and C are not row echelon matrices because matrix B does not satisfy
condition (ii) and matrix C does not satisfy condition (i) of Definition 1.
In the next section, we shall make use of matrices to solve systems of linear
equations. A key tool to be used in the process is the concept of an elementary
operation performed on the rows of a matrix, thereby obtaining a new matrix.
Definition 2 There exist three kinds of elementary row operations:
(i) exchanging two rows, i.e., exchanging row li with row lj , with i ̸= j;
4 Linear Algebra
The scalars α in this definition lie in the same field as the entries in the matrix.
For simplicity, in what follows elementary row operations may also be
called elementary operations.
Example 1.4 The three types of elementary operations will be illustrated with
the real matrix
3 1 −1 2
A = 1 −1 2 2 .
1 1 1 3
The first elementary operation to be performed is of type (i). We exchange
rows 1 and 3 of matrix A, obtaining matrix B:
1 1 1 3 3 1 −1 2
A = 1 −1 2 2 / 1 −1 2 2 = B.
l1 ↔l3
3 1 −1 2 1 1 1 3
i.e., l3 of M is
l3 = a31 + 2a11 a32 + 2a12 a33 + 2a13 a34 + 2a14 .
A / B, A / C, A / M.
l1 ↔l3 −3l2 l3 +2l1
Matrices 5
Rn = {(x1 , x2 , . . . , xn ) : x1 , x2 , . . . , xn ∈ R},
Cn = {(x1 , x2 , . . . , xn ) : x1 , x2 , . . . , xn ∈ C}.
Two systems of linear equations are said to be equivalent if they have
the same solution set. Systems are classified according to the nature of their
solution set. A system of linear equations is said to be:
In fact, such a system possesses, at least, the trivial solution, that is, the
solution where all the variables take the value 0.
6 Linear Algebra
The system of linear equations above is associated with the following ma-
trices
a11 a12 . . . a1n
a21 a22 . . . a2n
A= . Coefficient matrix, (1.3)
.. ..
.. . .
ak1 ak22 ... akn
b1
b2
b=. Column vector of independent terms, (1.4)
..
bk
a11 a12 ... a1n b1
a21 a22 ... a2n b2
[A|b] = . Augmented matrix.
.. .. ..
.. . . .
ak1 ak22 ... akn bk
(1.5)
The augmented matrix may also be presented without the vertical separation
line, i.e.,
a11 a12 . . . a1n b1
a21 a22 . . . a2n b2
[A|b] = . .. .
.. ..
.. . . .
ak1 ak22 . . . akn bk
Consider the following system of linear equations
x + y + z = 3
x − y + 2z = 2 . (1.6)
2x + y − z = 2
This fact will be used to ‘simplify’ the augmented matrix in order to ob-
tain a system equivalent to the given one but easier to solve. The aim is to
Matrices 7
reduce the augmented matrix [A|b] to a row echelon matrix using elementary
operations according to a method called Gaussian elimination.
O
1 1
1 −1 2
1 3
1 1
/ 0 −2 1
1 3
2 −1 · · ·
l2 −l1
2 1 −1 2 l3 −2l1 0 −1 −3 −4
O
1 1 1 3 1 1 1 3
··· / 0 -1 −3 −4 / 0 −1 −3 −4 .
l2 ↔l3 l3 −2l2
0 −2 1 −1 0 0 7 7
l ↔l
2 3 indicates that rows 2 and 3 have been exchanged;
l − 2l
3 2 indicates that row 2 multiplied by −2 has been added to row 3.
Notice that in the points above and throughout this book we use the fol-
lowing notation: when indicating an elementary operation of type (iii), the
first row to be written is the one to be modified.
The matrix
1 1 1 3
0 −1 −3 −4
0 0 7 7
is a row echelon matrix and the augmented matrix of the system of linear
equations
x + y + z = 3
−y − 3z = −4 . (1.7)
7z = 7
Moreover,
x + y + z = 3
x + y + z = 3
x − y + 2z = 2 ⇐⇒ −y − 3z = −4 .
2x + y − z = 2 7z = 7
Beginning to solve the system by the simplest equation, i.e., the third equation,
we have z = 1. Substituting z by its value in the second equation, we get
−y − 3 = −4,
that is, y = 1. Finally, using the first equation and the values of y and z
already obtained, we have
x+1+1=3
and, hence, x = 1. It immediately follows that the solution set S of system
(1.7) or, equivalently, of system (1.6) is S = {(1, 1, 1)}. Hence, we conclude
that the system (1.6) is consistent and has a unique solution.
Observe that, once the row echelon augmented matrix is obtained, we
begin solving the system from the bottom equation up, and we keep as-
cending in the system until the top equation is reached. This is called back
substitution.
Matrices 9
[A|b] / R;
GE
1 −1 −2 1 0
/ 0 −1 2 0 3 .
l3 +l2
0 0 0 0 0
It is obvious from (1.9) that the system (1.8) does not have a unique
solution. In fact, fixing (any) values for z and w, the values of x and y are
immediately determined. Hence, here the variables z, w can take any real
values and, once fixed, determine the values of the remaining variables. We
can say that z, w are the independent variables or free variables, whilst
x, y are the dependent variables.
As before, we used back substitution in (1.9) and expressed the variables
corresponding to the columns with pivots in terms of the variables correspond-
ing to the columns without pivots in the coefficient matrix. This will be the
rule throughout the book.
According to this rule, in (1.8) the dependent variables are x, y, and the
independent variables are z, w. The solution set S of system (1.8) is
S = {(x, y, z, w) ∈ R4 : x = 4z + w − 3 ∧ y = 2z − 3}
Hence,
x + y − 2z = 1
x + y − 2z = 1
−x + y = 0 ⇐⇒ 2y − 2z = 1 .
0 = − 12
y−z =0
Matrix A is a row echelon matrix but is not in reduced row echelon form
because it does not satisfy conditions (ii) and (iii). Matrix B is a matrix in
reduced row echelon form.
12 Linear Algebra
1 −2 0 2 1 −2 0 2
/ 0
−1 1 3 / 0
−1 0 −4
··· ···
l4 −l3 0 0 −1 −7 l2 +l3 0 0 −1 −7
0 0 0 0 0 0 0 0
1 0 0 10 1 0 0 10
0
/ −1 0 −4 0
/ 1 0 4
··· .
l1 −2l2 0 0 −1 −7 (−1)l3 0 0 1 7
(−1)l2
0 0 0 0 0 0 0 0
Matrices 13
Observe that in this proposition and, for that matter, also in Proposition
1.1, it is not required that R and R′ be obtained using Gaussian elimination.
R and R′ are just row echelon matrices obtained from A applying elementary
operations in no particular order.
Proof Applying Gauss–Jordan elimination to the matrices A, R, and R′ ,
by Proposition 1.1, we obtain the same reduced row echelon matrix M . Notice
that the Gauss–Jordan elimination forces both the number of pivots in R and
the number of pivots in M to coincide. Similarly, the number of pivots in R′
and M must also coincide. Hence, the result follows.
We are now ready to make one of the crucial definitions in the book.
Hence, since the last matrix is a row echelon matrix having two pivots, we
conclude that the rank of matrix A is rank (A) = 2.
Matrices 15
In the latter case, the only possibility is rank (A) < rank ([A|b]), and it follows
that the system is inconsistent. This means that a pivot appears in the column
corresponding to the vector b of independent terms. This in turn corresponds
to having an equation where the coefficients of all variables are zero whereas
the independent term is non-zero.
In fact, recalling the inconsistent system (1.10), we saw that
or
rank (A) < number of columns of A. (1.12)
In case (1.11), the system has a unique solution. In case (1.12), the system
has infinitely many solutions and, if A is a k × n matrix, then the number of
independent variables is n − rank (A).
We summarise this discussion in the proposition below.
Proposition 1.3 Let [A|b] ∈ Mk,n+1 (K) be the augmented matrix of a system
of linear equations. Then the following assertions hold.
(i) If rank (A) < rank ([A|b]), then the system is inconsistent.
16 Linear Algebra
(ii) If rank (A) = rank ([A|b]), then the system is consistent. In this case,
the number of independent variables coincides with n − rank (A), i.e.,
Example 1.8 Consider the system whose coefficient matrix A and vector b
of independent terms are, respectively,
1 0 0 0
A = 0 1 − i −2i , b = 1 − i .
0 1 1−i 1
Hence, we see that this is the case of a system in three complex variables, say,
z1 , z2 , z3 , having three equations.
Applying Gaussian elimination to the system’s augmented matrix [A|b],
we have
1 0 0 0 1 0 0 0
[A|b] = 0 1 − i −2i 1 − i
l2 ↔l3
/ 0 1 1−i 1
0 1 1−i 1 0 1 − i −2i 1 − i
l3 −(1−i)l2
1 0 0 0
/ 0 1 1−i 1 .
0 0 0 0
| {z }
R
Since the pivots in the matrix R are the grey entries, we conclude that
rank (A) = rank ([A|b]) = 2. Hence, the system is consistent and
A + 0 = A = 0 + A.
(iv) There exists a unique matrix −A ∈ Mk,n , called the additive inverse
of A, such that
A + (−A) = 0 = (−A) + A.
Proof The commutativity (i) and associativity (ii) follow easily from the
commutativity and associativity of the addition in K, and it is obvious that
the (unique) additive identity is the k × n zero matrix.
It is immediate that, given A = [aij ], its additive inverse is −A = [−aij ].
Matrices 19
the k × 1 column vector Ab whose entry (Ab)i1 is defined, for all indices
i = 1, . . . , k, by
n
X
(Ab)i1 = ai1 b11 + ai2 b21 + · · · + aij bj1 + · · · + ain bn1 = aij bj1 . (1.13)
j=1
20 Linear Algebra
To calculate row i of Ab, we need row i of matrix A and the column vector
b, as shown in (1.13).
Hence,
a11 a12 a1n
· · ·
· · ·
· · ·
ai1 + b21 ai2 + · · · + bn1 ain = b11 c1 + b21 c2 + · · · + bn1 cn ,
Ab = b11
· · ·
· · ·
· · ·
ak1 ak2 akn
(1.14)
Definition 9 Let A = c1 c2 . . . cn be a k × n matrix over K. A linear
combination of the columns of A is any (column) vector which can be
expressed as
α1 c1 + α2 c2 + · · · + αn cn ,
where α1 , α2 , . . . , αn are scalars in K.
Proposition 1.6 Let A = c1 c2. . . cn be a k × n matrix over K and
let
b11
b21
·
·
bj1
b=
·
·
·
bn1
be an n × 1 column vector. Then the product Ab is a linear combination of
the columns of A. More precisely,
AB = [(AB)ij ] i=1,...,k
j=1,...,n
a11 a12 . . . a1l ... a1p b11 b12 ... b1j ... b1n
a21 a22 . . . a2l ... a2p b21 b22 ... b2j ... b2n
.. .. .. .. .. .. .. ..
. . . . . . . .
= ai1 ai2 . . .
,
ail ... aip
bl1 bl2 ... blj ... bln
. .. .. .. .. .. .. ..
.. . . . . . . .
ak1 ak2 . . . akl ... akp bp1 bp2 ... bpj ... bpn
2 × 0 + (−1) × 1 + 5 × 0 = −1.
Matrices 23
(ii)
a1 B
a2 B
AB = . .
..
ak B
Hence
p
m
!
X X
dij = air brs csj .
r=1 s=1
On the other hand, eij is the product of row i of AB and column j of C, i.e.,
c1j
c2j
ai1 ai2 . . . aim B .
..
c1p
c1j
c2j
Pm Pm Pm
= r=1 air br1 r=1 air br2 ... r=1 a ir b rm . .
..
cpj
It follows that
p m
!
X X
eij = air brs csj
s=1 r=1
which, by the commutativity and associativity of scalar addition, yields finally
p
m
!
X X
eij = air brs csj = dij .
r=1 s=1
Matrices 25
For example,
0 1 1 2 3 4
= ,
1 0 3 4 1 2
whereas
1 2 0 1 2 1
= .
3 4 1 0 4 3
With this notation, this system can be presented as the matrix equation
Ax = b. (1.16)
Solving the system can now be reformulated as solving the matrix equation
(1.16). As we saw, Ax is a linear combination of the columns of A, hence:
Ax = 0. (1.17)
This homogeneous equation is always solvable since it has, at least, the trivial
solution x = 0.
Matrices can be partitioned into blocks and, as long as the sizes of the
blocks and the partitions are compatible, it is possible to devise a block mul-
tiplication. For example, let
1 2 3 4
A11 A12
A= 1 2
3 4 =
A21 A22
1 2 3 4
and
1 0 −1 0 0 0
0 1 0 −1 0 = B11
0 B12 B13
B=
.
−1 0 0 0 1 0 B21 B22 B23
0 −1 0 0 0 1
and it is not difficult to see that one can multiply the blocks as if they were
numbers, i.e.,
A11 B11 + A12 B21 A11 B21 + A12 B22 A11 B13 + A12 B23
AB = .
A21 B11 + A22 B21 A21 B12 + A22 B22 A21 B13 + A22 B23
Hence
−2 −2 −1 −2 3 4
A11 − A12 −A11 A12
AB = = −2 −2 −1 −2 3 4 .
A21 − A22 −A21 A22
−2 −2 −1 −2 3 4
Finally, we have
−2 −2 −1 −2 3 4
AB = −2 −2 −1 −2 3 4 .
−2 −2 −1 −2 3 4
If we now divide A into its columns and B into its rows, then AB coincides
with the sum
b1
b2
AB = a1 | a2 | . . . | ap . = a1 b1 + a2 c2 + · · · + ap cp .
..
bp
Then
1 2 −1 6 8
AB = 3 4 + 5 6 + 7 8 = .
1 2 −1 6 8
The next proposition summarises the above discussion and, with respect
to Proposition 1.7, gives a third possible way of calculating the product of two
matrices other than the definition (1.15).
Proposition 1.10 Let A ∈ Mk,p (K) and B ∈ Mp,n (K) be matrices such that
a1 , a2 , . . . , ap are the columns of A and b1 , b2 , . . . , bp are the rows of B.
Then,
b1
b2
AB = a1 | a2 | . . . | ap . = a1 b1 + a2 c2 + · · · + ap cp .
..
bp
Proof Exercise.
We end this section outlining four ways in which the product of two ma-
trices can be calculated.
28 Linear Algebra
A0 = Ik , An = AAn−1 (n ≥ 1)
9 8 7
is the 3 × 4 matrix
1 1 1 9
AT = 3 −1 1 8 .
2 2 −1 7
(i) (AT )T = A
(ii) (A + B)T = AT + B T
(iii) (αA)T = αAT
(iv) (AB)T = B T AT
It is worth noticing that properties (ii) and (iii) above can be seen in an
informal manner as, respectively, “the transpose of a sum is the sum of the
transposes” and “the transpose of a product is the product of the transposes
in the reverse order”.
The diagonal of an anti-symmetric matrix is null, i.e., all its entries are
equal to zero.
The diagonal of a symmetric matrix is ‘like a mirror’ reflecting the entries
on both sides of the diagonal.
When finding the transpose of a square matrix, the diagonal entries
remain unchanged.
Any square matrix A can be decomposed into a sum of a symmetric
matrix with an anti-symmetric matrix:
A = 12 (A + AT ) + 12 (A − AT ).
Matrices 31
(ii) tr(αA) = α tr A;
(iii) tr A = tr AT ;
(iv) tr(AB) = tr(BA).
as required.
Proof Exercise.
Multiplication of any two matrices in Mn (K) is always possible, and the
resulting product is again a square matrix of order n. Moreover, in Mn (K), by
Proposition 1.14, In is the multiplicative identity, that is, for all A ∈ Mn (K),
AI = A = IA. (1.18)
AB = I = BA. (1.19)
Example 1.17 It is not difficult to find the inverses of the matrices below.
(a) Observing that the product of the identity matrix (of any order) with
itself is again the identity matrix, we have
−1
1 0 1 0
= .
0 1 0 1
This shows that the identity matrix of order 2 is invertible and is its own
inverse. Clearly, the same applies to the identity matrix of any order.
(b) Consider now the matrix A = 2I. Since
1 1
(2I) I = I = I(2I),
2 2
we see that
−1 −1 1
2 0 1 0 0
A−1 = = 2 = 2 1 .
0 2 0 1 0 2
Proposition 1.15 Any square matrix having a zero row or a zero column is
not invertible.
Proof Let A−1 be the inverse of A and consider the system Ax = b. Then,
multiplying both members of this equality on the left by A−1 , it follows that
Hence, the system is consistent and has the unique solution x = A−1 b.
In Example 1.17, we found the inverses of two exceptionally simple matri-
ces. We need however to devise a method to obtain the inverse of any matrix,
should it exist. That is precisely what we will do next, using a concrete matrix
as a model.
Consider the matrix
1 −2
A= .
−1 1
We aim to find whether this matrix is invertible and, in the affirmative situ-
ation, find its inverse. That is, we need to find, if possible, a matrix
x1 x2
B=
y1 y2
Observing that both systems have the same matrix of coefficients, we will solve
them simultaneously using the Gauss–Jordan elimination method. Thus,
1 −2 1 0 / 1 −2 1 0 / 1 0 −1 −2
···
−1 1 0 1 l2 +l1 0 −1 1 1 l1 −2l2 0 −1 1 1
/ 1 0 −1 −2
··· .
−l2 0 1 −1 −1
Recall that the grey column corresponds to the system with variables x2 , y2 ,
whereas the adjacent column corresponds to the system with variables x1 , y1 .
Matrices 35
Keeping this in mind, it follows that the unique matrix B satisfying equation
AB = I is
−1 −2
B= .
−1 −1
At this point, we know that, if A is invertible, then B has to be its inverse.
Hence, to end our search for the inverse of A, it only remains to show that
BA = I. We only have to calculate the product BA and see that it equals I.
Hence,
−1 −2
A−1 = .
−1 −1
The calculations above can be summed up as follows. In order to find the
inverse of
1 −2
A= ,
−1 1
1. we solved the systems of linear equations AB = I using Gauss–Jordan
elimination:
1 −2 | 1 0 / 1 0 | −1 −2
[A|I] = = [I|B];
−1 1 | 0 1 GJE 0 1 | −1 −1
2. we verified that BA = I;
3. we concluded that A−1 = B.
Point 2. above can be avoided, as is shown in the proposition below.
Proposition 1.17 Let A, B be square matrices of order n over K and let I
be the identity matrix of order n. Then, AB = I if and only if BA = I.
Proof This result will be proved further on in the book (cf. Proposition
1.20),
Finally, the general procedure to obtain the inverse of a square matrix A,
should it exist, is summarised in the box below.
[A|I] / [I|A−1 ]
GJE
36 Linear Algebra
Proof We prove only (ii), the remaining assertions are left as an exercise.
Directly evaluating (B −1 A−1 )(AB) and keeping in mind that matrix mul-
tiplication is an associative operation , we have
(B −1 A−1 )(AB) = B −1 (A−1 A)B = B −1 IB = B −1 B = I.
It now follows from Proposition 1.17 that B −1 A−1 is the inverse matrix of
AB.
We have defined already the non-negative powers of a square matrix (see
Definition 11). We extend now the definition to negative powers of invertible
matrices by means of Proposition 1.18 (iii). In fact, this proposition allows for
unequivocally defining the integer powers of an invertible matrix A.
Definition 17 Let A be an invertible matrix of order k and let n ∈ N be a
positive integer. The power −n of A is defined by
A−n = (An )−1 = (A−1 )n .
Proposition 1.19 Let A be an invertible matrix and let r, s ∈ Z.
(i) Ar+s = Ar As
(ii) (Ar )s = Ars
Proof Exercise.
Matrices 37
A =P
′
ij A: the matrix A′ is obtained by exchanging rows i and j of A,
i.e., in the pre-established notation,
A / A′ = Pij A
li ↔lj
A =E
′
the matrix A′ is obtained from A adding to row i row j
ij (α)A:
multiplied by α, i.e.,
A / A′ = Eij (α)A
li +αlj
It is a simple exercise to see that the results above are true. It is however
desirable that one convinces oneself that the results do hold.
Having reached this point, we see that performing an elementary opera-
tion on a matrix A amounts to multiplying A on the left by the appropriate
elementary matrix. Summing it all up in a sort of
Matrices 39
Dictionary
A / A′ Pij A = A′
li ↔lj
A / A′ Eij (α)A = A′
li +αlj
A / A′ Di (α)A = A′
αli
Example 1.18 We examine now under this new light of elementary matrices
the calculations we made to find the inverse matrix of
1 −2
A= .
−1 1
The elementary operations that were performed on matrix [A|I] correspond to
the following sequential multiplications by elementary matrices:
Elementary operations Multiplication by elementary matrices
l2 + l1 E21 (1)A | E21 (1)I
l1 + (−2)l2 E12 (−2)E21 (1)A | E12 (−2)E21 (1)I
(−1)l2
D2 (−1)E12 (−2)E21 (1)A | D2 (−1)E12 (−2)E21 (1)I
Hence
D2 (−1)E12 (−2)E21 (1)A = I D2 (−1)E12 (−2)E21 (1)I = A−1
(cf. §1.3). It follows that A−1 is a product of elementary matrices, i.e.,
A−1 = D2 (−1)E12 (−2)E21 (1).
A common mistake. One should keep in mind that, in the Gaussian elimi-
nation or in the Gauss-Jordan elimination, the elementary matrices are always
sequentially multiplied on the left.
All elementary matrices are invertible and their inverses are also elemen-
tary matrices. It is left as an exercise to see that
40 Linear Algebra
−1 1
(Di (α)) = Di
α
Next we prove Proposition 1.17 whose statement we recall. Before, however,
we make a note of a simple but useful fact to be applied in the proof of this
proposition.
The reduced row echelon form of an invertible matrix is the identity matrix.
(Why?)
Proposition 1.20 Let A, B be n×n matrices over K and let I be the identity
matrix of order n. Then AB = I if and only if BA = I.
E1 E2 . . . Ek AB = E1 E2 . . . Ek I.
E1 E2 . . . Ek A B = E1 E2 . . . Ek .
| {z }
I
(AB)B −1 = IB −1 ⇐⇒ A = B −1 .
and
A is invertible ⇒ rank (A) = n.
We begin by proving the first implication. Suppose that A is a square matrix
of order n having rank n. Then, by Proposition 1.3 (ii), the systems of linear
equations
x11 1 x12 0 x1n 0
x21 0 x22 1 x2n 0
A . = . A . = . ... A . = .
.. .. .. .. .. ..
xn1 0 xn2 0 xnn 1
are consistent and each one of them has a unique solution. Hence, there exists
an n × n matrix B such that AB = I. By Proposition 1.20, it follows that A
is invertible.
As to the second implication, we shall prove equivalently that
Before the proof of this theorem, observe that (iii) is a very striking asser-
tion. It says, in other words, that
Any invertible matrix, and hence all invertible matrices, can be expressed
as products of elementary matrices. In this sense, elementary matrices are
‘generators’ of the invertible matrices.
The equivalence between (i) and (ii) has been proved already (cf. Proposi-
tion 1.21).
(ii) ⇒ (iii) Let R be the reduced row echelon form of A. Then there exist
elementary matrices E1 , E2 , . . . , Ek such that E1 E2 . . . Ek A = R.
Since by definition rank (A) = rank (R), we have rank (R) = n. It follows
that R = I and E1 E2 . . . Ek A = I. Multiplying on the left both members of
this equality sequentially by E1 −1 , E2 −1 , . . . , Ek −1 ,
A = Ek −1 · · · E2 −1 E1 −1 .
E1 E2 . . . Ek A = I.
E E ...E Ax = 0 ⇐⇒ x = 0.
| 1 2 {z k }
I
(vi) ⇒ (vii) We show firstly that, for each column vector b, the system
Ax = b is consistent.
Suppose that, on the contrary, there exists b such that Ax = b is inconsis-
tent. Hence, by Proposition 1.3, we must have rank (A) < n and, consequently,
the homogeneous system Ax = 0 has infinitely many solutions which contra-
dicts the initial assumption.
Matrices 43
We see next that the system Ax = b has a unique solution. Suppose that
x1 , x2 are solutions of Ax = b. Then
Ax1 = Ax2 ⇐⇒ A(x1 − x2 ) = 0.
Since we assumed that the homogeneous system only admits the trivial solu-
tion, then
x1 − x2 = 0 ⇐⇒ x1 = x2 .
(vii) ⇒ (i) We want to show that A is invertible, that is, we want to show
that there exists a matrix B such that AB = I (cf. Proposition 1.20). In other
words, we must show that the n systems below are all consistent 1 :
1 0 0
0 1 0
Ax = ... Ax = ... ... Ax = ...
0 0 0
0 0 1
But this is exactly what assertion (vii) guarantees, since whichever vector b
might be considered the system Ax = b is consistent (and has a unique solu-
tion). Hence A is invertible, as required.
1 Notice that, if these n systems are simultaneously consistent, then each one of them
has a unique solution, given the uniqueness of the inverse matrix (cf. Lemma 1.1).
44 Linear Algebra
where
1 0 0
L = 2 1 0 .
0 −1 1
3. Then A = LU .
yielding
1 0 0 1 0 0 1 1 −1
A = LDU ′ = 2 1 0 0 −2 0 0 1 −5/2 .
0 −1 1 0 0 3 0 0 1
| {z }| {z }| {z }
L D U′
Matrices 47
U ′ = D1 (u−1 −1 −1
11 )D2 (u22 ) . . . Dn (unn )U.
Example 1.21 Consider the matrix A of Example 1.19 and its LU factori-
sation. We shall make use of A = LU to solve the system
1 1 −1 x1 1
Ax = 2 0 3 x2 = 0
0 2 −2 x3 −1
in two steps. In the first step, we solve the system
1 0 0 y1 1
Ly = 2 1 0 y2 = 0 .
0 −1 1 y3 −1
48 Linear Algebra
y2 = −2y1 = −2
and
y3 = y2 − 1 = −3.
In the second step, we have
1 1 −1 x1 1
U x = 0 −2 5 x2 = −2 .
0 0 3 x3 −3
Using back substitution, it follows that x3 = −1,
1 3
x2 = − (−5x3 − 2) = −
2 2
and
3 1
x1 = −x2 + x3 − 1 = −1−1=− .
2 2
1.5 Exercises
EX 1.5.1. Find which of the equations are linear.
√ 1
(a) 12x1 + x2 − 8− 7 x3 = 10 (b) − x1 + x1 x2 + 2x3 = 0
√ 2 1 1
(c) v − π = eu + e 3 z − 2πw (d) y 7 − 6x + z = 5 4
2ix + 2iy + 4iz = 0
w − y − 3z = 0
(c)
2w + 3x + y + z = 0
− 2iw + ix + 3iy − 2iz = 0
Matrices 49
x + 2y − 3z = −1
8x1 + x2 + 4x3 = −1
− 3x − 3y − 6z = −24
(a) (b) − 2x1 + 5x2 + 2x3 = 1
3 7
x− y+z =5 x1 + x2 + x3 = 0
2 2
3 1
2w + 4x − 2y = 8
− v + w =
2 2
− 3x + 3y = −9
(c) 5 (d)
2u + 2v + w = − 2u − 4v − w − 7x = −7
3
2w + 6x − 4y = 14
6u + 12v − 6w = −4
EX 1.5.5. For each of the sets listed, find a system of linear equations whose
solution set is that set.
(a) {(1, 2, 3)}
(b) {(1, 2, t) : t ∈ R}
(c) {(y, −3y, y) : y ∈ R}
(d) {(x, 2x − z − w, z) : x, z, w ∈ R}
EX 1.5.6. Which of the 3 × 3 matrices are row echelon matrices? Which
matrices are in reduced row echelon form? What is the rank of
50 Linear Algebra
each matrix?
3 0 0 2 0 0 0 1+i 0
(a) 0 3 0 (b) 0 2 0 (c) 0 0 12i
0 0 3 0 0 0 0 0 0
1 0 0 0 5 0 1 6 0
(d) 0 0 −1 (e) 3 0 0 (f) 0 1 0
0 0 0 0 0 0 0 0 0
1+i 0 0 0 0 0 0 2 0
(g) 0 0 0 (h) 0 0 0 (i) 0 1 0
0 0 1−i 0 0 0 0 0 0
2 1 0 20 −10 0 2 10 0
(j) 0 −2 + i 0 (k) 0 0 −60 (l) 0 −1 20
0 1 + 5i 1 + i 0 0 2 0 0 0
EX 1.5.7. Find the reduced row echelon form and the rank of the matrix
1 1 1 2 0
A = 2 1 2 2 1 .
1 1 2 1 0
EX 1.5.8. Let
1 1 0 x 0
Aα = 0 α α and consider the system Aα y = 0 ,
−1 0 α2 z β
A is a 1 × 3 matrix B is a 3 × 1 matrix
C is a 1 × 3 matrix D is a 3 × 3 matrix
EX 1.5.13. Find the 3 × 3 anti-symmetric matrix A = [aij ] such that, for all
j < i,
aij = i − j.
is a column of A.
0 −i
EX 1.5.15. Find an expression for An , where A = .
i 0
EX 1.5.16. Fill in the entries of the matrix
−1 2 ...
A= 0 1 . . .
1 ... ...
EX 1.5.18. For
1 2
A= ,
0 1
find A3 , A−3 , A2 − 2A + I, and (A − I)2 . Solve the equation
A−1 X(A + I)2 = A + AT .
0 0 0 1
is such that A = E1 E2 . Write A−1 as a matrix product, using E1
and E2 .
EX 1.5.21. Let A be a 3 × 3 real matrix such that
A = E1 E2 R,
where R is a rank-2 row echelon matrix, and
E1 = D3 (−1) E2 = E21 (3).
I. A = E2 R.
II. A ia an invertible matrix.
III. A has a single zero row.
IV. The system
h i of linear equations
0
Ax = 0 might be consistent.
1
1.6 At a Glance
In a nutshell, this chapter is about the introduction of an object, the
matrix, and of the development of a toolkit to effectively extract knowledge
about the object, which will be put to use in the following chapters.
Matrices are crucial in the book, and several fundamental notions related
with matrices were established here and will be relied upon in the remainder
of the book, notably, the rank and the inverse of a matrix. To be determined,
both rank and inverse lean on two methods known as Gaussian elimination
and Gauss–Jordan elimination. These methods aim at finding, respectively,
a row echelon form and the (unique) reduced row echelon form of a matrix.
Gaussian and Gauss–Jordan eliminations will be used extensively throughout
the book and matrices are mostly what this book is about.
We know now how to operate with matrices (addition, multiplication by
a scalar, and multiplication) and to do elementary operations by means of
elementary matrices. In fact, invertible matrices are exactly the products of
elementary matrices.
In this chapter, matrices were applied to the solution of systems of lin-
ear equations via Gaussian elimination or Gauss–Jordan elimination. These
eliminations led to the LU and the LDU factorisations of matrices involv-
ing the products of diagonal, lower triangular, and upper triangular matrices.
Matrices of all these types will play a decisive role in what follows.
Chapter 2
Determinant
§
A word of advice: if seeing a function defined using axioms rather than
formulas causes some anxiety, then the reader is advised to go firstly to 2.2,
DOI: 10.1201/9781351243452-2 55
56 Linear Algebra
§
where this function is defined in a traditional way, and then come back to 2.1.
In the end, however, it will be apparent that the easiest way of calculating the
determinant is that of the present section.
det : Mn (K) → K
A 7→ det A
(Ax1) det I = 1;
.. ..
. .
det αli = α det li
.. ..
. .
l1
l1
l1
... ..
. ...
l l
i−1 l
i−1
i−1
′
det i i = det i + det l′i ,
l +l l
li+1 l i+1 li+1
.
..
.. .
..
.
ln ln ln
where li , l′i
are matrix rows. The number det A is called the determinant of
the matrix A.
whose rows coincide except possibly for row i, then one can construct a new
matrix l
1
...
l
i−1′
C = det i i
l +l (2.2)
li+1
.
..
ln
whose rows coincide with those of A and B, except possibly for row i which is
the sum of the corresponding rows of A and B, one has that the determinant
det C of the matrix C satisfies
It is not obvious that these axioms define a function or even that this
function is uniquely defined. In fact, this is the case: there exists a unique
function satisfying (Ax1)–(Ax3) in Definition 19. The existence and unique-
ness of the determinant function will be shown in due course (see Sections
2.2 and 8.2), but first we want to make clear that we can calculate the de-
terminant of any given matrix just by abiding to the rules (Ax1)–(Ax3) above.
det[α] = α det[1] = α1 = α.
Example 2.2 Now let A be a diagonal matrix. Repeatedly using (Ax3),
··· ···
a11 0 0 0 1 0 0 0
0
a22 0 ··· 0
0
a22 0 ··· 0
det A = det 0
0 a33 ··· 0 = a11 det 0
0 a33 ··· 0 =
. .. .. .. .. . .. .. .. ..
.. ..
. . . . . . . .
0 0 0 ··· ann 0 0 0 ··· ann
··· ···
1 0 0 0 1 0 0 0
0
1 0 ··· 0
0
1 0 ··· 0
= a11 a22 det 0
0 a33 ··· 0 = a11 a22 a33 . . . ann det 0
0 1 ··· 0 .
. .. .. .. .. . .. .. .. ..
.. ..
. . . . . . . .
0 0 0 ··· ann 0 0 0 ··· 1
a11 0 0 ··· 0
0
a22 0 ··· 0
det 0
0 a33 ··· 0 = a11 a22 a33 . . . ann det I
.. .. .. .. ..
. . . . .
0 0 0 ··· ann
On the other hand, by Axiom (Ax2), det(Pij A) = − det A. It then follows that
Hence
det A = − det A ⇔ 2 det A = 0 ⇔ det A = 0.
(ii) Let li be a zero row of matrix A and let A′ be the matrix obtained from
A by multiplying row li by α = 0. By Axiom (Ax3), we have
...
l
i
A = ... .
l
j
.
..
ln
.. . .
. ..
. l .
l li−1
i−1 i−1
li +αlj
li
αlj
li+1 li+1 li+1
det
.
= det
. + det
.
.. . ..
.
l
j lj l
j
. . .
.. .. ..
ln ln l
l1
n l
1
.. ..
. l .
li−1 i−1
l
i lj
li+1 li+1
. .
= det . + α det
. .
. l.
lj j
. .
.. ..
ln ln
Observing that the last matrix has two equal rows, assertion (i) of this propo-
sition yields l
1 l1
... ..
.
l
i−1 li−1
li +αlj li
li+1 li+1
det
. = det + 0 = det A,
.. .
.
.
l
l j
j
. .
.. ..
ln ln
In case (2), let akk be the first from below zero entry in the diagonal.
Using only elementary operations of type (iii) in Definition 2 and the (non-
zero) entries ann , . . . , ak+1,k+1 , we can change row k into a zero row (see the
grey rows in the matrices below).
a11 a12 · · · a1k a1,k+1 ··· a1n
0 a22 · · · a2k a2,k+1 ··· a2n
.. .. . . ..
.. .. ..
. . . .
/
A= 0 0 · · · 0 a k,k+1 · · · a kn
GJE
0
0 · · · 0 ak+1,k+1 · · · a k+1,n
. .. .. .. .. ..
.. . . . . .
0 0 ··· 0 0 ··· ann
a11 a12 · · · a1k 0 · · · a1n
0 a22 · · · a2k 0 · · · a2n
.. .. . . ..
.. .. ..
.
. . .
A= 0 0 · · · 0 0 · · · 0
0
0 · · · 0 ak+1,k+1 · · · 0
. . . . . .
.. .. .. .. .. ..
0 0 ··· 0 0 · · · ann
Observing that, by Proposition 2.1 (iii), the elementary operations used do
not change the determinant,
Determinant 61
a11 a12 ··· a1k 0 ··· a1n
0 a22 ··· a2k 0 ··· a2n
.. .. .. .. ..
..
.
. . . . .
0
det A = det 0 ··· 0 0 ··· 0 = 0.
0
0 ··· 0 ak+1,k+1 ··· 0
. .. .. .. .. ..
.. . . . . .
0 0 ··· 0 0 ··· ann
a11 a12 ··· a1n
0 a22 ··· a2n
det . .. = a11 a22 · · · ann .
.. ..
. .
0 ··· 0 ann
The determinant of A is
3 −3 −3 1 −1 −1 1 −1 −1
|A| = 0 1 −1 = 3 0 1 −1 = 3 0 1 −1 =
−1 0 0 −1 0 0 0 −1 −1
| {z }
|B|
1 −1 −1
=3 0 1 −1 = 3(−2) = −6.
0 0 −2
62 Linear Algebra
Hence
a b a b
|A| = = = ad − bc.
c d 0 d − ac b
a=0
0 b / c d
A= ,
c d L1 ↔L2 0 b
We have
0 b c d
|A| = =− = −cb = ad − bc.
c d 0 b
In summary,
a b
|A| = = ad − bc.
c d
Determinant 63
Proof (i) ⇒ (ii) Suppose that A invertible. Then, by Theorem 1.1, there
exist elementary matrices E1 , . . . , Ek such that
A = E1 E2 · · · Ek .
Suppose then that A is singular. By Theorem 1.1, we know that rank A < n
and that, consequently, the reduced row echelon form R of matrix A has, at
least, one zero row.
It is also the case that there exist elementary matrices E1 , . . . , Em such
that
E1 E2 · · · Em A = R.
By Propositions 2.1, 2.2, we have
Since the determinants of the elementary matrices are all different from zero,
it follows that |A| = 0.
|AB| = |A||B|.
|AB| = |E1 E2 · · · Em B|
= |E1 ||E2 · · · Em B|
= |E1 ||E2 | · · · |Em ||B|
= |E1 E2 · · · Em ||B|
= |A||B|.
If A is singular, then its reduced row echelon form R has a zero row (see
Theorem 1.1). That is, there exist elementary matrices E1 , . . . , Er such that
since RB has a zero row (see Proposition 2.1 (ii)). By Proposition 2.2,
0 = |AB| = |A||B|,
as required.
An easy consequence of this proposition is the following result.
|A−1 | = |A|−1 .
Determinant 65
|AA−1 | = |A||A−1 |,
and, therefore,
|Eij (α)T | = |Eji (α)| = 1 = |Eij (α)|.
|AT | = |A|.
|AT | = |(E1 E2 · · · Em )T |
T
= |Em · · · E2T E1T |
T
= |Em | · · · |E2T ||E1T |
= |Em | · · · |E2 ||E1 |
= |E1 ||E2 | · · · |Em |
= |E1 E2 · · · Em |
= |A|.
|AT | = 0 = |A|.
Observe that any lower triangular matrix can be obtained as the trans-
posed matrix of an upper triangular matrix and that transposition does not
change the determinant, as seen in Proposition 2.4. This leads immediately to
the next proposition.
66 Linear Algebra
and
0 1 0 1 2
0 0 1 2 = 3 = p4 .
1 0 0 3 1
| {z }
P2
of Example 2.3. We have that the permutations p1 , p4 , p5 are even and corre-
spond to the products
a11 a22 a33 = 0, a12 a23 a31 = −3, a13 a21 a32 = 0.
By Leibniz’s formula,
det A = (0 − 3 + 0) − (3 + 0 + 0) = −6.
a1p′ (1) a2p′ (2) . . . aip′ (i) . . . ajp′ (j) . . . anp′ (n) ,
with p′ (i) = p(j), p′ (j) = p(i) and p′ (k) = p(k), for k ̸= i, j. Moreover, all
permutations p′ are obtained in this way. Then,
X
det A′ = sign(p′ )a1p′ (1) a2p′ (2) . . . aip′ (i) . . . ajp′ (j) . . . anp′ (n) .
p′ ∈PS
But p coincides with p′ except for the extra exchange between p(i) and p(j).
Consequently, the parity of the permutations change and we have
sign(p′ ) = −sign(p).
Determinant 69
It follows that
X
det A′ = sign(p′ )a1p′ (1) a2p′ (2) . . . aip′ (i) . . . ajp′ (j) . . . anp′ (n)
p′ ∈PS
X
= sign(p′ )a1p(1) a2p(2) . . . aip(j) . . . ajp(i) . . . anp(n)
p′ ∈PS
X
= −sign(p)a1p(1) a2p(2) . . . aip(i) . . . ajp(j) . . . anp(n)
p∈PS
= − det A.
Finally, we tackle (Ax3). It is clear that the formula (2.4) satisfies the first
equality in (Ax3). Now, let A = [aij ], B = [bij ], and C = [cij ] be matrices as
in (2.1), and (2.2). Then
X
det C = sign(p)c1p(1) c2p(2) . . . cip(i) . . . cnp(n)
p∈PS
X
= sign(p)c1p(1) c2p(2) . . . (aip(i) + bip(i) ) . . . cnp(n)
p∈PS
X
= sign(p)c1p(1) c2p(2) . . . aip(i) . . . cnp(n)
p∈PS
X
+ sign(p)c1p(1) c2p(2) . . . bip(i) . . . cnp(n)
p∈PS
X
= sign(p)a1p(1) a2p(2) . . . aip(i) . . . anp(n)
p∈PS
X
+ sign(p)b1p(1) b2p(2) . . . bip(i) . . . bnp(n)
p∈PS
= det A + det B,
as required.
It is now clear that there exists a determinant function, i.e., a function
that satisfies (Ax1)–(Ax3). This was shown by using Leibniz’s formula for the
determinant. As said before, this way of calculating the determinant is useful,
at least to show that there exists in fact a determinant function, but is far
from being practical when it comes to actual calculations. It will never be
used in the remainder of the book, apart from proving Laplace’s formula in
the next section.
70 Linear Algebra
Proof This proof follows closely §11.3 of [13]. Given a matrix A, we know
that, by (2.4),
X
det A = sign(p)a1p(1) a2p(2) . . . aip(i) . . . anp(n) . (2.6)
p∈PS
Notice that, if li is some fixed row of A, then each summand has a single entry
from li .
Consider firstly the permutations p ∈ PS such that p(1) = 1 and denote
by PS′ this subset of permutations. That is to say, we are speaking of all the
summands of the form a11 a2p(2) . . . aip(i) . . . anp(n) . In this case, the sum of all
this summands gives
X X
sign(p)a11 a2p(2) . . . aip(i) . . . anp(n) = a11 sign(q)a2q(2) . . . anq(n)
p∈PS′ q∈S1
= a11 M11
= a11 (−1)1+1 M11
= a11 C11
Consider now the general case where p(i) = j and, therefore, we have all
summands in (2.6) containing the entry aij as a factor. Let B be the matrix
ai1 ai2 ··· ain
a11 a12 ··· a1n
..
.
B= ai−1,1 ai−1,2 · · · ai−1,n
.
ai+1,1
ai+1,2 · · · ai+1,n
.
..
an1 an2 ··· ann
It follows that
Notice that, since transposition does not modify the determinant, when two
columns are exchanged, as with rows, the determinant is multiplied by −1.
We have shown above that
where σ is the sum of all the summands in the Leibniz’s formula for det B ′ ,
′
which do not have aij as a factor and [B11 ] is the submatrix of B ′ obtained
′
by deleting row 1 and column 1 of B . Observing that
′
det[B11 ] = Mij ,
it follows that
where µ is the sum of all summands in (2.6) which do not contain aij as a
factor. This ends the proof.
72 Linear Algebra
Example 2.7 We apply formula (2.5) of Theorem 2.2 to find the determinant
of
1 0 2
A = 3 0 −1 ,
1 3 1
fixing row 3, i.e., i = 3. It follows
1 0 2
|A| = 3 0 −1 = a31 C31 + a32 C32 + a33 C33
1 3 1
0 2 1 2 1 0
= 1(−1)3+1 + 3(−1)3+2 + 1(−1)3+3
0 −1 3 −1 3 0
= 0 − 3(1 × (−1) − 2 × 3) + 0
= 21
Proof Exercise. (Hint: use the fact that transposition does not change the
determinant.)
1 0 2
|A| = 3 0 −1 = a12 C12 + a22 C22 + a32 C32
1 3 1
3 −1 1 2
= 0 × (−1)1+2 + 0 × (−1)2+2
1 1 1 1
1 2
+ 3 × (−1)3+2
3 −1
= 0 + 0 − 3 × (1 × (−1) − 2 × 3)
= 21
It is clear from Examples 2.7 and 2.8 that a right choice of the row or the
column may simplify considerably the calculations.
Determinant 73
Proof (i) We calculate firstly the diagonal entries of A adj A. The entry-ii
is
n
X
(A adj A)ii = aij Cij .
j=1
That is,
a11 a12 a1i a1n b1
a21 a22 a2i a2n b2
x1 . + x2 . + · · · + xi . + · · · + xn . = . .
.. .. .. .. ..
ak1 ak2 ani ann bn
2.4 Exercises
EX 2.4.1. Use Gaussian elimination to calculate the determinant of
6 −12 18
A = −3 1 4
6 7 −1
Complete:
2
det(−2A−3 ) = −−−− , det AB T = −−−− ,
EX 2.4.3. Let A be a 4×4 matrix such that |A| = −2. Consider the following
assertions.
I) The diagonal entries of A might be all equal to zero.
II) |(2A)−1 | = −1/32.
III) |(2A)−1 | = −1/4.
IV) |(−AT )2 | = 4.
The complete list of correct assertions is
A) I, III, IV B) I, II C) I, II, IV D) III, IV
Determinant 77
30 60 1 29
and the matrix B = E23 (−1) A E34 (−5). Consider also the follow-
ing assertions.
I) B is invertible.
II) det(B) = 52 .
III) det( 21 B)−1 = − 23 .
IV) B −1 = E23 (1) A−1 E34 (5) .
The complete list of correct assertions is
A) I, II B) I, II, IV C) III, IV D) II, IV
EX 2.4.7. Determine
1 −1 −1
adj 0 1 0 .
0 0 1
EX 2.4.8. Find the cofactor C14 and the entry (A−1 )41 of the inverse matrix
of
15 3 6 −4i
−15 1 1 30i
A= 15 3 7 20i .
15 3 8 12i
78 Linear Algebra
where α, β ∈ C.
(a) Find all α ∈ C for which Aα is not invertible.
(b) With α = −i and β = 1, use Cramer’s rule to find the solution of
the system Aα x = bα,β .
EX 2.4.11. Let 2
a 3 0
A=5 0 a ,
a2 a 0
where a is a real number. Answer the following questions without
calculating A−1 , should it exist.
2.5 At a Glance
The determinant is a K-valued function defined on the square matrices in
Mn (K). It can be calculated either using Gaussian elimination or a formula,
Leibniz’s or Laplace’s. Easiest to calculate is the determinant of a triangular
matrix for it is the product of its diagonal entries.
Matrix multiplication is not commutative but |AB| = |BA|. Moreover, the
determinant is invariant under transposition.
The determinant can be used as a test for invertibility since invertible
matrices are those having a non-zero determinant, and the inverse can be
calculated by means of determinants.
Cramer’s rule gives an explicit formula to obtain the solution of a system
of linear equations as quotients of determinants, under the constraint that the
system has a square coefficient matrix and a unique solution.
Chapter 3
Vector Spaces
If one were to be asked what matrices, polynomials, and vectors have in com-
mon, the first answer to spring to mind would be ‘nothing’. This chapter is
about proving this answer wrong.
What links matrices, polynomials, and vectors is the concept of vector
space. We shall see how far reaching it is to look at these apparently far re-
moved entities through the lens of this abstract concept.
§
the previous ones, we will see that vector spaces and matrices are definitely
entangled. This entanglement is so strong that we will introduce in 3.4 four
fundamental vector spaces associated with any given matrix . Their relevance
is such that these vector spaces will be present in all of the remainder of the
book.
To remain true to our purpose, the outstanding number in this chapter is
the dimension of a vector space, of which much of the theory revolves around.
DOI: 10.1201/9781351243452-3 81
82 Linear Algebra
addition +:V ×V →V
(u, v) 7→ u + v
u+0=u=0+u
u + (−u) = 0 = (−u) + u
(v) α(u + v) = αu + αv
(vi) (αβ)u = α(βu)
(vii) (α + β)u = αu + βu
(viii) 1u = u
When K = R (respectively, K = C), V is also called a real vector space
(respectively, a complex vector space).
An element of a vector space is said to be a vector or point. Axioms
(i),(ii) say, respectively, that the addition of vectors is commutative and asso-
ciative. We can also see in (v) and (vii) that the multiplication by a scalar is
distributive relative to the addition of vectors and that the multiplication by
a vector is distributive relative to the addition of scalars.
The additive identity is unique: if 0 and 0̃ were additive identities, then,
by (iii) above,
0 = 0 + 0̃ = 0̃.
Vector Spaces 83
(αa1 , αa2 )
(a1 + b1 , a2 + b2 )
αu
u+v
(a1 , a2 )
(b1 , b2 )
u v
§
vector space over K. That is, these operations satisfy Axioms (i)–(viii). We
shall see in 3.3 that Kn , although being a concrete example of a vector space
over K, captures the essence of these spaces and will act as a model and a
source of insight for the general spaces.
At this point, it is worthwhile noting that the vectors of Rn are a natural
generalisation of the R2 -plane vectors and the R3 -space vectors. The same
applying to the operations of addition and scalar multiplication. Figure 3.1
illustrates these operations on the R2 -plane.
Example 3.1 Let A be a k×n matrix over K and let Ax = 0 be the associated
homogeneous system of linear equations. The solution set S = ̸ ∅ of this system
is contained in Kn and, together with the restriction of the addition and scalar
multiplication of Kn , is itself a real vector space.
To see this, we begin by showing that S is closed for the addition of vectors
and scalar multiplication, that is, given x, y, z ∈ S and α ∈ R, the vectors
x + y, αz lie in S. In fact,
A(x + y) = Ax + Ay = 0, A(αz) = αAz = 0,
which shows that x + y, αz ∈ S, i.e., the vectors x + y, αz are solutions of
Ax = 0.
The above ensures that addition of vectors and scalar multiplication are
well-defined in S. Notice also that 0 ∈ Kn lies in S (Ax. (iii)) and that, for
each x ∈ S, its additive inverse −x = (−1)x lies in S (Ax. (iv)).
It would remain to show that Axioms (i),(ii), and (v)–(viii) are satisfied.
But that we know for granted, since the operations on S are the restrictions
of those in Kn .
Vector Spaces 85
Exercise 3.1 Show that the following are real vector spaces.
a) The set Mk,n (R) of real k×n matrices with the usual addition of matrices
and multiplication of a matrix by a scalar.
b) The set Pn of real polynomials
p(t) = a0 + a1 t + · · · + an tn , a0 , a1 , . . . , an ∈ R
of degree less that or equal to n with the usual addition of real functions
and multiplication of a function by a scalar.
c) The set P of real polynomials (of any degree) with the usual addition of
real functions and multiplication of a function by a scalar.
d) The set C([a, b]) of continuous real functions on the real interval [a, b],
with a < b, endowed with the usual addition of real functions and mul-
tiplication of a function by a scalar.
Exercise 3.2 Show that the following are complex vector spaces.
a) The set Mk,n (C) of real k×n matrices with the usual addition of matrices
and multiplication of a matrix by a scalar.
b) The set Pn of real polynomials
p(t) = a0 + a1 z + · · · + an z n , a0 , a1 , . . . , an ∈ C
of degree less that or equal to n with the usual addition of complex func-
tions and multiplication of a function by a scalar.
c) The set P of complex polynomials (of any degree) with the usual addition
of complex functions and multiplication of a function by a scalar.
We shall see further on that R2 does not possess any subspaces apart from
those in Example 3.3 (cf. Example 3.13). But that requires developing further
the theory of vector spaces which, at this point in the book, is not mature
enough to answer with confidence the (apparently) simple question
Vector Spaces 87
y
2u
u+v y=2
v u
FIGURE 3.2: The straight line y = 2 is neither closed for the multiplication
by scalars nor for the addition of vectors.
u = α1 u1 + α2 u2 + · · · + αk uk ,
and
v = β1 u1 + β2 u2 + · · · + βk uk .
Hence, using the properties of the operations + and µ,
u + v = (α1 u1 + α2 u2 + · · · + αk uk ) + (β1 u1 + β2 u2 + · · · + βk uk )
= (α1 + β1 ) u1 + (α2 + β2 ) u2 + · · · + (αk + βk ) uk .
| {z } | {z } | {z }
γ1 γ2 γk
u + v = γ1 u1 + γ2 u2 + · · · + γk uk ,
αu = α(α1 u1 + α2 u2 + · · · + αk uk )
= α(α1 u1 ) + α(α2 u2 ) + · · · + α(αk uk )
= (αα1 )u1 + (αα2 )u2 + · · · + (ααk )uk .
This is clear for the set in a). We show now that the span of the set in
b) is R2 .
We must prove that any vector (a1 , a2 ) ∈ R2 is a linear combination of
(1, 1), (−1, 0). That is, there must exist α, β ∈ R such that
a1 1 −1
=α +β .
a2 1 0
Vector Spaces 89
(ii) How does one select the minimal spanning sets (avoiding redundancy)?
(iii) If, say, two vectors span V , do any two vectors span V ?
Question (iii) has a clear No for an answer. Already we can see this in the
discussion above. For example, to span a plane in R3 (containing (0, 0, 0)) we
need two vectors but they must not be colinear. To span R3 we need three
vectors but none of them can be in the plane (or the line) spanned by the
other two, i.e., the three vectors must not be coplanar.
The point seeming to be that, when selecting a spanning set, we must be
sure not to include a vector which is already in the space spanned by the
remaining vectors (as this vector does not bring anything new to the set of
linear combinations of the other vectors). This still vague idea is conveyed
precisely by the notion of linear independence.
α1 u1 + α2 u2 + · · · + αk uk = 0 ⇒ α1 = α2 = · · · = αk = 0. (3.1)
α1 u1 + α2 u2 + · · · + αk uk = 0.
Example 3.6 The vectors (1, 0, 1), (0, −1, 1), (1, 1, 1) ∈ R3 are linearly inde-
pendent. In other words,
1 0 1 0
α1 0 + α2 −1 + α3 1 = 0
1 1 1 0
Vector Spaces 91
must admit the trivial solution (α1 , α2 , α3 ) = (0, 0, 0) only. For this to hold,
1 0 1
rank 0 −1 1
1 1 1
must be equal to 3 which is the case. Hence the vectors (1, 0, 1), (0, −1, 1),
(1, 1, 1) are linearly independent.
Exercise 3.4 Show that the sets in Example 3.4 a), b), and Exercise 3.3 a),
b) are linearly independent and that the set in c) of the same examples is not
linearly independent. Check that removing any single vector from the sets in
Example 3.4 c) and Exercise 3.3 c) makes the new set a linearly independent
set which still spans R2 or C2 , respectively.
Proof Let
α1
α2
A = u1 | u2 | ... | uk , x = . .
..
αk
The vectors u1 , u2 , . . . , uk are linearly independent if, and only, if the equa-
tion Ax = 0 admits the trivial solution only. That is, if and only if the corre-
sponding homogeneous system has a unique (trivial) solution. Consequently,
u1 , u2 , . . . , uk are linearly independent if and only if
rank u1 | u2 | . . . | uk = k.
u1 = α2 u2 + · · · + αk uk .
Then
(−1) u1 + α2 u2 + · · · + αk uk = 0,
| {z }
̸=0
α1 u1 + α2 u2 + · · · + αk uk = 0. (3.2)
Hence, u1 is a linear combination of the other vectors, which ends the proof.
Suppose that u1 is a linear combination of u2 , . . . , uk and let v be a vector
in the space span{u1 , u2 , . . . , uk }, i.e., for some scalars β1 , β2 , . . . , βk ,
v = β1 u1 + β2 u2 + · · · + βk uk .
Hence the span remains unchanged if one eliminates from the spanning set
a vector which is already a linear combination of the others. In other words,
if one is to choose a minimal set to span a vector space, one needs to focus
on linearly independent sets. The next section is devoted to this kind of sets,
those linearly independent sets that span a vector space V.
(2, 3)
(0, 3)
u
3b1
b1
b2 (2, 0) x
Exercise 3.5 Express the vector (2, 3) as a linear combination of the vector
of the basis B in b) above.
Example 3.8
by Proposition 3.4, the vectors (1, −i), (1, 2i) are linearly independent.
Moreover, given (a, b) ∈ C2 , the system
α a
A =
β b
More generally, the set En = {(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)}
is the standard basis of Cn .
u = α1 b1 + α2 b2 + · · · + αk bk ,
and
u = β1 b1 + β2 b2 + · · · + βk bk .
Subtracting the corresponding members of the equalities above, we have
0 = u − u = α1 b1 + α2 b2 + · · · + αk bk − (β1 b1 + β2 b2 + · · · + βk bk ).
α1 − β1 = α2 − β2 = · · · = αk − βk = 0,
i.e.,
α1 = β1 , α2 = β2 , ..., αk = βk ,
as required.
u = α1 b1 + α2 b2 + · · · + αk bk
96 Linear Algebra
uB = (α1 , α2 , · · · , αk ).
x = α1 b1 + α2 b2 + · · · + αn bn
1) Solve the system whose augmented matrix is b1 b2 . . . bk x ;
Find also a vetor equation, parametric equations and cartesian equations for
S.
By Proposition 3.4, we know that the vectors in S are linearly independent
if and only if
1 0
rank 1 0 = 2.
0 1
Since the rank of the matrix is indeed 2, we can conclude that the vectors are
linearly independent. That is, the set S = {(1, 1, 0), (0, 0, 1)} is a basis for S.
Vector Spaces 97
is linearly dependent (cf. Theorem 3.1). That is, (x, y, z) ∈ S if and only if
the matrix
1 0 x
A = 1 0 y
0 1 z
has rank less than 3. (Hence rank A must be equal to 2.) Keeping this in mind,
we reduce A to a row echelon form using Gaussian elimination:
1 0 x 1 0 x 1 0 x
1 0 y / 0 0 −x + y / 0 1 z
L2 −L1 L2 ↔L3
0 1 z 0 1 z 0 0 −x + y
T (u) = uB = (α1 , α2 , . . . , αk ).
Suppose now that u, v ∈ V are such that T (u) = T (v). Then, by (3.6), we
have
Hence
T (u − v) = (u − v)B = (0, 0, . . . , 0),
| {z }
k
T (u) = T (v) ⇒ u = v,
α1 u1 + · · · + αp up = 0V (3.7)
Vector Spaces 99
if and only if
(α1 u1 + · · · + αp up )B = 0Kk .
By Proposition 3.5, we have
α1 (u1 )B + · · · + αp (up )B = 0Kk . (3.8)
It follows that (3.7) has a unique solution (i.e., α1 = · · · = αp = 0) if and only
if the same applies to (3.8). Hence, u1 , . . . , up ∈ V are linearly independent
if and only if (u1 )B , . . . , (up )B ∈ Rk are linearly independent.
Example 3.11 Find a basis B for the subspace S of M2 (R) spanned by the
matrices
1 1 1 1 0 0
A= B= C= .
1 1 1 0 0 −5
Find also the coordinate vector AB of A relative to the basis B.
Notice that no calculations are required to obtain this coordinate vector since
A = 1A + 0B.
§
coordinates’ with respect to which the space is described. This even allows for
treating any space having a basis with n vectors like Kn (cf. 3.3.1).
One might ask however whether this is always possible. Given a space,
does it always have a basis? And if it has two bases, say, is there a relation
between their cardinality?
The next two theorems answer these questions for spaces having a spanning
set. But before going into that, it should be pointed out that not all spaces
have a spanning set, that is, a finite set whose span coincides with the space.
For example, if one considers the set P of real polynomials, it is impossible to
find such a set for P. (Why?)
Theorem 3.3 Every vector space over K with a spanning set has a basis.
Here we adopt the convention that the empty set ∅ is a basis of V = {0}.
Proof The case V = {0} holds trivially. Let V ̸= {0} and let X be a span-
ning set of V. We show next that X contains a maximal linearly independent
set Y , that is, any other subset of X which contains Y properly is linearly
dependent.
Let y1 be a non-zero vector in X, and observe that {y1 } is linearly inde-
pendent. Now two situations can occur: either (a) every other vector of X lies
in the subspace spanned by y1 , or (b) we can find y2 ∈ X such that {y1 , y2 }
is linearly independent.
102 Linear Algebra
In the case (a), Y = {y1 }. In case (b), one keeps adjoining vectors of X
to {y1 , y2 }, one at a time, until obtaining a linearly independent set Y which
cannot be enlarged, either because there are no more vectors in X, or because
all the remaining vectors in X\Y lie in span Y (see Theorem 3.1).
Let now Y ⊆ X be a maximal linearly independent set. It follows by the
above reasoning that every x ∈ X is a linear combination of the elements of Y .
Hence, we have that each element of V is a linear combination of the elements
of X, which are in turn all linear combinations of the vectors in Y . It follows
that Y spans V (see (3.4)) and, being linearly independent, is a basis of V.
§
see that dim Rn = n. Obviously, we have also dim Cn = n.
Going back to the examples in 3.3.1, we have that dim M2 (R) = 4,
dim Pn = n + 1, and dim S = 2, where S is the subspace in Example 3.11.
Now we know that any vector space with a spanning set has always a
basis but also that not every vector space has a spanning set. We make here a
distinction between these spaces: a vector space with a spanning set is called
a finite dimensional vector space, whilst those vector spaces without such
a set are called infinite dimensional. In the sequel, all vector spaces are
supposed to be finite dimensional unless stated otherwise.
Recapping: now that we know that a finite dimensional vector space always
possesses a basis (and, therefore, infinitely many basis, if V ̸= {0}), it would
be desirable to have a way to find it (them). This will be accomplished in
Theorem 3.5 below which gives a way of obtaining bases from subsets of
vectors in V.
We begin with the following lemma.
Proof It is enough to show that X spans Kk , i.e, to show that, for all
v = (c1 , c2 , . . . , ck ) ∈ Kk , there exist α1 , α2 , . . . , αk ∈ K such that
v = α1 u1 + α2 u2 + · · · + αk uk .
Informally, we can say that (iii) asserts that any linearly independent sub-
set of V can be ‘augmented’ to yield a basis of V. Similarly, it is stated in (iv)
that any subset that spans V can be cut down in order to obtain a basis of V.
Proof Let B = (b1 , b2 , · · · , bk ) be a basis of V.
(i) Let X = {u1 , u2 , · · · , uk } be a linearly independent set of vectors in
V. By Proposition 3.6, the set {(u1 )B , (u2 )B , · · · , (uk )B } is a linearly inde-
pendent subset of Kk . Hence, by Lemma 3.1, the set
is a basis of Kk .
Let v = α1 b1 + α2 b2 + · · · + αk bk be a vector of V. We want to show that
v lies in span X. Then, since ((u1 )B , (u2 )B , · · · , (uk )B ) is an ordered basis
of Kk , the vector (α1 , α2 , . . . , αk ) ∈ Kk is a linear combination of the vectors
(u1 )B , (u2 )B , · · · , (uk )B . That is,
1 Observe that any subset of a linearly independent set is necessarily also lineraly inde-
The vectors (1, 2, 6), (1, 1, 1), (2, 3, 7), (0, 1, 5) are linearly dependent be-
cause, since the dimension of R3 is equal to 3, any set with four vectors cannot
be linearly independent (cf. Theorem 3.5 (ii)). Observe that, if these vectors
were linearly independent, the dimension of R3 would have to be greater than
or equal to 4, which is impossible.
Since X spans S, Theorem 3.5 (iv) guarantees that the set
Observing that the pivots (in grey) are located in the first and second columns,
Proposition 3.4 yields that (1, 2, 6), (1, 1, 1) are linearly independent.
106 Linear Algebra
On the other hand, using the row echelon matrix above, we know that the
homogeneous system associated with the matrix (3.9) has two free variables.
Denoting by (α, β, γ, δ) the elements of the solution set of this system, we have
that the free variables are γ and δ.
If, for example, we let γ = 1 and δ = 0, then there exist α1 , β1 ∈ R such
that
α1 (1, 2, 6) + β1 (1, 1, 1) + (2, 3, 7) + 0(0, 1, 5) = 0.
Hence
α1 (1, 2, 6) + β1 (1, 1, 1) + (2, 3, 7) = 0,
which shows that (2, 3, 7) is a linear combination of (1, 2, 6), (1, 1, 1).
Analogously it could be shown that (0, 1, 5) is a linear combination of
(1, 2, 6), (1, 1, 1), being enough to set γ = 0 and δ = 1.
We conclude thus that the set {(1, 2, 6), (1, 1, 1)} is a basis of S.
It follows from the solution of this problem that a vector equation for S is
(x, y, z) = t(1, 2, 6) + s(1, 1, 1),
with t, s ∈ R, and that parametric equations of S are
x = t + s
y = 2t + s
z = 6t + s,
it follows that N (A) = span{(3, 1, 1, 0), (−3, 1, 0, 1)}. Hence, a basis of N (A)
is {(3, 1, 1, 0), (−3, 1, 0, 1))}, since this set is linearly independent. We have
now that dim N (A) = 2.
Notice that the way we constructed the spanning vectors of N (A) makes
them automatically linearly independent due to the ‘strategic’ placement of
zero in each vector. In fact, the only way to span the zero vector is by mak-
ing z = 0 = w. Hence, when finding a basis for N (A), if the method above
is used, then one does not have to verify whether the spanning vectors are
linearly independent: they always are.
We can extrapolate from this example that, given a matrix A, the dimen-
sion of N (A) coincides with the number of independent variables of the system
Ax = 0.
The nullity nul(A) of a matrix A is the dimension of its null space.
(c1 , c2 , . . . , ck ∈ Kk ) ,
A = c1 c2 ··· cn
then
C(A) = {β1 c1 + β2 c2 + · · · + βn cn : β1 , β2 , . . . , βn ∈ K}
A / B .
αli
It is now clear that (3.10) is also a linear combination of the rows of A. Hence
we showed that L(B) ⊆ L(A).
Conversely, let x be a vector in L(A). That is,
However,
and using a reasoning similar to that above, it is easy to see that L(A) = L(B).
Proposition 3.10 The non-zero rows of a row echelon matrix are linearly
independent.
110 Linear Algebra
α1 = α2 = · · · = αk = 0
and its reduction to a row echelon matrix done in Part I (see Example 3.16).
By Proposition 3.9, the row space of A and the row space of any matrix
obtained from A using elementary operations coincide. Hence the row space of
1 1 2 2
0 1 −1 −1
0 0 0 0
is the row space L(A) of A. Observing that, on the other hand, the non-zero
rows of a row echelon matrix are linearly independent (cf. Proposition 3.10),
we have that the set {(1, 1, 2, 2), (0, 1, −1, −1)} is a basis of L(A).
that, if one considers the matrix A′ obtained by removing the grey columns of
A, this matrix is a row echelon matrix. Hence, since the number of columns is
equal to the number of pivots, the homogeneous system A′ x = 0 admits only
the trivial solution. Since A′ x is a linear combination of the columns of A′ , it
is now clear que that the columns of A′ are linearly independent.
On the other hand, if one adds any of the grey columns to the columns of
A′ , i.e., any columns corresponding to an independent variable, this new set
is linearly dependent. In fact, these columns form an augmented matrix of a
system which is consistent and has a unique solution, showing that this grey
column is a linear combination of the columns of A′ . We can now conclude
that {(1, 1, 2), (1, 2, 3)} is a basis of C(A).
A common mistake. When choosing the columns in the basis of C(A), one
must go back to the original matrix. One has to choose the columns in matrix
A corresponding to those having pivots in the row echelon matrix. It is a
common mistake to select those of the row echelon matrix. This is wrong.
In 2., above, alternatively, one could go back to A and choose the corre-
sponding rows. However, this might be tricky if row exchange was involved
in the Gaussian elimination. Moreover, the rows of A are more ‘complicated’
than those of R, since the latter have more zero entries, in general. Hence,
there is nothing to be gained from going back to A. Why do it then?
Proof (i) The dimension of the null space is the number of independent
variables in Ax = 0 and, consequently, coincides with n − rank (A).
(ii) This is a consequence of Propositions 3.9 and 3.10.
(iii) Removing from A the columns corresponding to those without pivots
in the row echelon matrix, we obtain a matrix A′ whose columns are linearly
independent, since he system A′ x = 0 has only the trivial solution. It is also
the case that these columns correspond exactly to the maximum number of
linearly independent columns in A, yielding dim C(A) = rank (A).
Example 3.18 Find the spaces N (A), L(A), C(A), and check that Theorem
3.6 holds for
1 i 0
A= .
−i 1 2i
We begin by finding N (A). Solving the homogeneous system Ax = 0, we
have
1 i 0 / 1 i 0
A= . (3.15)
−i 1 2i l2 +il1 0 0 2i
L(A) = L({(1, i, 0), (0, 0, 2i)}), C(A) = L({(1, −i), (0, 2i)},
BL(A) = {(1, i, 0), (0, 0, 2i)}, BC(A) = {(1, −i), (0, 2i)}.
from which follows that v1 , v2 , v3 are linearly independent and, therefore, the
set
{(1, 2, −1, −2, 1), (0, 0, 2, −1, −5), (0, 0, 0, 2, 3)}
| {z } | {z } | {z }
u1 u2 u3
Observe that the vectors are column vectors and that to obtain row
vectors one has, therefore, to use transposition.
2. Use Gaussian elimination to reduce A to a row echelon matrix R.
3. A basis of span{u1 , u2 , . . . , uk } = L(A) = L(R) is formed by the
rows of R with pivots.
3.4.1 Ax = b
Let A be a matrix over K of size k × n. Let x0 be a solution of the ho-
mogeneous system Ax = 0 and let xp be a particular solution of the system
Ax = b. In other words, we are supposing that x0 , xp are vectors in Kn such
that
Ax0 = 0 Axp = b.
N (A)
xp
x2
x1
FIGURE 3.4: The solution set S is obtained adding the particular solution
xp = (1, 0, 3) to the solution set of the homogeneous system.
U + W = {x + y : x ∈ U ∧ y ∈ W }.
z1 + z2 = x1 + y1 + x2 + y2
= (x1 + x2 ) + (y1 + y2 ),
Hence U + W = span{(1, −1, 1), (0, 0, 1), (−2, 1, 0), (−1, 0, 1)}. To obtain a ba-
sis of U +W , we only have to find a maximal set of linearly independent vectors
contained in span{(1, −1, 1), (0, 0, 1), (−2, 1, 0), (−1, 0, 1)}. It is easy to see
that we can find three linearly independent vectors and, therefore, U +W = R3 .
A cartesian equation for W is x + y = 0 and, consequently, U ∩ W consists
of the vectors in R3 such that
(
x + 2y + z = 0
.
x+y =0
BU ∪ BW = {u1 , . . . , ur , v1 , . . . , vm , w1 , . . . wk },
then
r
X m
X k
X
αi ui + βj vj = − γl wl ,
i=1 j=1 l=1
Pk Pk
from which follows that − l=1 γl wl ∈ U . Consequently, l=1 γl wl ∈ U ∩ W
Pk
is a linear combination of {v1 , . . . , vm }, yielding that l=1 γl wl = 0, since
the set {v1 , . . . , vm , w1 , . . . wk } is the linearly independent set BW . It now
follows that all scalars in (3.16) are 0 and, therefore, BU ∪ BW is linearly
independent.
Notice that
#BU +W = #BU + #BW − #BU ∩W ,
since the vectors in BU ∩W appear twice in #BU +#BW . Now, it is immediate
that
dim U + dim W = dim(U + W ) + dim(U ∩ W ).
u − u′ = w − w ′
and, consequently, u − u′ = 0 = w − w′ , since U ∩ W = {0}. That is, u = u′
and w = w′ .
u = α1 b1 + α2 b2 + · · · + αk bk .
That is,
uB2 = (b1 )B2 | (b2 )B2 | ··· | (bk )B2 uB1 .
| {z }
MB2 ←B1
The matrix whose columns are the coordinate vectors of the vectors of
basis B1 relative to the basis B2 is called the change of basis matrix from
basis B1 to basis B2 , and is denoted by MB2 ←B1 . Hence,
Example 3.23 Consider the basis B = ((1, 1), (−1, 0)) and the standard ba-
sis E2 = (e1 , e2 ) of R2 . Use the change of basis matrix MB←E2 to find the
coordinate vector (2, 3)B (cf. Figure 3.3).
Hence we have to find the coordinate vectors (e1 )B , (e2 )B , that is, solve the
systems whose augmented matrices are
1 −1 1 1 −1 0
, .
1 0 0 1 0 1
Vector Spaces 121
Since the coefficient matrices are the same, to save time we shall solve them
simultaneously. Then, using Gauss–Jordan elimination, we have
1 −1 1 0 / 1 −1 1 0 / 1 0 0 1
.
1 0 0 1 l2 −l1 0 1 −1 1 l1 +l2 0 1 −1 1
The next proposition asserts that the change of basis matrix is unique.
More precisely,
Proposition 3.14 Let V be a vector space over K of dimension n and let
B1 , B2 be bases of V. Then, there exists uniquely a matrix M such that, for all
x ∈ V , xB2 = M xB1 .
Proof We prove the uniqueness part, since the existence has been taken
care of in the discussion above.
Suppose that, for all x ∈ V ,
2. If x is a vector in V , then
xB2 = (b1 )B2 ··· (bk )B2 xB1 .
| {z }
MB2 ←B1
rank (MB2 ←B1 ) = k (cf. Proposition 3.4 and Proposition 3.6), this matrix
is invertible. Hence, using the equality
we have −1
uB1 = MB2 ←B1 uB2 .
By Proposition 3.14, we obtain
Observe that we can now easily obtain MB←E3 . Indeed, we need only to cal-
culate the inverse of MB←E3 , that is, MB←E3 = (ME3 ←B )−1 (compare with
Example 3.24).
When calculating a change of basis matrix between two bases, one should
ponder which change of basis matrix is the easiest to obtain. If there is one,
then find that change of basis matrix first and calculate its inverse, if neces-
sary.
3.7 Exercises
EX 3.7.1. Which vectors are a linear combination of u = (0, −1, 1) e v =
(−1, −3, 1)?
(a) (1, 1, 1)
(b) (−6, −2, −10)
124 Linear Algebra
(c) (0, 2, 25 )
(d) (0, 0, 0)
EX 3.7.6. Show that Mk,n (R), Pn , P, and C[a, b] (with a < b) are real vector
spaces and Mk,n (C) is a complex vector space.
EX 3.7.7. Consider the following subsets of the space P2 of the real polyno-
mials of degree less than or equal to 2:
(a) {3at2 − at + 3a : a ∈ R}
(b) {−5at2 − 3t2 + 3a − 4 : a ∈ R}
(c) {at2 − 2at2 + 3a : a ∈ R}
(d) {−5at + 3a − 1 : a ∈ R}
Which of the subsets above are vector subspaces of P2 ?
EX 3.7.9. Which of the sets below are linearly independent? Find a basis for
the subspace spanned by each of the sets.
(a) {(1, −1, 0), (0, 0, 2)}
(b) {(2, 4, 12), (−1, −1, −1), (2, 4, 12), (0, 1, 5)}
(c) {(1, 2, 3, 4), (0, 0, 0, 0), (0, 1, 1, 0)}
(d) {(1 + i, 2i, 4 − i), (2 + 2i, 4i, 8 − 2i)}
(e) {(1, 2, 6, 0), (3, 4, 1, 0), (4, 3, 1, 0), (3, 3, 1, 0)}
EX 3.7.10. Consider the subspace
W = {(x, y, z) ∈ R3 : x − 2y + 3z = 0}.
W = {(x, y, z, w) ∈ C4 : −4y + z = 0 ∧ x − y + z = w ∧ x = w}
(a) Determine bases for the null space, the row space, and column space
of A.
(b) Verify the solution (a) using the Rank-nullity Theorem.
(c) If C is an invertible 4 × 4 matrix, what is the dimension of the
null space of CAT ? What is the dimension of the column space of
CAT ?
EX 3.7.17. Let A be a real square matrix such that its column space is
V = {(x, y, z, w) ∈ R4 : − x = 2y}.
(a) Find a basis BU ∩V for the subspace U ∩ V .
(b) What is the dimension of R4 + (U ∩ V )?
EX 3.7.19. Let U and W be the subspaces of C4
W = {(x, y, z, w) ∈ R4 : ix + 2y − z − iw = w − x ∧ x − w = 0}.
Find a basis and the dimension of each of the subspaces U ∩ W
and U + W . Verify that the formula
holds.
EX 3.7.20. Write the polynomial 5 + 9t + 3t2 + 5t3 as a linear combination of
p1 = 2 + t + t2 + 4, p2 = 1 − t + 3t3 , and p3 = 3 + 2t + 5t3 .
EX 3.7.21. Find if the subsets of polynomials are linearly independent or lin-
early dependent.
(a) {1 + 2t, t2 − 1 + t, t}.
(b) {1 + t − t2 + t3 , 2t + 2t3 , 1 + 3t − t2 + 3t3 }.
(c) {t5 − t4 , t2 , t3 − 2}.
EX 3.7.22. Let B = ( 31 − 31 t, 13 + 23 t) be an ordered basis of the vector space
P1 of the real polynomials of degree less than or equal to 1. Find
the coordinate vectors (3 − 2t)B and (3 − 2t)P1 .
128 Linear Algebra
EX 3.7.24. Write
−9 −7
M=
4 0
as a linear combination of
2 1 1 −1 3 2
A= , B= , C= .
4 1 3 2 5 0
−i i 1 1 0 0 0 0
A1 = , A2 = , A3 = , A4 = .
0 0 0 0 1+i 0 0 1−i
Vector Spaces 129
EX 3.7.27. Let
2 −2 0 1 1 1 0 0
B= , , ,
0 0 1 0 0 0 0 1
be a basis of M2 (R). Let S be the subspace of M2 (R) such that
(d) Use the two change of basis matrices to determine MB′ ←E3 .
EX 3.7.30. Let B be an ordered basis of the space P1 consisting of the real
polynomials of degree less than or equal to 1. Let
3 1
MB←P1 =
−1 5
3.8 At a Glance
A vector space over the field of scalars K is a non-empty set endowed
with two operations, addition and scalar multiplication, satisfying some fixed
axioms. Examples of vector spaces are Rn , Cn , Mk,n (K), and Pn . Elements of
a vector space are called vectors.
In EX 3.7.8 we give an example of a set which satisfies all but one of these
axioms imposed on the operations. Although the property that is not verified
seems innocent enough, the ending result is that we do not have a vector space
in this case. In other words, all axioms matter.
Important subsets of a vector space are its subspaces, that is, non-empty
subsets that are closed under addition and scalar multiplication.
Two crucial concepts pervade the theory of vector spaces: linear combina-
tion and linear independence. Some vector spaces have a spanning set, that is,
a finite subset of vectors such that every vector in the space is a linear combi-
nation of those vectors. Some vector spaces do not have a spanning set, e.g.,
the space P of real polynomials. These are called infinite dimensional vector
spaces, as opposed to the former which are called finite dimensional
vector spaces. In the book, we deal almost exclusively with finite dimensional
vector spaces.
Minimal spanning sets must be linearly independent and they are called
bases of the vector space. All bases of a vector space have the same number
of vectors called the dimension of the space.
The dimension classifies finite dimensional vector spaces in the sense that
an n dimensional vector space over K can be essentially identified with Kn ,
from a purely algebraic point of view. More precisely, vectors can be given by
their coordinates relative to a basis, thereby allowing for an ‘identification’ of
an n dimensional vector space over K with Kn .
Each matrix has four vector subspaces associated: its null space, row space,
and column space, and the null space of its transpose. These spaces are fun-
damental in the analysis of the properties of the matrix. The Rank-nullity
Theorem gives a formula relating the dimensions of these spaces. The general
solution of a system of linear equations is obtained in terms of a particular
solution and the null space of the coefficient matrix of the system.
In a vector space, coordinates relative to two different bases can be related
through the so-called change of basis matrix.
Chapter 4
Eigenvalues and Eigenvectors
Ax = λx. (4.1)
Under these conditions, λ is called an eigenvalue of A associated with x.
The spectrum of A, denoted by σ(A), is the set of eigenvalues of matrix A.
−λ −1 0
−λ −1
p(λ) = det(A − λI) = −1 −λ 0 = (−1 − λ)
−1 −λ
0 0 −1 − λ
The roots of p(λ) = (−1−λ)2 (1−λ), that is, the solutions of the characteristic
equation (4.3) are λ1 = 1 and λ2 = −1, being the latter a double root. Hence,
the eigenvalues of A are λ1 = 1 and λ2 = −1, and the spectrum of A is
σ(A) = {−1, 1}.
The eigenspace E(1) consists of 0 and the eigenvectors corresponding the
eigenvalue λ1 = 1 and, therefore, E(1) is the null space N (A − I) of A − I.
Using Gaussian elimination to solve the corresponding homogeneous system,
we have
−1 −1 0 −1 −1 0 −1 −1 0
A − I = −1 −1 0 → 0 0 0 → 0 0 −2 .
0 0 −2 0 0 −2 0 0 0
Eigenvalues and Eigenvectors 133
z
E(−1)
u
y
E(1) Av = v Au
FIGURE 4.1: Vectors u and v in the eigenspaces E(−1) and E(1), respec-
tively, and the vectors Au and Av.
The eigenspace E(1) is the solution set associated with the matrix A − I and,
therefore,
E(1) = {(x, y, 0) ∈ R3 : x = −y}.
The eigenspace E(1) is, consequently, the straight line satisfying the equations
z = 0, x = −y. Observe that E(1) is the solution set of a consistent system
having a single independent variable.
The eigenspace E(−1) consists of (0 and) the eigenvectors associated with
the eigenvalue λ2 = −1, i.e., E(−1) = N (A + I). Similarly to what has been
done above, we have
1 −1 0 1 −1 0
A + I = −1 1 0 → 0 0 0 ,
0 0 0 0 0 0
yielding
E(−1) = {(x, y, z) ∈ R3 : x = y}.
The eigenspace E(−1) is the plane x = y corresponding to the solution of a
consistent system having two independent variables (see Figure 4.1).
§
We will discuss the proof of this in Corollary 4.6 (see 4.4).
be the characteristic polynomial of A (where the roots may not be all distinct).
Eigenvalues and Eigenvectors 135
Observing that
p(0) = |A − 0I| = |A|
and letting λ = 0 in (4.4), we have
|A| = λ1 λ2 · · · λn .
r1 + r2 + · · · + rk = n.
The matrix Ā is
1−i −2 −5 − 3i
Ā = .
7 6 1 + 2i
136 Linear Algebra
(iii) If A ∈ Mn×n (C) is a matrix with real entries, then λ ∈ σ(A) if and only
if λ̄ ∈ σ(A). Moreover,
Proof (i) Let pAT (λ) and pA (λ) the be characteristic polynomials of AT
and A, respectively. Since the determinant remains unchanged by transposition
(cf. Proposition 2.4), we have
as required.
(ii) Let λ be a eigenvalue of A and let x be a vector in the eigenspace E(λ).
The result is trivially true for p = 1. We shall prove the remaining cases by
induction.
Suppose then that (ii) holds for p − 1, with p ≥ 2, that is, Ap−1 x = λp−1 x.
Then,
Ap x = A(Ap−1 x) = A(λp−1 x) = λp−1 Ax = λp x.
(iii) Given λ ∈ K, x ∈ Kn , we have A = λx if and only if
Ax = λx.
Ax = Āx̄ = Ax̄.
Ax̄ = λ̄x̄.
(ii) 0 ∈
/ σ(A).
This simple (albeit important) fact, already implicit in the proof above, is
strangely often overlooked. The reader ought to keep it in mind.
Eigenvalues and Eigenvectors 139
B = S −1 AS
or, equivalently, if
SB = AS.
It is easy to see that B is similar to A if and only if A is similar to B. Con-
sequently, to simplify one says simply that A and B are similar matrices.
Proof (i) Since A and B are similar, we have |B| = |S −1 AS|. Hence, by
Propositon 2.3 and Corollary 2.1,
Hence
pB (λ) = |S −1 (A − λI)S| = |A − λI| = pA (λ).
(iv) This is a direct consequence of (iii) and Proposition 4.2 (ii).
140 Linear Algebra
(v) By (iii), it is clear that σ(A) = σ(B). It is also clear that the algebraic
multiplicities coincide.
As to the geometric multiplicities of λ ∈ σ(A) = σ(B), notice that, if x is
a vector such that Bx = λx, i.e., x is a eigenvector of B associated with the
eigenvalue λ, then
λx = Bx = S −1 ASx.
Hence,
λSx = ASx,
which shows that Sx is an eigenvector of A associated with λ. In fact, x is
eigenvector of B associated with λ if and only if Sx is an eigenvector of A
associated with the eigenvalue λ.
It follows that
Here EA (λ) and EB (λ) are the eigenspaces of A and B, respectively, corre-
sponding to the eigenvalue λ.
It is easy to see that a subset B = {x1 , x2 , . . . , xk } of EB (λ) is linearly
independent if and only if the subset B ′ = {Sx1 , Sx2 , . . . , Sxk } of EA (λ) is
linearly independent. It now follows that B is a basis of EB (λ) if and only if B ′
is a basis of EA (λ), which proves the equality of the geometric multiplicities,
as required.
(vi) By (v), 0 ∈ σ(A) if and only if 0 ∈ σ(B). Hence, if both matrices are
invertible, then N (A) = {0} = N (B).
The only other possibility, is 0 being an eigenvalue of both A and B. In
this case, letting EA (0) and EB (0) denote, respectively, the eigenspace of A
and B corresponding to 0, by (v), we have
(vii) This follows immediately from (vi) observing that, by Theorem 3.6
and Proposition 3.11,
Find the eigenvalues and eigenspaces of D. (Take into account that the diag-
onal entries in the matrix might not be all distinct.)
Eigenvalues and Eigenvectors 141
D = S −1 AS
or, equivalently, if
SD = AS.
Under these conditions, S is said to be a diagonalising matrix for A.
As one can see from the definition, A shares many features with D, for
example, the characteristic polynomial and, hence, the spectrum (see Theorem
4.2).
Theorem 4.3 Let A be a square matrix of order n over K. The following are
equivalent.
(i) A is diagonalisable.
(ii) A has n linearly independent eigenvectors.
(iii) There exists a basis of Kn consisting entirely of eigenvectors of A.
Before embarking in the proof of the theorem, it is worth taking some time
to ponder its statement. Assertions (i) and (iii) are clear enough, however as-
sertion (ii) might cause some misunderstanding.
A common mistake. Assertion (ii) does not mean that A has exactly n
eigenvectors which are linearly independent. It says that, amongst the eigen-
vectors of A, one can extract a linearly independent subset having n eigen-
vectors.
λ1 0 0 ··· 0 0
0 λ2 0 ··· 0 0
= v1 v2 ··· vn = SD,
.. .. ..
. . .
0 0 0 ··· 0 λn
as required.
We show now that (i) implies (ii). Suppose then that A is diagonalisable.
It follows that AS = SD, where
S = v1 | v2 | · · · | vn .
Observe that, since S is invertible, its columns are linearly independent vectors
in Kn . We have
λ1 0 0 · · · 0 0
0 λ2 0 · · · 0 0
AS = v1 | v2 | · · · | vn . . .
.. .. ..
0 0 0 ··· 0 λn
⇐⇒
A v1 | v2 | ··· | vn = λ1 v1 | λ2 v2 | ··· | λn vn
⇐⇒
Av1 | Av2 | ··· | Avn = λ1 v1 | λ2 v2 | ··· | λn vn .
Eigenvalues and Eigenvectors 143
Hence,
Av1 = λ1 v1 , Av2 = λ2 v2 , ..., Avn = λn vn ,
from which follows that A has n linearly independent eigenvectors.
The equivalence between (ii) and (iii) is obvious (see Theorem 3.5 (i)).
We are now able to identify which matrices are diagonalisable and which
are not. For example, Theorem 4.3 gives us a clear answer: the matrix in Ex-
ample 4.4 is not diagonalisable whilst that in Example 4.2 is a diagonalisable
matrix.
is diagonalisable and shall see how this helps calculate the power A2020 .
A simple calculation yields that the eigenvalues of A are ±1:
eigenvalue ma (λ) mg (λ)
λ1 = 1 1 1
λ2 = −1 1 1
The eigenspace E(1) is the straight line x = −y and the eigenspace E(−1)
is the straight line x = y. Choosing two linearly independent vectors, say,
v1 = (−1, 1) ∈ E(1) e v2 = (1, 1) ∈ E(−1), Theorem 4.3 guarantees that A is
diagonalisable and that
−1
1 0 −1 1 0 −1 −1 1
= .
0 −1 1 1 −1 0 1 1
| {z } | {z } | {z } | {z }
D S −1 A S
The proof of Theorem 4.3 is constructive and, for that reason, provides a
way of constructing S. One has just to find the required number of linearly
independent eigenvectors and build a matrix whose columns are those vectors,
in no particular order. However, having done that, one has to be sure that, in
each column of D, the diagonal entry is the eigenvalue corresponding to the
eigenvector in the same column of S.
The diagonalising matrix S is a possible solution amongst many others,
since S depends on the chosen eigenvectors.
To find A2020 , we begin by calculating A2 :
A2 = (SDS −1 )2 = SD S −1 −1 2 −1
| {z S} DS = SD S .
I
144 Linear Algebra
This is a very simple case for which would be very easy to find A2020 with-
out using diagonalisation, since A = −P12 . However, this serves the purpose
of illustrating a general process that is very useful when calculating powers of
(more complicated) matrices.
An = SB n S −1 .
Proof Exercise.
A difficulty that may occur when diagonalising a matrix is how to choose
the eigenvectors in order for them to be linearly independent, as required in
Theorem 4.3. The next proposition helps overcoming this difficulty.
Proof The assertion is trivially true for n = 1. Let 1 < n and consider the
eigenvectors v1 , v2 , . . . , vn and the corresponding eigenvalues λ1 , λ2 , . . . , λn ,
all distinct.
Suppose that the assertion holds for v1 , v2 , . . . , vn−1 , i.e., the eigen-
vectors v1 , v2 , . . . , vn−1 are linearly independent. We want to show that
v1 , v2 , . . . , vn−1 , vn are also linearly independent.
Suppose that, on the contrary, the vector vn is a linear combination of the
remaining eigenvectors. Observe that there is no loss of generality with this
assumption. We have then that there exist scalars α1 , α2 , . . . , αn−1 such that
vn = α1 v1 + α2 v2 + · · · + αn−1 vn−1 .
Eigenvalues and Eigenvectors 145
It follows that
Avn = λn vn
⇐⇒
A(α1 v1 + α2 v2 + · · · + αn−1 vn−1 ) = λn (α1 v1 + α2 v2 + · · · + αn−1 vn−1 )
⇐⇒
α1 λ1 v1 + α2 λ2 v2 + · · · + αn−1 λn−1 vn−1 = λn (α1 v1 + α2 v2 + · · · + αn−1 vn−1 )
Hence,
α1 (λn − λ1 )v1 + α2 (λn − λ2 )v2 + · · · + αn−1 (λn − λn−1 )vn−1 = 0
Since the eigenvectors v1 , v2 , . . . , vn−1 are linearly independent, we have
α1 (λn − λ1 ) = 0
α2 (λn − λ2 ) = 0
..
.
αn−1 (λn − λn−1 ) = 0 .
λ1 (u1 + u2 + · · · + up ) = λ1 u1 + λ2 u2 + · · · + λp up .
Hence,
(λ1 − λ2 )u2 + · · · + (λ1 − λp )up = 0. (4.6)
The non-zero vectors in {u1 , u2 , . . . , up } are eigenvectors corresponding to
distinct eigenvalues. It follows from Proposition 4.6 that they are linearly in-
dependent.
Since there exists in (4.6) at least one i ∈ {2, . . . , p} such that ui ̸= 0, the
eigenvalue λ1 coincides with the eigenvalue λi , which contradicts the initial
assumption of all eingenvalues being distinct.
mg (λi ) = dim N (A − λi I)
= dim N (D − λi I) (by Theorem 4.2)
= n − rank (D − λi I) (by Theorem 3.6)
= n − (n − ma (λi ))
= ma (λi ),
1. Find the eigenvalues and bases for the eigenspaces of A. If the sum of
the dimensions of the eigenspaces is n (meaning that A has n linearly
independent eigenvectors), then A is diagonalisable. Otherwise, it is
not.
2. If A is diagonalisable, then let λ1 , λ2 , . . . , λp be the distinct eigenval-
ues of A.
(i) (i) (i)
For all i = 1, 2, . . . , p, let {v1 , v2 , . . . vri } be a basis of the
eigenspace E(λi ). Build the n × n matrix S whose columns consist of
the vectors in these bases arranged by juxtaposition, in no particular
order.
3. Build the diagonal matrix D whose diagonal entry in any column j
coincides with the eigenvalue corresponding to the eingenvector in
the column j of S.
1 Schur’s Triangularisation Theorem states that any n×n complex matrix is similar to an
upper triangular matrix and the similarity matrix might be chosen to be a unitary matrix.
The definition of unitary matrix is given in Chapter 6.
148 Linear Algebra
Proof We will prove the result by induction on the size of the matrix. If
n = 1, then the result holds trivially. Suppose now that the result holds for n×n
matrices. Let A be an n + 1 × n + 1 matrix with eigenvalues λ1 , . . . , λn , λn+1 .
Let u be an eigenvector in Cn+1 corresponding to the eigenvalue λn+1 and
let B = {u, b1 , . . . , bn } be a basis of Cn+1 including u. Let N be the matrix
whose columns are the vectors of B, i.e.,
N= u b1 ... bn .
It follows that
AN = Au Ab1 ... Abn = λn+1 u Ab1 ... Abn ,
that is
λn+1 ∗ ... ∗
0 ∗ ... ∗
−1
N AN = 0 ,
∗ ... ∗
.. .. .. ..
. . . .
0 ∗ ... ∗
where the lower right hand corner B of A is an n × n matrix such that σ(B) =
{λ1 , . . . , λn }.
By the induction hypothesis, there exists an n + 1 × n + 1 invertible matrix
M such that
1 0
1 0
0 λ1 ∗ . . . ∗
M = M 0
0 λ2 . . . ∗
.
0 B
.. .. .. . .
. . . . ∗
0 0 0 . . . λn
Hence,
λn+1 ∗ ∗ ... ∗
0 λ1 ∗ ... ∗
−1 −1
M N AN M = 0 .
0 λ2 ... ∗
.. .. .. ..
. . . . ∗
0 0 0 ... λn
| {z }
U
−1
Finally, A = (N M )U (N M ) , where U is an upper triangular matrix having
the n eigenvalues of A in the diagonal.
Eigenvalues and Eigenvectors 149
Summing up: Theorem 4.4 states that, given a complex square matrix A,
there exist an upper triangular matrix U and an invertible matrix S such that
S −1 AS = U.
It is then obvious that both A and U must have the same eigenvalues which
appear in the diagonal of U . Notice also that the proof above can be done in
a way that equal eigenvalues in the diagonal of U are grouped together.
then you can see for yourself that A2 ̸= 0 but A3 = 0. Obviously, for any
p ≥ 3, we have Ap = 0. This matrix A has many zero entries but this is not a
requirement to display this behaviour. In fact,
5 15 10
A = −3 −9 −6
2 6 4
is such that A2 = 0.
150 Linear Algebra
As we see, this is all about the null space of some power of a matrix being
the whole space. It is worth making a note of a simple fact about the null
spaces of matrix powers:
Given an n × n matrix B,
N (B p ) = N (B p+j ) = N (B n ).
which is impossible.
Hence, at this point, we have an integer p, with 0 ≤ p ≤ n, for which
N (B p ) = N (B p+1 ). We will show now that
that is, after the power B p the null spaces of the larger powers stabilise. Con-
trarily, suppose that there exists a positive integer r such that
N (B p+r ) ⊊ N (B p+r+1 ).
Eigenvalues and Eigenvectors 151
But then
B p (B r x) ̸= 0 and B p+1 (B r x) = 0,
which shows that B r x lies in the null space of B p+1 but not in the null space
of B p . This however yields a contradiction, since N (B p ) = N (B p+1 ).
The proposition we just proved has the following immediate consequence.
0 = An x = λn x,
yielding that λ = 0.
It remains to show that σ(A) ̸= ∅, which might not happen only for real
matrices (see Example 4.3). Since A ̸= 0 and An = 0, by Proposition 4.9,
there exist a non-zero vector x ∈
/ N (A) and a positive integer 1 < j ≤ n such
that
0 = An x = Aj x and Aj−1 x ̸= 0.
But, in this case,
A(Aj−1 x) = 0,
showing that Aj−1 x is an eigenvector of A with the associated eigenvalue 0.
Show that ABC = 0. Here we suppose that the sizes of the blocks are compati-
ble for the multiplication purpose. (Solving this exercise will help you to better
understand the proof of the next theorem.)
152 Linear Algebra
q(A) = a0 I + a1 A + a2 A2 + · · · + an An .
Proof In this proof, we write I for the identity matrices of any order.
Let p(λ) = (λ − λ1 )n1 (λ − λ2 )n2 . . . (λ − λp )np , where λ1 , λ2 , . . . , λp are all
distinct, be the characteristic polynomial of A. By Theorem 4.4, there exists an
upper triangular matrix U and an invertible matrix S such that A = S −1 U S.
Moreover, U can be chosen to be
U1 ∗ . . . . . . . . . . . . ∗
U2 ∗ . . . . . . . . . ∗
..
.
U = , (4.7)
Uj ∗ . . . ∗
..
.
Up
we have that
p(A) = S −1 p(U )S = 0.
Eigenvalues and Eigenvectors 153
Proposition 4.10 A complex square matrix is nilpotent if and only if its only
eigenvalue is 0.
0 = p(A) = (−1)n An ,
Proposition 4.11 Let A be a square matrix over K and let p(λ) be a poly-
nomial over K. Then, the null space and the column space of p(A) are A-
invariant.
154 Linear Algebra
p(A)Ax = A(p(A)x) = A0 = 0
Given an n × n matrix A,
Observe that, fixing some non-negative integer p and any given vector
y ∈ C(Ap+1 ), there exists x ∈ Kn such that
y = Ap+1 x = Ap (|{z}
Ax ).
z
Notice that the smallest numbers p for which Proposition 4.9 and
Proposition 4.12 hold are the same.
Eigenvalues and Eigenvectors 155
Suppose that A is such that N (A) ∩ C(A) = {0}, then, by Theorem 3.8,
dim(N (A) + C(A)) = dim N (A) + dim C(A) − dim(N (A) ∩ C(A)) = n − 0 = n.
Kn = N (A) ⊕ C(A).
However, it is not always the case that N (A) ∩ C(A) = {0}. A simple counter-
example is
1 1
A= ,
−1 −1
for which
N (A) = C(A) = {(a, −a) : a ∈ R}.
There is however a way of bypassing this.
Proof (i) Suppose that on the contrary there exists y ̸= 0 such that
An y = 0 and y = An x,
0 = An y = A2n x
from which follows that x lies in the null space of A2n but not in N (An ). But
this is impossible, since it contradicts Proposition 4.9.
(ii) This is a consequence of (i), as discussed above.
N (An ) ⊕ C(An ) = Kn .
156 Linear Algebra
B r x = 0 and B r−1 x ̸= 0.
(A − λI)k x = 0. (4.8)
Notice that (4.8) is the same as requiring that x lie in the null space
N (A − λI)k of the matrix (A − λI)k and, consequently, in all the null spaces
of higher powers of A − λI. Hence the definition of the order of the generalised
eigenvector as the smallest k for which (4.8) holds. In other words, x is a
generalised eigenvector of order k of A, associated with λ, if
Consequently, the spectrum of A is σ(A) = {1, 2}. Both eigenvalues have geo-
metric multiplicity equal to 1, and it is easy to see that bases for the eigenspaces
are BE(2) = {(1, 0, 0)} and BE(−1) = {(0, −1, 1)}.
We want to find a generalised eigenvector x of order 2, i.e., to find an
eigenvalue λ (which will be 1 or 2) such that
u1 = (A − λI)x u2 = x.
Given a matrix A ∈ Mn (K), let G(λ) be the set consisting of 0 and all
the generalised eigenvectors (of any order) of A associated with λ. This set is
in fact a subspace of Kn called the generalised eigenspace of A associated
with the eigenvalue λ. (It is an easy exercise to show that G(λ) is closed under
vector addition and scalar multiplication. See EX 4.5.9.)
158 Linear Algebra
A clarifying reminder:
Hence, we have here a slight abuse of notation: when the generalised eigen-
vectors consist only of eigenvectors proper (as was the case for λ = 1 in Ex-
ample 4.6), then G(λ) is what we called E(λ) in Section 4.1. But once we keep
this in mind, there will be no source of confusion.
Another question is how are we to find a method to determine the gener-
alised eigenspaces? Or Jordan chains? In Example 4.6, we were advised from
the start about the order of the generalised eigenvector. What if we were not?
Do two distinct generalised eigenvectors in the same generalised eigenspace
have necessarily the same order?
These are important questions that need to be answered. Proposition 4.14
below is a first but crucial step towards the answers.
Proof Let x ̸= 0 lie in N ((A − λI)n ). Then, there exists a positive integer
k for which (A − λI)k x = 0. Indeed, it suffices to take k = n. Hence, x is a
generalised eigenvector in G(λ).
Conversely, suppose that x ∈ G(λ). Then, x lies in the null space of some
power of A − λI. Hence, by Proposition 4.9, x ∈ N (A − λI)n .
By the definition of order of a generalised eigenvector, we have finally that
this order must be at most equal to n.
Observe that, by Propositions 4.11 and 4.14,
Example 4.7 Let us revisit Example 4.6 under the new light of this proposi-
tion. We have that
3
0 1 1 0 0 0
(A − 2I)3 = 0 0 1 = 0 0 1
0 0 −1 0 0 −1
Eigenvalues and Eigenvectors 159
and 3
1 1 1 1 3 3
(A − I)3 = 0 1 1 = 0 1 1 .
0 0 0 0 0 0
Hence,
G(2) = N (A − 2I)3 = {(x, y, 0) : x, y ∈ R}
and
G(1) = N (A − I)3 = {(0, −z, z) : z ∈ R}.
Observe that G(1) must consist of eigenvectors only because it has dimension
1 and that the generalised eigenvectors (of order higher than 1), therefore, lie
all of them in G(2). Compare with what we found in Example 4.6.
Notice also that
G(1) ∩ G(2) = {0}
and
G(1) + G(2) = R3 .
In other words,
G(1) ⊕ G(2) = R3 .
all vectors in a Jordan chain are generalised eigenvectors, each one having
a different order.
Exercise 4.4 Find the Jordan chains of the generalised eigenvectors above.
Verify that the set consisting of these Jordan chains is a basis of R5 .
If {u1 , . . . , up , up+1 } were linearly dependent, then there would exist scalars
α1 , . . . , αp , αp+1 , not all equal to zero, such that
α1 u1 + · · · + αp up + αp+1 up+1 = 0.
We had already a similar result for eigenvectors. Compare this lemma with
Proposition 4.6.
Proof Let α1 , α2 , . . . , αp be scalars such that
α1 x1 + α2 x2 + · · · + αp xp = 0.
Observing that all the matrix powers commute, it follows from Proposition
4.14 that
Notice that
(A − λ1 I)(A − λ1 I)k1 −1 x1 = 0,
from which follows that
Notice that (U − λj I)n has all diagonal entries different from zero, except for
the nj diagonal entries of (Uj − λj I)n , which are equal to 0. Since the strictly
upper triangular matrix Uj − λj I satisfies
it follows that the null space of (U − λj I)n has dimension nj . Hence, by The-
orem 4.2 (vi),
dim G(λj ) = dim N (A − λj I)n = nj .
Pp
Since j=1 dim G(λj ) = n, by Proposition 8.3 (ii), it suffices to show that
with n > 1 the only vector lying in any given generalised eigenspace G(λj )
and also in the sum of the other generalised eigenspaces is 0. That is, we
must show that, given any j = 1, . . . , p,
\ X
G(λj ) G(λj ) = {0}.
l∈{1,2,...,p}\{j}
To simplify the notation, we prove only for j = 1, but the proof is easily
generalised for any j. Let x1 ∈ G(λ1 ) be such that
x1 = x2 + · · · + xp ,
P
where x2 + · · · + xp ∈ l∈{1,2,...,p}\{j} G(λj ). Re-writing the above equality,
we have
x1 − x2 − · · · − xp = 0.
It now follows from Lemma 4.2 that all these vectors must coincide with 0.
We can now make the following note:
Now we are finally able to prove Proposition 4.1 for complex matrices.
uj−1 = (A − λI)uj
= Auj − λuj ,
we have
Auj = uj−1 + λuj . (4.15)
If, for example, x were a generalised eigenvector of order n, then
A u1 |u2 | . . . |un = Au1 |Au2 | . . . |Aun (4.16)
λ 1 0 ... 0
..
0 λ 1 . 0
..
A u1 |u2 | . . . |un = u1 |u2 | . . . |un 0 0 , (4.17)
λ . 0
. . .. ..
.. ..
. . 1
0 0 ... 0 λ
| {z }
Jn (λ)
where the positive integers n1 , n2 , . . . , np may not be all distinct and the
scalars λ1 , λ2 , . . . , λp may also be repeated.
Eigenvalues and Eigenvectors 165
and
1 1 0
J3 (1) = 0 1 1
0 0 1
of degree 2 and degree 3, respectively.
Jordan block. However, we will see that any complex matrix A is similar to a
Jordan canonical form.
is a basis of Cm .
where, for for all i = 1, 2, . . . , p, the number ki is the order of the chain
C(wi , 0). However these coefficients correspond to the set consisting of
Azj = Avj ,
Aup+j = 0.
is a linearly independent set which spans Cn+1 , since its cardinality is that of
(4.23). This ends the proof.
168 Linear Algebra
AS = SM,
where M is a block upper triangular matrix where the size of each block in M
equals the dimension of the respective generalised eigenspace. It follows that
If x ∈ G(λ1 ), then
xB1 x M1 − λ1 I 0 xB1
S −1 (A − λ1 I)S = (M − λ1 I) B1 = ,
0 0 0 0 0
Exercise 4.6 Find a Jordan canonical form of the matrix in Example 4.6
and the corresponding similarity matrix.
Solution. We already know that (0, 1, 0) is a generalised eigenvector of order
2 whose Jordan chain is u1 = (1, 0, 0), u2 = (0, 1, 0). Hence, A = SJS −1 with
2 1 0 1 0 0
J = 0 2 0 , S = 0 1 −1 .
0 0 1 0 0 1
Since the the geometric multiplicity is 2 and the dimension of the eigenspace
of any Jordan block is 1, we have that the Jordan canonical form must have
170 Linear Algebra
Finally,
−1
0 1 0 0 0 1 1 0 0 0 0 1 0 0 0
1 0 0 0 0 0 1 0 0 0 1 0 0 0 0
A = SJS −1
1
= 0 0 −1 0 0
0 1 1 0 1
0 0 −1 0 .
−1 0 0 1 1 0 0 0 1 1 −1 0 0 1 1
0 0 1 1 0 0 0 0 0 1 0 0 1 1 0
4.5 Exercises
EX 4.5.1. Consider the matrix
1 2 −3
A= 0 −1 1
−1 −1 2
a) (5, −1, −4) b) (1, −1, −1) c) (0, 0, 0) d) (1, −1, 0) e) (−1, −1, −1)
EX 4.5.4. Which matrices are hermitian? For those which are, determine
their spectrum (use technology).
10 0 1+i 2 −3 + 2i 1
i −8
a) b) 0 2 3 c) A = −3 − 2i −1 1 .
−8 i
1−i 3 7 1 1 −5
172 Linear Algebra
EX 4.5.5. Diagonalise
1 2 2
A = −1 −2 −1
1 1 0
and calculate A21 .
EX 4.5.6. Let v be an eigenvector of an invertible matriz A corresponding
to an eigenvalue λ. Show that the same vector v is an eigenvector
of A−1 corresponding to the eigenvalue λ−1 .
EX 4.5.7. Let A be a matrix with an eigenvector v corresponding to an
eigenvector λ. Show that λ3 is an eigenvalue of A3 and that v is
an eigenvector of A3 corresponding to λ3 .
EX 4.5.8. Consider the matrix
1 b
A= ,
−b −1
where b is a real number. Suppose that 0 ∈ σ(A) and that B is a
2 × 2 matrix. Consider the following assertions:
I) dim N (A) = 1;
II) 0 ∈ σ(BA);
III) A is diagonalisable;
IV) A is not invertible.
Select all the correct assertions.
(i)
5 1 0 0 0 0
0 5 1 0 0 0
0 0 5 0 0 0
0
0 0 −2 1 0
0 0 0 0 4 0
0 0 0 0 0 2
Eigenvalues and Eigenvectors 173
(ii)
5 1 0 0 0 0
0 5 1 0 0 0
0 0 5 0 0 0
0
0 0 −2 0 0
0 0 0 0 −2 0
0 0 0 0 0 4
(iii)
5 1 0 0 0 0
0 5 1 0 0 0
0 0 5 0 0 0
0
0 0 −2 0 0
0 0 0 0 4 0
0 0 0 0 0 −2
(iv)
3 1 0 0
0 3 1 0
0 0 3 0
0 0 0 1
4.6 At a Glance
The spectrum σ(A) of a square matrix A consists of the roots of its char-
acteristic polynomial p(λ) = |A − λI|. These roots are called the eigenvalues
of A.
The eigenspace E(λ) corresponding to the eigenvalue λ is the solution set
of the homogeneous system of linear equations (A − λI)x = 0. The non-zero
vectors in E(λ) are eigenvectors of A.
174 Linear Algebra
The most significant functions between vector spaces are those which respect
the linear structure inasmuch as they map sums of vectors to sums of their
images and scalar multiplication of a vector to scalar multiplication of its
image. These are the so-called linear transformations.
Linear transformations are intrinsically bound together with matrices. As
will be shown in the sequel, once we fix bases both in the domain and the
codomain of a linear transformation, there exists a one-to-one correspondence
between linear transformations and a space of matrices. This runs so deep
that one can think of linear transformations as matrices and vice-versa. This
interchanging might come in very handy: linear transformations will benefit
from our accumulated knowledge of matrices and, conversely, the theory of
matrices might also gain from perceiving them as linear transformations.
The most relevant numbers here are the dimensions of the null space and
the image of a linear transformation. The formula linking these dimensions is
a very important result called the Rank-nullity Theorem (Theorem 5.2).
T (0U ) = 0V ,
Example 5.1 Find which of the following functions are linear transforma-
tions.
a) T : R2 → R2 is a reflection relative to the x-axis.
b) T : R3 → R3 is an orthogonal projection on the xy-plane.
c) T : R2 → R2 is a translation by the vector u = (1, 0).
Hence, every point in this line segment is mapped by T onto the line segment
connecting the images T (b) and T (c). Similarly, it is easily seen that the same
holds for the two remaining line segments connecting a and b and a and c,
respectively. It now follows that the image of the triangle is a triangle whose
vertices are (0, 0), (2, 1), (1, 0). Observe that, by Proposition 3.2, T (a) = (0, 0).
A comment is in order: at this point is not clear why fixing the image of
the two given points b = (1, 1) and c = (2, 0), we have a linear transformation
satisfying these data, or indeed whether it is unique. However, there exists a
§
unique linear transformation which satisfies these requirements as we will see
later (see 5.2).
178 Linear Algebra
b T (b)
T (a) = a c
T (c)
Example 5.3 Show that the function T : M2 (C) → C defined, for all A ∈
M2 (C), by T (A) = tr(A), is a linear transformation.
As before, we must verify that (5.1) and (5.2) hold. Let A, B be matrices
in M2 (C). Then, by Proposition 1.13 (i),
2. Verify, as in Examples 5.1, 5.3, if (5.1), (5.2) hold. If both hold, then
T is a linear transformation. If at least one of them fails, then T is
not a linear transformation.
Linear Transformations 179
[T (x)]Ek = [T (e1 )]Ek | [T (e2 )]Ek | . . . | [T (en )]Ek [x]En (5.5)
| {z }
[T ]Ek ,En
That is,
[T (x)]Ek = [T ]Ek ,En [x]En , (5.6)
where [T ]Ek ,En is a k × n matrix called the matrix of T relative to the
standard bases of the domain Kn and codomain Kk . In what follows
this matrix might be denoted simply by [T ].
Example 5.4 Find the matrix which represents each of the linear transfor-
mations below relative to the relevant standard basis.
a) The reflection relative to the x-axis in R2 (see Figure 5.2).
b) The orthogonal projection on the xy-plane in R3 (see Figure 5.3).
c) The counter-clockwise rotation in R2 around (0, 0) by an angle θ (see
Figure 5.4). Find also an analytic expression for this linear transforma-
tion.
180 Linear Algebra
T (u)
T (u)
y
T (u) θ
We are now ready to make a general statement about the matrix repre-
sentation of a linear transformation relative to fixed bases in its domain and
codomain.
Theorem 5.1 Let U and V be vector spaces over K with dim U = n, dim V =
k, let B1 = (b1 , b2 , . . . , bn ) be a basis of U and B2 be a basis of V and let
T : U → V be a linear transformation. Then, there exists uniquely a k × n
matrix [T ]B2 ,B1 such that, for all x ∈ U ,
[T (x)]B2 = [T ]B2 ,B1 xB1 ,
where xB1 is the coordinate vector of x relative to the basis B1 and [T (x)]B2
is the coordinate vector of T (x) relative to the basis B2 . Moreover,
[T ]B2 ,B1 = [T (b1 )]B2 | [T (b2 )]B2 | . . . | [T (bn )]B2 . (5.8)
The matrix [T ]B2 ,B1 is called the matrix of T relative to the basis B1 of
the domain and the basis B2 of the codomain.
Proof Let x be a vector in U such that its coordinate vector (x)B1 =
(α1 , α2 , . . . , αn ). Then
T (x) = T (α1 b1 + α2 b2 + · · · + αn bn )
and, by Proposition 3.5,
(T x)B2 = T ((α1 b1 + α2 b2 + · · · + αn bn ))B2
= α1 (T (b1 ))B2 + α2 (T (b2 ))B2 + · · · + αn (T (bn ))B2 .
It follows that
α1
α2
[T x]B2 = [T (b1 )]B2 | [T (b2 )]B2 | ... | [T (bn )]B2 . ,
..
αn
which shows that
[T (x)]B2 = [T ]B2 ,B1 xB1 .
Suppose now that A is a k × n matrix such that [T (x)]B2 = AxB1 . Then,
for all i = 1, . . . , n,
0
..
.
0
0 = [T (bi )]B2 − [T (bi )]B2 = [T ]B2 ,B1 (bi )B1 − A(bi )B1 = ([T ]B2 ,B1 − A)
1 ,
0
.
..
0
184 Linear Algebra
where the entry equal to 1 in the column vector lies in row i. Hence, for all
i = 1, . . . , n, the columns i of both matrices [T ]B2 ,B1 and A coincide, yielding
that [T ]B2 ,B1 = A.
T: U →V
x 7→ T (x)
S : Kn → Kk
y 7→ Ay,
where A = [T ]B2 ,B1 (cf. Exercise 5.1). This linear transformation maps the
coordinate vectors of the vectors in U relative to the basis B1 to the coordinate
vectors of their images relative to the basis B2 .
A comment: Now it is clear how we can guarantee the existence and unique-
ness of a linear transformation in Example 5.2 just by giving the image of two
vectors. Indeed we can because these two vectors form a basis of R2 .
N (T ) = {x ∈ U : T (x) = 0V }.
I(T ) = {T (x) ∈ V : x ∈ U }.
In other words, the null space is the subset of U consisting of the vectors
in the domain mapped by T to the zero vector of V , and the image of T is
the subset of V consisting of the images of all the vectors in U .
N (T ) = {x ∈ Kn : T (x) = 0Kk }.
Let A = [T ]Ek ,En be the matrix of T relative to the standard bases of Kn and
Kk . Then,
T (x) = 0 if and only if Ax = 0.
Hence,
N (T ) = N (A).
That is, the null space of T is the null space of the matrix A which represents
T when one considers the standard bases in the domain and codomain.
As to the image I(T ), we have by definition that
I(T ) = {T (x) ∈ Kk : x ∈ Kn }.
186 Linear Algebra
Observing that
[T (x)] = Ax,
the image I(T ) is found by obtaining all the linear combinations Ax of the
columns of A, i.e.,
I(T ) = C(A).
Hence,
I(T ) = span({T (e1 ), T (e2 ), . . . , T (en )}),
which shows that {T (e1 ), T (e2 ), . . . , T (en )} is a spanning set for I(T ), al-
though not necessarily a basis of I(T ).
Example 5.6 Find the null spaces and the images of the the linear transfor-
mations of Example 5.4.
In a), the null space of the matrix [T ] is {(0, 0)} and, hence N (T ) =
{(0, 0)}.
The image I(T ) is the subspace generated by the columns of [T ]. Since this
corresponds to the vectors (1, 0), (0, −1), it follows that I(T ) = R2 . The image
of this linear transformation coincides with the codomain R2 , that is, T is a
surjective function. Indeed, it is also an injective function.
Similarly, in b) we have that N (T ) coincides with the z-axis and the image
I(T ) is the xy-plane, which shows that T is neither surjective nor injective.
As to c), observe that
§
representing matrix relative to the bases of the domain and the codomain,
as we did in 5.3.1, for the particular kind of linear transformations under
scrutiny in that part of the book.
Tackling firstly the null space of T : U → V , we are then interested in
determining the vectors x ∈ U such that T (x) = 0. If A = [T ]B2 ,B1 is the
matrix of T relative to the bases of the domain and codomain, we have
T (x) = 0 if and only if [T (x)]B2 = 0.
It follows that T (x) = 0 if and only if
A[x]B1 = 0,
where this equality corresponds to determining the null space of A. Hence,
once N (A) ⊆ Kn is determined, we have the coordinate vectors relative to the
basis B1 of the vectors in the null space N (T ) of the linear transformation T ,
i.e.,
N (T ) = {α1 u1 + α2 u2 + · · · + αn un : (α1 , α2 , . . . , αn ) ∈ N (A)} ⊆ U. (5.9)
Example 5.7 Find the null space of the linear transformation of Example
5.5.
[T (x)]B2 = A[x]B1
Now we have
0 1 0
C([T ]) = C = R2 .
0 0 2
It follows that
3. Find all the vectors in V whose coordinate vectors lie in C(A): this
is the image of T .
Linear Transformations 189
x ̸= y ⇒ T (x) ̸= T (y)
or, equivalently, if
T (x) = T (y) ⇒ x = y.
Notice that
T (x) = T (y)
if and only if
T (x − y) = 0 ⇔ x − y ∈ N (T ).
Hence,we see that
T (x + N (T )) = {T (x)},
where we define x + N (T ) by
x + N (T ) = {x + z : z ∈ N (T )}.
x 7→ (x)B
is an isomorphism.
n = nul(T ) + rank (T ).
n = dim N (T ) + n,
Exercise 5.3 Give examples of linear transformations which are (i) injective
but not surjective, (ii) surjective but not injective.
ST : U → W
x 7→ S(T (x)) .
That is,
U
T /V
S
ST
W
Proof Exercise.
Suppose that U, V, and W are vector spaces over K whose dimensions are
Hence, the matrix [ST ]BW ,BU of the linear transformation ST relative to the
basis BU in the domain and the basis BW in the codomain is
A / Kp A / [T (x)]B
Kn [x]BU
_ V
B B
BA=[ST ]BW ,BU ! BA=[ST ]BW ,BU
&
Kk [S(T (x))]BW
Hence,
0 −1 1 0 0 1
[ST ] = =
1 0 0 −1 1 0
U = {a1 t + a2 t2 : a1 , a2 ∈ R},
Notice that we were informed that T was invertible to start with and, conse-
quently, did not have to check this. However, we can see it by ourselves now,
since [T ]P1 ,BU is an invertible matrix.
It now follows that
1 0
[T −1 ]BU ,P1 = ,
0 12
yielding
b0 b
[T −1 (b0 + b1 t)]BU = [T −1 ]BU ,P1 = 10 .
b1 2 b1
Hence, T −1 (b0 + b1 t) = b0 t + 21 b1 t2 .
T −1 : V → U
y 7→ x ,
where y = T x.
Linear Transformations 195
a) Denoting by A the matrix [T ]BV ,BU and supposing that dim U = n, show
that A is a n×n invertible matrix. (Hint: use Theorem 3.6 and Theorem
5.3.)
b) Use a) and the equality y = T x to write x as a function of y and
conclude that [T −1 ]BU ,BV = A−1 .
A / [T (x)]En
−1 [x]En
T (x) E = MB←En
[T (x)]B _ O
n
−1
= MB←E B[x]B MB←En MEn ←B
_
n
[x]B
−1
= MB←En
BMB←En [x]En / [T (x)]B
B
Hence
−1
A = MB←En
BMB←En .
If you try to determine the matrix [T ]E2 ,E2 of the linear transformation T
relative to the standard basis, you will be faced with difficulties. Indeed, you
will need to find the images T (1, 0), T (0, 1) and this is by no means immediate.
However, there are vectors whose images are particularly easy to find.
196 Linear Algebra
For example, all vectors lying in the line y = 2x are unchanged by the
linear transformation. Hence, e.g., T (1, 2) = (1, 2).
If one looks at the straight line that goes through the origin and is per-
pendicular to the one given before, then again we have an immediate way of
finding the images of the vectors lying in that straight line. That is the case
of the vector (−2, 1), and we have T (−2, 1) = (2, −1).
If we choose the basis B = ((1, 2), (−2, 1)) of R2 , then
1 0
[T ]B,B = ,
0 −1
since
It follows that
−1 x
T (x, y) E = MB←E [T ]B,B M B←E
2 2 2
y
1 0 x
= ME2 ←B ME−1
2 ←B
0 −1 y
−1
1 −2 1 0 1 −2 x
=
2 1 0 −1 2 1 y
−1
1 −2 1 0 1/5 2/5 x
=
2 1 0 −1 −2/5 1/5 y
−3x + 4y
= 1/5
4x + 3y
Consider now the general case of an arbitrary vector space U endowed with
two bases B1 = (b1 , b2 , . . . , bn ) and B2 = (v1 , v2 , . . . , vn ) and let A = [T ]B1 ,B1
and B = [T ]B2 ,B2 . A reasoning similar to that above yields
[x]B1 / [T (x)]B1
A
_ O
T B = MB1 ←B2 [T ]B2 ,B2 MB2 ←B1
1 ,B1
MB2 ←B1 MB1 ←B2
MB1 ←B2 = MB−1 _
[x]B2
2 ←B1
/ [T (x)]B
2
B
Linear Transformations 197
Hence
[T (x)]B1 = MB−1
2 ←B1
[T (x)]B2
= MB−1
2 ←B1
B[x]B2
= MB−1
2 ←B1
BMB2 ←B1 [x]B1
from which follows that
A = MB−1
2 ←B1
BMB2 ←B1 . (5.11)
Example 5.13 Let U be the subspace of the 2 × 2 complex matrices having
null trace, and let T : U → U be the transposition. Consider the bases of U
1 0 0 1 0 0
B1 = , , ,
0 −1 0 0 1 0
i 0 0 −i 0 i
B2 = , , .
0 −i i 0 i 0
Suppose that A = [T ]B1 ,B1 and B = [T ]B2 ,B2 . Then,
1 0 0 1 0 0
A = 0 0 1 , B = 0 −1 0 .
0 1 0 0 0 1
Check that
−1
i 0 0 i 0 0
A = MB−1
2 ←B1
BMB2 ←B1 = 0 i
2 − 2i B 0 i
2 − 2i .
0 − 2i − 2i 0 − 2i − 2i
[x]B1 / [T (x)]B2
A
_ O
T B = MB2 ←B2′ [T ]B2′ ,B1′ MB1′ ←B1 MB′ ←B
1
MB ′
2 ←B2
2 ,B1 1
_
[x]B1′ / [T (x)]B ′
B 2
It follows that
Hence
A = MB2 ←B2′ BMB1′ ←B1 .
Determine the change of basis matrices M B1′ ←B1 and MB2 ←B2′ .
We have
A = MB2 ←B2′ BMB1′ ←B1 .
Given one of the matrices A or B, one can use this equality to obtain
the other.
Determine one (the easiest) of these matrices, should you not have
one of them to start with. One can now obtain the other matrix using
the above equality.
Hence,
σ(T ) = σ(A),
and
Example 5.14 Find the eigenvalues and the eigenvectors of the reflection
relative to the straight line in R2 whose cartesian equation is y = x.
Solution: The matrix of this reflection relative to the basis B =
((1, 1), (1, −1)) of R2 is
1 0
A= .
0 −1
Hence, σ(T ) = σ(A) = {−1, 1}. The eigenspaces are E(−1) = {(x, −x) : x ∈
R} and E(1) = {(x, y) ∈ R2 : y = x}.
is the matrix of T relative to the basis B = ((1, 1), (1, −1)) of eigenvectors.
According to what we saw when diagonalising matrices in Chapter 4,
−1 1 1
1 1 1 0 1 1 1 1 1 0 2 2
[T ]E2 ,E2 = = .
1 −1 0 −1 1 −1 1 −1 0 −1 12 − 21
Hence,
x 0 1 x
T (x, y) = [T ]E2 ,E2 = ,
y 1 0 y
and we have T (x, y) = (y, x).
Linear Transformations 201
5.8 Exercises
EX 5.8.1. Are the following functions linear transformations?
a) T : R2 → R3 , T (x, y) = (2x − y, x, y − x)
b) T : C3 → C3 , T (z1 , z2 , z3 ) = (−iz2 , (5 − 3i)z3 − z2 , 3z1 )
c) T : R3 → R3 , T (x, y, z) = (x, x + y + z, 2x − 1)
d) T : C2 → C2 , T (z1 , z2 ) = (z2 , (2 − i)z1 )
e) T : M2 (K) → K, T (A) = tr(AB 2 ), with B = E21 (−5).
f) T : P3 → P2 , T is the derivative
g) T : M2,3 (R) → P2 ,
X 1
1 A 1 t + a11 t2
T (A) = aij + 1
i=1,2,j=1,2,3 1
−1 1
A + AT A − AT
T1 (A) = T2 (A) = .
2 2
(a) Find the matrices representing T1 and T2 relative to the standard
basis Bs .
(b) Find the spectra and the eigenspaces of these linear transforma-
tions.
EX 5.8.15. Show that the only linear transformation T : Mn (C) → C satisfy-
ing (i) T (AB) = T (BA), for all A, B ∈ Mn (C), and (ii) T (I) = n
is the trace.
5.9 At a Glance
A linear transformation is an additive and homogeneous function between
vector spaces. In other words, it transforms sums of vectors in sums of their
images and multiplication of vectors by scalars in multiplication of their images
by scalars.
Fixing bases in the domain U and codomain V , a linear transformation T
is represented relative to these bases by a k × n matrix A. Here we assume
that the dimension of the domain is n and of the codomain is k. The image
of x ∈ U is then calculated as [T (x)]BV = AxBU .
We have thus an induced linear transformation from Kn to Kk given by z 7→
Az. If we understand this transformation, then we understand T . Moreover,
if k = n, then the eigenvalues of A and T coincide, and A and T are only
diagonalisable simultaneously.
N (A) consists of the coordinate vectors of the vectors in the null space
N (T ) and C(A) consists of the coordinate vectors of the vectors in the image
I(T ). The null space N (T ) is isomorphic as a vector space to N (A), and the
204 Linear Algebra
Up to here we have been dealing with purely algebraic structures. For example,
we do not have just yet a notion of distance between elements of a vector
space, and implicitly we are overlooking the geometric aspects of spaces. We
will fix that in this chapter by introducing the inner product spaces. Unlike the
previous chapters where we have treated simultaneously the real and complex
vector spaces, here we treat the real and complex inner products separately.
This is because the definitions of these inner products are different in an
essential manner.
⟨·, ·⟩ :V × V → R
(x, y) 7→ ⟨x, y⟩
θ
y
⟨x, y⟩ = x1 y1 + x2 y2 + · · · + xn yn (6.2)
or, alternatively,
⟨x, y⟩ = yT x = xT y.
We shall refer to Rn together with this inner product as the real Euclidean
space Rn .
Analogously, to the plane and space cases, we define the norm of a vector
x by q
p
∥x∥ = ⟨x, x⟩ = x21 + x22 + · · · + x2n .
Also generalising what is usual in the plane and space, the distance
d(x, y) between the points x, y ∈ Rn is defined by
d(x, y) = ∥x − y∥.
⟨A, B⟩ = tr(B T A)
2
X
= aij bij .
i,j=1
(0, 2)
•
• x1
(3, 0)
1 2
9 x1 + 14 x22 = 1
FIGURE 6.2: Ellipse depicting the points at distance 1 from (0, 0) with
respect to the inner product of Exercise 6.2.
⟨(x1 , x2 ), (y1 , y2 )⟩ = 91 x1 y1 + 14 x2 y2
defines an inner product in R2 and, for this inner product, find the circle C
centered at (0, 0) and with radius 1, i.e.,
Solution. It is easily seen that this function satisfies all the conditions in
Definition 48. To find the circle C, we have to determine which points (x1 , x2 )
lie at a distance 1 from (0, 0), that is,
Hence, the points at distance 1 from (0, 0) lie in an ellipse (see Figure 6.2).
This exercise illustrates how the geometry of a space can be changed by
endowing it with different inner products. Here, the ‘usual’ circle has been
changed into an ellipse.
Inner Product Spaces 209
Building on what is known for the Euclidean space, we define the norm of
a vector in an (any) inner product space.
Definition 50 Let V be a real inner product space with inner product ⟨·, ·⟩
and let x ∈ V . The norm of x is defined by
p
∥x∥ = ⟨x, x⟩. (6.3)
d(x, y) = ∥x − y∥.
Proposition 6.1 Let V be a real inner product space and let the norm of a
vector be defined as in (6.3). Then the function defined by
∥ · ∥ :V → R
x 7→ ∥x∥
We prove (i) and (ii) here and leave the proof of (iii) for later, as we shall use
in its proof another inequality, the Cauchy–Schwarz inequality, that will be
proved in the sequel.
Proof (i) It is clear that ∥x∥ ≥ 0, from the very definition of norm (cf.
Definition 50). Moreover, by (6.1), we have that ⟨x, x⟩ = 0 if and only if
x = 0.
(ii) By Definition 48 (ii), for x ∈ V, α ∈ R,
p p p
∥αx∥ = ⟨αx, αx⟩ = α2 ⟨x, x⟩ = |α| ⟨x, x⟩ = |α|∥x∥,
as required.
Exercise 6.3 Let V be an inner product space. Show that for all x, y, z ∈ V ,
In R2 and R3 , we have, for any vectors x, y and the usual scalar product
that,
⟨x, y⟩ = ∥x∥∥y∥ cos θ,
yielding
|⟨x, y⟩| ≤ ∥x∥∥y∥,
since | cos θ| ≤ 1. It turns out that this inequality holds in every inner product
space.
and
|⟨x, y⟩| = ∥x∥∥y∥ (6.6)
if and only if {x, y} is a linearly dependent set.
If we set
⟨x, y⟩
α= ,
∥y∥2
then
⟨x, y⟩2
⟨x, y⟩ ⟨x, y⟩ 2
0 ≤ ⟨x, x⟩ − 2
− ⟨x, y⟩ − ∥y∥ ,
∥y∥ ∥y∥2 ∥y∥2
that is,
⟨x, y⟩
0 ≤ ⟨x, x⟩ − ⟨x, y⟩.
∥y∥2
Hence,
0 ≤ ∥x∥2 ∥y∥2 − ⟨x, y⟩2
from which follows that
⟨x, y⟩2 ≤ ∥x∥2 ∥y∥2 .
Taking square roots,
|⟨x, y⟩| ≤ ∥x∥∥y∥
which proves (6.5).
We show now that
|⟨x, y⟩| = ∥x∥∥y∥ (6.9)
Inner Product Spaces 211
y ∥y∥
∥x + y∥
Having proved Theorem 6.1, we are now ready to prove the triangle in-
equality.
∥x − y∥
y
∥x + y∥
x
∥x + y∥2 + ∥x − y∥2 = ⟨x + y, x + y⟩ + ⟨x − y, x − y⟩
= 2⟨x, x⟩ + 2⟨y, y⟩ + 2⟨x, y⟩ − 2⟨x, y⟩
= 2∥x∥2 + 2∥y∥2 ,
as required.
Up to now, our approach to inner product spaces has been coordinate free.
It is now the time to make use of the fact that vector spaces do have bases
and see how they interact with the inner product.
Let V be a real inner product space and let B = (b1 , b2 , . . . , bn ) be a basis
of V . For x, y ∈ V such that the coordinate vectors of x and y relative to B
are, respectively, xB = (α1 , α2 , . . . , αn ) and yB = (β1 , β2 , . . . , βn ), we have
⟨x, y⟩ = ⟨α1 b1 + α2 b2 + · · · + αn bn , β1 b1 + β2 b2 + · · · + βn bn ⟩
= β1 ⟨b1 , b1 ⟩α1 + β1 ⟨b2 , b1 ⟩α2 + . . . β1 ⟨bn , b1 ⟩αn +
+ β2 ⟨b1 , b2 ⟩α1 + β2 ⟨b2 , b2 ⟩α2 + . . . β2 ⟨bn , b2 ⟩αn +
..
.
+ βn ⟨b1 , bn ⟩α1 + βn ⟨b2 , bn ⟩α2 + . . . βn ⟨bn , bn ⟩αn
⟨b1 , b1 ⟩ ⟨b2 , b1 ⟩ . . . ⟨bn , b1 ⟩ α1
⟨b 1 , b 2 ⟩ ⟨b 2 , b 2 ⟩ . . . ⟨b n , b2 ⟩ α2
= β1 β2 . . . β n . .. .
.. .. ..
.. . . . .
⟨b1 , bn ⟩ ⟨b2 , bn ⟩ . . . ⟨bn , bn ⟩ αn
| {z }
G
⟨b1 , b1 ⟩ ⟨b2 , b1 ⟩ ... ⟨bn , b1 ⟩
⟨b1 , b2 ⟩ ⟨b2 , b2 ⟩ ... ⟨bn , b2 ⟩
G= . (6.10)
.. .. ..
..
. . .
⟨b1 , bn ⟩ ⟨b2 , bn ⟩ . . . ⟨bn , bn ⟩
such that
T
⟨x, y⟩ = yB GxB .
This matrix G = [gij ], where for all i, j = 1, . . . , n, we have gij = ⟨bj , bi ⟩, is
said to be the Gram matrix of the set of vectors {b1 , b2 , . . . , bn }.
xT Ax > 0.
Proposition 6.3 Let V be a real inner product space of dimension n and let
B = (b1 , b2 , . . . , bn ) be a basis of V . Let x, y be vectors in V . Then, there
exists uniquely an n × n real matrix G such that
T
⟨x, y⟩ = yB GxB , (6.11)
Exercise 6.4 Consider the Euclidean space R2 and its standard basis E2 .
Find the Gram matrix of E2 . What is the Gram matrix if now one considers
an inner product ⟨·, ·⟩1 which is that of Exercise 6.2? Calculate ⟨(1, 2), (3, 1)⟩1
using this latter matrix.
Solution. An easy application of (6.10) leads to the Gram matrix
1 0
G= ,
0 1
in the first case. For the second inner product, we have
1
9 0
G= .
0 14
It follows that
1
0 1 5
⟨(1, 2), ((3, 1)⟩1 = 3 1 9
1 = 6
0 4 2
214 Linear Algebra
Proposition 6.4 Let A be a real matrix of order n. The following are equiv-
alent.
(i) The expression
⟨x, y⟩ = yT Ax
defines an inner product on Rn .
(ii) A is a real positive definite matrix.
Proof We prove only that (i) implies (ii) and leave the other implication
as an easy exercise.
We see now that A is symmetric. For i, j = 1, . . . , n,
⟨ei , ej ⟩ = eTj Aei = aji , ⟨ej , ei ⟩ = eTi Aej = aij .
Hence, aij = aji and, therefore, A is symmetric. On the other hand, given
x ∈ Rn ,
xT Ax = ⟨x, x⟩ ≥ 0
and equals 0 only when x = 0.
We saw in Chapter 4 that real symmetric matrices have real eigenvalues
(see Corollary 4.1). We can say more.
Proposition 6.5 A real symmetric matrix is positive definite if and only if
its eigenvalues are positive numbers.
For another result on real positive definite matrices, see Corollary 6.3.
Proof Suppose that A is a positive definite matrix. Let λ be an eigenvalue
of A and let x be and associated eigenvector. Then
0 < xT Ax = λxT x = λ∥x∥2 ,
from which follows that λ > 0. The proof of the converse is postponed until
§6.4.
⟨·, ·⟩ :V × V → C
(x, y) 7→ ⟨x, y⟩
We see that, apart from the inner product of two vectors being now a
complex number, the remaining difference between real and complex inner
products is condition (i) above.
By conditions (i) and (ii),
yielding that the inner product is conjugate linear in the second variable.
A function f : V × V → C satisfying, for all x, y, z ∈ V and α ∈ R,
(a) ⟨αx, y⟩ = α⟨x, y⟩;
(b) ⟨x, αy⟩ = ᾱ⟨x, y⟩;
(c) ⟨x + y, z⟩ = ⟨x, z⟩ + ⟨y, z⟩.
is said to be a sesquilinear function, meaning that it is linear in the first
variable and linear by ‘half’ in the second variable.
Hence, a complex inner product is a sesquilinear function which, because
§
it also satisfies condition (iv), is a positive definite sesquilinear function.
Similarly to 6.1, we define the norm of a vector x ∈ Cn by
p
∥x∥ = ⟨x, x⟩, (6.14)
⟨x, y⟩ = x1 y 1 + x2 y 2 + · · · + xn y n .
Consequently, we have
⟨x, y⟩ = yT x.
It follows from (6.14),
that is, p p
∥x∥ = ⟨x, x⟩ = |x1 |2 + |x2 |2 + · · · + |xn |2
The space Cn together with this inner product is said to be the complex
Euclidean space.
(iii)
∥x + y∥ ≤ ∥x∥ + ∥y∥. Triangle inequality (6.16)
Showing that (i)–(iii) are properties of the norm can be done similarly to the
real case. In fact, the fundamental Cauchy–Schwarz inequality and parallelo-
gram law do hold also in this setting and we make a note of this in the next
theorem, whose proof can be easily adapted from the corresponding real case.
Theorem 6.2 Let V be a complex inner product space. The following hold.
(i) For x, y ∈ V ,
and
|⟨x, y⟩| = ∥x∥∥y∥
if and only if {x, y} is a linearly dependent set.
(ii) For x, y ∈ V ,
In a complex inner product space V one can also make a coordinate ap-
proach to the inner product. We shall also get a Gram matrix corresponding
to a fixed basis of V .
If B = (b1 , b2 , . . . , bn ) is a basis of V , then, given x, y ∈ V such that
xB = (α1 , α2 , . . . , αn ) and yB = (β1 , β2 , . . . , βn ), we have
⟨x, y⟩ = ⟨α1 b1 + α2 b2 + · · · + αn bn , β1 b1 + β2 b2 + · · · + βn bn ⟩
= β 1 ⟨b1 , b1 ⟩α1 + β 1 ⟨b2 , b1 ⟩α2 + . . . β 1 ⟨bn , b1 ⟩αn +
+ β 2 ⟨b1 , b2 ⟩α1 + β 2 ⟨b2 , b2 ⟩α2 + . . . β 2 ⟨bn , b2 ⟩αn +
..
.
+ β n ⟨b1 , bn ⟩α1 + β n ⟨b2 , bn ⟩α2 + . . . β n ⟨bn , bn ⟩αn
⟨b1 , b1 ⟩ ⟨b2 , b1 ⟩ . . . ⟨bn , b1 ⟩ α1
⟨b 1 , b 2 ⟩ ⟨b 2 , b 2 ⟩ . . . ⟨b n , b 2 ⟩ α2
= β1 β2 . . . βn . .. .
.. .. ..
.. . . . .
⟨b1 , bn ⟩ ⟨b2 , bn ⟩ . . . ⟨bn , bn ⟩ αn
| {z }
G
such that
⟨x, y⟩ = yTB GxB .
The matrix G = [gij ], where, for all i, j = 1, . . . , n, we have gij = ⟨bj , bi ⟩, is
said to be the Gram matrix of the set of vectors {b1 , b2 , . . . , bn }.
xT Ax > 0.
Similarly to the real case, the next proposition, whose proof is left as an
exercise, collects some properties of the Gram matrix.
xT
B GxB > 0. (6.18)
The next two results are a counterpart of Proposition 6.4 and Proposition
6.5 for the complex inner product. Their proofs are an easy adaptation to
the complex setting of those propositions and, for this reason, are left as an
exercise.
This proposition shows that, when one considers a complex inner product
space V , the Gram matrix corresponding to some basis of V is an invertible
matrix.
⟨x, y⟩
cos θ = .
∥x∥∥y∥
Exercise 6.5 Find the angle θ between the vectors (1, 1, −1, 0), (0, 0, 0, 1) in
the Euclidean space R4 .
Solution. Using Definition 54,
Exercise 6.6 Which vectors are orthogonal to (1, 1, 0), in the Euclidean space
R3 ?
Solution. We want to find the vectors (x, y, z) ∈ R3 such that
x
0 = ⟨(x, y, z), (1, 1, 0)⟩ = 1 1 0 y = x + y.
z
Hence the vectors orthogonal to (1, 1, 0) are those in the plane whose cartesian
equation is x + y = 0.
∥x + y∥2 = ⟨x + y, x + y⟩
= ⟨x, x⟩ + ⟨y, y⟩ + ⟨x, y⟩ + ⟨y, x⟩
= ⟨x, x⟩ + ⟨y, y⟩ = ∥x∥2 + ∥y∥2 .
220 Linear Algebra
W ⊥ = {x ∈ V : x ⊥ W }.
Exercise 6.7 Find the orthogonal complement of the straight line U spanned
by (1, 1, 0).
Solution. We want to find the vectors (x, y, z) ∈ R3 such that, for all α ∈ R,
(x, y, z) ⊥ α(1, 1, 0). Hence,
from which follows that the orthogonal complement U ⊥ of U is the plane with
equation x + y = 0.
Proof We need to show that W ⊥ is closed under vector addition and scalar
multiplication. Let x, y be vectors in W ⊥ and let w ∈ W . Then
⟨x + y, w⟩ = ⟨x, w⟩ + ⟨y, w⟩ = 0 + 0 = 0,
⟨αx, w⟩ = α⟨x, w⟩ = α0 = 0.
We have shown that W ⊥ ̸= ∅ is closed under vector addition and scalar mul-
tiplication thus concluding the proof that W ⊥ is a subspace of V .
Using Corollary 6.1, it suffices to find a basis of W and all the vectors in
R3 orthogonal to that basis. The set {(1, 1, 0), (0, 0, 1)} is a basis of W and
(x, y, z) is orthogonal to this basis if and only if
h i x
⟨(x, y, z), (1, 1, 0)⟩ = 1 1 0 y = 0
z
h i x
⟨(x, y, z), (0, 0, 1)⟩ = 0 0 1 y = 0.
z
It follows that W ⊥ is the straight line of which we give three possible sets of
equations (
x = −y
cartesian equations
z=0
or
(x, y, z) = t(−1, 1, 0) (t ∈ R) vector equation
222 Linear Algebra
W⊥
•0
or
x = −t
y=t (t ∈ R) parametric equations
z=0
Notice that, when dealing with Rn , the bar over the vectors is irrele-
vant.
Since both sets BW , BW ⊥ are linearly independent, we have finally that, for
i = 1, . . . , k and j = 1, . . . , r, all the scalars αi = 0 and βj = 0.
To finish the proof, it suffices to show that r = n − k. Since B is a linearly
independent set, by Theorem 3.5 (ii), we already know that r + k ≤ n.
Suppose that r < n − k. We show next that, in this case, it is possible to
find a non-zero vector v ∈ W ∩ W ⊥ . Observe that, by Proposition 6.11, this
is a contradiction.
We want to find all the vectors v ∈ V such that v ⊥ W and v ⊥ W ⊥ .
Hence, by Corollary 6.1, we want to find the solutions of the problem
(
⟨v, bi ⟩ = 0, i = 1, . . . , k
⟨v, uj ⟩ = 0, j = 1, . . . , r.
If B ′ is a basis of V , this can be written in terms of the coordinate vectors
relative to B ′ as the homogeneous system of linear equations
[b1 ]TB ′
..
.
[bk ]T ′
B
GvB′ = 0,
[u1 ]T ′
B
.
..
[ur ]TB ′
| {z }
A
Inner Product Spaces 225
where G is the Gram matrix relative to the basis B ′ . Observe that, when V
is a real inner product space, the bar over the first matrix is not relevant: it
does not change anything, since the conjugate of a real number is that same
number.
One can re-write the equation above as Ax = 0, where x := GvB′ lies in
Kn .
Since B is linearly independent, that is, the rows of the (k + r) × n matrix
A are linearly independent, rank (A) = k + r (see Exercise 6.8 below). In
other words, the null space N (A) has dimension n − k − r ≥ 1. Consequently,
N (A) ̸= {0}. Hence, we have a non-zero solution v of our initial problem.
That is,
v ∈ W ∩ W ⊥ ̸= {0},
contradicting Proposition 6.11. It follows that r = n − k, as required.
Exercise 6.8 Let B be a complex matrix. Show that rank (B) = rank (B).
k = dim W ≤ dim W ⊥⊥ = n − (n − k) = k.
x = xW + xW ⊥ . (6.22)
Notice that this theorem implies that every subspace W induces a splitting
of V into a sum of two subspaces, that is, V = W + W ⊥ .
Proof By Lemma 6.1, we know that any x ∈ V is a linear combination
of the vectors of the basis B = BW ∪ BW ⊥ . It follows that
k
X n−k
X
x= αi bi + βj uj = xW + xW ⊥ , (6.23)
i=1 j=1
where we use the same notation as in the proof of Lemma 6.1. It only remains
to prove that this decomposition of x relative to W and W ⊥ is unique. Suppose
that x = xW +xW ⊥ and x = x′W +x′W ⊥ , for some x′W ∈ W and x′W ⊥ ∈ W ⊥ .
Then,
0 = (xW − x′W ) + (xW ⊥ − x′W ⊥ ).
226 Linear Algebra
xW − x′W = 0 = xW ⊥ − x′W ⊥ .
V = W ⊕ W ⊥. (6.24)
Exercise 6.9 Can you propose an answer (even if not a formal one) to the
next questions?
a) Let X ⊆ R2 be an orthogonal set not containing (0, 0). How many vec-
tors, at most, lie in X?
b) Let X ⊆ R3 be an orthogonal set not containing (0, 0, 0). How many
vectors, at most, lie in X?
α1 v1 + · · · + αk vk = 0 ⇒ α1 , . . . , αk = 0.
Inner Product Spaces 227
Now we are ready to answer the questions in Exercise 6.9. How many are
they?
We can say confidently that: a) 2; b) 3.
The proof of the following corollary is left as an exercise.
Corollary 6.2 Let V be an inner product space of dimension n and let X =
{v1 , . . . , vk } be an orthogonal subset of V not containing 0. Then k ≤ n.
Moreover, if k = n, then X is a basis of V .
The form of the orthogonal complements of the subspaces associated with
a real matrix are given in the next result.
Proposition 6.14 Let A be a n×k real matrix and let Rn and Rk be endowed
with the usual inner products (6.2). Then the following hold.
(i) L(A)⊥ = N (A).
Proof (i) Since the rows of A are a spanning set for L(A), by Proposition
6.10, the orthogonal complement of L(A) consists of the solution set of the
system Ax = 0. In other words, L(A)⊥ = N (A).
(ii) By Proposition 6.12 and (i) of this proposition, we have
xB = (α1 , . . . , αn ),
that is,
x = α1 b1 + · · · + αn bn .
Since B is an orthogonal set, we have, for i = 1, . . . , n,
n
X
⟨x, bi ⟩ = ⟨αj bj , bi ⟩ = αi ∥bi ∥2 .
j=1
Hence, for i = 1, . . . , n,
⟨x, bi ⟩
αi = . (6.25)
∥bi ∥2
Moreover, if B is an orthonormal basis, then we have even a simpler formula
to calculate the coordinate vector of x. For all i = 1, . . . , n,
αi = ⟨x, bi ⟩. (6.26)
Example 6.5 Find the coordinate vector of (1, 2, 3) relative to the orthogonal
basis B = ((1, 1, 0), (0, 0, 1), (1, −1, 0)) of R3 .
projy x y
When dealing with orthogonal basis, as we have seen, it is very easy to obtain
the coordinates of any given vector. However, one might ask ‘Does an inner
§
product space always possess an orthogonal basis?’ The answer is ‘Yes’. We
shall see how to construct such a basis in 6.3.3. For the moment, we shall
keep on developing the properties of an inner product space assuming that it
has an orthogonal basis.
At this point, an observation is in order. We can get an orthonormal basis
out of an orthogonal one. Indeed, let B = {b1 , . . . , bn } be an orthogonal basis
of V . Then, for i = 1, . . . , n, by a property of the norm
1 1
bi = ∥bi ∥ = 1.
∥bi ∥ ∥bi ∥
If we apply this to the orthogonal basis of Example 6.5, we have that the
basis
B ′ = (( √12 , √12 , 0), (0, 0, 1), ( √12 , − √12 , 0))
is an orthonormal basis of R3 .
Summing up:
Now we know that orthonormal bases are easy to come by, provided we
have orthogonal bases to start with.
230 Linear Algebra
(Compare with (6.25), and check that this is exactly what we have in R2 and
R3 for the usual inner product.)
Let W be a k-dimensional subspace of V and let
In other words,
Hence, we have
It is worth to point out three facts which are consequences of the definitions
of orthogonal projections:
(i) In the extreme case W = {0}, the orthogonal projection projW x of any
given vector x is 0;
(ii) If W is spanned by a single non-zero vector y, then
projW x = projy x,
Inner Product Spaces 231
W⊥
x
xW ⊥
xW
•0
Example 6.6 Let W be the plane in the Euclidean space R3 whose cartesian
equation is x = y. Find projW (1, 2, 3) and projW ⊥ (1, 2, 3).
Hence, a basis of W is
Since
⊥ 1 1 0
W =N
0 0 1
(cf. (6.19)), it follows that one basis of W ⊥ is BW ⊥ = {(1, −1, 0)}. This is of
course an orthogonal basis and, therefore,
⟨(1, 2, 3), (1, −1, 0)⟩ 1 1
projW ⊥ (1, 2, 3) = (1, −1, 0) = − , , 0 .
2 2 2
Finally, we have
1 1 3 3
projW (1, 2, 3) = (1, 2, 3) − − , , 0 = , ,3 .
2 2 2 2
2. projW ⊥ x = x − projW x;
3. Finally x = projW x + projW ⊥ x.
whose coefficients are, respectively, bT1 x, . . . , bTk x. It follows that we can write
the equality as
projW x = BB T x.
In other words, one can construct a linear transformation, the projection PW
onto the subspace W , PW : Rn → Rn defined by
where, for each i ∈ {1, 2, . . . , k}, the n × n matrix bi bTi is a matrix corre-
sponding to the projection onto the subspace spanned by bi . Each of these
matrices has rank 1, since its columns are multiples of the first column.
Notice that the second equality in (6.30) reiterates that the orthogonal
projection onto W is the sum of the orthogonal projections on the basis vectors
(see (6.28)).
W⊥
∥ projW x∥
x ∥ projW ⊥ x∥
projW ⊥ x
projW x
•0
and r √
1 1 1 1 2
d((1, 2, 3), W ) = − , ,0 = + = .
2 2 4 4 2
Similarly, the best approximation of (1, 2, 3) in W ⊥ is
1 1
projW ⊥ (1, 2, 3) = − , , 0
2 2
and r √
⊥ 3 3 9 9 54
d((1, 2, 3), W ) = , ,3 = + +9= .
2 2 4 4 2
Alternatively, using Theorem 6.3,
√ !2
⊥ 2 2 2 2
d((1, 2, 3), W ) = ∥x∥ − = 14 − ,
2 4
b2
u2
projb1 x b1
Step 1. Choose a (any) vector from X, say, u1 = (−1, 0, 1, 0), and set it
as the first vector of the new basis: b1 = u1 = (−1, 0, 1, 0).
Step 2. Let b2 = u2 − proju1 u2 . Notice that
It is easily seen that the first equality holds (hint: proju1 u2 is spanned by
u1 ). The second equality is obvious.
Observe that, by Theorem 6.4, b2 lies in (span{b1 })⊥ . Hence, {b1 , b2 } is
an orthogonal basis of span{u1 , u2 }.
Since
⟨u2 , u1 ⟩
proju1 u2 = u1 = − 12 u1 = − 21 (−1, 0, 1, 0),
∥u1 ∥2
we have that
(check that b1 ⊥ b2 ).
By now, probably it is already clear from Figure 6.10 the way things are
going.
Step 3. Let
Notice that
span{b1 , b2 , b3 } = span{b1 , b2 , u3 } = S
and that, by Theorem 6.4, b3 lies in (span{b1 , b2 })⊥ . We have
Inner Product Spaces 237
b3
W u3
b2
0
b1
projW u3
b3 = u3 − (projb1 u3 + projb2 u3 )
⟨u3 , b1 ⟩ ⟨u3 , b2 ⟩
= u3 − b 1 + b 2
∥b1 ∥2 ∥b2 ∥2
6 2 6 10
= ( 11 , − 11 , 11 , 11 ).
Gram–Schmidt process
Let V be an inner product space and let S be a subspace of V spanned
by a linearly independent subset X = {u1 , u2 , u3 , . . . , uk−1 , uk } of V . To
obtain a orthogonal basis {b1 , b2 , b3 , . . . , bk−1 , bk } of S, proceed as follows.
1. Fix a(ny) vector, say, b1 = u1 .
b2 = u2 − projb1 u2 ,
b3 = u3 − (projb1 u3 + projb2 u3 )
An orthonormal basis of S is
1 1 1 1 1
b1 , b2 , b3 , . . . , bk−1 , bk .
∥b1 ∥ ∥b2 ∥ ∥b3 ∥ ∥bk−1 ∥ ∥bk ∥
SS T = I = S T S.
Proposition 6.15 Let S be a matrix in Mn (R). Then the following are equiv-
alent.
(i) S is an orthogonal matrix.
Inner Product Spaces 239
(ii) S T S = I.
(iii) SS T = I.
(iv) The columns of S are an orthonormal basis of Rn .
(v) The rows of S are an orthonormal basis of Rn .
Proof The equivalence between (i), (ii), and (iii) has been proved above.
Notice that
(S T S)ij = cTi cj = ⟨ci , cj ⟩,
where ci , cj are, respectively, the columns i and j of S. Hence, (ii) yields that
⟨ci , cj ⟩ = 0, for i ̸= j, and
⟨ci , ci ⟩ = ∥ci ∥2 = 1.
AT = (SDS T )T = SDS T = A,
that is, A is a real symmetric matrix. In fact, the converse is also true. Before
proving this, we need an auxiliary result.
We already know that the spectrum of a real symmetric matrix is a non-
empty set of real numbers (see Corollary 4.1). Now we add something about
its eigenspaces.
(λ1 − λ2 )⟨x1 , x2 ⟩ = 0.
Hence, ⟨x1 , x2 ⟩ = 0.
240 Linear Algebra
Corollary 6.3 Let A be a real positive definite matrix. Then, there exists a
non-singular matrix B such that A = BB T .
xT Ax = xT SDS T x.
Let D′ be the diagonal matrix whose entries are the square root of the corre-
sponding entries of D. Then,
that is,
xT Ax = ∥D′ S T x∥2 > 0,
as required.
where
the diagonal
entries of D are the n eigenvalues of A, and if S =
u1 u2 . . . un is the diagonalising orthogonal matrix, then
Notice that the matrices u1 uT1 , u2 uT2 , . . . , un uTn are n × n projection ma-
trices onto the subspaces spanned, respectively, by u1 , u2 , . . . , un (cf. (6.30)).
242 Linear Algebra
Hence,
2
− 31 − 31
1
1 1
3 3 3 3
1 2
+ 4 − 3 − 13
1 1 1
A = 7 .
3
3 3 3
1
− 13 2
1 1 1 −3 3
3 3 3
Observe that these matrices are symmetric and idempotent as they should be,
since they correspond to orthogonal projections (onto the eigenspaces).
This exercise served two purposes: one was to show how to obtain an or-
thogonal diagonalisation/spectral decomposition of a real symmetric matrix,
the other was to use an alternative way of calculating the eigenvalues bypassing
the (often not so easy) problem of determining the roots of the characteristic
polynomial.
Proposition 6.16 Let S be a matrix in Mn (C). Then the following are equiv-
alent.
(i) S is a unitary matrix.
T
(ii) S S = I.
T
(iii) SS = I.
The proof of this proposition is left as an easy exercise (see the proof of
Proposition 6.15).
Definition 66 A complex square matrix is said to be unitarily diagonal-
isable, if there exist a diagonal matrix D and an unitary matrix S such that
T
D = S AS.
It is easy to see that, if D is real, then A is hermitian.
The next result refers to the orthogonality of the eigenspaces of a hermitian
matrix. The proof is an easy adaptation of that of Lemma 6.2.
244 Linear Algebra
Corollary 6.4 Let A be a complex positive definite matrix. Then, there exists
T
a non-singular matrix such that A = BB .
Proof Exercise.
Finally, we can present the spectral decomposition for hermitian matrices:
Notice once again that the matrices u1 uT1 , u2 uT2 , . . . , un uTn are n × n pro-
jection matrices onto the subspaces spanned, respectively, by u1 , u2 , . . . , un
(cf. (6.31)).
if A is real, or
if A is complex.
Inner Product Spaces 245
C(A) = span {Avi : i = 1, 2, . . . , r} ∪ {Avi : i = r + 1, . . . , n}
= span {Avi : i = 1, 2, . . . , r} ∪ {0}
= span{Avi : i = 1, 2, . . . , r}.
Let {u1 , u2 , . . . , ur , ur+1 , . . . , uk } be an orthonormal basis of Rk contain-
ing the orthonormal basis of C(A) defined above. Then,
D 0
A v1 v2 . . . vn = u1 u2 . . . uk , (6.34)
| {z } | {z } 0 0
V U | {z }
Σ
246 Linear Algebra
A = U ΣV T ,
The eigenvalues of
2 0
AT A =
0 3
√
are λ1 = 3 and λ2 = 2. Hence, the singular values of A are 3 = σ1 > σ2 =
√
2. Norm one eigenvectors of AT A are, for example, v1 = (0, 1), v2 = (1, 0).
We have that
1 1 1 1
u1 = Av1 = √ (−1, 1, 1), u2 = Av2 = √ (1, 0, 1).
∥Av1 ∥ 3 ∥Av2 ∥ 2
A norm one vector orthogonal to u1 , u2 is, for example, u3 = √1 (−1, −2, 1).
6
Finally, the singular value decomposition sought is
1 √
− √3 √12 − √16
3 √0
√1 2 0 1
A= 3 0 − √
6
0 2 .
1 0
√1 √1 √1 0 0
3 2 6
λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0
Inner Product Spaces 247
T
be all the eigenvalues of the hermitian matrix A A (possibly repeated). Notice
that, similarly to what we did for real matrices above, it is easily seen that
these eigenvalues are all non-negative.
Let {v1 , v2 , . . . , vn } be an orthonormal basis of Cn consisting of eigenvec-
tors such that, for all i = 1, 2, . . . , n,
T
A Avi = λi vi .
The singular values of A are defined by
p
σi = λi , i = 1, 2, . . . , n.
Let σ1 ≥ σ2 ≥ . . . σr > 0 be all the non-zero singular values (possibly
repeated), and define, for all i = 1, 2, . . . , r,
1 1
ui = Avi = Avi .
∥vi ∥ σi
Define the unitary matrices
U = u1 u2 . . . uk , V = v1 v2 ... vn , (6.35)
k
where {u1 , u2 , . . . , ur , ur+1 , . . . , uk } is an orthonormal basis of C .
Theorem 6.10 (Singular value decomposition – complex matrix) Let
A be a k × n complex matrix. Then there exist a k × k unitary matrix U and
an n × n unitary matrix V such that
T
A = U ΣV ,
where U, V are as in (6.35) and the k × n matrix Σ is as in (6.34).
Proof Exercise.
Example 6.9 We are going to obtain a singular value decomposition of
0 1 i
A= .
−i 0 0
The eigenvalues of
1 0 0
T
A A = 0 1 i
0 −i 1
are λ1 = 2, λ2 = 1, and λ3 = 0. An eigenvector corresponding to λ1 = 2 is
v1 = (0, i, 1), an eigenvector corresponding to λ2 = 1 is v1 = (1, 0, 0), and
eigenvector corresponding to λ3 = 0 is v3 = (0, −i, 1).
We obtain u1 = (i, 0), u2 = (0, −i). The singular value decomposition of
A is then
√ 0 − √i2 √12
i 0 2 0 0
A= 1 0 0 .
0 −i 0 1 0 i 1
0 √
2
√
2
248 Linear Algebra
λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0
A = U ΣV T , if A is a real matrix,
and
T
A = U ΣV , if A is a complex matrix.
Inner Product Spaces 249
S = p + W. (6.37)
x=y+p
or, equivalently,
y = x − p. (6.38)
The equality (6.38) shows that we can find a vector equation, cartesian equa-
tions or parametric equations for S using the corresponding equations of W .
Indeed, it suffices to replace y by x − p in those equations.
p
x
0•
y
for some t ∈ R. This is a vector equation for S which yields the parametric
equations for S
x1 = 1
x2 = 1
x3 = 1 + t (t ∈ R).
A(x − p) = 0.
Ax = Ap.
Hence,
5(x1 − 1) + (x2 − 2) − 2x3 = 0
from which follows that
5x1 + x2 − 2x3 = 7
is a cartesian equation for S.
From this equation we have that x2 = −5x1 + 2x3 + 7. Hence,
x1 1 0 0
x2 = t −5 + s 2 + 7 (s, t ∈ R)
x3 0 1 0
d(q, x) = ∥q − x∥
= ∥(q − p) + (p − x)∥
| {z }
−y∈W
= ∥(q − p) − y∥
= d(q − p, y).
§
The minimum value of this distance d(q − p, y) is attained when y =
projW (q − p), as we have seen in 6.3.3; y is the best approximation of q
in S. Naturally, this suggests the next definition.
Example 6.12 Calculate the distance of (3, 2, −1) to the plane S of Example
6.11.
d((3, 2, −1), S) = ∥ projW ⊥ ((3, 2, −1) − (1, 2, 0))∥ = ∥ projW ⊥ ((2, 0, −1)∥.
We have
⟨(2, 0, −1), (5, 1, 2)⟩ 8
projW ⊥ ((2, 0, −1) = 2
(2, 0, −1) = (2, 0, −1).
∥(2, 0, −1)∥ ∥(2, 0, −1)∥2
Hence
8 8 8
d((3, 2, −1), S) = ∥(2, 0, −1)∥ = =√ .
∥(2, 0, −1)∥2 ∥(2, 0, −1)∥ 5
Inner Product Spaces 253
6.7 Exercises
EX 6.7.1. For the vectors u = (4, 1, 0, −2) and v = (2, −1, 0, 3) in R4 , calcu-
late:
(a) ∥u + v∥
(b) ∥u∥ + ∥v∥
(c) ∥ − 3u∥
1
(d) ∥v∥ v
1
(e) ∥v∥ v
(f) ∡(u, v)
(g) d(u, v)
EX 6.7.2. Find two norm one vectors orthogonal to (1, 1, −2) and
(−2, 3, −1).
EX 6.7.3. Verify that the Cauchy–Schwarz inequality holds for (1, 1, 2) and
(2, 1, 3). Verify also that the parallelogram law holds for the same
vectors.
EX 6.7.4. Consider the usual inner product in C2 . Find proj(1,2) (−i, −i).
EX 6.7.5. Use the Gram–Schmidt process to find an orthonormal basis for
the subspace spanned by {(−2, 2, −2, 2), (1, 1, 3, −1), (0, 0, 0, 1)}.
EX 6.7.6. Find a basis and cartesian equations for the orthogonal comple-
ment of the subspace W = {(x, y, z) : y + 2z = 0, x − y = z} of
R3 .
EX 6.7.7. Let S be the subspace of R4 defined by S = L{(2, 1, 1, 0)}. Find
the distance of u = (1, 1, 1, 1) to S ⊥ . Determine the matrix corre-
sponding to the orthogonal projection onto S, and u1 ∈ S, u2 ∈
S ⊥ such that u = u1 + u2 .
EX 6.7.8. Consider the subspace
S = L 34 , 1, −1 , (0, −1, 1) of R3 . Suppose that R3 is endowed
with the inner product ⟨·, ·⟩ whose Gram matrix of the vectors of
the standard basis E3 is
3 −2 1
G = −2 2 0 .
1 0 2
4
(a) Show that 3 , 1, −1 , (0, −1, 1) is an orthogonal basis of S.
254 Linear Algebra
(a) dim(W ⊥ )
(b) d(p, W ⊥ )
(c) an orthonormal basis of W ⊥
EX 6.7.11. Let A a real square matrix such that its column space is
6.8 At a Glance
A real or complex vector space V can be endowed with an inner product,
with respect to which each vector has a norm.
The inner product is given by a positive definite matrix, the Gram matrix
with respect to the basis of V . The Gram matrix depends on the basis.
By means of the inner product, one defines the distance between two points
in the space, the angle, and the notion of orthogonal vectors.
Any subspace W of V leads to a direct sum V = W ⊕ W ⊥ , yielding a
unique splitting of each vector in V into two summands, one from W and the
other from the orthogonal complement W ⊥ of W .
The orthogonal projection of x ∈ V on W is given, with respect to an
orthonormal basis of V , by a projection matrix A: projW x = Ax. The ma-
trix A is idempotent and symmetric (respectively, hermitian) if V is a real
(respectively, complex) vector space.
It is always possible to have an orthonormal basis, since any basis can be
transformed into one using the Gram–Schmidt process. Nevertheless, there is
§
a formula for the projection matrix considering any given basis (see the How
to determine the matrix of the orthogonal projection onto a subspace in 7.1).
Real symmetric matrices and hermitian matrices are diagonalisable: in the
former case, the diagonalising matrix can be chosen to be orthogonal and
unitary, in the latter.
Real symmetric matrices and hermitian matrices have a spectral decom-
position.
A real (respectively, complex) positive definite matrix A is the product of
T
two non-singular matrices A = BB T (respectively, A = BB ).
The singular values of a real (respectively, complex) k × n matrix A are
T
the square roots of the eigenvalues of AT A (respectively A A).
The singular values allow for a factorisation of A = U ΣV T (respectively,
T
A = U ΣV ), where U, V are orthogonal (respectively, unitary) and Σ is a
k × n matrix having the non-zero singular values of A in its diagonal.
Chapter 7
Special Matrices by Example
In this chapter, we introduce some types of matrices via examples, the latter
chosen from applications of Linear Algebra. Each section is a brief introduction
to a special type of matrix always motivated by the analysis of a concrete
problem. The main purpose is to highlight some particular matrices arising in
applications rather than explore profoundly each application itself. To do so
would be beyond the scope of this book.
b̂ = Ax̂
It follows that
AT (b − Ax̂) = 0,
i.e.,
AT Ax̂ = AT b. (7.1)
Notice that the columns of A span the plane x + y + z = 0 and that (1, 2, 3)
does not lie in this plane.
By (7.1), we have to solve the system
1 1 1
1 −1 0 x̂ 1 −1 0
−1 0 = 2 .
1 0 −1 ŷ 1 0 −1
0 −1 3
Hence,
2 1 x̂ −1
= ,
1 2 ŷ −2
yielding the least squares solution (0, −1). The error vector is
1 1 1 2
e = b − b̂ = 2 − 0 −1 − (−1) 0 = 2 .
3 0 −1 2
√ √
The error is ∥e∥ = 22 + 22 + 22 = 2 3.
x̂ = (AT A)−1 AT b
and
b̂ = Ax̂ = A(AT A)−1 AT b.
| {z }
P
xT A T 2
| {zAx} = ⟨Ax, Ax⟩ = ∥Ax∥ = 0
0
Example 7.2 We want to find the straight line that, in some sense, best fits
the points (1, 2), (2, 5), (3, 3), (4, 8). More precisely, we want to determine a
straight line y = ĉ + m̂x which gives a least squares solution of the problem
1 1 2
1 2 c 5
1 3 m = 3 .
1 4 8
We have
−1
1 1 2
ĉ 1 1 1 1
1 2 1 1 1 5 .
1
=
m̂ 1 2 3 4 1 3 1 2 3 4 3
1 4 8
The solution is (ĉ, m̂) = ( 12 , 85 ) from which follows that the best fitting straight
line is y = 12 + 85 x.
We are describing here a process where the probability of the system being
in some state at a given observation time tm only depends on which state it
Special Matrices by Example 261
Suppose that the initial state of our system is 1 and consider the corre-
sponding initial state vector
1
x0 = ,
0
i.e., the particle is in state 1 with probability 1 and has zero probability of
being in state 2. It follows from probability theory that the state vector of the
next observation is
0.6 0.3 1 0.6
x1 = P x0 = = .
0.4 0.7 0 0.4
Hence, the particle will be in state 1 with 0.6 probability and in state 2 with
0.4 probability.
Similarly, the following observation x2 satisfies
2
0.6 0.3 1 0.48
x2 = P x1 = P 2 x0 = =
0.4 0.7 0 0.52
and, in general,
xk = P k x0 .
the sum of all its entries equals 1, since x1j is the probability of the system
being in state 1 at observation j, x2j is the probability of the system being in
state 2 at observation j, etc. An n × 1 vector whose entries are non-negative
and add up to 1 is said to be a probability vector.
In the example that we have been analysing, one can verify that, for k ≥ 6,
0.428
xk ≈ , (7.2)
0.571
if only three decimal places are considered. Hence this system seems to be
approaching a steady state. In other words, what seems to be happening is
that
0.428
lim xn ≈ .
0.571
The behaviour displayed in this example does not always occur, for a
Markov chain may not approach a steady state.
Example 7.3 If we have the 2 × 2 transition matrix
0 1
P =
1 0
0.2
and an initial state vector x0 = , then the system oscillates between the
0.8
state vectors
0.8 0.2
x1 = , x2 = .
0.2 0.8
It follows that 1 ∈ σ(P T ). Since σ(P T ) = σ(P ) (see Proposition 4.3 (i)), the
Markov matrix itself has an eigenvalue equal to 1.
Special Matrices by Example 263
0 0 0.5 1
Given a system does it follow that whatever the initial state, the system
will approach a steady-state vector? No. As is evident from Example 7.3,
whatever the initial state vector x0 we start with, the system will always
264 Linear Algebra
1 o 2
_ O
3 4
0, 220
Hence, we have a webpage ranking: the most important page is 1 followed by
page 2, and next we have pages 3 and 4 of equal importance.
The Leslie matrix model requires that the interval between consecutive
observations have the same length as each age group.
Let
x1 (0)
x2 (0)
x(0) = .
..
xn (0)
be the initial age distribution vector displaying the number of females in
each age group at time t0 = 0 and let x(tk ) be the number of females in each
group at time tk , i.e., the age distribution vector at time tk .
During a 5-year time span, it is expected to have deaths, births and aging
in each age group. Hence, for i = 1, . . . , n, let bi denote the expected number
of daughters born to a female in the age group i between the times tk and
tk+1 , and let si be the proportion of females in the group gi at time tk that
are expected to be in the group gi+1 at time tk+1 .
It follows that
and, for i = 2, . . . , n,
xi (tk+1 ) = si−1 xi−1 (tk ) (7.6)
In other words, if we consider the matrix
b1 b2 . . . ... bn
s1 0 0 0 0
0 s2 0 0 0
L= , (7.7)
0 0 ...
0 0
0 0 0 sn−1 0
268 Linear Algebra
we have
x(tk+1 ) = Lx(tk )
and, in general, for k = 0, 1, 2, . . . ,
Suppose one starts with 600 females in each age group. Then, we have the
following population distribution in years 1, 2, and 3
0 0 6 600 3600
x(1) = 12 0 0 600 = 300 ,
0 13 0 600 200
2
0 60 600 0 2 0 600 1200
x(2) = 21 0 600 = 0 0 3 600 = 1800 ,
0
1 1
0 3 0 600 6 0 0 600 100
3
0 0 6 600 1 0 0 600 600
x(3) = 21 0 0 600 = 0 1 0 600 = 600 .
0 13 0 600 0 0 1 600 600
Similarly, we will have in years 4, 5, and 6
3600 1200 600
x(4) = 300 , x(5) = 1800 , x(6) = 600 .
200 100 600
We have here the so-called population waves: every 3-year period one ob-
serves the same population distribution. Notice that this behaviour does not
depend on the particular initial age distribution vector x(0). In fact, since
L3 = I, we will have always a 3-year cycle for this population.
Special Matrices by Example 269
Example 7.6 Consider now a female animal population whose life ex-
pectancy is 4 years. Let this population be divided into two age groups g1 , g2
of 0 up to 2 years and from 2 to 4 years, respectively. Suppose that 50% of
the females in group g1 dies within each 2-year time span and each female is
expected to give birth to 2 daughters whereas in g2 each female gives birth to
4 daughters. Suppose that this animal population starts off with 30 individuals
in g1 and 10 in the age group g2 .
We have now the Leslie matrix
2 4
L= 1
2 0
and the age distribution vectors
30 100
x(2) = Lx0 = L = ,
10 15
260 720 1960
x(4) = L2 x0 = , x(6) = L3 x0 = , x(8) = L4 x0 = .
50 130 360
In this case we see a steady population growth.
(7.9)
(see EX 7.6.12). Hence, for λ ̸= 0,
b1 b2 s1 b3 s1 s2 bn s1 s2 . . . sn−1
p(λ) = (−1)n λn 1 − − 2 − − · · · −
λ λ λ3 λn
It follows that λ ̸= 0 is a root of p(λ) if and only if
b1 b2 s1 b3 s1 s2 bn s1 s2 . . . sn−1
+ 2 + + ··· + = 1. (7.10)
λ λ λ3 λn
It is a routine exercise in real function calculus to see that there exists a unique
positive λ1 satisfying (7.10). Hence, we have that λ1 > 0 is a root of p(λ). In
fact, λ1 is a simple root (see EX 7.6.12).
One can check directly that the vector
s1 s2 s3 ...sn−1 T
h i
s s1 s2 s1 s2 s3
x1 = 1 λ11 λ12 λ13 · · · λ n−1 (7.11)
1
Theorem 7.2 The Leslie matrix L in (7.7), has a unique positive eigenvalue
λ1 with algebraic multiplicity 1 and such that there exists an eigenvector x1
in E(λ1 ) whose entries are all positive.
Our aim now is to show how the existence of a dominant eigenvalue helps
to understand the long-term behaviour of the population. We assume in what
follows that the Leslie matrix is diagonalisable, as is the case in the two pre-
vious examples.
k
λ1 0 ... 0
0 λk2 0 0 −1
x(tk ) = Lk x0 = S S x(0)
..
0 0 . 0
0 0 . . . λkn
1 0 ... 0
0 λ2 k
( λ1 ) 0 0 −1
= λk1 S S x(0).
..
0 0 . 0
0 0 . . . ( λλn1 )k
Special Matrices by Example 271
We see that
(i) if λ1 < 1, the population will decrease,
(ii) if λ1 = 1, the population will stabilise,
(iii) if λ1 > 1, the population will increase.
Example
7.6 √ (continued). As √ seen before, the spectrum of this Leslie
matrix is 1 ± 3 and λ1 = 1 + 3 is dominant.
Since the matrix is diagonalisable, we can apply the results above and
can confidently say that the population is going to steadily increase (see
EX 7.6.11).
7.4 Graphs
In this section, we give a short introduction to simple graphs, the main
goal being to present, albeit briefly, a particular type of a symmetric matrix,
the adjacency matrix, whose entries are either 0 or 1. We begin again with an
example.
Suppose you have an archipelago of five islands some of which, possibly
not all, are connected by bridges. Name the islands from 1 to 5, and consider
Figure 7.2 where each line segment represents a bridge linking two islands.
272 Linear Algebra
1 2 3
4 5
V = {v1 , v2 , . . . , vn }
Notice that A is a symmetric matrix and that the diagonal entries of A are
all zero, since no loop is permitted from a vertex to itself. We are considering
simple graphs: no loops and no more than one edge connecting two vertices.
The sum of the entries in a row i equals the number of edges connecting
vertex i to other vertices. The same can be said for column i. This number
Special Matrices by Example 273
is called the degree deg(vi ) of the vertex vi . In our case, island 1 has degree
deg(1) = 1 whilst island 2 has degree deg(2) = 4, for example.
(1)
Since we have shown above that lrj = arj , it follows that
n
(k+1) (k) (1) (k+1)
X
lij = air arj = aij ,
r=1
as required.
274 Linear Algebra
Now that we can count the walks connecting two vertices, we shall see next
that every walk contains a path.
Proof Consider the set consisting of all the walks from v to v ′ (of any
given length) and observe that there must be a minimum length walk in this
set.
If the walk has length 1, then the walk is itself a path. Suppose now that this
minimum length walk has length k ≥ 2. We want to show that this walk is a
path. Suppose that, on the contrary, this walk is not a path. Consequently, there
exist vertices vr and vr+m , with m ̸= 0 and vr = vr+m . Then, if we remove
from the walk all the vertices vr+1 , . . . , vr+m and the edges linking them, we
obtain a strictly shorter length walk from v to v ′ . But this is a contradiction,
since we assumed that our initial walk was the minimum length walk. The
remaining assertion of the theorem is obvious.
For example, there is exactly one path between islands 1 and 2: this path
has length 1. There are two paths between islands 1 and 3 of length 3 (see
EX 7.6.15).
x(t) = αect ,
where α is a(ny) real constant. Suppose that we increase the ‘complexity’ of the
problem: we have now the system of (first-order) linear differential equations
(
x′1 (t) = −x1 (t) + x2 (t)
(7.17)
x′2 (t) = 5x1 (t) + 3x2 (t)
Can we still mimic in some way the solution of (7.16) to solve the system
(7.17)? In other words, is it possible to give a meaning to the exponential eA
of a matrix A such that a solution of (7.19) can be constructed somehow using
the exponential?
If one were to define the exponential of a matrix A by generalising what
is known for the exponential function over R, one would naturally be led to
write
∞
1 2 1 1 X 1 n
eA = I + A + A + A3 + · · · + An + · · · = A . (7.20)
2! 3! n! n=0
n!
Although, each term of the formal power series is meaningful, does the series
converge? And what does convergence even mean?
Let us begin by calculating the matrix powers for our example. The matrix
A is diagonalisable and
−1
−1 1 −2 0 −1 1
A=
1 5 0 4 1 5
and
k "Pk 1 n
# −1
X 1 n −1 1 n=0 n! (−2) 0 −1 1
A = Pk 1 n .
n! 1 5 0 n=0 n! 4
1 5
n=0
A = S diag(λ1 , λ2 , . . . , λn )S −1 ,
Hence
etA = S diag(eλ1 t , eλ2 t , . . . , eλn t )S −1 , (7.21)
i.e.,
etA = SetD S −1 . (7.22)
Notice that
(etA )′ = AetA
(see EX 7.6.19). Hence, given a(ny) n × 1 vector c, we have
(etA c)′ = (etA )′ c = AetA c
from which follows that,
x(t) = etA c (7.23)
is a solution of the system of linear differential equations
x′ = Ax.
It is possible to show that, in fact, any solution of this system is of the form
(7.23). Moreover, since
x(0) = e0 c = c,
we have that fixing the initial conditions at t = 0 determines a unique solution.
(Observe that e0 = I.) It follows that this unique solution is
x(t) = etA x(0). (7.24)
Example 7.7 Consider the system of linear differential equations
′
x
= 2x + z
′
y = −y
′
z = x + 2z
which we want to solve with the initial conditions x(0) = −1, y(0) = 1, z(0) =
2. We begin by writing the system in matrix form
′
x 2 0 1 x
y ′ = 0 −1 0 y .
z′ 1 0 2 z
| {z }
A
278 Linear Algebra
In (7.25), the right hand side of the equality consists of a product of two
matrices. Notice that the columns of the first matrix are formed by eigenvec-
tors of A multiplied by certain ‘weights’. Each column shows a pairing between
the eigenvector and the corresponding eigenvalue inasmuch as the weight is
a exponential in whose exponent appears this eigenvalue. This behaviour can
be seen in general.
Let A be an n × n diadonalisable matrix such that A = SDS −1 and
consider the system (7.19). By (7.22) and (7.24), we have
x(t) = etA x(0) = SetD S −1 x(0)
and, therefore,
α1
α2
x(t) = eλ1 t x1 eλ2 t x2 eλn t xn . ,
...
..
αn
| {z }
S −1 x(0)
Special Matrices by Example 279
we have α1 = 1, α2 = − 23 , α3 = 12 . Hence,
7.6 Exercises
EX 7.6.1. Show that the spectrum of a projection matrix is a non-empty
subset of {0, 1}. Find the corresponding eigenspaces.
EX 7.6.2. Find the projection matrix onto the the subspace W of R4 such
that
W = {(x, y, z, w) ∈ R4 : x + y − z = 0, z = w}.
EX 7.6.3. Find the straight line that best fits the points (2, 4), (3, 5), (7, 9).
EX 7.6.4. Show that the product of two Markov matrices is a Markov matrix.
280 Linear Algebra
and that the equality holds if and only if the entries in x have the
same sign.
EX 7.6.6. For the Markov matrix of Example 7.4, calculate P k with k =
0, 1, 2, . . . , 10.
EX 7.6.7. Calculate the steady-state vectors of the Markov matrices P, Q in
Example 7.4. Does this contradict Proposition 7.3? Why?
EX 7.6.8. Calculate the steady-state vector of
0.6 0.3
P =
0.4 0.7
EX 7.6.12. Show that the equality (7.9) holds and that there exists only one
λ1 > 0 satisfying (7.10). Hint: show that the real function f (λ) on
the left hand side of (7.10) is decreasing, that limλ→0+ f (λ) = +∞
and that limλ→+∞ f (λ) = 0.
EX 7.6.13. Show that λ1 in the previous exercise is a simple root of p(λ).
Hint: recall that a root a of a polynomial q(t) is simple if and only
if q ′ (a) ̸= 0.
Show that λ1 is a simple root of p(λ). Hint: recall that a root a of
a polynomial q(t) is simple if and only if q ′ (a) ̸= 0.
EX 7.6.14. Suppose that
0 1 1
L = 12 0 0 .
1
0 5 0
is the Leslie matrix of some female population. How many age
groups are there? What is the approximate proportion of the num-
ber of females in two consecutive age groups for large enough time?
Is the population going to increase eventually? Why?
§
EX 7.6.15. For the island-bridge example of 7.4, find the walks of length 2
and 3 connecting islands 1 and 3. Which of them are paths?
EX 7.6.16. Draw the simple graph whose adjacency matrix is
0 1 0 0 1 0
1 0 1 1 0 1
0 1 0 0 1 0
0 1 0 0 1 0
1 0 1 1 0 1
0 1 0 0 1 0
a b c
Find the adjacency matrix, the vertices lying in cliques and those
cliques having b. Find the paths whose start point is b and endpoint
is d.
(etA )′ = AetA .
7.7 At a Glance
Special matrices were introduced in this chapter in connection with appli-
cations of Linear Algebra:
The author wishes to thank J. Teixeira and M.J. Borges for Example 7.7,
EX 7.6.20, and EX 7.6.21.
Chapter 8
Appendix
Proof This result will be proved by induction on the the number of columns
n of the matrix. The result is clear enough for n = 1. Let A be an k × n matrix
with n > 1, and suppose now that the result holds for all number of columns
less than or equal to n − 1.
If R, R′ are reduced row echelon forms of A, then, by the induction hypothe-
sis, their first n−1 columns must be equal. Indeed, if we remove the last column
′
of all three matrices, thus obtaining k×n−1 matrices, say, An−1 , Rn−1 , Rn−1 ,
′
respectively, then Rn−1 , Rn−1 are reduced row echelon forms of An−1 . Hence,
′
Rn−1 = Rn−1 .
It follows that R, R′ may differ only in the nth column. Suppose, then, that
′
they differ in row i, that is, rin ̸= rin .
Let x be a vector such that Ax = 0. Then, we have also that Rx = 0 = R′ x.
Consequently, (R − R′ )x = 0 from which follows that xn = 0, since ri,n ̸= rin ′
,
′
as assumed above. This forces both columns n of R and R to have a pivot
(equal to 1), since otherwise xn would be an independent variable. However,
′
since Rn−1 − Rn−1 = 0, we have finally that these pivots must be located in
the same row. This yields a contradiction, since it forces R = R′ which we
assumed to be different.
.. ..
. .
f αli = αf li
.. ..
. .
l l
1 l1 1
... .. ...
l
li−1 .
l
i−1′ i−1
f li +li = f li + f
l′i ,
li+1 li+1 li+1
. . .
.. .. ..
ln ln ln
where det A is the function given by the Leibniz’s formula (see §2.2, (2.4)).
Suppose firstly that A is invertible and that we reduce A to the identity
I, the reduced row echelon form of A, using elementary operations. Then, by
(Ax2), (Ax3), and Proposition 2.1 (iii),
f (I) = 1 = det I,
if follows that g(A) = 0, i.e., the functions f and det coincide on the invertible
matrices.
If A is not invertible, then its reduced row echelon form has a zero row.
Hence, by Proposition 2.1 (i),
f (A) = 0 = det A.
Appendix 287
Notice that Proposition 2.1 must hold also for f , since we used only the Axioms
(Ax1) − (Ax3) in its proof.
The proof is complete.
Proposition 8.3 Let V be a vector space over K with dim V = n, and let
S1 , S2 , . . . , Sk be subspaces of V . Then the following are equivalent.
(i) V = S1 ⊕ S2 ⊕ · · · ⊕ Sk .
Pk
(ii) i=1 dim Si = n and, for all i = 1, 2, . . . , k,
X
Si ∩ Sl = {0}. (8.2)
l∈{1,2,...,k}\i
Pk
(iii) V = i=1 Si and, for all x ∈ V , the decomposition
x = x1 + . . . x2 + · · · + xk
Proof We begin by showing that (i) implies (ii). Observe that, by (8.1)
the union of the bases of all subspaces is a linearly independent set. Indeed,
if it were not, then some non-zero vector spanned by one of the bases would
be a linear combination of the vectors in the bases of the remaining spaces,
288 Linear Algebra
Pk
contradicting (8.1). Since V = i=1 Si , it follows that the union of these bases
Pk
spans a subspace of dimension n and, hence, i=1 dim Si = n.
To see that (ii) ⇒ (iii), consider that, given some x in V, we have
x1 + . . . x2 + · · · + xk = x = z1 + . . . z2 + · · · + zk ,
But by (8.2), each of the summands must coincide with 0, yielding the unique-
ness of the decomposition. Consequently, the union of the bases of all subspaces
is a linearly independent set from which follows the remaining assertion.
Suppose now (iii) holds. To show that this implies (i), if suffices to show
that (8.1) holds. Suppose,
P on the contrary, that, for some i, there existed a
non-zero x ∈ Si ∩ l∈{1,2,...,k}\i Sl . But then
X
0=x− xl ,
l∈{1,2,...,k}\i
(d) 2x − y − z − w = 0
EX 1.5.4 The solution sets are:
(a) The solution set is S = {(3, 1, 2)}.
(b) S = {(− 17 − 37 x3 , 17 − 47 x3 , x3 ) : x3 ∈ R}.
(c) S = ∅.
(d) S = {(−6 − 2v − 3y, v, −2 − y, 3 + y, y) : v, y ∈ R}.
EX 1.5.6
(a) Yes (it is a row echelon matrix); Yes (it is in reduced row echelon form);
rank 3.
(b) Yes; Yes; 2.
(c) Yes; Yes; 2.
(d) Yes; Yes; 2.
(e) No; No; 2.
(f) Yes; No; 2.
(g) No; No; 2.
(h) Yes; Yes; 0.
(j) No; No; 1.
(k) No; No; 3.
(l) No; No; 2.
(m) Yes; No; 2.
EX 1.5.7
1 0 0 1 1
0 1 0 2 −1 ,
0 0 1 −1 0
rank (A)=3.
EX 1.5.8 rank (Aα ) = 2 when α = −1, 0, 1 and rank (Aα ) = 3 otherwise.
If α = 0, then the systems are consistent and have one independent vari-
able.
If α = ±1 and β = 0, then the systems are consistent and have one
independent variable.
If α = ±1 and β ̸= 0, then the systems are inconsistent.
If α ∈ R\{−1, 0, 1}, then the systems are consistent and have no indepen-
dent variable.
Solutions 291
S = {(2z, −z − w, z, w) : z, w ∈ R} .
If α = −2, , then the system has one free variable and the solution set is
S = {(2z, −z, z, 0) : z ∈ R} .
EX 1.5.10
√
7 3 0 2 −4
B + C = 2 −5 1 , 8
2A = √ 2 ,
π 2 9 2 2 6
√ √ √
1 + 4 √3 −2 + 3 6 6 3 0
AB = −2 + √ 2 −2 , CB = −8 4 −4 ,
π+8− 2 −2π − 1 10π 20 −10
An = A2k+1 = (−1)k A.
292 Linear Algebra
EX 1.5.16
−1 2 1
A= 0 1 1 .
1 −2 −1
EX 1.5.17
−7 2
(a)
4 −1
1 5 −4
(b) − 39
−6 −3
− 13
1 0 0
1
0
3 − 15 0
(g)
1
0 0 − 17
5
1
0 0 0 7
EX 1.5.18
31 6 −3 1 −6 2 0
2 0
A = , A = , A − 2A + I = (A − I) = ,
0 1 0 1 0 0
3/2 −3/2
X= .
1/2 −1/2
EX 1.5.19
1 0
(a) Elementary operation: L2 −5L1 Elementary matrix: E21 (3) = −5 1
(b)
Elementary
operation: − 13 L3 Elementary matrix: D3 (− 13 ) =
10 0
01 0
0 0 − 13
Solutions 293
1 0 0 0
(c) Elementary operation: L2 ↔ L4 Elementary matrix: P24 = 0001
0010
0100
1
(d) Elementary operation: L3 + 2 L2 Elementary matrix: E32 ( 12 ) =
1 0 0 0
0 1 10
0 12 1 0
0 0 01
EX 1.5.20
1 0 0 0 1 0 0 0
0 1 0 −1 0 −5 0 0
E1 = E24 (−1) =
0 0 1
E2 = D2 (−5) =
0 0 0 1 0
0 0 0 1 0 0 0 1
1 0 0 0 1 0 0 0
0 −1/5 0 0
0 1 0 1
A−1 = D2 (−5)−1 E24 (−1)−1
=
0
0 1 0 0 0 1 0
0 0 0 1 0 0 0 1
| {z }| {z }
D2 (−1/5) E24 (1)
1 0 0 0
0 −1/5 0 −1/5
=
0 0 1 0
0 0 0 1
EX 1.5.21 D).
EX 1.5.22 Let A = [aij ] be a matrix such that, for all n × n matrices B we
have AB = BA. It follows immediately that A is an n × n matrix.
Let B = Eii , where Eii the matrix having all entries equal to zero except
for the entry-ii which is equal to 1. Then,
LA A
i = Eii A = AEii = Ci ,
where LA i is a matrix whose row i is the row i of A and whose remaining rows
are zero rows, and CiA is a matrix with zero columns except for the ith column
which is that of A. Hence, since the only possibly non-zero entry in common
in both matrices is aii , all the remaining entries in both matrices are equal to
zero. Letting i vary one gets that A is a diagonal matrix.
Suppose now that B = Pij . Then
Pij A = APij ,
from which follows that, for all i, j = 1, . . . , n, aii = ajj . Recall that the
operation Pij A interchanges the rows i and j of A whilst the operation APij
swaps the columns i and j of A. Hence, A = αI.
294 Linear Algebra
EX 1.5.23
1 0 0 4 0 0 1 4 1 1 0 0 4 16 4
A = −5 1 0 0 5 0 0 1 −5 = −5 1 0 0 5 −25 ,
−3 4 1 0 0 1 0 0 1 −3 4 1 0 0 1
(b) (2, 11
3 , 2) = 4u − 5v + 1w
(c) (0, 0, 0) = 0u + 0v + 0w
(d) ( 37 , 83 , 3) = 0u − 2v + 3w
Solutions 295
Axioms (vi) and (vii) are shown to hold similarly and it is obvious that
(viii) also holds since 1p(t) = p(t). Hence Pn is a real vector space.
We can show similarly that P is a real vector space. Propositions 1.4 and
1.5 show that Mn,k (K) is a vector space over K.
The addition of continuous real functions on [a, b] is commutative, associa-
tive and has the zero function on [a, b] as the additive identity. The additive
inverse of a function f (t) is −f (t) and 1f (t) = f (t).
If f, g are functions on [a, b] and α, β ∈ R, then
(a) It is linearly independent and a basis is {(1, −1, 0), (0, 0, 2)}
(b) It is linearly dependent and a basis is {(2, 4, 12), (−1, −1, −1)}
(c) It is linearly dependent and a basis is {(1, 2, 3, 4), (0, 1, 1, 0)}
(d) It is linearly dependent and a basis is {(1 + i, 2i, 4 − i)}
(e) It is linearly dependent and a basis is {(1, 2, 6, 0), (3, 4, 1, 0), (4, 3, 1, 0)}
296 Linear Algebra
EX 3.7.10
(b) vB = (−i).
EX 3.7.12
(a) These vectors do not form a basis for W because they are linearly de-
pendent. For example, w = 2u + v.
Parametric equations:
x = t + 2s
y = 2s
.
z=0
w =t+s t, s ∈ R
Cartesian equations:
z + 2w = 0 ∧ y = 0,
(c)
x ∈ N (M AT ) ⇐⇒ M AT x = 0.
Hence
x ∈ N (M AT ) ⇐⇒ M −1 M AT x = 0 ⇐⇒ AT x = 0 ⇐⇒ x ∈ N (AT ).
We have
4 = dim N (AT )+dim L(AT ) = dim N (AT )+dim C(A) = dim N (AT )+3
(b) dim N (B T B) ≥ 2.
EX 3.7.18
(a) BU ∩V = {(0, 0, 1, 1)}.
(b) dim(R4 + (U ∩ V )) = 4.
EX 3.7.19
BU +W = {(0, 0, 3 − i, 0), (1 − 2i, 0, 0, 1 − 2i), (0, 1, 2, 0)}, dim(U + W ) = 3.
BU ∩W = (1 − 2i, 0, 0, 1 − 2i), dim(U ∩ W ) = 1.
BS = (1 + t − t3 , t + t2 − t3 , 2 − 2t)
EX 3.7.28
−1 −1
(a) MB←E2 = 0 1 (2, 2)B = (−4, 2)
1 −2
(b) ME2 ←B′ = 2 1
−3 1
(c) MB←B′ = 2 1
EX 3.7.29
−1 2 1
1. MB←E3 = 0 0 2
1 0 −1
5 1 1 3
EX 3.7.30 a) (2, −6), b) B = ( 16 + 16 t, − 16 + 16 t).
EX 3.7.31 A possible solution is
−1 0 0 1 −1 0 0 1 1 0
BS = , , B= , , .
0 1 0 0 0 1 0 0 0 0
1
2 0 0
MB1 ←B = 0 1 0
0 0 1
Solutions 299
and
n−2
X n−2
X
2
(Jn (λ) − λI) = Ei,i+1 Ei+1,i+2 = Ei,i+2 .
i=1 i=1
Generalise for (Jn (λ) − λI)p , with 1 < p < n, and show that
(Jn (λ) − λI)n−1 = E1,n .
Then obtain the result.
EX 4.5.11 (i) No, no; (ii) no, yes; (iii) no, yes; (iv) no, yes.
EX 4.5.12 Up to a permutation of blocks:
h−2 0 0i h−2 1 0i
0 −2 0 , 0 −2 0 .
0 0 1 0 0 1
h 1
i
−1 −
EX 4.5.13 a) J = [00 10] , S = 1 02 ;
"3 #
h3 1 0 i h1 0 0 i 1 0 0 0
2 1 0 0
b)J = 0 3 0 , S = 0 1 −1 ; c) J = 00 30 13 01 , S = 01 0
0
−1
−2
−1
−1
001 00 1 0003 0 0 0 −2
300 Linear Algebra
EX 5.8.9
0 −2 0
[T ]B,B = MB←E2 [T ]P2 ,P2 MP2 ←B = 0 0 0 .
1 3 0
plane z = 0. The invariant subspaces are {(0, 0, 0)}, the eigenspaces and any
straight line contained in the plane z = 0.
EX 5.8.13 a) σ(T ) = ∅; σ(S) = {−1}, E(−1) = R2 . b) σ(A) =
{±i}, BE (i) = {(i, 1)}, BE (−i) = {(−i, 1)}.
EX 5.8.14 a)
1 0 0 0 1 0 0 0
0 1 1 0 0 1
− 21 0
[T1 ]Bs ,Bs = 2 2 , [T ] = 2 ;
2 B ,B
1 1
1 1
0 0 s s 0 − 0
2 2 2 2
0 0 0 1 0 0 0 1
(e) 1
(f) arccos √141√21
√
(g) 33
302 Linear Algebra
∥(0, −1, 1)∥ = 2; (c) u1 = (7/3, 1, −1) and u2 = (−4/3, −2, 0).
EX 6.7.9 (1, 0, 1), (1, 0, 1). √ √ √ √
EX 6.7.10 (a) dim(W ⊥ ) = 1; (b) d(p, W ⊥ ) = 2; (c)(1/ 3, 1/ 3, 1/ 3, 0).
EX 6.7.11
(a) No because (2, 1, −1) ∈
/ C(A).
(b) The solution set is
{(x, y, z) ∈ R3 : (x, y, z) = t(1, 1, 1) + s(1, −1, −1) + (1, 2, 3) ∀t, s ∈ R}.
EX 6.7.19
√12 √1 0
" √1 √1
#
2
3 2 2 5 0 0 √
1 1 4
= √12 2 − 3√ √ .
2 3 −2 − √12 0 3 0 322 2
2 3 2
2
3 −3 − 13
3 4 o / 5
0 1/5 1/5 0 0 1/5 1/5 1/5 1/5 1/5
1/2 1/5 1/5 1/2 1/2 1/5 1/5 1/5 1/5 1/5
1/2 1/5 1/5
G = 0.85 0 0
+ 0.15 1/5 1/5 1/5 1/5 1/5;
0 1/5 1/5 0 1/2 1/5 1/5 1/5 1/5 1/5
0 1/5 1/5 1/2 0 1/5 1/5 1/5 1/5 1/5
304 Linear Algebra
steady-state vector
0, 11
0, 33
q≈
0, 16 ;
0, 20
0, 20
ranking of pages: 2,4–5,3,1.
EX 7.6.10 The Leslie matrix for this population is
2 4
L= .
0.8 0
1 2 3
4 5 6
1 1 1 0 5 5 5 4
All vertices lie in cliques. Cliques having b: {a, b, d} and {b, c, d}. Paths :
({b, a}, {a, d}); ({b, d}); ({b, c}, {c, d}).
EX 7.6.18
−1 1 −2 0 −5/6 1/6 −1 1
eA = = .
1 5 0 4 1/6 1/6 5 3
EX 7.6.19
′
(etA )′ = S diag(eλ1 t , eλ2 t , . . . , eλn t ) S −1
= S diag(λ1 eλ1 t , λ2 eλ2 t , . . . , λn eλn t )S −1
= S diag(λ1 , λ2 , . . . , λn ) diag(eλ1 t , eλ2 t , . . . , eλn t )S −1
= S diag(λ1 , λ2 , . . . , λn )S −1 S diag(λ1 eλ1 t , λ2 eλ2 t , . . . , λn eλn t )S −1
= AetA .
[1] H. Anton and C. Rorres. Elementary Linear Algebra. Wiley, 11th edition,
2014.
[2] T.M. Apostol. Calculus vol. II. Xerox, 2nd edition, 1969.
[3] M. Artin. Algebra. Prentice-Hall, 1991.
[4] S. Axler. Linear Algebra Done Right. Springer Verlag, 3rd edition, 2015.
307
Index
309
310 Index