LS Paper

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Solving Rank-Deficient Linear Least-Squares Problems*

Thomas F. Coleman1 and Chunguang Sun2

Abstract

Numerical solution of linear least-squares problems is a key computational task in science and
engineering. Effective algorithms have been developed for the linear least-squares problems in
which the underlying matrices have full rank and are well-conditioned. However, there are few
efficient and robust approaches to solving the linear least-squares problems in which the
underlying matrices are rank-deficient and sparse. In this paper, we propose a new method for
solving rank-deficient linear least-squares problems. Our proposed method is mathematically
equivalent to an existing method but has several practical advantages over the existing method.
Furthermore, our proposed method is applicable to solving both dense and sparse rank-
deficient linear least-squares problems. Our experimental results demonstrate the practical
potential of our proposed method.

Section 1: Introduction

Let A ∈ R m×n , b ∈ R m . The numerical solution of the linear least-squares problem

minn Ax − b 2
(1.1)
x∈R

lies at the heart of many computational problems frequently arising in scientific, engineering,
and economic disciplines. Efficient algorithms are available when the matrix 𝐴𝐴 has full rank and
is well-conditioned. However, when the matrix is ill-conditioned or rank-deficient, numerical
solution of (1.1) often requires some variant of rank-revealing QR factorization (RRQR) or
singular value decomposition (SVD). The resulting solution process is relatively expensive.
Furthermore, the solution to (1.1) is not unique. Usually, the minimum-norm solution is sought.

In practical applications, the matrix 𝐴𝐴 is often large and sparse. If 𝐴𝐴 is also rank-deficient, there
are few effective algorithms available for solving (1.1). Because of the paramount need for
sparsity preservation, the algorithms involving RRQR or SVD developed for dense linear least
squares problems are not suitable for sparse linear least squares problems.

________
*This work was supported by the National Sciences and Engineering Research Council of Canada. The views expressed herein
are solely from the authors.
1. Department of Combinatorics and Optimization, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L
3G1, Canada ([email protected])
2. Faculty of Mathematics, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada
([email protected])

1
Effective approaches (George, et al., 1980)(Heath, 1982)(Heath, 1984)(Liu, 1986)(Sun, 1996)
(Sun, 1997) have been developed to solve sparse linear least-squares problem when the
underlying matrix has full rank and is well-conditioned. Issues in handling rank deficiency in
solving sparse linear least-squares problems are considered in (Ng, 1991) and (Avron, et al.,
2009).

In this paper, we propose a new method for solving rank-deficient linear least-squares
problems. We show that our proposed method is mathematically equivalent to an existing
method. If we view both our method and the existing method as generating a sequence of
points (i.e. approximate solutions) approaching the true solution, the two methods are
mathematically equivalent in the sense that they generate the same sequence of points in exact
arithmetic. However, there are two major differences between the two methods. First, the
theoretical underpinnings of the two methods are very different. In particular, the
mathematical derivations of the two methods are drastically different. Second, the
computations at each iterative step are organized differently. Because of these differences, our
proposed method offers practical advantages over the existing method in terms of algorithmic
efficiency and applicability to handling both dense and sparse rank-deficient linear least-
squares problems.

We focus our attention upon the design and analysis of our new method for solving dense and
rank-deficient linear least-squares problems in this paper. We outline our approach to the
solution of sparse and rank-deficient linear least-squares problems. However, detailed results
on the sparse problems will be presented in (Coleman, et al., 2010).

In Section 2 we propose a new algorithm for solving rank-deficient linear least-squares


problems. We prove that our proposed algorithm is mathematically equivalent to an existing
algorithm in Section 3. In Section 4, we discuss the selection of a crucial parameter used in our
algorithm. In Section 5, we compare the practical performance of our algorithm with that of the
existing algorithm. Finally, we discuss future work and summarize our results in Section 6.

Notations. Assume 𝑘𝑘 = 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟(𝐴𝐴). Let 𝐴𝐴 = 𝑈𝑈𝑈𝑈𝑉𝑉 𝑇𝑇 be the SVD of 𝐴𝐴. Assuming 𝛴𝛴1 is the leading
submatrix of 𝛴𝛴 corresponding to the 𝑘𝑘 positive singular values, the compact SVD of 𝐴𝐴 can be
written as 𝐴𝐴 = 𝑈𝑈1 𝛴𝛴1 𝑉𝑉1𝑇𝑇 , where

𝑈𝑈1 = [𝑢𝑢1 , 𝑢𝑢2 , … , 𝑢𝑢𝑘𝑘 ], 𝑉𝑉1 = [𝑣𝑣1 , 𝑣𝑣2 , … , 𝑣𝑣𝑘𝑘 ],

and
σ1 
 
 σ2 ,
Σ1 =
  
 
 σk 

where 𝜎𝜎1 ≥ 𝜎𝜎2 ≥ ⋯ ≥ 𝜎𝜎𝑘𝑘 > 0.

2
The minimum-norm solution to the above linear least-squares problem (1.1) is given by

𝑥𝑥� = 𝐴𝐴+𝑏𝑏 = 𝑉𝑉1 𝛴𝛴1−1 𝑈𝑈1𝑇𝑇 𝑏𝑏,

where 𝐴𝐴+is the pseudo-inverse of 𝐴𝐴. Let rˆ= b − Axˆ.

Section 2: A New Method for Handling Rank Deficiency

For 𝜆𝜆 > 0 define

(λ ) ( AT A + λ I ) −1 AT b
x= (2.1)

Using the compact SVD, (2.1) can be written

x(λ )= V1 (Σ12 + λ I ) −1 (Σ1U1T b) (2.2)

From (2.2) it is clear that 𝑥𝑥(𝜆𝜆) → 𝑥𝑥� = 𝑉𝑉1 𝛴𝛴1−1 𝑈𝑈1𝑇𝑇 𝑏𝑏 as 𝜆𝜆 → 0.

An approximate minimum-norm least-squares solution to (1.1) is therefore given by (2.2) or,


equivalently, the least-squares solution to the following full-rank problem:

 A  b
minn  1  x −   (2.3)
x∈R  2 
λ I   
0
2

While there are several numerical approaches to (2.3), the stable QR factorization method can
be used, perhaps with the use of a permutation matrix Π to limit fill in the upper triangular
matrix 𝑅𝑅𝜋𝜋 (in the case where matrix A is sparse):

 A  R 
QT  1  ∏ = ∏  (2.4)
λ 2 I   0 
 

One general approach to solving the rank-deficient linear least-squares problem is to choose a
positive value for 𝜆𝜆, solve (2.3) by using the factorization in (2.4) to obtain 𝑥𝑥(𝜆𝜆), and then refine
𝑥𝑥(𝜆𝜆) to get (or approximate) x(0) = xˆ. The question we address next is how to effectively do
this refinement. We discuss our approach to choosing the parameter 𝜆𝜆 in Section 4.

3
Let 𝑀𝑀 = 𝐴𝐴𝑇𝑇 𝐴𝐴, 𝑑𝑑 = 𝐴𝐴𝑇𝑇 𝑏𝑏. The solution to the full-rank problem (2.3) is the same as the solution
to the following system of semi-normal equations:

( AT A + λ I ) x =
AT b

or
(M + λ I ) x =
d (2.5)

which obviously yields the solution

x ( M + λ I ) −1  
= d (2.6)

Differentiating (2.5) with respect to 𝜆𝜆 yields

x + ( M + λ I ) x' =
0 (2.7)

−( M + λ I ) −1 x =
or x ' = −( M + λ I ) −2 d .

Differentiating (2.7) with respect to 𝜆𝜆 yields

2 x ' + ( M + λ I ) x" =
0 (2.8)

−2( M + λ I ) −1 x ' =
or, x" = 2( M + λ I ) −3 d .

Generally,
−k ( M + λ I ) −1 x ( k −1) .
x(k ) = (2.9)

By Taylor’s theorem,

x ( 0 ) =x + ∑(−1) k sk λ k (2.10)
k =1

where x  x(λ ), sk  1
k!
x ( k ) (λ ), (k = 1, 2,...).

Note that by (2.9)

sk = −1
( k −1)!
( M + λ I ) −1 x ( k −1) =
−( M + λ I ) −1 sk −1 , (k =
2,3,...) (2.11)

x ' −( M + λ I ) −1 x.
where s1 ==

4
An important computational observation is that given 𝑠𝑠𝑘𝑘−1 and ΠT (M + λI)Π = RTΠ R Π , vector
sk can be computed with two triangular solves, and thus in time 𝑂𝑂�𝑛𝑛𝑛𝑛𝑛𝑛(𝑅𝑅Π )�, where 𝑛𝑛𝑛𝑛𝑛𝑛(𝑅𝑅Π )
is the number of nonzeroes in the matrix 𝑅𝑅Π .

Computationally, our algorithm can be described as follows:

1) 𝑥𝑥0 = 𝑥𝑥(𝜆𝜆)
2) 𝑠𝑠0 = 𝑥𝑥0
3) 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1, 2, 3 , …
𝑠𝑠𝑖𝑖 = −(𝑀𝑀 + 𝜆𝜆𝜆𝜆)−1 𝑠𝑠𝑖𝑖−1
𝑥𝑥𝑖𝑖 = 𝑥𝑥𝑖𝑖−1 + (−1)𝑖𝑖 𝑠𝑠𝑖𝑖 𝜆𝜆𝑖𝑖

At the i -th refinement step, 𝑠𝑠𝑖𝑖 = −(𝑀𝑀 + 𝜆𝜆𝜆𝜆)−1 𝑠𝑠𝑖𝑖−1 is accomplished by solving 𝑅𝑅 𝑇𝑇 𝑅𝑅𝑠𝑠𝑖𝑖 = −𝑠𝑠𝑖𝑖−1 ,
where 𝑀𝑀 + 𝜆𝜆𝜆𝜆 = 𝑅𝑅 𝑇𝑇 𝑅𝑅. In practice, this version of our algorithm is numerically unstable since
the product 𝑠𝑠𝑖𝑖 𝜆𝜆𝑖𝑖 is formed at each refinement step. Typically, ||𝑠𝑠𝑖𝑖 ||2 is very large and 𝜆𝜆𝑖𝑖 very
small.

To overcome the numerical instability, we revise our algorithm as follows:

1) 𝑥𝑥0 = 𝑥𝑥(𝜆𝜆)
2) 𝑡𝑡0 = 𝑥𝑥0
3) 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1, 2, 3 , …
𝑡𝑡 = 𝜆𝜆𝑡𝑡𝑖𝑖−1
𝑡𝑡𝑖𝑖 = (𝑀𝑀 + 𝜆𝜆𝜆𝜆)−1 𝑡𝑡
𝑥𝑥𝑖𝑖 = 𝑥𝑥𝑖𝑖−1 + 𝑡𝑡𝑖𝑖

It is straightforward to show that

𝑡𝑡𝑖𝑖 = (−1)𝑖𝑖 𝑠𝑠𝑖𝑖 𝜆𝜆𝑖𝑖 , 𝑖𝑖 = 1, 2, 3, …

via mathematical induction. Therefore, our revised algorithm is equivalent to our original
algorithm. However, our revised algorithm avoids forming the product 𝑠𝑠𝑖𝑖 𝜆𝜆𝑖𝑖 at each refinement
step. Therefore, the issue of numerical instability in the original algorithm has been resolved.

Now we discuss the implementation issues of our algorithm. For a given λ > 0, step 1) of our
algorithm is accomplished by solving the full-rank linear least-squares problem (2.3). Let

 A 
C = 1  (2.12)
λ2I 
 
and
b
f =  . (2.13)
0

5
Then problem (2.3) can be written as
minn Cx − f 2
(2.14)
x∈R

where C is an (𝑚𝑚 + 𝑛𝑛)-by-𝑛𝑛 matrix and f is an (𝑚𝑚 + 𝑛𝑛)-vector. First, we compute the
“economy size” QR factorization:

C = QR, (2.15)

where Q is an (𝑚𝑚 + 𝑛𝑛)-by-𝑛𝑛 matrix with orthonormal columns and R is an 𝑛𝑛-by-𝑛𝑛 upper
triangular matrix. Let
Q 
Q= I  (2.16)
 QII 

where 𝑄𝑄𝐼𝐼 is an 𝑚𝑚-by-𝑛𝑛 matrix and 𝑄𝑄𝐼𝐼𝐼𝐼 is an 𝑛𝑛-by-𝑛𝑛 matrix. Second, the solution to (2.3) (i.e.
(2.14)), denoted by x0 or x(λ ), is obtained by performing a matrix-vector multiply followed by a
triangular solve as follows:
= x0 x= (λ ) R −1 (QIT b). (2.17)

Mathematically, we have
( M + λ I ) = C T C = RT R. (2.18)

ti ( M + λ I ) −1 t is accomplished by two triangular solves:


Hence, at the i -th refinement step,=

RT Rti = t. (2.19)

Clearly, each iterative refinement step of our algorithm requires precisely two triangular solves,
one scalar-vector multiply, and the addition of two vectors.

It is important to note that the QR factorization (2.15) is computed only once. The overall
complexity of our algorithm consists of the arithmetic work for solving (2.3) (which consists of
the arithmetic work for the QR factorization (2.15) and the arithmetic work for (2.17)) and the
arithmetic work for N iterative refinement steps, where N is the total number of iterative
refinement steps performed. As shown in Section 5, N is typically less than or around 10.

6
Section 3: Mathematical Equivalence of Our Algorithm with an Existing
Algorithm

In this section we show that our algorithm is mathematically equivalent to an existing algorithm
in the sense that both algorithms generate the same sequence of points (i.e. approximate
solutions) approaching the true solution.

Riley (Riley, 1956) proposed the following iterative scheme (referred to as Iterated
Regularization scheme or IR scheme for short) for solving linear least-squares problems with
full column rank. Let 𝑥𝑥 (0) be an arbitrary vector, solve

(AT
A + λ I ) x(
q +1)
AT b λ x (
=+
q)
(3.1)

The sequence 𝑥𝑥 (𝑞𝑞) converges to 𝑥𝑥� (the true solution) if 𝜆𝜆 > 0 since the spectral radius of
𝜆𝜆(𝐴𝐴𝑇𝑇 𝐴𝐴 + 𝜆𝜆𝜆𝜆)−1 is less than 1.

Golub (Golub, 1965) observed that the Riley’s iterative scheme is equivalent to the following:

r ( q )= b − Ax ( q )
( AT A + λ I ) e(q) = AT r ( )
q
(3.2)
x ( q=
+1)
x ( q ) + e( q )

The vector 𝑒𝑒 (𝑞𝑞) is the solution of the following linear least-squares problem:

min || Ce( q ) − f ( q ) ||2 (3.3)


e( q ) ∈R n

where
 A 
C = 1  (3.4)
λ 2I 
 
and
 r (q) 
f (q)
=  (3.5)
 0 

Therefore, the IR scheme can be described as follows:

𝑥𝑥 (0) = 0
𝑓𝑓𝑓𝑓𝑓𝑓 𝑞𝑞 = 0, 1, 2, 3, …
a) 𝑟𝑟 (𝑞𝑞) = 𝑏𝑏 − 𝐴𝐴𝑥𝑥 (𝑞𝑞)
b) Solve the linear least-squares problem (3.3).
c) 𝑥𝑥 (𝑞𝑞+1) = 𝑥𝑥 (𝑞𝑞) + 𝑒𝑒 (𝑞𝑞)
7
Golub (Golub, 1965) has shown that

(𝑞𝑞) (𝑞𝑞) (𝑞𝑞)


𝑥𝑥 (𝑞𝑞) = 𝜇𝜇1 𝑣𝑣1 + 𝜇𝜇2 𝑣𝑣2 + ⋯ + 𝜇𝜇𝑘𝑘 𝑣𝑣𝑘𝑘

where
𝑞𝑞
(𝑞𝑞) 𝜆𝜆 𝑢𝑢𝑖𝑖𝑇𝑇 𝑏𝑏
𝜇𝜇𝑗𝑗 = �1 − � � �
𝜆𝜆 + 𝜎𝜎𝑗𝑗2 𝜎𝜎𝑗𝑗

for 𝑗𝑗 = 1, 2, … , 𝑘𝑘.

Therefore, as 𝑞𝑞 → ∞
𝑢𝑢1𝑇𝑇 𝑏𝑏 𝑢𝑢2𝑇𝑇 𝑏𝑏 𝑢𝑢𝑘𝑘𝑇𝑇 𝑏𝑏
𝑥𝑥 (𝑞𝑞) = 𝑣𝑣1 + 𝑣𝑣2 + ⋯ + 𝑣𝑣 = 𝑥𝑥�
𝜎𝜎1 𝜎𝜎2 𝜎𝜎𝑘𝑘 𝑘𝑘

As indicated in (Golub, 1965), it is easy to show that

𝑒𝑒 (𝑞𝑞+1) = 𝜆𝜆(𝐴𝐴𝑇𝑇 𝐴𝐴 + 𝜆𝜆𝜆𝜆)−1 𝑒𝑒 (𝑞𝑞)


and
𝜆𝜆
�𝑒𝑒 (𝑞𝑞+1) �2 ≤ �𝑒𝑒 (𝑞𝑞) �2 < �𝑒𝑒 (𝑞𝑞) �2
𝜆𝜆 + 𝜎𝜎𝑘𝑘2

Therefore, a good termination procedure is to stop the iterative process as soon as �𝑒𝑒 (𝑞𝑞) �2
increases or doesn’t change.

Now we prove that the IR scheme as formulated in (Golub, 1965) is equivalent to our algorithm.
The IR scheme shows that

𝑥𝑥 (𝑞𝑞) = 𝑒𝑒 (0) + 𝑒𝑒 (1) + 𝑒𝑒 (2) … + 𝑒𝑒 (𝑞𝑞) ,

where 𝑥𝑥 (𝑞𝑞) is the solution vector computed by the IR scheme after 𝑞𝑞 iterative steps. Clearly,
𝑒𝑒 (0) = 𝑥𝑥0 = 𝑥𝑥(𝜆𝜆). Therefore

𝑥𝑥 (𝑞𝑞) = 𝑥𝑥0 + 𝑒𝑒 (1) + 𝑒𝑒 (2) … + 𝑒𝑒 (𝑞𝑞) .

Our algorithm shows that


𝑥𝑥𝑞𝑞 = 𝑥𝑥0 + 𝑡𝑡1 + 𝑡𝑡2 … + 𝑡𝑡𝑞𝑞

where 𝑥𝑥𝑞𝑞 is the solution vector computed by our algorithm after 𝑞𝑞 iterative steps. We prove

= q (q
x ( q ) x= 1, 2, 3, …) (3.6)
by showing

8
= ( q 1, 2, 3,…) .
e( q ) tq= (3.7)

We can easily derive the following from our algorithm:

=tq λ q ( M + λ I ) − ( q +1) d (3.8)

Let
=G λ ( AT A + λ I ) −1
and
=h ( AT A + λ I ) −1 AT b

It has been shown in (Golub, 1965) that

x ( q=
+1)
(G q
+ G q −1 +…+ G + I ) h (3.9)
Therefore,

𝑒𝑒 (𝑞𝑞) = 𝑥𝑥 (𝑞𝑞+1) − 𝑥𝑥 (𝑞𝑞)


= (𝐺𝐺 𝑞𝑞 + 𝐺𝐺 𝑞𝑞−1 + ⋯ + 𝐺𝐺 + 𝐼𝐼)ℎ − (𝐺𝐺 𝑞𝑞−1 + 𝐺𝐺 𝑞𝑞−2 + ⋯ + 𝐺𝐺 + 𝐼𝐼)ℎ
= 𝐺𝐺 𝑞𝑞 ℎ

Thus,
−( q +1)
e( ) ==
q
λq (M + λI )
G q h [ λ ( AT A + λ I ) −1 ]q ( AT A + λ I ) −1 AT b = d (3.10)

Combing (3.8) and (3.10), we obtain (3.7). Therefore, we have proved (3.6). In other words, the
IR scheme is equivalent to our algorithm in the sense that both algorithms generate the same
sequence of approximate solutions approaching the true solution in exact arithmetic. Clearly,
the mathematical derivations of the two algorithms are very different. Our algorithm relies on
the application of the Taylor series expansion while the existing algorithm is an iterated
regularization method. Furthermore, the computations at each iterative step are different. In
Section 5, we will demonstrate that our algorithm offers several practical advantages over the
IR scheme.

Section 4: Selection of Lambda

Given that the singular values of 𝐴𝐴 are 𝜎𝜎1 , 𝜎𝜎2 , … , 𝜎𝜎𝑘𝑘 , 0, … , 0. It is easy to show that the singular
values of the extended matrix 𝐶𝐶 as defined in (2.12) are as follows:

�(𝜎𝜎12 + 𝜆𝜆), �(𝜎𝜎22 + 𝜆𝜆), … , �(𝜎𝜎𝑘𝑘2 + 𝜆𝜆), √𝜆𝜆, … , √𝜆𝜆

Therefore, the conditioning number of the extended matrix 𝐶𝐶

9
𝜎𝜎12 + 𝜆𝜆
𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝐶𝐶) = 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑅𝑅) = � ,
𝜆𝜆

where 𝐶𝐶 = 𝑄𝑄𝑄𝑄 is the QR factorization of 𝐶𝐶 as defined in (2.15). Clearly, the conditioning


number of 𝑅𝑅 will be large if 𝜆𝜆 is small. This will have negative impact on the numerical accuracy
of solving triangular systems involving 𝑅𝑅 as required by both our algorithm and the existing
algorithm. Hence, 𝜆𝜆 cannot be too small. On the other hand, the following discussion shows
that 𝜆𝜆 cannot be too large either.

Let 𝛿𝛿 be a lower bound of the smallest non-zero singular value 𝜎𝜎𝑘𝑘 . Golub (Golub, 1965)
suggested that 𝜆𝜆 be chosen so that

λ
< 0.1 (4.1)
λ +δ 2

Clearly, a wide range of choices for 𝜆𝜆 and 𝛿𝛿 would satisfy (4.1). Our approach is to choose the
greatest lower bound for 𝜎𝜎𝑘𝑘 , which is 𝜎𝜎𝑘𝑘 itself. Hence, we have

λ
< 0.1 (4.2)
λ + σ k2

Obviously, an infinite number of choices for 𝜆𝜆 would satisfy (4.2). In particular, we choose
𝜆𝜆 = 𝛽𝛽𝜎𝜎𝑘𝑘2 . Then

βσ k2 β
= < 0.1 (4.3)
βσ k + σ k β + 1
2 2

Thus,
1
𝛽𝛽 <
9

Our experimental results have shown that 𝛽𝛽 = 0.01 produces satisfactory results. So our choice
for 𝜆𝜆 is 0.01𝜎𝜎𝑘𝑘2 . To distinguish different choices for 𝜆𝜆 in the following discussion, let

𝜆𝜆𝑠𝑠𝑠𝑠𝑠𝑠 = 0.01𝜎𝜎𝑘𝑘2 .

It is well-known that computing the SVD of a matrix is an expensive process. At least, it is


considerably more expensive than computing the QR factorization of the same matrix. So our
approach avoids computing the SVD of the given matrix. Hence, we have no direct knowledge
of 𝜎𝜎𝑘𝑘 . In other words, we cannot compute 𝜆𝜆𝑠𝑠𝑠𝑠𝑠𝑠 exactly. Our approach to choosing 𝜆𝜆 is to obtain
an approximate value for 𝜆𝜆𝑠𝑠𝑠𝑠𝑠𝑠 . Our approach is described as follows:

10
• Compute the QR factorization of 𝐴𝐴: 𝐴𝐴 = 𝑄𝑄1 𝑅𝑅1 , where 𝑄𝑄1 is 𝑚𝑚-by-𝑚𝑚, and 𝑅𝑅1 is 𝑚𝑚-by-𝑛𝑛.

• Let 𝑊𝑊 denote the set of absolute values of the nonzero diagonal elements of 𝑅𝑅1 . Let
𝜔𝜔𝑚𝑚𝑚𝑚𝑚𝑚 and 𝜔𝜔𝑚𝑚𝑚𝑚𝑚𝑚 denote the smallest and largest elements of 𝑊𝑊, respectively.
2
𝜔𝜔
Both 𝜆𝜆1 = 𝛽𝛽̂ 𝜔𝜔𝑚𝑚𝑚𝑚𝑚𝑚
2
and 𝜆𝜆2 = 𝛽𝛽̂ 𝜔𝜔 2𝑚𝑚𝑚𝑚𝑚𝑚 , where 𝛽𝛽̂ = 0.00025 produce satisfactory results.
𝑚𝑚𝑚𝑚𝑚𝑚
Furthermore, we observe that

2 2
1 𝜔𝜔𝑚𝑚𝑚𝑚𝑚𝑚 (𝜔𝜔𝑚𝑚𝑚𝑚𝑚𝑚 + 1)
𝜆𝜆𝑄𝑄𝑄𝑄 = (𝜆𝜆 ) ̂
1 + 𝜆𝜆2 = 𝛽𝛽 2
2 𝜔𝜔𝑚𝑚𝑚𝑚𝑚𝑚 2

produces improved results over 𝜆𝜆1 or 𝜆𝜆2 .

The following table shows 𝜆𝜆𝑠𝑠𝑠𝑠𝑠𝑠 and 𝜆𝜆𝑄𝑄𝑄𝑄 for six dense problems. Except for the problem
600x300, 𝜆𝜆𝑄𝑄𝑄𝑄 is a good approximation for 𝜆𝜆𝑠𝑠𝑠𝑠𝑠𝑠 .

Dense Problems rank 𝜆𝜆𝑠𝑠𝑠𝑠𝑠𝑠 𝜆𝜆𝑄𝑄𝑄𝑄


600x300 238 2.2441e-11 6.0530e-14
1500x1000 812 8.2283e-16 5.1143e-16
2000x1500 1212 3.7025e-17 2.4798e-17
3000x2000 1632 1.6702e-21 4.0261e-20
3500x500 383 1.8902e-19 5.2419e-19
3500x1000 812 9.4053e-18 8.5434e-18

Section 5: Performance Comparison

First, we consider the implementation details of the IR scheme. Each iterative step of the IR
scheme solves a linear least-squares problem with the same (𝑚𝑚 + 𝑛𝑛)-by-𝑛𝑛 matrix C as defined
in (2.12). As in our algorithm, the (economy) QR factorization C = QR defined in (2.15) is
computed only once. Assume that the matrix Q is partitioned as in (2.16).

There are two possible implementations of the IR scheme:

• Option 1: The matrix 𝑄𝑄 is saved (actually, it is only necessary to save the submatrix 𝑄𝑄𝐼𝐼 ).
Then at the 𝑞𝑞-th iterative step, we solve the following triangular system:

𝑅𝑅𝑒𝑒 (𝑞𝑞) = 𝑄𝑄 𝑇𝑇 𝑓𝑓 (𝑞𝑞) = 𝑄𝑄𝐼𝐼𝑇𝑇 𝑟𝑟 (𝑞𝑞)

11
Compared with our algorithm, this implementation of the IR scheme performs two extra
matrix-vector multiplications 𝐴𝐴𝑥𝑥 (𝑞𝑞) and 𝑄𝑄𝐼𝐼𝑇𝑇 𝑟𝑟 (𝑞𝑞) per iterative step. But it solves only one
n × n triangular system instead of two as in our algorithm.

If 𝐴𝐴 is sparse, 𝑄𝑄(or 𝑄𝑄𝐼𝐼 ) is often dense. Therefore, saving 𝑄𝑄(or 𝑄𝑄𝐼𝐼 ) would not be practical
for sparse linear least-squares problems. Evidently, this implementation of the IR
scheme is not suitable for solving sparse linear least-squares problems.

• Option 2: The matrix 𝑄𝑄(or 𝑄𝑄𝐼𝐼 ) is not saved. Then at the 𝑞𝑞-th iterative step, we solve the
following system of semi-normal equations (SNE):

𝑅𝑅 𝑇𝑇 𝑅𝑅𝑒𝑒 (𝑞𝑞) = 𝐴𝐴𝑇𝑇 𝑟𝑟 (𝑞𝑞)

Compared with our algorithm, this option performs two extra matrix-vector
multiplications 𝐴𝐴𝑥𝑥 (𝑞𝑞) and 𝐴𝐴𝑇𝑇 𝑟𝑟 (𝑞𝑞) per iterative step. In other words, our algorithm is
more efficient than this implementation of the IR scheme.

In the following discussion, Option 1 and Option 2 will be referred to as the Algorithm IR-Q and
the Algorithm IR-SNE, respectively.

All three algorithms solve exactly the same extended linear least-squares problem (2.3). The
running time for solving (2.3) includes running times for forming the extended matrix 𝐶𝐶,
computing the (economy) QR factorization of 𝐶𝐶, and solving the resulting triangular
system 𝑅𝑅𝑥𝑥0 = 𝑄𝑄𝐼𝐼𝑇𝑇 𝑏𝑏. The following table shows the running times for solving (2.3) on six dense
problems

Dense Problems Running Times for Solving the Extended LS Problem


(Seconds)
600x300 0.0550
1500x1000 0.7882
2000x1500 2.1226
3000x2000 4.9765
3500x500 0.5450
3500x1000 1.5034

Now we compare running times of the three algorithms for the iterative refinement process.
We measure the running time for ten iterative steps for each of the three algorithms. The
running times are summarized in the following table.

12
Running Times for 10 Iterative Steps
Dense Problems (Seconds)
IR-Q IR-SNE Our Algorithm
600x300 0.0031 0.0103 0.0084
1500x1000 0.0593 0.1685 0.1210
2000x1500 0.1137 0.3640 0.2813
3000x2000 0.2111 0.6392 0.4836
3500x500 0.0506 0.0799 0.0229
3500x1000 0.1077 0.2149 0.1166

Overall, IR-Q is the most efficient approach since it performs only one triangular solve per
iterative step. However, when the matrix 𝐴𝐴 is very skinny (that is 𝑚𝑚 ≫ 𝑛𝑛, e.g. 3500x500), it is
not as efficient as our algorithm. The reason is that the forming the matrix-vector products
𝐴𝐴𝑥𝑥 (𝑞𝑞) and 𝑄𝑄𝐼𝐼𝑇𝑇 𝑟𝑟 (𝑞𝑞) would be more expensive than a single triangular solve. The most severe
drawback of Algorithm IR-Q lies in the fact that it is not suitable for solving sparse linear least-
squares problems.

Clearly, our algorithm is more efficient than IR-SNE since IR-SNE performs two extra matrix-
vector multiplications 𝐴𝐴𝑥𝑥 (𝑞𝑞) and 𝐴𝐴𝑇𝑇 𝑟𝑟 (𝑞𝑞) per iterative step. Furthermore, our algorithm becomes
increasingly more efficient as the underlying matrix gets skinnier.

We used 𝜆𝜆 = 𝜆𝜆𝑄𝑄𝑄𝑄 , discussed in the previous section, in our experiments. All three algorithms
converge in about the same number of iterative steps. They converge under ten iterations for
five of the six test problems. They converge in 12 iterations for the remaining test problem
3000-by-2000.

As suggested in (Golub, 1965), a good termination criterion is to measure the size of the update
vector 𝑡𝑡𝑞𝑞 at each step. The iterative process should stop as soon as �𝑡𝑡𝑞𝑞 �2 doesn’t change or
increases. For both algorithm IR-Q and our algorithm, once they converge, the following
relative change to 𝑥𝑥𝑞𝑞 being computed

tq
2
(5.1)
xq
2

remains unchanged (at least the first five significant digits of the ratio (5.1) remain unchanged)
for a considerable number of subsequent iterations. This is a clear signal for terminating the
iterative process. In practice, as soon as the first five significant digits of the ratio (5.1) of two
consecutive iterations agree, the iterative process terminates. This stopping criterion can be
easily implemented. It is worth noting that algorithm IR-SNE doesn’t have this convergent
property.

13
IR-Q and our algorithm are virtually identical in terms of numerical properties. Using the
stopping criterion outlined above, the number of steps required for convergence for both IR-Q
and our algorithm are given in the following table:

Number of Steps for Convergence


Dense Problems
(IR-Q and our algorithm)
600x300 3
1500x1000 6
2000x1500 5
3000x2000 12
3500x500 6
3500x1000 5

The algorithm IR-SNE also converges around the same number of steps as shown in the above
table. However, the termination criterion for IR_SNE is not as easy to implement as the other
two algorithms because of the lack of the convergent property discussed above.

Let
𝑟𝑟 (𝑞𝑞) = 𝑟𝑟𝑞𝑞 = 𝑏𝑏 − 𝐴𝐴𝑥𝑥 (𝑞𝑞) = 𝑏𝑏 − 𝐴𝐴𝑥𝑥𝑞𝑞 (𝑞𝑞 = 1, 2, 3, … )

Define
�𝑥𝑥� − 𝑥𝑥𝑞𝑞 �2
𝑅𝑅𝑅𝑅�𝑥𝑥𝑞𝑞 � =
||𝑥𝑥�||2
and

�𝑟𝑟̂ − 𝑟𝑟𝑞𝑞 �2
𝑅𝑅𝑅𝑅�𝑟𝑟𝑞𝑞 � =
||𝑟𝑟̂ ||2

for 𝑞𝑞 ≥ 0. 𝑅𝑅𝑅𝑅 stands for Relative Error.

The following three tables show experimental results on the test problem 2000-by-1500. In the
tables, column 1 represents iterative steps (from 0 to 10), column 2 shows 𝑅𝑅𝑅𝑅�𝑥𝑥𝑞𝑞 �, and column
3 shows 𝑅𝑅𝑅𝑅�𝑟𝑟𝑞𝑞 �. The last column shows ||𝑡𝑡𝑞𝑞 ||2 /||𝑥𝑥𝑞𝑞 ||2 .

14
Table 1: Dense Problem 2000-by-1500 (Our Algorithm)

IT 𝑅𝑅𝑅𝑅�𝑥𝑥𝑞𝑞 � 𝑅𝑅𝑅𝑅�𝑟𝑟𝑞𝑞 � ||𝑡𝑡𝑞𝑞 ||2 /||𝑥𝑥𝑞𝑞 ||2


0 6.3065e-003 2.4782e-004 0
1 4.2058e-005 1.6469e-006 6.3023e-003
2 4.3457e-006 1.1063e-008 4.1703e-005
3 5.7820e-006 2.8667e-010 1.4719e-006
4 7.2276e-006 2.4866e-010 1.4455e-006
5 8.6731e-006 2.5100e-010 1.4455e-006
6 1.0119e-005 2.5207e-010 1.4455e-006
7 1.1564e-005 2.5067e-010 1.4455e-006
8 1.3010e-005 2.5192e-010 1.4455e-006
9 1.4455e-005 2.5199e-010 1.4455e-006
10 1.5901e-005 2.5323e-010 1.4455e-006

Table 2: Dense Problem 2000-by-1500 (Algorithm IR-Q)

IT 𝑅𝑅𝑅𝑅�𝑥𝑥𝑞𝑞 � 𝑅𝑅𝑅𝑅�𝑟𝑟𝑞𝑞 � ||𝑡𝑡𝑞𝑞 ||2 /||𝑥𝑥𝑞𝑞 ||2


0 6.3065e-003 2.4782e-004 0
1 4.2058e-005 1.6469e-006 6.2648e-003
2 4.3457e-006 1.1058e-008 4.1702e-005
3 5.7820e-006 2.9249e-010 1.4719e-006
4 7.2276e-006 2.4562e-010 1.4455e-006
5 8.6731e-006 2.4256e-010 1.4455e-006
6 1.0119e-005 2.4222e-010 1.4455e-006
7 1.1564e-005 2.4858e-010 1.4455e-006
8 1.3010e-005 2.4483e-010 1.4455e-006
9 1.4455e-005 2.4470e-010 1.4455e-006
10 1.5901e-005 2.4480e-010 1.4455e-006

Table 3: Dense Problem 2000-by-1500 (Algorithm IR-SNE)

IT 𝑅𝑅𝑅𝑅�𝑥𝑥𝑞𝑞 � 𝑅𝑅𝑅𝑅�𝑟𝑟𝑞𝑞 � ||𝑡𝑡𝑞𝑞 ||2 /||𝑥𝑥𝑞𝑞 ||2


0 6.3065e-003 2.4782e-004 0
1 4.2098e-005 1.6469e-006 6.2648e-003
2 4.9380e-006 1.1022e-008 4.1746e-005
3 6.2523e-006 3.0371e-010 2.3706e-006
4 7.3532e-006 2.4815e-010 2.1699e-006
5 8.4908e-006 2.5428e-010 2.4117e-006
6 9.7705e-006 2.9390e-010 2.3501e-006
7 1.0947e-005 2.6434e-010 2.3575e-006
8 1.2334e-005 2.2960e-010 2.4873e-006
9 1.3579e-005 2.6337e-010 2.3336e-006
10 1.4774e-005 2.4588e-010 2.3539e-006

15
Section 6: Future Work and Concluding Remarks

Efficiency Enhancement

In our implementations, we have computed two separate QR decompositions -- the QR


decomposition of the original matrix 𝐴𝐴 to obtain 𝜆𝜆𝑄𝑄𝑄𝑄 and the QR decomposition of the
extended matrix 𝐶𝐶 to obtain 𝑥𝑥(𝜆𝜆). We plan to explore the approach in which the QR
decomposition of 𝐶𝐶 is not computed from scratch rather it is built upon the QR decomposition
of 𝐴𝐴. The problem will then become how to compute the following QR decomposition
efficiently

 R1 
 1  = QR
ˆ (6.1)
λ2I 
 

where 𝐴𝐴 = 𝑄𝑄1 𝑅𝑅1 is the QR decomposition of the matrix 𝐴𝐴. Mathematically,

 A   Q1 R1   Q 0   R1   Q 0 
=C =1  = 1  
1
 = 1  
1
= ˆ
 QR QR (6.2)
λ2I   λ 2I   0 I λ 2I   0 I 
     

In this context, the matrix √𝜆𝜆𝐼𝐼 is considered an 𝑛𝑛-by-𝑛𝑛 upper triangular matrix. Depending on
whether 𝑅𝑅1 is stored row-by-row or column-by-column, the QR decomposition defined in (6.1)
can be computed by a sequence of Givens rotations or a sequence of Householder reflections.

If 𝑅𝑅1 is dense, we need 𝑛𝑛(𝑛𝑛 + 1)/2 Givens rotations or 𝑛𝑛 Householder reflections to perform
the QR decomposition (6.1). If 𝑅𝑅1 is sparse, choosing an appropriate storage scheme for 𝑅𝑅1 is
critical. Assume that 𝑅𝑅1 is stored row-by-row, we need 𝑛𝑛𝑛𝑛𝑛𝑛(𝑅𝑅1 ) = 𝑛𝑛𝑛𝑛𝑛𝑛(𝑅𝑅) Givens rotations to
compute the QR factorization. In this situation, Householder reflections would be very
inefficient.

Note that the sparsity structure of 𝑅𝑅1 is the same as that of 𝑅𝑅. We can determine the sparsity
structure of 𝑅𝑅 in a preliminary step. Therefore, we don’t need to compute the sparsity structure
of 𝑅𝑅 again in carrying out (6.2). Furthermore, the sparsity structure 𝑅𝑅 can fully accommodate
the process of merging √𝜆𝜆𝐼𝐼 into 𝑅𝑅1 without the need to dynamically allocate additional storage
for 𝑅𝑅.

We plan to investigate whether the proposed method of combining the two QR decompositions
is more efficient than computing the two QR decompositions separately for both dense and
sparse problems.

16
The following describes our overall approach to solving rank-deficient linear least-squares
problems.

1. For a given m-by-n rank-deficient matrix 𝐴𝐴, compute the QR decomposition 𝐴𝐴 = 𝑄𝑄1 𝑅𝑅1 .
The matrix 𝑄𝑄1 is not saved if 𝐴𝐴 is sparse.

2. Calculate 𝜆𝜆𝑄𝑄𝑄𝑄 as described in Section 4.

3. Compute the QR decomposition of the extended matrix 𝐶𝐶 as defined in (2.12) from


scratch or using the method of combining the two QR decompositions described above.

4. Compute 𝑥𝑥0 = 𝑥𝑥(𝜆𝜆) using the results of the QR decomposition of 𝐶𝐶.

5. Perform the iterative refinement process.

The application of our algorithm to the solution of sparse and rank-deficient linear least-
squares problems is discussed in (Coleman, et al., 2010).

Sparse Linear Least-Squares Problems with Dense Rows

In practice, a sparse linear least-squares problem frequently contains a number of dense or


nearly dense rows. The corresponding upper triangular factor 𝑅𝑅 becomes nearly or completely
full. Therefore, the straightforward application of the QR decomposition method is not a viable
approach to the solution of the sparse least-squares problems with dense rows. Heath (Heath,
1982) proposed a method for handling dense rows. Through row permutation, a sparse linear
least-squares problem with dense rows can be easily transformed into the following equivalent
problem

 A  b 
minn   x −   (6.3)
x∈R
E  f  2

where 𝐴𝐴 consists of sparse rows and 𝐸𝐸 the dense rows of the original matrix, respectively. The
solution to (6.3) is obtained by first computing the solution to the sparse linear least-squares
problem min x Ax − b 2 as usual and then updating the solution by the rows in 𝐸𝐸. We refer the
reader to (Heath, 1982) for details. The crucial assumption made in the method described in
(Heath, 1982) is that 𝐴𝐴 has full column rank. However, practical problems might not satisfy this
assumption. If the matrix 𝐴𝐴 is rank-deficient, our proposed method described in this paper can
be applied to obtain a solution to the problem min x Ax − b 2 . Handling dense rows in sparse
linear least-squares problems is also considered in (Sun, 1995).

17
Concluding Remarks

In this paper, we have proposed a new method for handling rank deficiency in solving sparse
linear least-squares problems. Our proposed method is mathematically equivalent to an
existing method. We have shown that our method has several practical advantages over the
existing method in terms of efficiency and applicability to rank-deficient sparse problems. Our
experimental results show the practical promise of our approach. We have outlined several
future directions of research to further enhance the efficiency of our algorithm and to apply our
algorithm to the solution of sparse linear least-squares problems with dense rows.

Works Cited

Avron H., Ng E. and Toledo S. Using Perturbed QR Factorizations to Solve Linear Least-Squares Problems
[Article] // SIAM J. Matrix Anal. Appl.. - 2009. - 2 : Vol. 31. - pp. 674-693.
Coleman T. F. and Sun C. Solving Rank-Deficient Sparse Linear Least Squares Problems [Report] /
University of Waterloo. - Waterloo, Ontario, Canada : [s.n.], 2010. - Work in Progress.
George J. A. and Heath M. T. Solution of Sparse Linear Least Squares Problems using Givens Rotations
[Article] // Linear Algebra Appl.. - 1980. - Vol. 34. - pp. 69-83.
Golub G. Numerical Methods for Solving Linear Least Squares Problems [Article] // Numerische
Mathematik. - 1965. - Vol. 7. - pp. 206-216.
Heath M. T. Numerical Methods for Large Sparse Linear Least Squares Problems [Article] // SIAM J. Sci.
Stat. Comput.. - 1984. - Vol. 26. - pp. 497-513.
Heath M. T. Some Extensions of an Algorithm for Sparse Linear Least Squares Problems [Article] // SIAM
J. Sci. Stat. Comput.. - 1982. - Vol. 3. - pp. 223-237.
Liu J. W. H. On General Row Merging Schemes for Sparse Givens Transformations [Article] // SIAM J. Sci.
Statist. Comput.. - 1986. - Vol. 7. - pp. 127-148.
Ng E. A scheme for handling rank deficiency in the solution of sparse linear least squares problems
[Article] // SIAM J. Sci. Statist. Comput.. - 1991. - Vol. 12. - pp. 1173-1183.
Riley J. D. Solving Systems of Linear Equations with a Positive Definite, Symmetric, but Possibly Ill-
Conditioned Matrix [Article] // Math. Tables Aids Comput.. - 1956. - Vol. 9. - pp. 96-101.
Sun C. Dealing with Dense Rows in the Solution of Sparse Linear Least Squares Problems [Report] /
Advanced Computing Research Institute, Cornell Theory Center ; Cornell University. - Ithaca, NY : [s.n.],
1995. - CTC95TR227.
Sun C. Parallel Solution of Sparse Linear Least Squares Problems on Distributed-Memory
Multiprocessors [Article] // Parallel Computing. - 1997. - pp. 2075-2093.
Sun C. Parallel Sparse Orthogonal Factorization on Distributed-Memory Multiprocessors [Article] //
SIAM J. Sci. Comput.. - 1996. - Vol. 17. - pp. 666-685.

18

You might also like