Solving Least-Squares Problems via a Double-Optimal Algorithm and a Variant of the Karush–Kuhn–Tucker Equation for Over-Determined Systems

Liu, Chein-Shan; Kuo, Chung-Lun; Chang, Chih-Wen

doi:10.3390/a17050211

Open AccessArticle

Solving Least-Squares Problems via a Double-Optimal Algorithm and a Variant of the Karush–Kuhn–Tucker Equation for Over-Determined Systems

by

Chein-Shan Liu

¹

,

Chung-Lun Kuo

¹ and

Chih-Wen Chang

^2,*

¹

Center of Excellence for Ocean Engineering, National Taiwan Ocean University, Keelung 20224, Taiwan

²

Department of Mechanical Engineering, National United University, Miaoli 36063, Taiwan

^*

Author to whom correspondence should be addressed.

Algorithms 2024, 17(5), 211; https://doi.org/10.3390/a17050211

Submission received: 1 April 2024 / Revised: 2 May 2024 / Accepted: 7 May 2024 / Published: 14 May 2024

(This article belongs to the Special Issue Numerical Optimization and Algorithms: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

A double optimal solution (DOS) of a least-squares problem

A x = b, A \in R^{q \times n}

with

q \neq n

is derived in an

m + 1

-dimensional varying affine Krylov subspace (VAKS); two minimization techniques exactly determine the

m + 1

expansion coefficients of the solution

x

in the VAKS. The minimal-norm solution can be obtained automatically regardless of whether the linear system is consistent or inconsistent. A new double optimal algorithm (DOA) is created; it is sufficiently time saving by inverting an

m \times m

positive definite matrix at each iteration step, where

m ≪ min (n, q)

. The properties of the DOA are investigated and the estimation of residual error is provided. The residual norms are proven to be strictly decreasing in the iterations; hence, the DOA is absolutely convergent. Numerical tests reveal the efficiency of the DOA for solving least-squares problems. The DOA is applicable to least-squares problems regardless of whether

q < n

or

q > n

. The Moore–Penrose inverse matrix is also addressed by adopting the DOA; the accuracy and efficiency of the proposed method are proven. The

m + 1

-dimensional VAKS is different from the traditional m-dimensional affine Krylov subspace used in the conjugate gradient (CG)-type iterative algorithms CGNR (or CGLS) and CGRE (or Craig method) for solving least-squares problems with

q > n

. We propose a variant of the Karush–Kuhn–Tucker equation, and then we apply the partial pivoting Gaussian elimination method to solve the variant, which is better than the original Karush–Kuhn–Tucker equation, the CGNR and the CGNE for solving over-determined linear systems. Our main contribution is developing a double-optimization-based iterative algorithm in a varying affine Krylov subspace for effectively and accurately solving least-squares problems, even for a dense and ill-conditioned matrix

A

with

q ≪ n

or

q ≫ n

.

Keywords:

linear least-squares problems; varying affine Krylov subspace; double optimal algorithm; minimal-norm solution; residual orthogonality; absolute convergence; Moore–Penrose inverse; Karush–Kuhn–Tucker equation

MSC:

15A06; 65F10

1. Introduction

One of the most important numerical methods to solve linear systems is the Krylov subspace method [1,2,3,4,5,6,7]; since the appearance of pioneering works in [8,9], it has been studied extensively. A lot of Krylov subspace methods were reported in [10,11,12,13,14,15,16].

Many algorithms have been used to solve the least-squares problem with a minimum norm [17,18]:

min ∥ x ∥ s . t . x \in \arg min_{x} ∥ b - A x ∥,

(1)

where

∥ \cdot ∥

denotes the Euclidean norm,

x \in R^{n}

is unknown, and

A \in R^{q \times n}

and

b \in R^{q}

are given.

We are concerned with a varying affine Krylov subspace (VAKS) solution and its corresponding iterative algorithm for solving

A x = b,

(2)

which is an over-determined system when

q > n

and an under-determined system when

q < n

.

Most results for Equation (2) are directly extendable to Equation (1), whose solution is called the minimum-norm or pseudo-inverse solution and is given by

x = A^{†} b

, where

A^{†}

denotes the pseudo-inverse of

A

satisfying the following Penrose equations:

\begin{matrix} A A^{†} A = A, \end{matrix}

(3)

\begin{matrix} A^{†} A A^{†} = A^{†}, \end{matrix}

(4)

\begin{matrix} {(A A^{†})}^{T} = A A^{†}, \end{matrix}

(5)

\begin{matrix} {(A^{†} A)}^{T} = A^{†} A . \end{matrix}

(6)

The pseudo-inverse has been investigated and applied to solve linear least-squares problems, and some computational methods have been developed to compute the Moore–Penrose inverse matrix [19,20,21,22,23,24]. In terms of the Moore–Penrose inverse matrix

A^{†}

, the general solution of the least-squares problem is given by

\begin{matrix} x = A^{†} b + (I - R_{A}) z, \end{matrix}

(7)

\begin{matrix} R_{A} = A^{†} A, \end{matrix}

(8)

where

z

is arbitrary. Because

A^{†} b

and

(I - R_{A}) z

are orthogonal, it follows that

x = A^{†} b

is the unique solution of Equation (1).

QR factorization is a good method to solve Equation (2). The normalized Gram–Schmidt process is a modification of the original Gram–Schmidt orthogonalization by normalizing each vector after orthogonalization. Let

Q

denote a

q \times n

matrix obtained by applying the normalized Gram–Schmidt process on the column vectors of

A

:

Q : = [q_{1}, \dots, q_{n}],

(9)

which possesses the following orthonormal property:

Q^{T} Q = I_{n} .

(10)

Hence, by

Q R x = b

, it is easy to solve

x

by applying the backward substitution method to

R x = Q^{T} b .

(11)

However, the QR method is only applicable to an over-determined system with

q > n

. For an under-determined system with

q < n

, the QR method is not applicable because

R

is no longer an upper triangular matrix.

Recently, many scholars have proposed many algorithms to tackle least-squares problems, like relaxed greedy randomized coordinate descent methods [25], the parallel approximate pseudo-inverse matrix technique in conjunction with the explicit preconditioned conjugate gradient least squares scheme [26], the new randomized Gauss–Seidel method [27], the QR–Cholesky method, the implicit QR method, the bidiagonal implicit QR method [28], the randomized block coordinate descent algorithm [29], an iterative pre-conditioning technique [30], splitting-based randomized iterative methods [31], the projection method [32], the index-search-method-based inner–outer iterative algorithm [33], the greedy double subspace coordinate descent method [34], a distributed algorithm [35], and multivalued projections [36].

As just mentioned, there are many algorithms for solving linear least-squares problems. For small- and medium-sized problems, one can use QR-based methods, and, if the problem is rank-deficient or ill conditioned, singular value decomposition (SVD) can be adopted [37,38]. For large-sized least-squares problems, Krylov subspace methods are more efficient than direct methods if the matrix is sparse or otherwise structured. In the present paper, we plan to develop an efficient and accurate Krylov subspace method for large-sized least-squares problems with dense and maybe ill-conditioned matrices, not limited to sparse matrices. It is also interesting that the proposed iterative algorithm can solve least-squares problems with extremal cases of

q ≪ n

or

q ≫ n

. In fact, for under-determined systems, many methods are not applicable, for instance, QR and conjugate gradient (CG)-type methods, for which the basis vectors of the Krylov subspace are orthogonalized implicitly.

In addition to Equation (1), Liu [39] considered the following minimization

min_{x} \{f = \frac{x^{T} A^{T} A x}{{(b^{T} A x)}^{2}}\}

(12)

for solving ill-posed linear systems of Equation (2) with

q = n

. Liu [40,41] employed the double optimization technique to iteratively solve inverse problems. Liu [42] replaced the Krylov subspace by the column space of

A

and derived an effective iterative algorithm based on the maximal projection in Equation (12). These results are quite promising. However, for a non-square system (2), there exists no similar method of double optimization in the literature.

It is well known that the least-squares solution of the minimization in Equation (1) is obtained by

A^{T} A x = A^{T} b .

(13)

When the derivative of f in Equation (12) vs.

x

is equal to zero, we can obtain

A^{T} A x = \frac{x^{T} A^{T} A x}{b^{T} A x} A^{T} b .

(14)

Taking the transpose of Equation (2) to

x^{T} A^{T} = b^{T}

and inserting it into Equation (14), we can immediately obtain

A^{T} A x = \frac{b^{T} A x}{b^{T} A x} A^{T} b,

(15)

which implies Equation (13) by

b^{T} A x / b^{T} A x = 1

. Many numerical methods for least-squares problems are based on Equation (1); no numerical methods for least-squares problems are based on Equation (12).

To solve the least-squares problem (2), the usual approach is the Karush–Kuhn–Tucker equation method, which sets Equation (2) to a square linear system but with a larger dimension

q + n

. Let

r = b - A x

(16)

be the residual vector. By considering the enforcing constraint:

A^{T} r = 0,

(17)

which is equivalent to the normal form of Equation (2), Equations (16) and (17) constitute a

q + n

-dimensional linear system:

[\begin{matrix} I_{q} & A \\ A^{T} & 0 \end{matrix}] [\begin{matrix} r \\ x \end{matrix}] = [\begin{matrix} b \\ 0 \end{matrix}] .

(18)

The advantage of the so-called Karush–Kuhn–Tucker equation [15] is that one can employ a linear solver to solve Equation (18) to obtain the least-squares solution of Equation (2); however, a disadvantage is that the dimension is enlarged to

q + n

.

A major novelty of the present paper is that we employ both Equations (1) and (12) to develop a novel numerical method in a varying affine Krylov subspace. The innovative contributions are as follows: this is the first time a new concept of varying affine Krylov subspaces has been introduced, and a double-optimal iterative algorithm has been derived based on the double optimization method to solve least-squares problems. A new variant of the Karush–Kuhn–Tucker equation can significantly improve the accuracy of solving least-squares problems by adopting the partial pivoting Gaussian elimination method.

In Section 2, the solution of Equation (2) is expanded in an

m + 1

-dimensional varying affine Krylov subspace (VAKS), whose

m + 1

unknown coefficients are optimized in Section 3 by two merit functions. The double optimal algorithm (DOA) is developed in Section 4; we prove a key property of absolute convergence, and the decreasing quantity of residual squares is derived explicitly. Examples of linear least-squares problems are given and compared in Section 5; in addition, Moore–Penrose inverses of non-square matrices are discussed. In Section 6, we discuss a variant of the Karush–Kuhn–Tucker equation for solving over-determined linear systems, which is compared to conjugate-gradient-type iterative methods. Finally, Section 7 describes the conclusions.

2. A Varying Affine Krylov Subspace Method

We consider an

m + 1

-dimensional varying affine Krylov subspace (VAKS) by

\begin{matrix} K_{m} : = span {A^{T} A u_{0}, \dots, {(A^{T} A)}^{m} u_{0}}, \\ K_{m}^{'} : = span {u_{0}, K_{m}}, \end{matrix}

(19)

where

u_{0} : = A^{T} b \in R^{n},

(20)

and

K_{m}^{'}

is equivalent to

K_{m + 1} (A^{T} A, u_{0})

.

Then, we expand the solution

x

of Equation (2) in

K_{m}^{'}

by

x = α_{0} u_{0} + \sum_{k = 1}^{m} α_{k} u_{k} \in K_{m}^{'},

(21)

where

u_{k} : = {(A^{T} A)}^{k} u_{0}, k = 1, \dots, m

, and

α_{0}

and

α_{k}

are constants to be determined. We take

m ≪ min (n, q)

.

Let

U : = [u_{1}, \dots, u_{m}] = [A^{T} A u_{0}, \dots, {(A^{T} A)}^{m} u_{0}]

(22)

be an

n \times m

matrix. The Arnoldi process [15] as a normalized Gram–Schmidt process is used to normalize and orthogonalize the Krylov vectors

{(A^{T} A)}^{j} u_{0}, j = 1, \dots, m

, such that the resultant vectors

u_{i}, i = 1, \dots, m

satisfy

u_{i}^{T} u_{j} = δ_{i j}, i, j = 1, \dots, m

, where

δ_{i j}

is the Kronecker delta symbol. Select m, give an initial

v_{0} = A^{T} A u_{0}

, and compute

u_{1} = \frac{v_{0}}{∥ v_{0} ∥}

:

\begin{matrix} Do j = 1 : m - 1, \\ w_{j} = A^{T} A u_{j}, \\ Do i = 1 : j, \\ h_{i j} = u_{i}^{T} w_{j}, \\ w_{j} = w_{j} - h_{i j} u_{i}, \\ End do of i, \\ h_{j + 1, j} = ∥ w_{j} ∥; if h_{j + 1, j} = 0 stop, \\ u_{j + 1} = \frac{w_{j}}{h_{j + 1, j}}, \\ End do of j . \end{matrix}

(23)

Hence, Equation (21) can be written as

x = α_{0} u_{0} + U α,

(24)

where

α : = {(α_{1}, \dots, α_{m})}^{T}

and

U

satisfies the orthonormal property:

U^{T} U = I_{m} .

(25)

For a consistent system, the exact solution

x^{*}

satisfies

A x^{*} = b

exactly. For an inconsistent system, the exact solution

x^{*}

exactly satisfies Equation (1) given by

x^{*} = A^{†} b

, because the equality in

A x^{*} = b

no longer holds.

On this occasion, we briefly recall the m-dimensional conjugate gradient (CG)-type iterative method with an approximation to Equation (2) with

q > n

. The CGNR (or CGLS) minimizes the energy function of the error [15,43]:

f (x) = (A^{T} A (x^{*} - x), (x^{*} - x)),

(26)

where

(a, b)

denotes the inner product of vectors

a

and

b

, and

x^{*}

is the exact solution of Equation (2). Overall,

x

in the affine Krylov subspace (AKS) is:

x_{0} + K_{m} (A^{T} A, A^{T} r_{0}) : = x_{0} + span {A^{T} r_{0}, A^{T} A A^{T} r_{0}, \dots, {(A^{T} A)}^{m - 1} A^{T} r_{0}},

(27)

where

r_{0} = b - A x_{0}

. There is some extension of the CGNR (or CGLS) to bounded perturbation resilient iterative methods [44].

Craig’s idea was to seek the solution of Equation (2) with

q > n

by

x = \arg min_{z \in R^{n}} ∥ z ∥ subject to A z = b .

(28)

By introducing

x = A^{T} y

, we seek

y

in the AKS by

y_{0} + K_{m} (A A^{T}, r_{0}),

(29)

to minimize

f (y) = (A A^{T} (y^{*} - y), (y^{*} - y)),

(30)

where

y^{*}

is the exact solution of

A A^{T} y = b

. The above statement for

x^{*}

is also valid for

y^{*}

with

A x^{*} = b

replaced by

A A^{T} y^{*} = b

. The resulting CG iterative algorithm is known as the CGNE (or the Craig method), which minimizes the error function [15,43].

Notice that the affine Krylov subspaces in Equations (27) and (29) are fixed AKSs upon setting

x_{0}

and

y_{0}

, while in Equation (21), the AKS is varying, denoted as a VAKS, owing to the translation vector

α_{0} u_{0}

not being fixed to

u_{0}

:

K_{m}^{'} = K_{m + 1} (A^{T} A, A^{T} b) = span {A^{T} b, A^{T} A A^{T} b, \dots, {(A^{T} A)}^{m} A^{T} b} .

(31)

Upon comparing Equation (21), whose coefficient

α_{0}

before

u_{0}

is an unknown constant, to Equations (27) and (29), whose coefficients before

x_{0}

and

y_{0}

are fixed to one, the affine Krylov subspace in Equation (21) is a varying affine Krylov subspace (VAKS). The dimension of the VAKS is one more than that of the AKS.

3. A Double Optimal Solution

In this section, we adopt the new concept of the VKAS to derive a double optimal solution of Equation (2). The starting point is Equation (21), rather than

x = u_{0} + \sum_{k = 1}^{m} α_{k} u_{k} \in u_{0} + K_{m}

.

3.1. Two Minimized Functions

Let

y : = A x,

(32)

such that the approximation to

b

in Equation (2) is accounted by the error vector:

e : = b - b^{T} (\frac{y}{∥ y ∥}) \frac{y}{∥ y ∥};

(33)

the squared-norm of

e

reads as

{∥ e ∥}^{2} = {∥ b ∥}^{2} - \frac{{(b^{T} y)}^{2}}{{∥ y ∥}^{2}} \geq 0 .

(34)

We maximize

max_{y} \{\frac{{(b^{T} y)}^{2}}{{∥ y ∥}^{2}}\}

(35)

as the maximal projection of

b

on the direction

y / ∥ y ∥

, such that

{∥ e ∥}^{2}

is minimized. Equation (35) is equivalent to the minimization of

min_{y} \{f : = \frac{{∥ y ∥}^{2}}{{(b^{T} y)}^{2}}\} \geq \frac{1}{{∥ b ∥}^{2}} .

(36)

Let

J

be

J : = A U \in R^{q \times m} .

(37)

Then, with the aid of Equation (24), Equation (32) can be written as

y = y_{0} + J α,

(38)

where

y_{0} : = α_{0} A u_{0} \in R^{q} .

(39)

Inserting Equation (38) for

y

into Equation (36) yields

min_{α} \{f = \frac{{∥ y ∥}^{2}}{{(b^{T} y)}^{2}} = \frac{∥ y_{0} {+ J α ∥}^{2}}{{(b^{T} y_{0} + b^{T} J α)}^{2}}\},

(40)

which is used to find

α

. However,

α_{0}

in

y_{0} = α_{0} A u_{0}

is still an unknown value to be determined by imposing the minimization of squared-norm of residuals, shortened to the residual square:

min_{α_{0}} {{∥ r ∥}^{2} = {∥ b - A x ∥}^{2}} .

(41)

3.2. Mathematical Preliminaries

To determine the unknowns

α_{j}, j = 0, 1, \dots, m

, the following main theorem is an extension of [39] to a non-square system (2).

Theorem 1.

Assume that

m_{0} = r a n k (A) \leq min (n, q)

.

U

satisfies Equation (25) with

m < m_{0}

.

x \in K_{m}^{'}

, approximately satisfying Equation (2) derived from two minimizations (40) and (41), is a double optimal solution (DOS), given by

x = α_{0} (u_{0} - V A u_{0} + λ_{0} V b),

(42)

where

\begin{matrix} u_{0} = A^{T} b, J = A U, C = J^{T} J, D = C^{- 1}, V = U D J^{T}, P = A V = J D J^{T}, \end{matrix}

\begin{matrix} λ_{0} = \frac{∥ A u_{0} ∥^{2} - u_{0}^{T} A^{T} P A u_{0}}{b^{T} A u_{0} - b^{T} P A u_{0}}, \end{matrix}

(43)

\begin{matrix} w = λ_{0} P b + A u_{0} - P A u_{0}, \end{matrix}

(44)

\begin{matrix} α_{0} = \frac{w^{T} b}{{∥ w ∥}^{2}} . \end{matrix}

(45)

To guarantee the existence of

D = C^{- 1}

, the dimension m of

K_{m}

follows

m < m_{0}

.

Proof.

In view of Equation (38), we can write

\begin{matrix} b^{T} y = b^{T} y_{0} + b^{T} J α, \end{matrix}

(46)

\begin{matrix} {∥ y ∥}^{2} = {∥ y_{0} ∥}^{2} + 2 y_{0}^{T} J α + α^{T} C α \end{matrix}

(47)

for f in Equation (40), where

C : = J^{T} J

(48)

is an

m \times m

positive definite matrix; hence,

C^{- 1}

exists. For the rank deficiency matrix

A

, we suppose that its rank is

m_{0} = rank (A)

, where

m_{0} < min (n, q)

; to guarantee that

C

is an

m \times m

positive definite matrix, m must be smaller than

m_{0}

.

The minimality condition of Equation (40) is

{(b^{T} y)}^{2} \nabla_{α} {∥ y ∥}^{2} - 2 b^{T} y {∥ y ∥}^{2} \nabla_{α} (b^{T} y) = 0;

(49)

\nabla_{α}

is the gradient with respect to

α

. Cancelling

b^{T} y

in Equation (49) yields

b^{T} y y_{2} - 2 {∥ y ∥}^{2} y_{1} = 0,

(50)

where

\begin{matrix} y_{1} : = \nabla_{α} (b^{T} y) = J^{T} b, \end{matrix}

(51)

\begin{matrix} y_{2} : = \nabla_{α} {∥ y ∥}^{2} = 2 J^{T} y_{0} + 2 C α . \end{matrix}

(52)

From Equation (50),

y_{2}

is observed to be proportional to

y_{1}

in the m-dimensional space

R^{m}

, which is supposed to be

y_{2} = \frac{{2 ∥ y ∥}^{2}}{b^{T} y} y_{1} = 2 λ y_{1},

(53)

where

2 λ

is a proportional factor determined below. The second equality leads to

{∥ y ∥}^{2} = λ b^{T} y .

(54)

Then, by Equations (51)–(53), we have

α = λ D J^{T} b - D J^{T} y_{0},

(55)

where

D : = C^{- 1} = {(J^{T} J)}^{- 1} > 0

(56)

is an

m \times m

matrix. It follows from Equations (46), (47) and (55) that

\begin{matrix} b^{T} y = b^{T} y_{0} + λ b^{T} P b - b^{T} P y_{0}, \end{matrix}

(57)

\begin{matrix} {∥ y ∥}^{2} = λ^{2} b^{T} P b + {∥ y_{0} ∥}^{2} - y_{0}^{T} P y_{0}, \end{matrix}

(58)

where

P : = J D J^{T}

(59)

is a

q \times q

positive semi-definite matrix.

P

is indeed a projection operator.

From Equations (54), (57) and (58),

λ

is solved from

∥ y_{0} ∥^{2} - y_{0}^{T} P y_{0} = λ (b^{T} y_{0} - b^{T} P y_{0})

(60)

as follows:

λ = \frac{∥ y_{0} ∥^{2} - y_{0}^{T} P y_{0}}{b^{T} y_{0} - b^{T} P y_{0}} .

(61)

Inserting this into Equation (55) generates

α = \frac{∥ y_{0} ∥^{2} - y_{0}^{T} P y_{0}}{b^{T} b y_{0} - b^{T} P y_{0}} D J^{T} b - D J^{T} y_{0} .

(62)

By inserting the above

α

into Equation (24) and using Equation (39) for

y_{0}

, we can obtain

x = α_{0} (u_{0} - V A u_{0} + λ_{0} V b),

(63)

where

\begin{matrix} V : = U D J^{T}, \end{matrix}

(64)

\begin{matrix} λ_{0} : = \frac{∥ A u_{0} ∥^{2} - u_{0}^{T} A^{T} P A u_{0}}{b^{T} A u_{0} - b^{T} P A u_{0}} . \end{matrix}

(65)

Upon letting

v : = u_{0} - V A u_{0} + λ_{0} V b,

(66)

x

in Equation (63) can be expressed as

x = α_{0} v,

(67)

where

α_{0}

is determined by inserting Equation (67) into Equation (41):

{∥ b - A x ∥}^{2} = ∥ α_{0} {A v - b ∥}^{2} = α_{0}^{2} {∥ w ∥}^{2} - 2 α_{0} w^{T} b + {∥ b ∥}^{2},

(68)

where

w : = A v = A u_{0} - A V A u_{0} + λ_{0} A V b .

(69)

In view of Equations (37) and (64),

P

in Equation (59) can be written as

P = A V,

(70)

such that

w

can be written as

w = λ_{0} P b + A u_{0} - P A u_{0} .

This is just

w

in Equation (44).

Set the differential of Equation (68) vs.

α_{0}

to zero, obtaining

α_{0} = \frac{w^{T} b}{{∥ w ∥}^{2}};

(71)

hence, the proof of Equation (42) in Theorem 1 is complete. □

In the proof of Theorem 1, we have used the minimality condition of Equation (40) to determine

α

in Equation (62), and then the minimality condition of Equation (41) was used to determine

α_{0}

in Equation (71). Two optimizations were used to derive

x

in Equation (42), which is thus named a double optimal solution (DOS).

Theorem 2.

The two factors

λ_{0}

in Equation (43) and

α_{0}

in Equation (45) satisfy

α_{0} λ_{0} = 1 .

(72)

Proof.

It follows from Equations (59) and (56) that

P^{2} = P;

(73)

hence,

P

is a projection operator.

In Equation (44),

w

can be written as

w = λ_{0} P b + (I_{q} - P) A u_{0} .

(74)

Utilizing Equation (73),

P^{T} = P

, and

{(I_{q} - P)}^{2} = I_{q} - P

, it follows that

\begin{matrix} {∥ w ∥}^{2} = λ_{0}^{2} b^{T} P b + u_{0}^{T} A^{T} A u_{0} - u_{0}^{T} A^{T} P A u_{0}, \end{matrix}

(75)

\begin{matrix} b^{T} w = λ_{0} b^{T} P b + b^{T} A u_{0} - b^{T} P A u_{0} . \end{matrix}

(76)

With the aid of Equation (65), Equation (76) is further reduced to

b^{T} w = λ_{0} b^{T} P b + \frac{1}{λ_{0}} (u_{0}^{T} A^{T} A u_{0} - u_{0}^{T} A^{T} P A u_{0}) .

(77)

Now, after inserting Equation (77) into Equation (71), we can obtain

α_{0} λ_{0} {∥ w ∥}^{2} = λ_{0}^{2} b^{T} P b + u_{0}^{T} A^{T} A u_{0} - u_{0}^{T} A^{T} P A u_{0},

(78)

which is just equal to

{∥ w ∥}^{2}

by Equation (75). Hence, we obtain Equation (72) readily. The proof of

α_{0} λ_{0} = 1

is complete. □

Remark 1.

The DOS is derived in the VAKS

K_{m}^{'}

in a way to stick onto Equation (21). If one takes

α_{0} = 1

in Equation (21),

K_{m}^{'}

reduces to the usual affine Krylov subspace

u_{0} + K_{m}

. However, in this situation, Theorem 1 no longer holds. If one takes

u_{0} = 0

and works directly in the

(m + 1)

-dimensional Krylov

K_{m + 1}

by expanding

α

in an

(m + 1)

-dimensional subspace, it would result in failure, because after inserting

y_{0} = A u_{0} = 0

into Equation (62),

α

cannot be defined. We can conclude that the DOS directly benefits from the VAKS.

The following corollaries are proven to help us understand the properties of DOS whose bases appeared in Theorems 1 and 2 and equations appeared in the proof of Theorem 1.

Corollary 1.

The multiplier λ defined by Equation (61) satisfies

λ = 1,

(79)

and the double optimal solution (42) can be written as

x = α_{0} u_{0} - α_{0} V A u_{0} + V b .

(80)

Proof.

Inserting Equation (39) for

y_{0}

into Equation (61), we have

λ = α_{0} \frac{∥ A u_{0} ∥^{2} - u_{0}^{T} A^{T} P A u_{0}}{b^{T} A u_{0} - b^{T} P A u_{0}},

(81)

in view of Equation (65), which can be written as

λ = α_{0} λ_{0} .

(82)

Hence,

λ = 1

follows immediately from Equation (72). Then, using Equations (72) and (42) proves Equation (80). □

Remark 2.

Inserting Equation (20) for

u_{0}

into Equation (80) yields

x = [V + α_{0} A^{T} - α_{0} V A A^{T}] b,

which is a pseudo-inverse solution obtained from the DOS.

Corollary 2.

If

m = n < q

and

A^{T} A

is non-singular, then the double optimal solution

x

of Equation (2) is the Penrose solution, given by

x = A^{†} b,

(83)

where

A^{†}

is the Moore–Penrose pseudo-inverse of

A

.

Proof.

If

m = n

, then

J = A U

defined by Equation (37) is a

q \times n

matrix. Meanwhile,

U

defined by Equation (22) is a rank n orthogonal matrix due to Equation (25):

U^{T} U = U U^{T} = I_{n} .

(84)

From Equations (56) and (84), it follows that

D = U^{T} {(A^{T} A)}^{- 1} U .

(85)

Inserting

J = A U, V = U D J^{T}

and Equation (85) into Equation (80), we have

x = α_{0} u_{0} + U U^{T} {(A^{T} A)}^{- 1} U U^{T} A^{T} b - α_{0} U U^{T} {(A^{T} A)}^{- 1} U U^{T} A^{T} A u_{0} .

(86)

By using Equation (84), we can derive

x = {(A^{T} A)}^{- 1} A^{T} b,

(87)

which is a Penrose solution of Equation (2). □

Corollary 2 reveals that the DOS can give the pseudo-inverse solution. SVD can offer a pseudo-inverse solution as follows [37]. In SVD, the matrix

A

is decomposed by

A = U D V^{T},

(88)

where

U

is a

q \times q

orthonormal matrix,

V

is an

n \times n

orthonormal matrix, and

D

is a

q \times n

pseudo-diagonal matrix with non-zero singular values on the diagonal and all other elements are zero. Suppose that

A^{T} A

is non-singular. Then, Equation (87) is a pseudo-inverse solution, which in terms of SVD can be derived as follows:

x = V {(D^{T} D)}^{- 1} D^{T} U^{T} b = V D^{†} U^{T} b;

(89)

letting

A^{†} = V D^{†} U^{T},

(90)

we can derive Equation (83) in Corollary 2. Mathematically, the DOS is equivalent to the SVD solution in Equation (89), even though they are different approaches to the least-squares problem.

3.3. Estimation of Residual Error

Equation (80) is written as

x = α_{0} (u_{0} - V A u_{0}) + V b,

(91)

where

α_{0} = \frac{1}{λ_{0}} = \frac{b^{T} A u_{0} - b^{T} P A u_{0}}{∥ A u_{0} ∥^{2} - u_{0}^{T} A^{T} P A u_{0}},

(92)

by using Equations (65) and (72).

Let us investigate the residual square:

{∥ b - A x ∥}^{2} = {∥ y - b ∥}^{2} = {∥ y ∥}^{2} - 2 b^{T} y + {∥ b ∥}^{2},

(93)

where

y = A x = α_{0} A u_{0} - α_{0} P A u_{0} + P b .

(94)

is obtained from Equation (91) by using Equation (70).

From Equation (94), it follows that

\begin{matrix} {∥ y ∥}^{2} = α_{0}^{2} (u_{0}^{T} A^{T} A u_{0} - u_{0}^{T} A^{T} P A u_{0}) + b^{T} P b, \end{matrix}

(95)

\begin{matrix} b^{T} y = α_{0} b^{T} A u_{0} - α_{0} b^{T} P A u_{0} + b^{T} P b . \end{matrix}

(96)

In the first equation, we take Equation (73) into account. Inserting them into Equation (93) leads to

{∥ b - A x ∥}^{2} = α_{0}^{2} (u_{0}^{T} A^{T} A u_{0} - u_{0}^{T} A^{T} P A u_{0}) + 2 α_{0} (b^{T} P A u_{0} - b^{T} A u_{0}) + {∥ b ∥}^{2} - b^{T} P b .

(97)

Finally, inserting Equation (92) for

α_{0}

into the above equation yields

{∥ b - A x ∥}^{2} = {∥ b ∥}^{2} - b^{T} P b - \frac{{(b^{T} A u_{0} - b^{T} P A u_{0})}^{2}}{u_{0}^{T} A^{T} A u_{0} - u_{0}^{T} A^{T} P A u_{0}} .

(98)

If

P \approx I_{q}

, the value of

{∥ b - A x ∥}^{2}

in Equation (98) is made as small as possible:

{∥ b - A x ∥}^{2} \approx 0 .

(99)

Theorem 3.

For the double optimal solution

x \in K_{m}^{'}

derived in Equation (42) of the least-squares problem (2), the squared-norm of the error vector

e

:

{∥ e ∥}^{2} = {∥ b ∥}^{2} - \frac{{(b^{T} A x)}^{2}}{{∥ A x ∥}^{2}},

(100)

and the squared-norm of the residual vector

r = b - A x

are equal, i.e.,

{∥ e ∥}^{2} = {∥ b - A x ∥}^{2} .

(101)

Moreover,

∥ r ∥ < ∥ b ∥ .

(102)

Proof.

Inserting Equation (92) for

α_{0}

into Equations (95) and (96), we can obtain

\begin{matrix} {∥ y ∥}^{2} = \frac{{(b^{T} A u_{0} - b^{T} P A u_{0})}^{2}}{u_{0}^{T} A^{T} A u_{0} - u_{0}^{T} A^{T} P A u_{0}} + b^{T} P b, \end{matrix}

(103)

\begin{matrix} b^{T} y = \frac{{(b^{T} A u_{0} - b^{T} P A u_{0})}^{2}}{u_{0}^{T} A^{T} A u_{0} - u_{0}^{T} A^{T} P A u_{0}} + b^{T} P b . \end{matrix}

(104)

Consequently, we have

{∥ y ∥}^{2} = b^{T} y .

(105)

It follows from Equations (98) and (103) that

{∥ b - A x ∥}^{2} = {∥ b ∥}^{2} - {∥ y ∥}^{2} .

(106)

By Equations (105) and (34),

{∥ e ∥}^{2} = {∥ b ∥}^{2} - {∥ y ∥}^{2} .

(107)

Equation (101) was proven upon comparing Equations (106) and (107).

Using

r = b - A x

and Equations (101) and (107) leads to

{∥ r ∥}^{2} = {∥ e ∥}^{2} = {∥ b ∥}^{2} - {∥ y ∥}^{2} .

(108)

Due to

{∥ y ∥}^{2} > 0

, Equation (102) follows readily. □

4. A Numerical Algorithm

In this section, we develop an iterative algorithm to solve the least-squares problem, starting from an initial guess

x_{0}

. We assume that the value of

x_{k}

at the kth step is already known, and

x_{k + 1}

is computed at the next step via the iterative algorithm. According to the value

x_{k}

, the kth step residual

r_{k} = b - A x_{k}

can be computed.

When the initial guess

x_{0}

is given, an initial residual is written as follows:

r_{0} = b - A x_{0} .

(109)

Upon letting

z = x - x_{0},

(110)

Equation (2) is equivalent to solving

z

from

A z = r_{0},

(111)

which is deduced by

A z = A x - A x_{0} = b - A x_{0} = r_{0} .

For system (111), we seek

z

in the VAKS by

z = α_{0} u_{0} + \sum_{k = 1}^{m} α_{k} u_{k} = α_{0} u_{0} + U α \in K_{m}^{'},

(112)

where

u_{0} = A^{T} r_{0},

(113)

and

u_{k} : = {(A^{T} A)}^{k} u_{0}, k = 1, \dots, m

. The constants

α_{k}, k = 1, \dots, m

and

α_{0}

are determined by the following two minimizations:

\begin{matrix} min_{z} \{f = \frac{{∥ A z ∥}^{2}}{{(r_{0}^{T} A z)}^{2}}\}, \end{matrix}

(114)

\begin{matrix} min_{z} r = {∥ r_{0} - A z ∥}^{2} . \end{matrix}

(115)

After inserting Equation (112) for

z

, the first minimization can derive

α

, while the second minimization can derive

α_{0}

.

Since two minimizations in Equations (114) and (115) are adopted to determine the descent vector

z

in Equation (112), the resulting iterative algorithm to solve the least-square problem might be labeled as a double optimal algorithm (DOA).

To treat the rank deficient least-squares problem, the dimension m of the VAKS must be

m < m_{0} = r a n k (A)

, such that

rank (J_{k}) = m

and

C_{k} = J_{k}^{T} J_{k} > 0

is an

m \times m

invertible matrix. Consequently, the DOA is depicted by (i), giving

A

,

b

,

m < m_{0}

, an initial value

x_{0}

and the convergence criterion

ε

and (ii) doing

k = 0, 1, \dots

,

\begin{matrix} r_{k} = b - A x_{k}, \\ u_{0}^{k} = A^{T} r_{k}, \\ u_{j}^{k} = {(A^{T} A)}^{j} u_{0}^{k}, j = 1, \dots, m, \\ U_{k} = [u_{1}^{k}, \dots, u_{m}^{k}] (By Arnoldi procedure), \\ J_{k} = A U_{k}, \\ C_{k} = J_{k}^{T} J_{k}, \\ D_{k} = C_{k}^{- 1}, \\ V_{k} = U_{k} D_{k} J_{k}^{T}, \\ P_{k} = A V_{k}, \\ λ_{k} = \frac{∥ A u_{0}^{k} ∥^{2} - {(u_{0}^{k})}^{T} A^{T} P_{k} A u_{0}^{k}}{r_{k}^{T} A u_{0}^{k} - r_{k}^{T} P_{k} A u_{0}^{k}}, \\ z_{k} = V_{k} r_{k} + \frac{1}{λ_{k}} (u_{0}^{k} - V_{k} A u_{0}^{k}), \\ x_{k + 1} = x_{k} + z_{k}, \end{matrix}

(116)

until the convergence with

∥ x_{k + 1} - x_{k} ∥ < ε

or

∥ r_{k + 1} ∥ < ε

. We call

∥ x_{k + 1} - x_{k} ∥

the relative residual.

In the above,

U_{k}

is

n \times m

matrix,

J_{k}

is a

q \times m

matrix,

C_{k}

and

D_{k}

are

m \times m

matrices,

V_{k}

is an

n \times q

matrix and

P_{k}

is a

q \times q

matrix. The computational cost is expanded to compute

D_{k}

, which is however quite time-saving because m is a small number.

A^{T} A

is an

n \times n

fixed matrix computed once and used in the construction of

U_{k}

, which requires

m n

operations.

J_{k}

requires

m n q

operations;

C_{k}

requires

m^{2} q

operations;

D_{k}

requires

m^{2} q

operations;

V_{k}

requires

n m^{2} q

operations; and

P_{k}

requires

n q^{2}

operations. In each iteration, there are

m n + m n q + 2 m^{2} q + n m^{2} q + n q^{2}

operations. Denote the number of iterations by NI. The computational complexity is in total NI

\times [m n + m n q + 2 m^{2} q + n m^{2} q + n q^{2}]

.

It is known that m is a key parameter in the Krylov subspace. A proper choice of m can significantly enhance the convergence speed and accuracy. For ill-posed linear least-squares problems, there exists the best value of m, but for well-posed linear least-squares problems, small values of m may slow down convergence. When m is increased, both the convergence speed and accuracy are increased. However, when m is increased, more computational power is required in the construction of the projection operator

P

and the inversion matrix

C^{- 1}

.

We make some comments about the initial value of

x_{0}

. For under-determined systems, there are many solutions; hence, different choices of

x_{0}

would generate different solutions. In general, as required in Equation (1), the minimal-norm solution can be obtained by setting

x_{0} = 0

, whose norm is zero

∥ x_{0} ∥ = 0

. For over-determined systems, we take

x_{0} = 0

such that the initial residual is a non zero vector

r_{0} = b

and the DOA based on the residual

r_{k}

is solvable. Indeed, the DOA is not sensitive to the initial value of

x_{0}

; we will take

x_{0} = 0

for most problems unless otherwise specified.

The following corollaries help the clarification of the DOA and are based on the DOS used in the residual Equation (111).

Corollary 3.

r_{k + 1}

is orthogonal to

A z_{k}

in Equation (111) and

r_{k + 1} - r_{k}

, i.e.,

\begin{matrix} r_{k + 1}^{T} (A z_{k}) = 0, \end{matrix}

(117)

\begin{matrix} r_{k + 1}^{T} (r_{k + 1} - r_{k}) = 0 . \end{matrix}

(118)

Proof.

By Equation (111), the next

r_{k + 1}

is given by

r_{k + 1} = r_{k} - y_{k},

(119)

where

y_{k} = A z_{k}

. By Equation (116), we can also derive Equation (119) as follows:

b - A x_{k + 1} = b - A x_{k} - A z_{k} .

For Equation (111), Equation (105) is written as

r_{k}^{T} y_{k} - {∥ y_{k} ∥}^{2} = 0;

(120)

and taking the

y_{k}

inner product to Equation (119), we have

r_{k + 1}^{T} y_{k} = r_{k + 1}^{T} (A z_{k}) = r_{k}^{T} y_{k} - {∥ y_{k} ∥}^{2} = 0 .

(121)

Inserting

A z_{k} = r_{k} - r_{k + 1}

into Equation (117), we can prove Equation (118). The DOA is a good approximation of Equation (2) with a better descent direction

z_{k}

in the VAKS. □

Corollary 4.

For the DOA, the convergence rate is

\frac{∥ r_{k} ∥}{∥ r_{k + 1} ∥} = \frac{1}{sin θ} > 1, θ \neq \frac{π}{2} .

(122)

Proof.

It follows from Equations (119) and (120) that

∥ r_{k + 1} ∥^{2} = ∥ r_{k} ∥^{2} - ∥ r_{k} ∥ ∥ y_{k} ∥ cos θ,

(123)

where

θ

is the intersection angle between

r_{k}

and

y_{k} = A z_{k}

. With help from Equation (120), we have

cos θ = \frac{∥ y_{k} ∥}{∥ r_{k} ∥} .

(124)

Obviously,

θ \neq π / 2

because

∥ y_{k} ∥ > 0

. Then, Equation (123) can be further reduced to

∥ r_{k + 1} ∥^{2} = ∥ r_{k} ∥^{2} [1 - {cos}^{2} θ] = {∥ r_{k} ∥}^{2} {sin}^{2} θ .

(125)

Dividing both sides by

∥ r_{k + 1} ∥^{2}

and taking the square-roots of both sides, we can obtain Equation (122). □

Corollary 5.

The residual is decreased step by step,

∥ r_{k + 1} ∥^{2} = ∥ r_{k} ∥^{2} - ∥ A z_{k} ∥^{2} \Rightarrow ∥ r_{k + 1} ∥ < ∥ r_{k} ∥,

(126)

Proof.

Taking the squared norms of Equation (119) and using Equation (120) generates

∥ r_{k + 1} ∥^{2} = ∥ r_{k} ∥^{2} - ∥ y_{k} ∥^{2};

(127)

and after inserting

y_{k} = A z_{k}

, Equation (126) is proven. Corollary 5 is easily deduced by noting

∥ A z_{k} ∥^{2} > 0

. □

The property in Equation (126) is very important, which guarantees that the DOA is absolutely convergent at each iteration.

Remark 3.

At the very beginning, the DOS and DOA are developed in the VAKS

K_{m}^{'}

similar to those in Equation (21). If one takes

α_{0} = 1

in Equation (21),

K_{m}^{'}

reduces to the usual affine Krylov subspace

u_{0} + K_{m}

. However, in this situation, several good properties similar to those in Theorem 3 and Corollaries 3–5 will be lost; convergence cannot be guaranteed in the resulting iterative algorithm. On the other hand, if we take

u_{0} = 0

and directly work in the

(m + 1)

-dimensional Krylov subspace

K_{m + 1}

, by extending

α

to an

(m + 1)

-dimensional vector, this would result in failure because after inserting

y_{0} = A u_{0} = 0

into Equation (62),

α

cannot be defined.

Corollary 6.

In the iterative algorithm (116), we may encounter a slowly varying point:

x_{k + 1} \approx x_{k},

(128)

when

x_{0}

is given such that

r_{0} = b - A x_{0}

satisfies

A^{T} A r_{0} = c_{0} r_{0}

for some

c_{0} ≫ 1

.

Proof.

Inserting

b = r_{0}

and

u_{0} = A^{T} r_{0}

into Equation (92) yields

α_{0} = \frac{1}{c_{0}} .

when

c_{0} ≫ 1

,

α_{0} \approx 0

, and

x_{k + 1} \approx x_{k}

by Equation (116). □

5. Numerical Examples

To demonstrate the efficiency and accuracy of the present iterative DOA, several examples are presented. All the numerical computations were carried out in Microsoft Developer Studio with an Intel Core I7-3770, CPU 2.80 GHz and 8 GB memory. The precision is

10^{- 16}

.

5.1. Example 1

In this example, we find a minimal-norm and least-squares solution of the following consistent equation:

[\begin{matrix} 1 & 2 & 3 & - 1 \\ 3 & 2 & 1 & - 1 \\ 2 & 3 & 1 & 1 \end{matrix}] [\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \end{matrix}] = [\begin{matrix} 1 \\ 1 \\ 1 \end{matrix}] .

(129)

Although Example 1 is simple, we use it to test the DOS and DOA for finding the minimal-norm solution for a consistent system.

First we apply the DOS with

m = 2

to solve this problem, where we can find a quite accurate solution of

x_{1} = 1.48176 \times 10^{- 1}

,

x_{2} = 1.92567 \times 10^{- 1}

,

x_{3} = 1.48141 \times 10^{- 1}

, and

x_{4} = - 2.22689 \times 10^{- 2}

, for which the three residual errors are

r_{1} = 2.012 \times 10^{- 6}

,

r_{2} = 7.125 \times 10^{- 5}

and

r_{3} = - 7.51756 \times 10^{- 5}

, respectively.

Although the DOS is an accurate solution with an error on the order of

10^{- 5}

, the accuracy can be improved by the DOA. In the application of the iterative DOA, initial guesses are given by

x_{1}^{0} = x_{2}^{0} = x_{3}^{0} = x_{4}^{0} = c_{0}

for some constant. Then, the DOA with

m = 1

solves this problem with

c_{0} = 0

and

ε = 10^{- 12}

. As shown in Figure 1a, the DOA converges very fast in only seven steps. Furthermore, we can find very accurate solutions of

x_{1} = 1.48148 \times 10^{- 1}

,

x_{2} = 1.925926 \times 10^{- 1}

,

x_{3} = 1.48148 \times 10^{- 1}

, and

x_{4} = - 2.22222 \times 10^{- 2}

; the norm of

x

is

∥ x ∥ = 0.28545

. The three residual errors are

r_{1} = - 5.63993 \times 10^{- 14}

,

r_{2} = - 5.61773 \times 10^{- 14}

and

r_{3} = - 5.63993 \times 10^{- 14}

, respectively.

On the other hand, if we take

c_{0} = 1

,

r_{1} = 5.6 \times 10^{- 13}

,

r_{2} = 5.59 \times 10^{- 13}

and

r_{3} = 8.47 \times 10^{- 13}

are obtained, which show that the solution

(x_{1}, x_{2}, x_{3}, x_{4})

=

(0.4815, - 0.2741, 0.4815, 0.3778)

is accurate. However, the norm

∥ x ∥ = 0.82552

is not the minimal one. Therefore, the correct one for the minimal-norm solution is

c_{0} = 0

. Unless otherwise specified, we will take the initial guess by

x_{0} = 0

for most problems computed below.

5.2. Example 2

A minimal norm and least-squares solution is sought for of the following inconsistent equation:

[\begin{matrix} 1 & 1 & 0 \\ 1 & 0 & 1 \\ - 1 & 0 & 0 \\ 1 & 1 & 1 \end{matrix}] [\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ 1 \\ 2 \end{matrix}] .

(130)

Although Example 2 is simple, we use it to test the DOS and DOA for finding the minimal-norm solution for an inconsistent system.

We first apply the DOS with

m = 1

, where we obtain a very accurate solution with

x_{1} = - 1.25

,

x_{2} = 1.5

, and

x_{3} = 1.5

. The residuals are

(r_{1}, r_{2}, r_{3}, r_{4}) = (0.25, 0.25, 0.25, - 0.25)

with the residual norm being

0.5

.

Then, the DOA with

m = 1

and

ε = 10^{- 12}

solves this problem. Starting from the initial guesses

x_{1}^{0} = x_{2}^{0} = x_{3}^{0} = 0

, as shown in Figure 1b, the DOA converges very fast in only two steps, where we obtain a very accurate solution with

x_{1} = - 1.25

,

x_{2} = 1.5

, and

x_{3} = 1.5

. The residuals are

(r_{1}, r_{2}, r_{3}, r_{4}) = (0.25, 0.25, 0.25, - 0.25)

, with the residual norm being

0.5

. For the inconsistent system, the equality in Equation (130) no longer holds, such that the residual is not a zero vector.

The Moore–Penrose inverse of the coefficient matrix

A

in Equation (130) has an exact solution:

A^{†} = [\begin{matrix} \frac{1}{4} & \frac{1}{4} & - \frac{3}{4} & - \frac{1}{4} \\ \frac{1}{2} & - \frac{1}{2} & \frac{1}{2} & \frac{1}{2} \\ - \frac{1}{2} & \frac{1}{2} & \frac{1}{2} & \frac{1}{2} \end{matrix}] .

(131)

Then, we find that the exact solution of Equation (130) is given by

x = A^{†} b

, as just

(x_{1}, x_{2}, x_{3}) = (- 1.25, 1.5, 1.5)

mentioned above.

We apply the iterative method developed by Petkovic and Stanimirovic [20] to find the Moore–Penrose inverse, which has a simpler form:

X_{k + 1} = (1 + β) X_{k} - β X_{k} A X_{k}, X_{0} = β A^{T},

(132)

based on the Penrose Equations (4) and (6). Under the same convergence criterion, this iteration process is convergent in 130 steps, as shown in Figure 1b by a dashed line.

Note that the DOA can be also used to find the Moore–Penrose inverse

A^{†}

by taking the right-hand side

b

to be a zero vector and the k-th element to be one, of which the corresponding solution of

x

denoted by

x^{k}

constitutes the kth column of

A^{†}

. Then, we can run the DOA q times from

k = 1, \dots, q

to obtain

A^{†} = [\begin{matrix} x_{1}^{1} & \dots & x_{1}^{q} \\ x_{2}^{1} & \dots & x_{2}^{q} \\ ⋮ & \dots & \dots \\ x_{n}^{1} & \dots & x_{n}^{q} \end{matrix}] .

(133)

The dimension of

A^{†}

is

n \times q

. By applying the DOA with

m = 1

under the same convergence criterion

ε = 10^{- 12}

, we can find a very accurate solution of

A^{†}

, as shown in Equation (131) with the maximum error of all elements being

10^{- 14}

, which is achieved in a total of 26 steps and converges faster than the method developed by Petkovic and Stanimirovic [20].

5.3. Example 3

We consider an over-determined Hilbert linear problem:

\begin{matrix} A_{i j} = \frac{1}{i + j - 1}, 1 \leq i \leq q, 1 \leq j \leq n, \\ b_{i} = \sum_{j = 1}^{n} \frac{1}{i + j - 1} x_{j}, 1 \leq i \leq q, \end{matrix}

(134)

where we fix

q = 6

and

n = 5

, and the exact solutions

x_{j} = 1 / j, 1 \leq j \leq n

are given.

The DOA with

m = 4

and

ε = 10^{- 13}

can solve this problem very fast, starting from the initial guesses

x_{1}^{0} = x_{2}^{0} = x_{3}^{0} = x_{4}^{0} = x_{5}^{0} = 0

, in only four steps, as shown in Figure 2a. Furthermore, we can find a very accurate solution, as shown in Figure 2b with the maximum error (ME) being

8.91 \times 10^{- 12}

. If we consider an under-determined system with

q = 5

and

n = 6

, the DOA with

m = 3

is still workable with ME =

2.22 \times 10^{- 8}

.

For least-squares problems, QR is a famous method to directly solve Equation (2). This process is shown in Equations (9)–(11). However, we find that ME =

1.05 \times 10^{- 6}

obtained by QR is less accurate than that obtained via the DOA in the Hilbert least-squares problem with

q = 6

and

n = 5

. We have checked the accuracy of the orthogonality in Equation (10) by

∥ Q^{T} Q - I_{n} ∥,

which is on the order of

8.65 \times 10^{- 12}

.

The Hilbert matrix is known as a notorious example of a highly ill-conditioned matrix. For this problem, the condition number is

2.5 \times 10^{5}

. Then, in the floating point,

b

has a rounding error on the the order of

10^{- 16}

. The algorithm introduces further floating point errors which are of the same order of magnitude. Perturbation theory for the least-squares problem [38] shows that one can only expect to find a solution that is approximately within

2 \times 10^{(5 - 16)} = 2 \times 10^{- 11}

of the exact solution. The error

∥ Q^{T} Q - I_{n} ∥ = 8.65 \times 10^{- 12}

is within this range; however, the ME =

1.05 \times 10^{- 6}

obtained by QR is not within this range. The reason for this is that when we use Equation (11) to find the solution in the QR method, very small singular values appearing in the denominator will enlarge the rounding error and the error from

Q

; for example, the last two singular values

4.803 \times 10^{- 4}

and

1.734 \times 10^{- 5}

, upon dividing by those two small values sequentially, lead to ME =

1.05 \times 10^{- 6}

, which is not within the range of

2 \times 10^{- 11}

. In contrast, the proposed DOA method with an error of

8.91 \times 10^{- 12}

is within this range. For highly ill-conditioned least-squares problems, the QR method loses some accuracy, and even

Q

well satisfies Equation (10) within the range of

2 \times 10^{- 11}

.

We raise q to 10, and Table 1 lists the computed ME for different values of n. We also list the error

∥ Q^{T} Q - I_{n} ∥

for the QR method, which is worse when n is increased. Based on Equations (88)–(90), this problem is solved by SVD. The DOA is competitive with SVD, and for all cases, it is slightly more accurate than SVD.

To investigate the influence of m on the DOA, Table 2 compares the ME and the iteration number (IN). For ill-posed Hilbert problems, there exists the best value

m = 3

.

5.4. Example 4

We find the Moore–Penrose inverse of the following rank deficient matrix [45]:

A = [\begin{matrix} - 1 & 0 & 1 & 2 \\ - 1 & 1 & 0 & - 1 \\ 0 & - 1 & 1 & 3 \\ 0 & 1 & - 1 & - 3 \\ 1 & - 1 & 0 & 1 \\ 1 & 0 & - 1 & - 2 \end{matrix}] .

(135)

We apply the iterative method of the DOA as demonstrated in Section 5.2 to find the Moore–Penrose inverse, which has an exact solution:

A^{†} = \frac{1}{102} [\begin{matrix} - 15 & - 18 & 3 & - 3 & 18 & 15 \\ 8 & 13 & - 5 & 5 & - 13 & - 8 \\ 7 & 5 & 2 & - 2 & - 5 & - 7 \\ 6 & - 3 & 9 & - 9 & 3 & - 6 \end{matrix}] .

(136)

Under the same convergence criterion

ε = 10^{- 9}

as used by Xia et al. [45], the iteration process of the DOA with

m = 1

converges very fast in a total of 12 steps, as shown in Figure 3. In Table 3, we compare the DOA with other numerical methods specified by Xia et al. [45] and Petkovic and Stanimirovic [20] to asses the performance of the DOA measured by four numerical errors of the Penrose Equations (3)–(6) and the iteration number (IN).

In the above, Alg. is the shorthand of algorithm, IP is the shorthand of initial point and IN is the shorthand of “iteration number”.

1 = {(1, \dots, 1)}^{T}

. The first two algorithms were reported by Xia et al. [45]. It can be seen that the DOA converges much faster and is more accurate than the other algorithms.

In the computation of the pseudo-inverse, the last two singular values close to zero can be avoided to appear in the matrix

C \in R^{m \times m}

if we take m small enough. For example, we take

m = 1

in Table 3, such that

D = C^{- 1}

can be computed accurately without inducing a zero singular value to appear in the denominator to increase the rounding error.

5.5. Example 5

In this example, we find the Moore–Penrose inverse of the Hilbert matrix in Equation (134), which is more difficult than the previous example. Here, we fix

q = 3

,

n = 50

, and

q = 50

,

n = 3

. The numerical errors of the Penrose Equations (3)–(6) are compared in Table 4 and Table 5, respectively.

5.6. Example 6

In this example, we consider a non-harmonic boundary value problem on an amoeba-like boundary:

\begin{matrix} Δ u = u_{x x} + u_{y y} = 0, (x, y) \in Ω, \end{matrix}

(137)

\begin{matrix} u (x, y) = f (x, y) = x^{2} y^{3}, (x, y) \in Γ, \\ Γ : = {ρ (θ) cos θ, ρ (θ) sin θ, 0 \leq θ \leq 2 π}, \end{matrix}

(138)

\begin{matrix} ρ = exp (sin θ) {sin}^{2} (2 θ) + exp (cos θ) {cos}^{2} (2 θ), \end{matrix}

(139)

where

Δ f (x, y) = 2 y^{3} + 6 x^{2} y \neq 0

, and the contour is displayed in Figure 4a by a solid black line. Feng et al. [46] have pointed out that “non-harmonic boundary data” mean that the solution does not have a harmonic extension to the whole plane and a solution is very difficult to achieve.

Let

v (x, y) = u (x, y) - f (x, y)

(140)

be a new variable, and then we come to a Poisson equation under a homogeneous boundary condition:

Poisson equation : \{\begin{matrix} Δ v = F (x, y) = - Δ f (x, y), \\ {v (x, y) |}_{(x, y) \in Γ} = 0, \end{matrix}

(141)

where

F (x, y) \neq 0

, because

f (x, y)

is a non-harmonic boundary function. When

v (x, y)

is solved, we can find

u (x, y) = v (x, y) + f (x, y)

.

In order to obtain an accurate solution of

v (x, y)

, we use the multiple-scale Pascal triangle polynomial expansion method developed by Liu and Kuo [47]:

v (x, y) = \sum_{i = 1}^{m_{0}} \sum_{j = 1}^{i} c_{i j} s_{i j} {(\frac{x}{R_{0}})}^{i - j} {(\frac{y}{R_{0}})}^{j - 1} .

(142)

After collocating q points to satisfy the governing equation and boundary condition (141), we have a non-square linear system (2), where the scales

s_{i j}

are determined such that each column of the coefficient matrix

A

has the same norm.

By using the DOA, we take

m = 4

,

q = 200

,

m_{0} = 5

,

R_{0} = 1000

and

c_{i j}^{0} = 0

. Hence, the dimension of

A

is

200 \times 15

, for which system (2) is an over-determined system. The distribution of collocation points is shown in Figure 4a, while the convergence rate is shown in Figure 4b. It is amazing that the DOA converges with nine steps even under a stringent convergence criterion

ε = 10^{- 10}

. In Figure 5a, we compare the recovered boundary function with the exact function at 1000 points along the boundary. They are almost coincident, and thus we plot the absolute error in Figure 5b, whose ME =

2.95 \times 10^{- 12}

and the root mean square error (RMSE) =

1.2 \times 10^{- 12}

. This result is much better than that computed by Feng et al. [46].

5.7. Example 7

The method of fundamental solutions (MFS) is adopted to solve Equation (137) by using

u (z) = \sum_{j = 1}^{n} c_{j} ln ∥ z - s_{j} ∥, z \in Ω, s_{j} \in {\bar{Ω}}^{c},

(143)

where

{\bar{Ω}}^{c}

is the complementary set of

\bar{Ω}

, and

s_{j} = {(R cos θ_{j}, R sin θ_{j})}^{T}, θ_{j} = \frac{2 j π}{n}

(144)

are source points, in which

R > 0

is a parameter of the circle on which are the source points placed on.

We consider an exact solution

u (x, y) = x^{2} - y^{2}

and specify the Dirichlet boundary condition on the boundary given by Equation (139). In Equation (144), we fix

R = 5

. By using the DOA, we take

m = 11

and

c_{j}^{0} = 0

.

Table 6 lists the computed ME for different values of

(n, q)

obtained by the DOA. It can be seen that the DOA is applicable to least-squares problems regardless of whether

q < n

or

q > n

.

When we apply the QR to solve this problem with

n = 100

and

q = 150

, ME = 42.81 is obtained. When we take

n = 150

and

q = 400

, ME = 2.11 is obtained. Obviously, the QR is not applicable to this problem.

5.8. Example 8

In this example, we consider Equation (2) with a cyclic matrix, whose first row is given by

(1, \dots, n)

. The coefficient matrix can be generated by the following procedure:

\begin{matrix} Do i = 1 : n, \\ Do j = 1 : n, \\ If i = 1, then B_{i, j} = j; otherwise, \\ B_{i, j} = B_{i - 1, j} + 1, \\ If B_{i, j} > n, then B_{i, j} = B_{i, j} - n, \\ Enddo . \end{matrix}

(145)

The exact solution is

x_{i} = 1, i = 1, \dots, n

. The non-square matrix

A

is obtained by taking the first q rows from

B

.

Since the right-hand side vector

b

is obtained from Equation (2) after inserting

A

and

x = 1 = {(1, \dots, 1)}^{T}

, the non-square system is consistent. Therefore, we can compute the maximal error (ME) by

ME : = max_{i = 1, \dots, n} | x_{i}^{e} - x_{i}^{n} |,

where

x_{i}^{e} = 1, i = 1, \dots, n

are the exact solutions and

x_{i}^{n}

are numerical solutions.

For a large matrix of

A

with

n = 2000

and

q = 100

, Table 7 lists the computed ME for different values of m and the INs under

ε = 10^{- 5}

with the initial guesses

x_{i}^{0} = 1 + 0.1 i, i = 1, \dots, n

. If we take the initial guesses to be

x_{i}^{0} = 0, i = 1, \dots, n

, the accuracy is poor with ME = 0.961 and the convergence is very slow. As shown in Corollary 6, for the zero initial value, we have

r_{0} = b

and for the cyclic matrix, we have

r_{0} = b = c 1

, where

c_{1} = \sum_{j = 1}^{n} j

. Thus,

A^{T} A r_{0} = c_{0}^{2} c_{2} r_{0}

, where

c_{2} = \sum_{j = 1}^{q} j

. Hence,

c_{0} = 2001000^{2} \times 5050

is a huge value, which causes the failure of the DOA with a zero initial value for this problem. The same situation happens for the constant initial guesses

x_{i}^{0} = c, i = 1, \dots, n

. It is apparent that the DOA is quite accurate and converges faster. It is interesting that a small m is enough to achieve accurate solutions; when m is smaller, the accuracy is better, but it converges slower. In contrast, when m is larger, the accuracy is worse but it converges faster.

Next, we construct a large-sized matrix

B

from Equation (145) with n replaced by

n_{q}

. The non-square matrix

A

is obtained by taking the first n columns from

B

. With

n = 500

and

q = 2000

, Table 8 lists the computed ME for different values of m, and IN under

ε = 10^{- 5}

with the initial guesses

x_{i}^{0} = 1 + 0.1 i, i = 1, \dots, n

. It is apparent that the DOA is quite accurate and converges faster. When m is increased, the accuracy is better and convergence is faster.

As mentioned, the QR is not applicable to the least-squares problem with

q < n

; however, for

q > n

, the QR is applicable. Table 9 lists the computed ME for different values of

(q, n)

. In the DOA, we take

m = 30

,

ε = 10^{- 12}

, and

x_{i}^{0} = 1 + 0.1 i, i = 1, \dots, n

. The DOA converges within 25 iterations for the first four cases, and 79 iterations for the last case. The DOA can improve the accuracy by about three and four orders compared to the QR. We also list the error

∥ Q^{T} Q - I_{n} ∥

for the QR method, which is very accurate.

If we take

α_{0} = 1

in Equation (21),

K_{m}^{'}

reduces to the usual affine Krylov subspace

u_{0} + K_{m}

. Equation (80) becomes

x = u_{0} + λ_{0} V b - V A u_{0} .

As shown in Table 10, the DOA is more accurate than the DOA with

α_{0} = 1

.

6. A Variant of the Karush–Kuhn–Tucker Equation

In many meshless collocation methods to solve linear partial differential equations, we may collocate more q points than the number of coefficients n in the expansion of the solution, which results in an over-determined system in Equation (2) with

q > n

. The residual vector

r = b - A x

together with Equation (2) can be written as

[\begin{matrix} I_{q} & A \\ A^{T} & 0 \end{matrix}] [\begin{matrix} r \\ x \end{matrix}] = [\begin{matrix} b \\ 0 \end{matrix}] .

(146)

This is called the Karush–Kuhn–Tucker equation [15].

Instead of Equation (146), we may consider a permutation of the Karush–Kuhn–Tucker equation:

[\begin{matrix} A & I_{q} \\ 0 & A^{T} \end{matrix}] [\begin{matrix} x \\ r \end{matrix}] = [\begin{matrix} b \\ 0 \end{matrix}],

(147)

which is a

(q + n) \times (q + n)

system. Equation (147) is a variant of the Karush–Kuhn–Tucker Equation (146) by re-ordering the unknown vector from

(r, x)

to

(x, r)

. When the dimension

q + n

is a moderate value, we can apply the partial pivoting Gaussian elimination method (GEM) to solve Equations (146) and (147). We name the first partial pivoting Gaussian elimination method on Equation (146) as GEM1, and on Equation (147) as GEM2. It would be convincing if medium-size least-squares problems shown below the partial pivoting Gaussian elimination method can be an effective and accurate method. Moreover, the variant of the Karush–Kuhn–Tucker equation is more accurate than the original Karush–Kuhn–Tucker equation, even they have the same condition numbers.

6.1. Example 9

We consider Example 7 again, which is now solved by the above four methods. Table 11 lists the computed ME for different values of

(n, q)

obtained by GEM1 and GEM2. Here, CGNR is the conjugate gradient method applied to the normal equation with minimal residuals, while CGNE is the conjugate gradient method applied to the normal equation with minimal errors [15]. CGNR is also known as the CGLS (least-squares), and CGNE is Craig’s method. In this test, we find that when the partial pivoting Gaussian elimination method is applied to Equation (147) as GEM2, the accuracy is the best.

6.2. Example 10

We consider Example 3 again, which is now solved by the above four methods, the QR method and the DOA with

ε = 10^{- 14}

. Based on Equations (88)–(90), this problem is solved via SVD. As shown in Table 12, the GEM2, DOA and SVD are competitive and are more accurate than other methods.

We also list the errors

∥ Q^{T} Q - I_{n} ∥

for the QR method in Table 12, which is very accurate. However, since quite small singular values are present for the highly ill-conditioned Hilbert matrix, the QR method gradually deteriorates when the condition number is increased from

7.04 \times 10^{4}

to

2.1 \times 10^{7}

.

7. Conclusions

Least-squares problems arise in a lot of applications, and many iterative algorithms are already available to seek their solutions. In this paper, a new concept of a varying affine Krylov subspace was introduced, which is different to the fixed-type affine Krylov subspace used in the CG-type numerical solution of over-determined least-squares problems. In the

m + 1

-dimensional varying affine Krylov subspace, a closed-form double optimal solution in a simple form in Equation (42) was derived and further reduced to Equation (80), which was obtained by two minimizations in Equations (40) and (41). We analyzed a key equation (72) to link these two optimizations together. The double optimal solution is an excellent approximation, as verified for least-squares problems. The iterative DOA was developed, which leads to step-by-step absolute convergence. In each iterative step, by merely inverting an

m \times m

matrix with a small value of m, the computational cost of the DOA is very low, lower than other methods. The Moore–Penrose inverses of non-square matrices were also derived by using the varying affine Krylov subspace method. For over-determined least-squares problems, we proposed a variant of the Karush–Kuhn–Tucker equation, which uses the partial pivoting Gaussian elimination method and is better than the original Karush–Kuhn–Tucker equation and CG-type numerical methods of CGNR (CGLS) and CGNE (Craig’s method). It is very important that the DOA can be applied to both rectangular systems with

q > n

or

q < n

.

The novelties involved in this paper are as follows:

A double-optimization-based iterative algorithm for least-squares problems was developed in a varying affine Krylov subspace.
For dense large-size least-squares problems, the proposed iterative DOA is efficient and accurate.
The DOA can be applied to both rectangular systems with extremal cases of $q ≪ n$ or $q ≫ n$ .
A variant of the Karush–Kuhn–Tucker equation was presented, which is an improved version of the original Karush–Kuhn–Tucker equation.

The idea of varying the affine Krylov subspace is useful to find a better solution of linear algebraic equations based on the mechanism of double optimization. In the future, we may extend the new methods of DOSs and the DOA to other linear matrix equations. The limitations are that there are several matrix multiplications needed to construct the projection operator

P

in the Krylov subspace and the high computational cost for inversion of the matrix

C^{- 1}

with dimension m.

Author Contributions

Conceptualization, C.-S.L. and C.-W.C.; Methodology, C.-S.L. and C.-W.C.; Software, C.-S.L., C.-W.C. and C.-L.K.; Validation, C.-S.L., C.-W.C. and C.-L.K.; Formal analysis, C.-S.L. and C.-W.C.; Investigation, C.-S.L., C.-W.C. and C.-L.K.; Resources, C.-S.L. and C.-W.C.; Data curation, C.-S.L., C.-W.C. and C.-L.K.; Writing—original draft, C.-S.L.; Writing—review & editing, C.-W.C.; Visualization, C.-S.L., C.-W.C. and C.-L.K.; Supervision, C.-S.L. and C.-W.C.; Project administration, C.-W.C.; Funding acquisition, C.-W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Acknowledgments

The authors would like to express their thanks to the reviewers, who supplied valuable opinions to improve the quality of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, C.S. An optimal multi-vector iterative algorithm in a Krylov subspace for solving the ill-posed linear inverse problems. Comput. Mater. Contin. 2013, 33, 175–198. [Google Scholar]
Dongarra, J.; Sullivan, F. Guest editors’ introduction to the top 10 algorithms. Comput. Sci. Eng. 2000, 2, 22–23. [Google Scholar] [CrossRef]
Simoncini, V.; Szyld, D.B. Recent computational developments in Krylov subspace methods for linear systems. Numer. Linear Alg. Appl. 2007, 14, 1–59. [Google Scholar] [CrossRef]
Saad, Y.; Schultz, M.H. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 1986, 7, 856–869. [Google Scholar] [CrossRef]
Saad, Y. Krylov subspace methods for solving large unsymmetric linear systems. Math. Comput. 1981, 37, 105–126. [Google Scholar] [CrossRef]
Freund, R.W.; Nachtigal, N.M. QMR: A quasi-minimal residual method for non-Hermitian linear systems. Numer. Math. 1991, 60, 315–339. [Google Scholar] [CrossRef]
van Den Eshof, J.; Sleijpen, G.L.G. Inexact Krylov subspace methods for linear systems. SIAM J. Matrix Ana. Appl. 2004, 26, 125–153. [Google Scholar] [CrossRef]
Hestenes, M.R.; Stiefel, E.L. Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Stand. 1952, 49, 409–436. [Google Scholar] [CrossRef]
Lanczos, C. Solution of systems of linear equations by minimized iterations. J. Res. Nat. Bur. Stand. 1952, 49, 33–53. [Google Scholar] [CrossRef]
Paige, C.C.; Saunders, M.A. Solution of sparse indefinite systems of linear equations. SIAM J. Numer. Anal. 1975, 12, 617–629. [Google Scholar] [CrossRef]
Fletcher, R. Conjugate Gradient Methods for Indefinite Systems; Lecture Notes in Math; Springer: Berlin/Heidelberg, Germany, 1976; Volume 506, pp. 73–89. [Google Scholar]
Sonneveld, P. CGS: A fast Lanczos-type solver for nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 1989, 10, 36–52. [Google Scholar] [CrossRef]
van der Vorst, H.A. Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 1992, 13, 631–644. [Google Scholar] [CrossRef]
Saad, Y.; van der Vorst, H.A. Iterative solution of linear systems in the 20th century. J. Comput. Appl. Math. 2000, 123, 1–33. [Google Scholar] [CrossRef]
Saad, Y. Iterative Methods for Sparse Linear Systems, 2nd ed.; SIAM: Philadelphia, PA, USA, 2003. [Google Scholar]
van der Vorst, H.A. Iterative Krylov Methods for Large Linear Systems; Cambridge University Press: New York, NY, USA, 2003. [Google Scholar]
Golub, G. Numerical methods for solving linear least squares problems. Numer. Math. 1965, 7, 206–216. [Google Scholar] [CrossRef]
Choi, S.C.; Paige, C.C.; Saunders, M.A. MINRES-QLP: A Krylov subspace method for indefinite or singular symmetric systems. SIAM J. Sci. Comput. 2011, 33, 1810–1836. [Google Scholar] [CrossRef]
Petkovic, M.D.; Stanimirovic, P.S. Iterative method for computing Moore-Penrose inverse based on Penrose equations. J. Comput. Appl. Math. 2011, 235, 1604–1613. [Google Scholar] [CrossRef]
Petkovic, M.D.; Stanimirovic, P.S. Two improvements of the iterative method for computing Moore-Penrose inverse based on Penrose equations. J. Comput. Appl. Math. 2014, 267, 61–71. [Google Scholar] [CrossRef]
Katsikis, V.N.; Pappas, D.; Petralias, A. An improved method for the computation of the Moore-Penrose inverse matrix. Appl. Math. Comput. 2011, 217, 9828–9834. [Google Scholar] [CrossRef]
Stanimirovic, I.; Tasic, M. Computation of generalized inverse by using the LDL* decomposition. Appl. Math. Lett. 2012, 25, 526–531. [Google Scholar] [CrossRef]
Sheng, X.; Wang, T. An iterative method to compute Moore-Penrose inverse based on gradient maximal convergence rate. Filomat 2013, 27, 1269–1276. [Google Scholar] [CrossRef]
Toutounian, F.; Ataei, A. A new method for computing Moore-Penrose inverse matrices. J. Comput. Appl. Math. 2009, 228, 412–417. [Google Scholar] [CrossRef]
Zhang, J.; Guo, J. On relaxed greedy randomized coordinate descent methods for solving large linear least-squares problems. Appl. Numer. Math. 2020, 157, 372–384. [Google Scholar] [CrossRef]
Lipitakis, A.-D.; Filelis-Papadopoulos, C.K.; Gravvanis, G.A.; Anagnostopoulos, D. A note on parallel approximate pseudoinverse matrix techniques for solving linear least squares problems. J. Comput. Sci. 2020, 41, 101092. [Google Scholar] [CrossRef]
Niu, Y.-Q.; Zhang, B. A new randomized Gauss–Seidel method for solving linear least-squares problems. Appl. Math. Lett. 2021, 116, 107057. [Google Scholar] [CrossRef]
Bojanczyk, A.W. Algorithms for indefinite linear least squares problems. Linear Algebra Appli. 2021, 623, 104–127. [Google Scholar] [CrossRef]
Du, K.; Ruan, C.-C.; Sun, X.-H. On the convergence of a randomized block coordinate descent algorithm for a matrix least squares problem. Appl. Math. Lett. 2022, 124, 107689. [Google Scholar] [CrossRef]
Chakrabarti, K.; Gupta, N.; Chopra, N. Iterative pre-conditioning for expediting the distributed gradient-descent method: The case of linear least-squares problem. Automatica 2022, 137, 110095. [Google Scholar] [CrossRef]
Zhang, Y.; Li, H. Splitting-based randomized iterative methods for solving indefinite least squares problem. Appl. Math. Comput. 2023, 446, 127892. [Google Scholar] [CrossRef]
Pes, F.; Rodriguez, G. A projection method for general form linear least-squares problems. Appl. Math. Lett. 2023, 145, 108780. [Google Scholar] [CrossRef]
Kuo, Y.-C.; Liu, C.-S. An index search method based inner-outer iterative algorithm for solving nonnegative least squares problems. J. Comput. Appl. Math. 2023, 424, 114954. [Google Scholar] [CrossRef]
Jin, L.-L.; Li, H.B. Greedy double subspaces coordinate descent method for solving linear least-squares problems. J. Comput. Sci. 2023, 70, 102029. [Google Scholar] [CrossRef]
Jahvani, M.; Guay, M. Solving least-squares problems in directed networks: A distributed approach. Comput. Chem. Eng. 2024, 185, 108654. [Google Scholar] [CrossRef]
Laura Arias, M.; Contino, M.; Maestripieri, A.; Marcantognini, S. Matrix representations of multivalued projections and least squares problems. J. Math. Anal. Appl. 2024, 530, 127631. [Google Scholar] [CrossRef]
Golub, G.H.; Reinsch, C. Singular value decomposition and least squares solutions. Numer. Math. 1970, 14, 403–420. [Google Scholar] [CrossRef]
Björck, A. Numerical Methods for Least Squares Problems; SIAM Publisher: Philadelphia, PA, USA, 1996. [Google Scholar]
Liu, C.S. A doubly optimized solution of linear equations system expressed in an affine Krylov subspace. J. Comput. Appl. Math. 2014, 260, 375–394. [Google Scholar] [CrossRef]
Liu, C.S. Optimal algorithms in a Krylov subspace for solving linear inverse problems by MFS. Eng. Anal. Bound. Elem. 2014, 44, 64–75. [Google Scholar] [CrossRef]
Liu, C.S. A double optimal descent algorithm for iteratively solving ill-posed linear inverse problems. Inv. Prob. Sci. Eng. 2015, 23, 38–66. [Google Scholar] [CrossRef]
Liu, C.S. A maximal projection solution of ill-posed linear system in a column subspace, better than the least squares solution. Comput. Math. Appl. 2014, 67, 1998–2014. [Google Scholar] [CrossRef]
Papez, J.; Tichy, P. Estimating error norms in CG-like algorithms for least-squares and least-norm problems. Numer. Algorithms 2024, 1–28. [Google Scholar] [CrossRef]
Abbasi, M.; Nikazad, T. Bounded perturbations resilient iterative methods for linear systems and least squares problems: Operator-based approaches, analysis, and performance evaluation. BIT Numer. Math. 2024, 64, 15. [Google Scholar] [CrossRef]
Xia, Y.; Chen, T.; Shan, J. A novel iterative method for computing generalized inverse. Neural Comput. 2014, 26, 449–465. [Google Scholar] [CrossRef] [PubMed]
Feng, G.; Li, M.; Chen, C.S. On the ill-conditioning of the MFS for irregular boundary data with sufficient regularity. Eng. Anal. Bound. Elem. 2014, 41, 98–102. [Google Scholar] [CrossRef]
Liu, C.S.; Kuo, C.L. A multiple-scale Pascal polynomial triangle solving elliptic equations and inverse Cauchy problems. Eng. Anal. Bound. Elem. 2016, 62, 35–43. [Google Scholar] [CrossRef]

Figure 1. Examples 1 and 2, showing the relative residuals with respect to the number of iterations; the latter is compared with [20].

Figure 2. An over-determined Hilbert linear problem solved by the DOA: (a) relative residual, and (b) numerical error.

Figure 3. A computation of the Moore–Penrose inverse by the DOA, showing the relative residuals in the computation of each column of the Moore–Penrose inverse matrix.

Figure 4. An amoeba-like boundary under a non-harmonic boundary function solved by the DOA: (a) the contour and distribution of collocation points, and (b) the convergence rate.

Figure 5. An amoeba-like boundary under a non-harmonic boundary function solved by the DOA: (a) comparison of numerical and exact boundary conditions, and (b) the numerical error.

Table 1. Example 3: ME obtained by the DOA and QR for different n and a fixed

q = 10

.

Table 1. Example 3: ME obtained by the DOA and QR for different n and a fixed

q = 10

.

n	2	3	4	5
$∥ Q^{T} Q - I_{n} ∥$	$1.63 \times 10^{- 15}$	$2.48 \times 10^{- 14}$	$3.11 \times 10^{- 13}$	$5.24 \times 10^{- 12}$
QR	$8.33 \times 10^{- 15}$	$3.55 \times 10^{- 12}$	$6.77 \times 10^{- 10}$	$2.27 \times 10^{- 7}$
SVD	$5.55 \times 10^{- 16}$	$1.60 \times 10^{- 14}$	$1.89 \times 10^{- 13}$	$9.09 \times 10^{- 13}$
DOA	$1.11 \times 10^{- 16}$	$3.55 \times 10^{- 15}$	$9.09 \times 10^{- 14}$	$7.81 \times 10^{- 13}$

Table 2. Example 3: the ME and NI obtained by the DOA for different m, where

(q, n) = (20, 8)

and

ε = 10^{- 8}

.

Table 2. Example 3: the ME and NI obtained by the DOA for different m, where

(q, n) = (20, 8)

and

ε = 10^{- 8}

.

m	2	3	4	5	6
ME	$4.94 \times 10^{- 6}$	$5.30 \times 10^{- 8}$	$1.34 \times 10^{- 6}$	$2.90 \times 10^{- 6}$	$6.28 \times 10^{- 6}$
IN	14	4	4	4	5

Table 3. Computed results of a rank-deficient matrix in Example 4.

Alg.	IP	$∥ A A^{†} {A - A ∥}^{2}$	$∥ A^{†} A A^{†} - A^{†} ∥^{2}$	$∥ {(A A^{†})}^{T} - A A^{†} ∥^{2}$	$∥ {(A^{†} A)}^{T} - A^{†} {A ∥}^{2}$	IN
[45]	$10^{- 5} A^{T}$	$8.92 \times 10^{- 16}$	$3.36 \times 10^{- 16}$	$7.85 \times 10^{- 17}$	$1.97 \times 10^{- 15}$	50
[45]	$0.1 A^{T}$	$9.17 \times 10^{- 20}$	$6.99 \times 10^{- 16}$	$7.85 \times 10^{- 17}$	$2.86 \times 10^{- 15}$	60
[20]	$0.1 A^{T}$	$1.53 \times 10^{- 14}$	$9.21 \times 10^{- 17}$	$4.13 \times 10^{- 31}$	$1.17 \times 10^{- 30}$	165
DOA	$x_{0} = 10^{- 16} 1$	$5.21 \times 10^{- 27}$	$2.57 \times 10^{- 29}$	$3.82 \times 10^{- 27}$	$1.61 \times 10^{- 27}$	12

Table 4. Computed results of the Hilbert matrix

q = 3

,

n = 50

in Example 5.

Table 4. Computed results of the Hilbert matrix

q = 3

,

n = 50

in Example 5.

Alg.	IP	$∥ A A^{†} {A - A ∥}^{2}$	$∥ A^{†} A A^{†} - A^{†} ∥^{2}$	$∥ {(A A^{†})}^{T} - A A^{†} ∥^{2}$	$∥ {(A^{†} A)}^{T} - A^{†} {A ∥}^{2}$	IN
[20]	$0.8 A^{T}$	$1.4 \times 10^{- 28}$	$3.7 \times 10^{- 21}$	$4.1 \times 10^{- 30}$	$3.5 \times 10^{- 27}$	34
DOA	$x_{0} = 0$	$9.8 \times 10^{- 28}$	$2 \times 10^{- 20}$	$4.1 \times 10^{- 24}$	$9.3 \times 10^{- 28}$	3

Table 5. Computed results of the Hilbert matrix

q = 50

,

n = 3

in Example 5.

Table 5. Computed results of the Hilbert matrix

q = 50

,

n = 3

in Example 5.

Alg.	IP	$∥ A A^{†} {A - A ∥}^{2}$	$∥ A^{†} A A^{†} - A^{†} ∥^{2}$	$∥ {(A A^{†})}^{T} - A A^{†} ∥^{2}$	$∥ {(A^{†} A)}^{T} - A^{†} {A ∥}^{2}$	IN
[20]	$0.8 A^{T}$	$1.2 \times 10^{- 28}$	$3.7 \times 10^{- 21}$	$3.3 \times 10^{- 27}$	$6.3 \times 10^{- 28}$	34
DOA	$x_{0} = 0$	$9 \times 10^{- 29}$	$2.6 \times 10^{- 22}$	$3.2 \times 10^{- 23}$	$5.5 \times 10^{- 29}$	3

Table 6. For the MFS on the 2D Laplace equation, ME with different

(n, q)

obtained by the DOA.

Table 6. For the MFS on the 2D Laplace equation, ME with different

(n, q)

obtained by the DOA.

$(n, q)$	(100,80)	(100,90)	(100,110)	(100,120)	(100,150)
ME	$1.88 \times 10^{- 8}$	$7.25 \times 10^{- 9}$	$7.67 \times 10^{- 8}$	$2.13 \times 10^{- 8}$	$4.45 \times 10^{- 8}$

Table 7. Example 8 with

n = 2000

and

q = 100

: ME obtained by the DOA and INs for different m.

Table 7. Example 8 with

n = 2000

and

q = 100

: ME obtained by the DOA and INs for different m.

m	5	8	10	12	15
ME	$7.24 \times 10^{- 5}$	$1.73 \times 10^{- 4}$	$2.95 \times 10^{- 4}$	$1.02 \times 10^{- 3}$	$4.99 \times 10^{- 3}$
IN	174	42	23	16	10

Table 8. Example 8 with

n = 500

and

q = 2000

: ME obtained by the DOA and INs for different m.

Table 8. Example 8 with

n = 500

and

q = 2000

: ME obtained by the DOA and INs for different m.

m	10	12	15	18	20
ME	$1.39 \times 10^{- 4}$	$6.82 \times 10^{- 5}$	$3.20 \times 10^{- 5}$	$1.70 \times 10^{- 5}$	$1.26 \times 10^{- 5}$
IN	123	72	41	25	19

Table 9. Example 8 with different

(q, n), q > n

: ME obtained by QR and the DOA.

Table 9. Example 8 with different

(q, n), q > n

: ME obtained by QR and the DOA.

$(q, n)$	(1000,500)	(1500,500)	(1500,1000)	(2000,500)	(2500,1000)
$∥ Q^{T} Q - I_{n} ∥$	$1.81 \times 10^{- 12}$	$2.55 \times 10^{- 12}$	$8.99 \times 10^{- 12}$	$2.20 \times 10^{- 12}$	$1.14 \times 10^{- 11}$
QR	$3.18 \times 10^{- 10}$	$6.91 \times 10^{- 10}$	$1.73 \times 10^{- 9}$	$1.61 \times 10^{- 9}$	$4.56 \times 10^{- 9}$
DOA	$2.49 \times 10^{- 13}$	$2.66 \times 10^{- 13}$	$2.46 \times 10^{- 13}$	$1.77 \times 10^{- 13}$	$1.24 \times 10^{- 13}$

Table 10. Example 8 with

n = 500

,

n_{q} = 2000

,

ε = 10^{- 12}

, ME1 and IN1 obtained by taking

α_{0} = 1

in the DOA and ME2 and IN2 obtained by the DOA for different m.

Table 10. Example 8 with

n = 500

,

n_{q} = 2000

,

ε = 10^{- 12}

, ME1 and IN1 obtained by taking

α_{0} = 1

in the DOA and ME2 and IN2 obtained by the DOA for different m.

m	20	22	25	28	30	35
ME1	$1.13 \times 10^{- 12}$	$8.50 \times 10^{- 13}$	$3.33 \times 10^{- 13}$	$2.0 \times 10^{- 3}$	$1.88 \times 10^{- 13}$	$2.15 \times 10^{- 13}$
IN1	62	48	34	25	21	15
ME2	$9.15 \times 10^{- 13}$	$8.59 \times 10^{- 13}$	$3.94 \times 10^{- 13}$	$2.71 \times 10^{- 13}$	$1.77 \times 10^{- 13}$	$1.41 \times 10^{- 13}$
IN2	64	48	33	25	21	16

Table 11. The MFS on a 2D Laplace equation: ME with different

(n, q)

obtained by different methods.

Table 11. The MFS on a 2D Laplace equation: ME with different

(n, q)

obtained by different methods.

$(n, q)$	(100,120)	(100,150)	(100,170)	(100,200)	(150,180)
GEM1	$7.76 \times 10^{- 6}$	$1.81 \times 10^{- 6}$	$6.12 \times 10^{- 7}$	$2.99 \times 10^{- 6}$	$5.73 \times 10^{- 6}$
GEM2	$2.17 \times 10^{- 11}$	$3.82 \times 10^{- 14}$	$2.24 \times 10^{- 14}$	$1.99 \times 10^{- 14}$	$3.52 \times 10^{- 14}$
CGNR	$8.33 \times 10^{- 11}$	$8.23 \times 10^{- 11}$	$7.18 \times 10^{- 10}$	$6.62 \times 10^{- 9}$	$7.98 \times 10^{- 10}$
CGNE	$1.84 \times 10^{- 11}$	$4.78 \times 10^{- 7}$	$5.44 \times 10^{- 6}$	$6.46 \times 10^{- 10}$	$2.58 \times 10^{- 6}$

Table 12. Example 10: ME with different

(n, q)

obtained by GEM1, GEM2, CGNR, CGNE, the DOA and QR.

Table 12. Example 10: ME with different

(n, q)

obtained by GEM1, GEM2, CGNR, CGNE, the DOA and QR.

$(n, q)$	(5,6)	(5,8)	(5,10)	(7,25)	(8,25)
GEM1	$1.80 \times 10^{- 6}$	$5.18 \times 10^{- 7}$	$4.57 \times 10^{- 7}$	$2.20 \times 10^{- 3}$	$1.55 \times 10^{- 3}$
GEM2	$3.15 \times 10^{- 12}$	$2.27 \times 10^{- 13}$	$1.02 \times 10^{- 12}$	$1.29 \times 10^{- 10}$	$4.70 \times 10^{- 9}$
CGNR	$4.98 \times 10^{- 8}$	$1.62 \times 10^{- 7}$	$3.05 \times 10^{- 7}$	$4.83 \times 10^{- 8}$	$7.64 \times 10^{- 8}$
CGNE	$3.52 \times 10^{- 12}$	$1.64 \times 10^{- 12}$	$1.88 \times 10^{- 12}$	$4.83 \times 10^{- 8}$	$7.64 \times 10^{- 8}$
DOA	$1.88 \times 10^{- 13}$	$2.55 \times 10^{- 12}$	$7.81 \times 10^{- 13}$	$2.72 \times 10^{- 9}$	$1.48 \times 10^{- 9}$
SVD	$4.85 \times 10^{- 12}$	$1.52 \times 10^{- 12}$	$9.09 \times 10^{- 13}$	$3.08 \times 10^{- 10}$	$1.52 \times 10^{- 8}$
$∥ Q^{T} Q - I_{n} ∥$	$8.65 \times 10^{- 12}$	$4.57 \times 10^{- 12}$	$5.24 \times 10^{- 12}$	$5.66 \times 10^{- 10}$	$1.89 \times 10^{- 8}$
QR	$1.05 \times 10^{- 6}$	$3.50 \times 10^{- 7}$	$2.27 \times 10^{- 7}$	$2.46 \times 10^{- 3}$	$5.68 \times 10^{- 1}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, C.-S.; Kuo, C.-L.; Chang, C.-W. Solving Least-Squares Problems via a Double-Optimal Algorithm and a Variant of the Karush–Kuhn–Tucker Equation for Over-Determined Systems. Algorithms 2024, 17, 211. https://doi.org/10.3390/a17050211

AMA Style

Liu C-S, Kuo C-L, Chang C-W. Solving Least-Squares Problems via a Double-Optimal Algorithm and a Variant of the Karush–Kuhn–Tucker Equation for Over-Determined Systems. Algorithms. 2024; 17(5):211. https://doi.org/10.3390/a17050211

Chicago/Turabian Style

Liu, Chein-Shan, Chung-Lun Kuo, and Chih-Wen Chang. 2024. "Solving Least-Squares Problems via a Double-Optimal Algorithm and a Variant of the Karush–Kuhn–Tucker Equation for Over-Determined Systems" Algorithms 17, no. 5: 211. https://doi.org/10.3390/a17050211

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Solving Least-Squares Problems via a Double-Optimal Algorithm and a Variant of the Karush–Kuhn–Tucker Equation for Over-Determined Systems

Abstract

1. Introduction

2. A Varying Affine Krylov Subspace Method

3. A Double Optimal Solution

3.1. Two Minimized Functions

3.2. Mathematical Preliminaries

3.3. Estimation of Residual Error

4. A Numerical Algorithm

5. Numerical Examples

5.1. Example 1

5.2. Example 2

5.3. Example 3

5.4. Example 4

5.5. Example 5

5.6. Example 6

5.7. Example 7

5.8. Example 8

6. A Variant of the Karush–Kuhn–Tucker Equation

6.1. Example 9

6.2. Example 10

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI