A Fast Projected Gradient Algorithm for Quaternion Hermitian Eigenvalue Problems

Shan-Qi Duan; Qing-Wen Wang

doi:10.20944/preprints202503.0469.v1

Submitted:

06 March 2025

Posted:

07 March 2025

You are already at the latest version

Abstract

In this paper, based on the novel generalized Hamilton-real (GHR) calculus, we propose for the first time a quaternion Nesterov’s accelerated projected gradient algorithm for computing the dominant eigenvalue and eigenvector of quaternion Hermitian matrices. By introducing momentum terms and look-ahead updates, the algorithm achieves a faster convergence rate. We theoretically prove the convergence of the quaternion Nesterov’s accelerated projected gradient algorithm. Numerical experiments show that the proposed method outperforms the quaternion projected gradient ascent method and the traditional algebraic methods in terms of computational accuracy and runtime efficiency.

Keywords:

GHR calculus

;

Principal eigenvalues

;

Quaternion Hermitian matrix

;

Nesterov’s accelerated gradient method

Subject:

Physical Sciences - Mathematical Physics

MSC: 15A18; 15B57; 65F10; 65F15

1. Introduction

In 1843, Sir William Rowan Hamilton [9] introduced quaternions in an effort to expand the concept of complex numbers into spaces of higher dimensions. Quaternions and quaternion matrices play a critical role in many applications, such as quantum mechanics, computer graphics, quaternion principal component analysis (QPCA) and image processing [2,11,23,24]. Due to the non-commutative property of quaternion multiplication, the eigenvalues of quaternion matrices are distinguished into left and right types, with the right eigenvalue problem having garnered widespread attention[4,6,14,16,25].

In recent years, a series of numerical methods have been developed to compute the eigenvalues of quaternion matrices, particularly focusing on the eigenvalue problems of Hermitian matrices. These numerical methods can be broadly categorized into three classes: The first class involves direct quaternion arithmetic operations. For instance, Bunse-Gerstner proposed a quaternion QR algorithm for solving the right eigenvalue problem of quaternion matrices [1]. However, due to the complexity of quaternion arithmetic, this algorithm requires significant computational effort. The second class is based on the real or complex counterparts of quaternion matrices. By studying the real or complex counterpart structures and properties of quaternion matrices, and leveraging stable orthogonal transformations, real or complex structure-preserving methods have been developed to solve the right eigenvalue problem of quaternion Hermitian matrices [8,10]. The third class is based on the real counterparts of quaternion matrices, leading to the development of numerous structure-preserving iterative algorithms. Examples include the explicitly restarted quaternion Arnoldi Method (ERQAM) [18], designed to compute standard right eigenpairs of general quaternion matrices, and a novel quaternion power method introduced in [13] for computing the dominant standard right eigenvalue and its corresponding eigenvector. Structure-preserving methods exhibit significant advantages in terms of storage space and computational efficiency.

In the field of quaternion optimization, significant progress has been made with the generalized HR calculus (GHR) [20,21,22]. GHR leverages quaternion rotations within a general orthogonal system, offering a way to compute the derivatives and gradients of functions with quaternion variables, thereby providing a solid theoretical foundation for the development of quaternion optimization methods. Subsequently, based on the generalized HR calculus (GHR), Diao et al. [3] proposed a gradient projection algorithm for maximizing the quaternion Rayleigh quotient under unit constraints. This algorithm demonstrated good performance and contributed to the development of quaternion optimization algorithms.

In this paper, we first equivalently transform the principal eigenvalue problem of quaternion Hermitian matrices into a maximization optimization problem over the quaternion skew field. Leveraging generalized HR calculus, we propose a quaternion Nesterov’s accelerated gradient projection algorithm to solve it. Subsequently, we conduct a convergence analysis of the quaternion Nesterov’s accelerated gradient projection algorithm, proving that a real differentiable function with Lipschitz continuous gradient possesses a quadratic upper bound. Furthermore, we demonstrate that the algorithm possesses a convergence rate of

(O (1 / t^{2})

. Finally, we compare our algorithm with two other methods, and numerical experiments indicate that our algorithm exhibits superior performance in terms of both accuracy and time efficiency.

The rest of this paper is organized as follows. Section 2 introduces some basic notations and fundamental properties of quaternions, including definitions of quaternion modulus, similarity, and rotation, with a particular emphasis on reviewing the relevant definitions and properties of generalized HR integrals. In Section 3, we design a quaternion Nesterov accelerated gradient projection algorithm to solve the principal eigenvalue and corresponding eigenvector of quaternion Hermitian matrices. Section 4 provides a convergence analysis of the quaternion Nesterov accelerated gradient projection algorithm. In Section 5, we conduct numerical experiments to validate the proposed method. Finally, in Section 6, we summarize this paper.

2. Preliminaries

In this section, some quaternion notations and basic definitions are introduced, which will be used in the rest of the paper.

2.1. Notations

Throughout this paper, to distinguish scalars, vectors, real or complex matrices and quaternion matrices, scalars will be denoted by lower case Greek letters, e.g.,

α

,

β

, quaternions will be denoted by lowercase letters, e.g.,

p, q

, and quaternion vectors are denoted by

x, y

, real or complex matrices will be defined by uppercase letters, e.g.,

A, B

, and quaternion matrices will be denoted by bold uppercase letters, e.g.,

A, B

.

I

denotes the

n \times n

identity quaternion matrix. The operators

{(\cdot)}^{T}

and

{(\cdot)}^{H}

represent transpose and conjugate transpose, respectively. The MATLAB function command will be denoted by typewriter letters, e.g.,

[V, D] = eig (A)

.

2.2. Quaternions and Quaternion Matrices

Denote the set of quaternions as

Q = span {1, i, j, k} ≜ \{q = q_{1} + q_{2} i + q_{3} j + q_{4} k | q_{1}, q_{2}, q_{3}, q_{4} \in R\},

(1)

where

i, j, k

are three imaginary units of quaternions, satisfying

\begin{matrix} i^{2} = j^{2} = k^{2} = ijk = - 1, \\ ij = - ji = k, jk = - kj = i, ki = - ik = j . \end{matrix}

The scalar (real) part of

q

is denoted by

ℜ (q) = q_{1}

. And the vector (imaginary) part of

q

is denoted by

ℑ (q) = q_{2} i + q_{3} j + q_{4} k

. A quaternion is called imaginary when its real part is equal to zero. The multiplication of quaternions adheres to the distributive law but is noncommutative.

The zero element in

Q

is

0 = 0 + 0 i + 0 j + 0 k

and the unit element is

1 = 1 + 0 i + 0 j + 0 k

. For any

q = q_{0} + q_{1} i + q_{2} j + q_{3} k \in Q

, the conjugate of a quaternion is defined as

q^{*} = q_{0} - q_{1} i - q_{2} j - q_{3} k .

The magnitude of

q

is

| q | = \sqrt{q^{*} q} = \sqrt{q_{0}^{2} + q_{1}^{2} + q_{2}^{2} + q_{3}^{2}}

, it follows that the inverse of a nonzero quaternion

q

is given by

q^{- 1} = q^{*} / {| q |}^{2}

.

Two quaternions

a

and

b

are said to be similar if there exists a nonzero quaternion

c

such that

c^{- 1} a c = b

, this is written as

a \sim b

. Obviously,

a

and

b

are similar if and only if there is a unit quaternion

d

such that

d^{- 1} a d = b

, and two similar quaternions have the same norm. It is routine to check that ∼ is an equivalence relation on the quaternions. We denote by

[a]

the equivalence class containing

a

. If

q = q_{0} + q_{1} i + q_{2} j + q_{3} k \in Q

, then

q

and

q_{0} + \sqrt{q_{1}^{2} + q_{2}^{2} + q_{3}^{2}} i

are similar, namely,

q \in [q_{0} + \sqrt{q_{1}^{2} + q_{2}^{2} + q_{3}^{2}} i]

.

Quaternions can also be expressed in polar form as

q = | q | (cos θ + \hat{q} sin θ)

, where

\hat{q} = ℑ (q) / | ℑ (q) |

is a pure unit quaternion and

θ = arccos (ℜ (q) / | q |) \in R

denotes the angle (or argument) of the quaternion. Next, we will introduce the quaternion rotation and involution operators.

Definition 1

(Quaternion rotation[19]). For any quaternion

q

, the transformation

q^{μ} ≜ μ q μ^{- 1}

(2)

geometrically describes a three-dimensional rotation of the vector part of

q

by an angle

2 θ

about the vector part of μ, where

μ = | μ | (cos θ + \hat{μ} sin θ)

is any nonzero quaternion.

Specifically, if

μ

in (2) is an imaginary unit, then the quaternion rotation (2) reduces to quaternion involution [5], defined by

\begin{matrix} q^{i} & = - i q i = q_{0} + i q_{1} - j q_{2} - k q_{3}, \\ q^{j} & = - j q j = q_{0} - i q_{1} + j q_{2} - k q_{3}, \\ q^{k} & = - k q k = q_{0} - i q_{1} - j q_{2} + k q_{3}, \end{matrix}

where

q = q_{0} + i q_{1} + j q_{2} + k q_{3} \in Q

. Below, we will list some properties of quaternion rotation, including

{(p q)}^{μ} = p^{μ} q^{μ}, p q = q^{p} p = q p^{(q^{*})}, \forall p, q \in Q

and

q^{μ ν} = {(q^{ν})}^{μ}, q^{μ *} ≜ {(q^{*})}^{μ} = {(q^{μ})}^{*} ≜ q^{* μ}, \forall ν, μ \in Q .

Note that the representation in (1) can be extended to a general orthogonal basis

\{1, i^{μ}, j^{μ}, k^{μ}\}

, where the following properties hold [19]:

i^{μ} i^{μ} = j^{μ} j^{μ} = k^{μ} k^{μ} = i^{μ} j^{μ} k^{μ} = - 1 .

Denote the set of quaternion matrices as

Q^{m \times n} = \{A = A_{0} + A_{1} i + A_{2} j + A_{3} k | A_{0}, A_{1}, A_{2}, A_{3} \in R^{m \times n}\} .

The conjugate transpose of

A

is

A^{*} = A_{0} - A_{1} i - A_{2} j - A_{3} k

. We say that a square quaternion matrix

A \in Q^{n \times n}

is normal if

A^{*} A = A A^{*}

; Hermitian if

A^{*} = A

, i.e.

A_{0} = A_{0}^{⊤}

and

A_{i} = - A_{i}^{⊤}, i = 1, 2, 3

; Unitary if

A^{*} A = I

, where

I

is the identity matrix; Invertible (nonsingular) if there exists a matrix

B \in Q^{n \times n}

such that

A B = B A = I

. In this case, we denote

A^{- 1} = B

. We have

{(A B)}^{- 1} = B^{- 1} A^{- 1}

if

A

and

B

are invertible, and

{(A^{*})}^{- 1} = {(A^{- 1})}^{*}

if

A

is invertible.

2.3. GHR calculus

We now introduce the generalized HR derivatives which comprise both the product and chain rules, see [20,22] for more details.

Definition 2

(real-differentiability [20]). Let

q = q_{a} + q_{b} i + q_{c} j + q_{d} k \in Q

, then a function

f (q) = f_{a} (q_{a}, q_{b}, q_{c}, q_{d}) + f_{b} (q_{a}, q_{b}, q_{c}, q_{d}) i + f_{c} (q_{a}, q_{b}, q_{c}, q_{d}) j + f_{d} (q_{a}, q_{b}, q_{c}, q_{d}) k

is called real differentiable when

f_{a} (q_{a}, q_{b}, q_{c}, q_{d}), f_{b} (q_{a}, q_{b}, q_{c}, q_{d}), f_{c} (q_{a}, q_{b}, q_{c}, q_{d})

and

f_{d} (q_{a}, q_{b}, q_{c}, q_{d})

are differentiable with respect to the real variables

q_{a}, q_{b}, q_{c}

and

q_{d}

, respectively.

Definition 3

(GHR derivatives [20]). If

f : Q \to Q

is real differentiable, then the GHR derivatives of

f (q)

with respect to

q^{μ}

and

q^{μ *}

(0 \neq μ \in Q)

are defined as

\frac{\partial f}{\partial q^{μ}} = \frac{1}{4} (\frac{\partial f}{\partial q_{a}} - \frac{\partial f}{\partial q_{b}} i^{μ} - \frac{\partial f}{\partial q_{c}} j^{μ} - \frac{\partial f}{\partial q_{d}} k^{μ}),

and

\frac{\partial f}{\partial q^{μ *}} = \frac{1}{4} (\frac{\partial f}{\partial q_{a}} + \frac{\partial f}{\partial q_{b}} i^{μ} + \frac{\partial f}{\partial q_{c}} j^{μ} + \frac{\partial f}{\partial q_{d}} k^{μ}),

where

q = q_{a} + q_{b} i + q_{c} j + q_{d} k \in Q

,

q_{a}, q_{b}, q_{c}, q_{d} \in R

,

\partial f / \partial q_{a}, \partial f / \partial q_{b}, \partial f / \partial q_{c}

and

\partial f / \partial q_{d}

are the partial derivatives of f with respect to

q_{a}, q_{b}, q_{c}

and

q_{d}

, while the set

\{1, i^{μ}, j^{μ}, k^{μ}\}

is an orthogonal basis of

Q

.

Definition 4

(Quaternion gradient[20]). Let

f (\tilde{q}) : Q^{n \times 1} \to Q

and

\tilde{q} = (q_{1}, q_{2}, \dots, q_{n}) \in Q^{n \times 1}

, then the two quaternion gradients of f are defined as

\nabla_{\tilde{q}} f ≜ {(\frac{\partial f}{\partial \tilde{q}})}^{T} = {(\frac{\partial f}{\partial q_{1}}, \dots, \frac{\partial f}{\partial q_{n}})}^{T} \in Q^{n \times 1}

and

\nabla_{{\tilde{q}}^{*}} f ≜ {(\frac{\partial f}{\partial {\tilde{q}}^{*}})}^{T} = {(\frac{\partial f}{\partial q_{1}^{*}}, \dots, \frac{\partial f}{\partial q_{n}^{*}})}^{T} \in Q^{n \times 1} .

Based on the definitions of GHR provided above, we consider a simple quadratic function

f (x) = x^{H} A x

, where

x \in Q^{n}

and

A \in Q^{n \times n}

is a quaternion Hermitian matrix, then the gradient of this function f is given by

\nabla_{x} f (x) = \frac{1}{2} {(A x)}^{*}, \nabla_{x^{*}} f (x) = \frac{1}{2} A x,

in which

\nabla_{x^{*}} f

is the steepest ascent direction [22].

3. Quaternion Nesterov’s Accelerated Gradient (QNAG)

In this section, we will introduce the quaternion Nesterov accelerated projected gradient algorithm. To this end, we first review the definition related to the eigenvalue of quaternion Hermitian matrices.

Definition 5.

Let

A \in Q^{n \times n}

be a quaternion Hermitian matrix. Then

λ \in R

is called an eigenvalue of

A

if there exists a nonzero vector

x \in Q^{n}

such that

A x = x λ .

Here

x

is called the eigenvector corresponding to the eigenvalue λ.

Due to the non-commutativity of quaternion multiplication, general quaternion square matrices have distinct left and right eigenvalues. However, for a quaternion Hermitian matrix

A

, if

A

has a right eigenvalue

λ

and its corresponding eigenvector

x

, it is straightforward to show that

x^{H} A x = x^{H} x λ,

by dividing both sides of the above equation by

x^{H} x

, we obtain

λ = \frac{x^{H} A x}{x^{H} x} = \frac{x^{H} A x}{{∥ x ∥}^{2}} \in R,

(3)

where (3) represents the Rayleigh quotient on the quaternion skew field. Therefore, the eigenvalues of quaternion Hermitian matrices are all real numbers, and thus there is no distinction between left and right eigenvalues.

Our goal is to compute the principal eigenvalue of a given quaternion Hermitian matrix. By defining the objective function as

f (x) = x^{H} A x

and imposing the normalization constraint

{∥ x ∥}^{2} = 1

, we can equivalently transform the problem of finding the principal eigenvalue of a given quaternion Hermitian matrix into the following maximization optimization problem on the quaternion skew-field

\begin{matrix} max_{x \in Q^{n}} & f (x) = x^{H} A x \\ s . t . & {∥ x ∥}^{2} = 1 . \end{matrix}

(4)

The above problem (4) can be addressed using the quaternion gradient projection algorithm. Since the introduction of the Nesterov’s Accelerated Gradient (NAG) method [15], the incorporation of momentum has become a conventional approach to overcome the shortsighted issue in gradient algorithms [7,12]. To tackle the problem (4), we propose a quaternion Nesterov’s accelerated gradient projection algorithm (QNAG). Given an initial point

x_{0}

, and set

x_{1} = x_{0}

, the QNAG method repeats, for

t \geq 0

,

\begin{matrix} y_{t} = x_{t} + β (x_{t} - x_{t - 1}), \\ z_{k + 1} = y_{t} + α \nabla_{y^{*}} f (y_{t}), \\ x_{t + 1} = z_{t + 1} / ∥ z_{t + 1} ∥, \end{matrix}

(5)

where

α

and

β

are the step-size and momentum parameters, respectively. When the momentum parameter

β = 0

, QNAG simplifies to standard gradient ascent (GA). When

β > 0

it is possible to achieve accelerated rates of convergence for certain combinations of

α

and

β

in the deterministic setting. The framework of the proposed algorithm is detailed below.

Remark 1.

After obtaining the principal eigenvalue and its corresponding eigenvector of the quaternion Hermitian matrix, we can employ a deflation technique by updating

A = A - λ x x^{H}

and continue to apply the QNAG algorithm. By repeating this process, all eigenvalues and their corresponding eigenvectors can be obtained.

4. Convergence Analysis of QNAG

In this section, we will theoretically prove the convergence properties of the quaternion Nesterov’s accelerated gradient projection algorithm. We begin our analysis with the following Lemma 1.

Lemma 1.

If

f : Q^{n} \to R

is a real differentiable and gradient Lipschitz continuous function with constant

L > 0

,

∥\nabla_{x^{*}} f (x + Δ x) - \nabla_{x^{*}} f (x)∥ \leq L ∥ Δ x ∥, \forall x, Δ x \in Q^{n} .

Algorithm 1: Quaternion Nesterov’s Accelerated Gradient (QNAG).

Input:: Given a quaternion Hermitian matrix $A \in Q^{n \times n}$ , step size $0 < α \in R$ , momentum coefficient $0 < β \in R$ and the maximum iteration number $I_{max}$ .
Output:: The principle eigenvalue $λ$ and its corresponding eigenvector $x_{t + 1}$ .
1:: Initialize: a unit quaternion vector $x_{0} \in Q^{n}$ .
2:: $x_{1} = x_{0}$ .
3:: for $t = 1$ to $I_{max}$ do
4:: Momentum extrapolation: $y_{t} = x_{t} + β (x_{t} - x_{t - 1})$ .
5:: Compute the gradient at the extrapolation point: $\nabla_{y^{*}} f (y_{t}) = \frac{1}{2} A y_{t}$ .
6:: Gradient ascent: $z_{k + 1} = y_{t} + α \nabla_{y^{*}} f (y_{t})$ .
7:: Normalization: $x_{t + 1} = z_{t + 1} / ∥ z_{t + 1} ∥$ .
8:: $λ = x_{t + 1}^{H} A x_{t + 1}$ .
9:: if $∥ A x_{t + 1} - x_{t + 1} λ ∥ < 10^{- 10}$ then
10:: Break
11:: end if
12:: end for

Then, f has the following quadratic upper bound

|f (y) - f (x) - 4 ℜ \{\nabla_{x^{*}} f {(x)}^{H} (y - x)\}| \leq 2 L {∥ y - x ∥}^{2}, \forall x, y \in Q^{n} .

(6)

Proof.

For any

x, y \in Q^{n}

, let

Δ x = y - x

. Define the parameterized path

g (t) = f (x + t Δ x)

for

t \in [0, 1]

. Then the difference in the function can be expressed as

f (y) - f (x) = \int_{0}^{1} \frac{d}{d t} g (t) d t .

Using the chain rule, the derivative

g^{'} (t)

corresponds to the directional derivative of f along

Δ x

. In terms of the quaternion gradient [22], this is given by

g^{'} (t) = 4 ℜ \{\nabla_{x^{*}} f {(x + t Δ x)}^{H} Δ x\} .

Thus, the function difference becomes

f (y) - f (x) = \int_{0}^{1} 4 ℜ \{\nabla_{x^{*}} f {(x + t Δ x)}^{H} Δ x\} d t .

Subtracting the linear term

4 ℜ \{\nabla_{x^{*}} f {(x)}^{H} Δ x\}

from both sides

f (y) - f (x) - 4 ℜ \{\nabla_{x^{*}} f {(x)}^{H} Δ x\} = \int_{0}^{1} 4 ℜ \{{(\nabla_{x^{*}} f (x + t Δ x) - \nabla_{x^{*}} f (x))}^{H} Δ x\} d t,

then using the absolute value inequality and the Cauchy-Schwarz inequality, we have

|f (y) - f (x) - 4 ℜ \{\nabla_{x^{*}} f {(x)}^{H} Δ x\}| \leq 4 \int_{0}^{1} ∥\nabla_{x^{*}} f (x + t Δ x) - \nabla_{x^{*}} f (x)∥ \cdot ∥ Δ x ∥ d t .

(7)

By the gradient Lipschitz condition, we have

∥\nabla_{x^{*}} f (x + t Δ x) - \nabla_{x^{*}} f (x)∥ \leq L ∥ t Δ x ∥ = L t ∥ Δ x ∥,

substituting the above inequality into the integral (7), we get

Right - hand side \leq {4 L ∥ Δ x ∥}^{2} \int_{0}^{1} t d t = {4 L ∥ Δ x ∥}^{2} \cdot \frac{1}{2} = 2 L {∥ Δ x ∥}^{2} .

Thus, for all

x, y \in Q^{n}

, we have

|f (y) - f (x) - 4 ℜ \{\nabla_{x^{*}} f {(x)}^{H} (y - x)\}| \leq 2 L {∥ y - x ∥}^{2} .

This completes the proof of the quadratic upper bound.

□

Theorem 1.

Let

f : Q^{n} \to R

is a real differentiable and gradient Lipschitz continuous function with constant

L > 0

. If the step size

α = \frac{1}{2 L}

and momentum parameter

β_{t} = \frac{t - 1}{t + 2}

, then

f (x_{t}^{*}) - f (x_{t}) \leq C / {(t + 1)}^{2}

is true for any

t \geq 0

, where

f (x_{t}^{*}) = λ_{max}

and C is a constant related to the initial conditions.

Proof.

From Lemma 1, for the update

z_{t + 1} = y_{t} + α \nabla_{y^{*}} f (y_{t})

, we have

f (z_{t + 1}) \geq f (y_{t}) + 4 ℜ \{\nabla_{y^{*}} f {(y_{t})}^{H} (z_{t + 1} - y_{t})\} - 2 L {∥z_{t + 1} - y_{t}∥}^{2}

Substituting

z_{t + 1} - y_{t} = α \nabla_{y^{*}} f (y_{t})

, we derive

f (z_{t + 1}) \geq f (y_{t}) + 4 α {∥\nabla f (y_{t})∥}^{2} - 2 L α^{2} {∥\nabla f (y_{t})∥}^{2} .

Choosing

α = \frac{1}{2 L}

, this simplifies to

f (z_{t + 1}) \geq f (y_{t}) + \frac{3}{2 L} {∥\nabla f (y_{t})∥}^{2} .

(8)

After projecting

z_{t + 1}

onto the unit sphere,

x_{t + 1} = z_{t + 1} / ∥z_{t + 1}∥

. Applying Lemma 1 again to

x_{t + 1}

and

z_{t + 1}

, we get

f (x_{t + 1}) \geq f (z_{t + 1}) + 4 ℜ \{\nabla_{z_{t + 1}^{*}} f {(z_{t + 1})}^{H} (x_{t + 1} - z_{t + 1})\} - 2 L {∥x_{t + 1} - z_{t + 1}∥}^{2} .

By the optimality condition of the projection, we have

ℜ \{\nabla_{z_{t + 1}^{*}} f {(z_{t + 1})}^{H} (x_{t + 1} - z_{t + 1})\} \geq 0

which implies

f (x_{t + 1}) \geq f (z_{t + 1}) - 2 L {∥x_{t + 1} - z_{t + 1}∥}^{2} .

(9)

Next, we consider the bound of the projection error

\begin{matrix} ∥ x_{t + 1} - z_{t + 1} ∥ & = ∥\frac{z_{t + 1}}{∥ z_{t + 1} ∥} - z_{t + 1}∥ \\ = ∥\frac{z_{t + 1} (1 - ∥ z_{t + 1} ∥)}{∥ z_{t + 1} ∥}∥ \\ = \frac{|1 - ∥ z_{t + 1} ∥|}{∥ z_{t + 1} ∥} ∥ z_{t + 1} ∥ \\ = |1 - ∥ z_{t + 1} ∥| . \end{matrix}

Combining

z_{t + 1} = y_{t} + α \nabla_{y^{*}} f (y_{t})

and normalizing such that

∥ y_{t} ∥ = 1

, we obtain

∥ z_{t + 1} ∥ \leq ∥ y_{t} ∥ + α ∥ \nabla_{y^{*}} f (y_{t}) ∥,

which implies that

∥ x_{t + 1} - z_{t + 1} ∥ \leq α ∥ \nabla_{y^{*}} f (y_{t}) ∥ .

Thus, the projection error in (9) is a higher-order term, we get

f (x_{t + 1}) \geq f (z_{t + 1}) - 2 L α^{2} {∥\nabla f (y_{t})∥}^{2} .

The formulas (8) and (9) give a lower bound

f (x_{t + 1}) \geq f (y_{t}) + \frac{3}{2 L} {∥\nabla f (y_{t})∥}^{2} - \frac{1}{2 L} {∥\nabla f (y_{t})∥}^{2} = f (y_{t}) + \frac{1}{L} {∥\nabla f (y_{t})∥}^{2} .

(10)

We then define the Lyapunov function as

Φ_{t} = θ_{t}^{2} (f (x^{*}) - f (x_{t})) + \frac{1}{2} {∥v_{t} - x^{*}∥}^{2},

where the auxiliary sequence

θ_{t} = \frac{t + 1}{2}

satisfies

β_{t} = \frac{θ_{t} - 1}{θ_{t + 1}}

, and

v_{t}

follows the update rule

v_{t + 1} = v_{t} + θ_{t} α \nabla f (y_{t}) .

Our goal is to prove that

Φ_{t + 1} \leq Φ_{t}

. Compute

Φ_{t + 1} - Φ_{t}

, we obtain

Φ_{t + 1} - Φ_{t} = θ_{t + 1}^{2} (f (x^{*}) - f (x_{t + 1})) - θ_{t}^{2} (f (x^{*}) - f (x_{t})) + \frac{1}{2} ({∥v_{t + 1} - x^{*}∥}^{2} - {∥v_{t} - x^{*}∥}^{2}) .

(11)

For the term

{∥v_{t + 1} - x^{*}∥}^{2}

, we have

\begin{matrix} {∥v_{t + 1} - x^{*}∥}^{2} & = {∥v_{t} - x^{*} + θ_{t} α \nabla f (y_{t})∥}^{2} \\ = {∥v_{t} - x^{*}∥}^{2} + 2 θ_{t} α ℜ \{\nabla f {(y_{t})}^{H} (v_{t} - x^{*})\} + θ_{t}^{2} α^{2} {∥\nabla f (y_{t})∥}^{2} . \end{matrix}

Thus we have

\begin{matrix} Φ_{t + 1} - Φ_{t} & = (θ_{t + 1}^{2} - θ_{t}^{2}) (f (x^{*}) - f (x_{t + 1})) + θ_{t}^{2} (f (x_{t + 1}) - f (x_{t})) \\ θ_{t} α ℜ \{\nabla f {(y_{t})}^{H} (v_{t} - x^{*})\} + \frac{θ_{t}^{2} + α^{2}}{2} {∥\nabla f (y_{t})∥}^{2} . \end{matrix}

Suppose that

f (x^{*}) \leq f (y_{t}) + ℜ \{\nabla f {(y_{t})}^{H} (x^{*} - y_{t})\} .

Substituting this into the Lyapunov difference and combining with (10), we arrive at

\begin{matrix} Φ_{t + 1} - Φ_{t} & \leq (θ_{t + 1}^{2} - θ_{t}^{2} - θ_{t}^{2}) (f (x^{*}) - f (y_{t})) \\ + θ_{t} α ℜ \{\nabla f {(y_{t})}^{H} (v_{t} - x^{*})\} + \frac{θ_{t}^{2} α^{2}}{2} {∥\nabla f (y_{t})∥}^{2} . \end{matrix}

Using

θ_{t + 1} = θ_{t} + \frac{1}{2}

and

α = \frac{1}{2 L}

, we simplify

θ_{t + 1}^{2} - θ_{t}^{2} = θ_{t} + \frac{1}{4} and \frac{θ_{t}^{2}}{L} \geq θ_{t}^{2} α^{2} L .

After algebraic manipulation, we show that

Φ_{t + 1} \leq Φ_{t} - \frac{θ_{t}^{2}}{4 L} {∥\nabla f (y_{t})∥}^{2} \leq Φ_{t} .

From

Φ_{t} \leq Φ_{0}

and

θ_{t} = \frac{t + 1}{2}

, we have

\frac{{(t + 1)}^{2}}{4} (f (x^{*}) - f (x_{t})) \leq Φ_{0} = \frac{1}{4} (f (x^{*}) - f (x_{0})) + \frac{1}{2} {∥v_{0} - x^{*}∥}^{2}

Letting

v_{0} = x_{0}

, we conclude

f (x^{*}) - f (x_{t}) \leq \frac{C}{{(t + 1)}^{2}},

where

C = f (x^{*}) - f (x_{0}) + 2 {∥v_{0} - x^{*}∥}^{2}

is a constant related to the initial setup. Algorithm 1 achieves an

O (1 / t^{2})

convergence rate. This completes the proof.

□

5. Numerical Experiments

In this section, we provide numerical examples to demonstrate the feasibility and effectiveness of the quaternion Nesterov’s accelerated gradient projection algorithm for the eigenvalue problem of quaternion Hermitian matrices. In the specific implementation of Algorithm 1, we set the constant step size and momentum parameter to

α = 0.05

and

β = 0.9

, respectively.

All the experiments are performed under Windows 11 and MATLAB version 23.2.0.2365128 (R2023b) with an AMD Ryzen 7 5800H with Radeon Graphics CPU at 3.20 GHz and 16 GB of memory.

Example 1.

Given quaternion matrix

A = A_{0} + A_{1} i + A_{2} j + A_{3} k

with

A_{0} = (\begin{matrix} 17.6331 & - 1.6420 & - 1.2730 \\ - 1.6420 & 8.3929 & - 1.7952 \\ - 1.2730 & - 1.7952 & 15.1089 \end{matrix}), A_{1} = (\begin{matrix} 0 & 1.2315 & 1.5751 \\ - 1.2315 & 0 & 2.5700 \\ - 1.5751 & - 2.5700 & 0 \end{matrix})

and

A_{2} = (\begin{matrix} 0 & 0.6530 & 3.2730 \\ - 0.6530 & 0 & 1.2301 \\ - 3.2730 & - 1.2301 & 0 \end{matrix}), A_{3} = (\begin{matrix} 0 & - 4.3909 & - 9.2817 \\ 4.3909 & 0 & 1.9585 \\ 9.2817 & - 1.9585 & 0 \end{matrix}) .

In this experiment, we employ the quaternion Nesterov’s accelerated gradient (QNAG) method (Algorithm 1) to compute all eigenvalues of

A

, which are

λ_{1} = 27.0543, λ_{2} = 12.4577, λ_{3} = 1.6229

, with their corresponding eigenvectors being

\begin{matrix} x_{1} = (\begin{matrix} 0.4485 + 0.4371 i - 0.1443 j - 0.3829 k \\ - 0.0365 + 0.0228 i + 0.0147 j + 0.1706 k \\ 0.3077 + 0.1006 i + 0.2161 j + 0.5076 k \end{matrix}), \\ x_{2} = (\begin{matrix} 0.1208 - 0.0357 i + 0.3112 j - 0.2267 k \\ 0.0627 - 0.7303 i - 0.0928 j - 0.0756 k \\ - 0.2492 + 0.4640 i + 0.0583 j - 0.0588 k \end{matrix}), \\ x_{3} = (\begin{matrix} - 0.0011 - 0.1557 i + 0.3809 j + 0.3270 k \\ 0.5029 + 0.3354 i + 0.0994 j + 0.2046 k \\ 0.1149 + 0.4714 i + 0.1735 j + 0.2025 k \end{matrix}) . \end{matrix}

Additionally, we obtain the following three residuals:

\begin{matrix} ∥ A x_{1} - x_{1} λ_{1} ∥ \leq 6.7432 e - 12, \\ ∥ A x_{2} - x_{2} λ_{2} ∥ \leq 5.1990 e - 12, \\ ∥ A x_{3} - x_{3} λ_{3} ∥ \leq 4.3378 e - 12 . \end{matrix}

It is evident that the residuals are controlled within an ideal range, demonstrating the feasibility and effectiveness of Algorithm 1 in computing the eigenvalues of quaternion Hermitian matrices.

Example 2.

In this experiment, we utilize MATLAB’s built-in functions to randomly generate three quaternion Hermitian matrices of different sizes and compare Algorithm 1 with the QPGA algorithm [3] and theeigfunction in the Quaternion Toolbox for MATLAB (QTFM)[17].

We first test the performance of Algorithm 1, QPGA, and the eig function in computing the principal eigenvalues of three different types and sizes of quaternion Hermitian matrices. The numerical experimental results are presented in Table 1, which includes three evaluation metrics: the number of iterations, residuals, and runtime. Superior results are highlighted in bold. It can be observed that Algorithm 1 outperforms the other algorithms in terms of the number of iterations, problem residuals, and runtime, demonstrating significant advantages when computing large-scale quaternion Hermitian matrices.

Subsequently, we plotted the variation curves of the objective function values for the first 50 iterations of Algorithm 1 and the QNGA algorithm, as shown in Figure 1. It is evident that our algorithm achieves a faster increase in the objective function, demonstrating higher efficiency in obtaining the maximum eigenvalue.

Figure 2 illustrates the residual variation curves generated by Algorithm 1 and the QNGA algorithm with respect to the number of iterations. Across the tested matrix dimensions, our algorithm consistently achieves higher accuracy and efficiency.

6. Conclusions

In this paper, leveraging the innovative generalized Hamilton-real (GHR) calculus, we have introduced a novel quaternion Nesterov’s accelerated projected gradient algorithm designed to compute the dominant eigenvalue and corresponding eigenvector of quaternion Hermitian matrices. The incorporation of momentum terms and look-ahead updates has enabled the algorithm to attain an accelerated convergence rate. Theoretical analysis has confirmed the convergence of the quaternion Nesterov’s accelerated projected gradient algorithm. Empirical results from numerical experiments indicate that the proposed method surpasses both the Quaternion Projected Gradient Ascent (QPGA) and conventional algebraic approaches in terms of both computational precision and efficiency in runtime.

Author Contributions

Shan-Qi Duan wrote the main manuscript text and performed the experiment. Qing-Wen Wang contributed to the conception of the study and helped to improve this manuscript with constructive suggestions. Xue-Feng Duan made a lot of useful suggestions. All authors reviewed the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China under Grant 12371023.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bunse-Gerstner, A. , Byers, R., Mehrmann, V. A quaternion QR algorithm. Numer. Math. 1989, 55, 83–95. [Google Scholar]
De Leo, S. , Scolarici, G. Right eigenvalue equation in quaternionic quantum mechanics. J. Phys. A Math. Gen. 2000, 33, 2971. [Google Scholar] [CrossRef]
Diao, Q. , Liu, J., Zhang, N., Xu, D. An iterative algorithm for quaternion eigenvalue problems in signal processing. IEEE Signal Process. Lett. 2024, 31, 2505–2509. [Google Scholar]
Duan, S.Q. , Wang, Q.W., Duan, X.F. On Rayleigh quotient iteration for the dual quaternion Hermitian eigenvalue problem. Mathematics 2024, 12. [Google Scholar] [CrossRef]
Ell, T.A. , Sangwine, S.J. Quaternion involutions and anti-involutions. Comput. Math. Appl. 2007, 53, 137–143. [Google Scholar] [CrossRef]
Farid, F. , Wang, Q.W., Zhang, F. On the eigenvalues of quaternion matrices. Lin. Multilin. Alg. 2011, 59, 451–473. [Google Scholar]
Ghadimi, S. , Lan, G. Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 2016, 156, 59–99. [Google Scholar]
Guo, Z. , Jiang, T., Vasil’ev, V., Wang, G. Complex structure-preserving method for schrödinger equations in quaternionic quantum mechanics. Numer. Algorithms 2024, 97, 271–287. [Google Scholar]
Hamilton, W.R. Elements of Quaternions. Longmans, Green (1866).
Jia, Z. , Wei, M., Ling, S. A new structure-preserving method for quaternion Hermitian eigenvalue problems. J. Comput. Appl. Math. 2013, 239, 12–24. [Google Scholar]
Jiang, T. , Chen, L. An algebraic method for schrödinger equations in quaternionic quantum mechanics. Comput. Phys. Commun. 2008, 178, 795–799. [Google Scholar]
Li, H. , Peng, Z., Pan, C., Zhao, D. Fast gradient method for low-rank matrix estimation. J. Sci. Comput. 2023, 96, 41. [Google Scholar]
Li, Y. , Wei, M., Zhang, F., Zhao, J. On the power method for quaternion right eigenvalue problem. J. Comput. Appl. Math. 2019, 345, 59–69. [Google Scholar]
Macías-Virgós, E. , Pereira-Sáez, M., Tarrío-Tobar, A.D. Rayleigh quotient and left eigenvalues of quaternionic matrices. Linear Multilinear Algebra 2023, 71, 2163–2179. [Google Scholar]
Nesterov, Y. A method for unconstrained convex minimization problem with the rate of convergence O(1/k²). Dokl. Akad. Nauk. SSSR 1983, 269, 543. [Google Scholar]
Rodman, L. Topics in quaternion linear algebra. Princeton University Press (2014).
Sangwine, S.J., Bihan, N.L. Quaternion toolbox for matlab, version 2 with support for octonions. Software library available at: http: //qtfm.sourceforge.net/ (2013).
Wang, Q.W. , Wang, X.X. Arnoldi method for large quaternion right eigenvalue problem. J. Sci. Comput. 2020, 82, 58. [Google Scholar]
Ward, J.P. Quaternions and Cayley numbers: Algebra and applications, vol. 403. Springer Science & Business Media (2012).
Xu, D. , Jahanchahi, C., Took, C.C., Mandic, D.P. Enabling quaternion derivatives: The generalized hr calculus. R. Soc. Open Sci. 2015, 2, 150255. [Google Scholar] [CrossRef]
Xu, D. , Mandic, D.P. The theory of quaternion matrix derivatives. IEEE Trans. Signal Process. 2015, 63, 1543–1556. [Google Scholar]
Xu, D. , Xia, Y., Mandic, D.P. Optimization in quaternion dynamic systems: Gradient, Hessian, and learning algorithms. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 249–261. [Google Scholar]
Yu, C.E. , Liu, X., Zhang, Y. A new complex structure-preserving method for QSVD. J. Sci. Comput. 2024, 99, 37. [Google Scholar]
Zeng, R. , Wu, J., Shao, Z., Chen, Y., Chen, B., Senhadji, L., Shu, H. Color image classification via quaternion principal component analysis network. Neurocomputing 2016, 216, 416–428. [Google Scholar]
Zhang, F. Quaternions and matrices of quaternions. Linear Algebra Its Appl. 1997, 251, 21–57. [Google Scholar] [CrossRef]

Figure 1. Variation curve of the objective function value with respect to the number of iterations for Algorithm 1 on a 1000-order quaternion Hermitian matrix.

Figure 2. Comparison of fixed iteration steps and residual variation curves between Algorithm 1 and QPGA for solving Problem (4), with quaternion Hermitian matrix sizes of order 300 and 1000 on the left and right, respectively.

Table 1. Numerical comparison results of the QNAG algorithm, QPGA algorithm, and eig function for computing the principal eigenvalue and its corresponding eigenvector of three types of quaternion Hermitian matrices.

Matrix type	n	Ours			QPGA [3]			`eig` [17]
		IT	Residual	CPU(s)	IT	Residual	CPU(s)	Residual	CPU(s)
General	300	453	4.5453e-12	0.6667	1764	9.9457e-12	2.4857	9.6329e-12	3.4420
Hermitian	500	432	9.4172e-12	2.6465	1368	9.8076e-12	8.2786	3.3852e-11	24.6503
	1000	461	8.2506e-12	10.4795	1592	9.8945e-12	36.1470	2.1115e-10	373.4246
Positive	300	434	1.6844e-12	0.6420	1141	9.8121e-12	1.7283	1.5353e-11	3.3504
Semi-definite	500	454	9.2637e-12	2.6836	2020	9.9177e-12	11.7135	1.9891e-11	24.6779
	1000	485	8.2055e-12	10.6467	1943	9.9733e-12	42.3868	1.7807e-10	362.1100
Positive	300	419	1.5814e-12	0.6079	1615	9.8821e-12	2.2746	3.9411e-11	3.3670
Definite	500	457	9.9405e-12	2.7210	1530	9.5587e-12	9.1385	2.3945e-11	24.3842
	1000	503	7.8888e-12	11.2300	2636	9.9721e-12	56.6146	7.2911e-11	379.0733

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

A Fast Projected Gradient Algorithm for Quaternion Hermitian Eigenvalue Problems

Abstract

Keywords:

Subject:

1. Introduction

2. Preliminaries

2.1. Notations

2.2. Quaternions and Quaternion Matrices

2.3. GHR calculus

3. Quaternion Nesterov’s Accelerated Gradient (QNAG)

4. Convergence Analysis of QNAG

5. Numerical Experiments

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe