Efficient Tensor Robust Principal Analysis via Right Invertible Matrix Based Tensor Products

Zhang Huang; Jun Feng; Wei Li

doi:10.20944/preprints202412.1487.v1

Submitted:

17 December 2024

Posted:

18 December 2024

You are already at the latest version

Abstract

In this paper, we extend the definition of tensor products from using invertible transformations to utilising right-invertible matrices, exploring the algebraic properties of these new tensor products. On the basis of this new definition, we define the concepts of tensor rank and tensor nuclear norm and investigate their properties, ensuring consistency with their matrix counterparts. We then derive a singular value thresholding (SVT) formula to approximately solve the subproblems in the alternating direction method of multipliers (ADMM), which is a key component of our proposed tensor robust principal component analysis (TRPCA) algorithm. We conduct a complexity analysis of the proposed algorithm, demonstrating its computational efficiency, and apply it to grayscale video denoising and motion detection problems, where it shows significant improvements in efficiency while maintaining a similar level of quality. This work provides a promising approach for handling large-scale data, offering new insights and solutions for advanced data analysis tasks.

Keywords:

tensor robust principal component analysis (TRPCA)

;

right invertible matrices

;

tensor nuclear norm

;

image processing

Subject:

Computer Science and Mathematics - Algebra and Number Theory

1. Introduction

Principal component analysis (PCA) is a fundamental technique in data analysis that is widely used for dimensionality reduction and feature extraction. It is particularly effective in uncovering low-dimensional structures in high-dimensional data, which is prevalent in various fields, such as images, text, videos, and bioinformatics. PCA simplifies the complexity of high-dimensional data while retaining trends and patterns by transforming the data into fewer dimensions, which act as summaries of features. This type of data presents several challenges that PCA mitigates: computational expense and an increased error rate due to multiple test corrections when testing each feature for association with an outcome. Despite its computational efficiency and robustness to minor noise, PCA is highly sensitive to gross corruptions or outliers, which are common in real-world datasets. Recent advancements in robust PCA have addressed these limitations, providing more accurate and stable results in the presence of outliers [1,2,3,4,5,6].

To address this limitation, numerous robust variants of PCA have been developed, but many suffer from high computational costs [7,8,9]. Among these robust variants, robust principal component analysis (RPCA) stands out as a significant advancement [1]. RPCA is the first polynomial-time algorithm with strong recovery guarantees [10]. Given an observed matrix

X \in R^{n_{1} \times n_{2}}

, which can be decomposed as

X = L_{0} + E_{0}

, where

L_{0}

is low-rank and

E_{0}

is sparse, RPCA demonstrates that if the singular vectors of

L_{0}

meet certain incoherence conditions,

L_{0}

and

E_{0}

can be accurately recovered with high probability by solving the following convex optimisation problem:

min_{L, E} {∥ L ∥}_{*} + λ {∥ E ∥}_{1}, s . t . X = L + E .

(1)

Here,

{∥ L ∥}_{*}

represents the nuclear norm (the sum of the singular values of L), and

{∥ E ∥}_{1}

denotes the

ℓ_{1}

norm (the sum of the absolute values of all entries in E) [1,3]. The parameter

λ

is typically set to

1 / \sqrt{max (n_{1}, n_{2})}

, which performs well in practice [1]. Algorithmically, this problem can be efficiently solved at a cost comparable to that of PCA [7,9]. Recent research has focused on improving the efficiency and robustness of RPCA, with novel approaches such as nonconvex RPCA (N-TRPCA) and learned robust PCA (LRPCA) showing promising results [6,11].

However, a primary limitation of RPCA is its applicability only to two-way (matrix) data. Real-world data, especially in modern applications, are often multidimensional and represented as multiway arrays or tensors [12,13]. For example, a colour image is a three-way object with column, row, and colour modes, whereas a grayscale video is indexed by two spatial variables and one temporal variable [12]. To apply RPCA, one must first reshape the multiway data into a matrix, a process that often leads to information loss and performance degradation [3,14]. This limitation has prompted researchers to extend RPCA to handle tensor data directly, leveraging its multidimensional structure [6,15]. Such extensions aim to preserve the inherent structures of the data and avoid the pitfalls associated with reshaping, thus improving the robustness and accuracy of RPCA in handling high-dimensional data [16,17].

To address this issue, recent research has focused on extending RPCA to handle tensor data directly, leveraging its multidimensional structure. One notable approach is tensor robust principal component analysis (TRPCA), which aims to precisely recover a low-rank tensor corrupted by sparse errors[3,15]. Lu et al.[15] introduced a new tensor nuclear norm based on the tensor-tensor product (t-product)[18], which is a generalisation of the matrix-matrix product. This new tensor nuclear norm is the convex envelope of the tensor average rank within the unit ball of the tensor spectral norm, providing a strong theoretical foundation for TRPCA[15].

Building upon this work, we propose a novel TRPCA model that utilises a new concept of tensor products defined via right invertible linear transforms. Specifically, we define new tensor products for third-order tensors via right invertible matrices. These new tensor products generalise the concept of tensor products defined by invertible linear transforms and offer several advantages in terms of computational efficiency and robustness.

Our contributions are summarised as follows:

Motivated by the new tensor product in Reference [19], which is a natural generalisation of the tensor product with invertible linear transforms, such as the t-product and c-product, we rigorously deduce a new class of tensor products for third-order tensors via right invertible linear transforms. These new tensor products generalise existing tensor product definitions and preserve many fundamental properties of the tensor product defined by invertible linear transforms. This forms the foundation for extending models, optimisation methods, and theoretical analysis techniques from t-products to new products with the right linear transforms.
Novel TRPCA models that leverage new tensor products to increase computational efficiency and robustness are given. Equipped with the tensor nuclear norm, this model provides a more efficient and robust solution for tensor data.
We provide theoretical guarantees for the exact recovery of low-rank and sparse components via the new tensor nuclear norm. Under certain assumption conditions, the solution to the convex TRPCA model perfectly recovers the underlying low-rank component $L_{0}$ and sparse component $E_{0}$ . The recovery guarantees of RPCA[1] fall into our special cases.
Numerical experiments are presented to validate the efficiency and effectiveness of the proposed algorithms in various applications, including grayscale video recovery and motion detection. Our experiments show the superiority of the new TRPCA over TRPCA on the basis of the c-product.

This paper is organised as follows: Section 2 introduces the necessary background and notation. Section 3 defines the new tensor products and explores their properties. Section 4 presents the TRPCA model and the theoretical guarantees. Section 5 provides numerical experiments and applications. Finally, Section 6 concludes the paper and discusses future directions.

2. Preliminaries

2.1. Notations

Throughout this paper, vectors are denoted by italic lowercase letters such as

\vec{u}

and

\vec{v}

, matrices by uppercase letters such as A and B, and tensors by calligraphic capital letters such as

A

and

B

. Scalars are denoted by lowercase letters, such as a. The notation

F^{n_{1} \times n_{2} \times n_{3}}

represents the set of

n_{1} \times n_{2} \times n_{3}

tensors over the field

F

, where

F

can be either the field of real numbers

R

or the field of complex numbers

C

. This notation can also be written as

F_{n_{3}}^{n_{1} \times n_{2}}

when emphasising two particular dimensions.

For a third-order tensor

A \in F_{n_{3}}^{n_{1} \times n_{2}}

, the

(i, j, k)

-th entry is denoted as

A_{i j k}

or

a_{i j k}

. Let

a_{i j}

represent the

i, j

-th tube of

A

, that is,

a_{i j} = A_{i_{1}, i_{2}, :}

. A slice of a tensor is a two-dimensional section defined by fixing all but two indices. The

i_{3}

-th frontal slice is denoted as

A^{(i_{3})}

, the

i_{1}

-th horizontal slice is denoted as

A_{i_{1}, :, :}

, and the

i_{2}

-th lateral slice is denoted as

A_{:, i_{2}, :}

.

The inner product between matrices A and B in

F^{n_{1} \times n_{2}}

is defined as

〈 A, B 〉 = Tr (A^{*} B)

, where

A^{*}

denotes the conjugate transpose of A and

Tr (\cdot)

denotes the matrix trace. The inner product between tensors

A

and

B

in

F_{n_{3}}^{n_{1} \times n_{2}}

is defined as

〈 A, B 〉 = \sum_{i_{3} = 1}^{n_{3}} 〈 A^{(i_{3})}, B^{(i_{3})} 〉

. Several norms are used for vectors, matrices, and tensors. The

ℓ_{1}

norm of a tensor

A

is defined as

{∥ A ∥}_{1} = \sum_{i_{1}, i_{2}, i_{3}} | a_{i_{1} i_{2} i_{3}} |

. The Frobenius norm is defined as

{∥ A ∥}_{F} = \sqrt{\sum_{i_{1}, i_{2}, i_{3}} {| a_{i_{1} i_{2} i_{3}} |}^{2}}

. These norms reduce to the corresponding vector or matrix norms when

A

is a vector or a matrix. For a vector

\vec{v} \in F^{n}

, the

ℓ_{2}

norm is defined as

∥ \vec{v} ∥_{2} = \sqrt{\sum_{i} {| v_{i} |}^{2}}

. The matrix nuclear norm is defined as

{∥ A ∥}_{*} = \sum_{i} σ_{i} (A)

.

2.2. Right invertible transform operator

Consider a matrix M of size

p \times n_{3}

with

p \leq n_{3}

. If M is a right invertible matrix, then there exists a matrix N of size

n_{3} \times p

such that

M_{p \times n_{3}} N_{p \times n_{3}} = I_{p \times p}

. When

p = n

, M, as a right invertible matrix, is an invertible matrix. A right invertible matrix is a matrix with linearly independent rows. One way to construct a right invertible matrix is by removing some rows from an invertible matrix. Let us consider an

n \times n

discrete cosine transform (DCT) matrix C. Suppose that we want to form a new matrix

F^{'}

of size

(n - m) \times n

by removing the last m rows from the original matrix F, where

m \leq n

. This operation is equivalent to truncating the matrix. The resulting matrix

F^{'}

will no longer be square and thus does not have an inverse effect in the traditional sense. However, we can find a right inverse G such that

F_{(n - m) \times n}^{'} G_{n \times (n - m)} = I_{n - m}

where

I_{n - m}

is the

(n - m) \times (n - m)

identity matrix. In general, the right inverse of a matrix is not unique. To construct G, we can use the Moore-Penrose pseudoinverse. For a given matrix A, its pseudoinverse

A^{+}

is defined as

A^{+} = A^{†} {(A A^{†})}^{- 1}

provided that

A^{†} A

is invertible, where

A^{†}

denotes the conjugate transpose of A. Therefore, the right inverse G of

F^{'}

is given by:

G = F^{' †} {(F^{'} F^{' †})}^{- 1} .

Thus, we form a right invertible pair

(F^{'}, G)

.

Now, we can define two functions to convert between tube tensors

F_{n_{3}}^{1 \times 1}

and vectors in

F^{n_{3}}

. The function

vec

converts a tube tensor

A \in F_{n_{3}}^{1 \times 1}

into a vector in

F^{n_{3}}

,

vec : F_{n_{3}}^{1 \times 1} \to F^{n_{3}}, A \mapsto vec (A)

, where

vec (A)

is the vector obtained by stacking the elements of

A

in a column vector. The function

unvec

converts a vector

\vec{v} \in F^{n_{3}}

into a tube tensor

A \in F_{n_{3}}^{1 \times 1}

,

unvec : F^{n_{3}} \to F_{n_{3}}^{1 \times 1}, \vec{v} \mapsto unvec (\vec{v})

, where

unvec (\vec{v})

is the tube tensor obtained by reshaping the vector

\vec{v}

into a

F_{n_{3}}^{1 \times 1}

tensor.

Definition 1.

The mode-3 (matrix) product between

A \in F_{n_{3}}^{n_{1} \times n_{2}}

and an

p \times n_{3}

matrix U is defined as

{(A \times_{3} U)}_{i j k} = \sum_{s = 1}^{n_{3}} A_{i j s} u_{k s}, i = 1, \dots, n_{1}, j = 1, \dots, n_{2}, k = 1, \dots, p .

Definition 2.

Let M be a right invertible matrix of size

p \times n_{3}

. We define the right invertible operator L on tube tensors as

L : F_{n_{3}}^{1 \times 1} \to F_{p}^{1 \times 1}, A \mapsto A \times_{3} M

and its inverse as

R : F_{p}^{1 \times 1} \to F_{n_{3}}^{1 \times 1}, B \mapsto B \times_{3} N

where N is the right invertible matrix of M.

Using the functions vec and unvec, we can also express the operators L and R in terms of vector operations

L : F_{n_{3}}^{1 \times 1} \to F_{p}^{1 \times 1}, A \mapsto unvec (M \cdot vec (A))

R : F_{p}^{1 \times 1} \to F_{n_{3}}^{1 \times 1}, B \mapsto unvec (N \cdot vec (B))

It can be seen that

R (L (A))

is not equal to the original tensor

A

in general because

N M

is not an identity matrix, although

M N

is an identity matrix.

3. Definition and Properties of the New Tensor Product

Definition 3

(Face-wise Product[20]). Let

A \in F_{n_{3}}^{n_{1} \times k}

and

B \in F_{n_{3}}^{k \times n_{2}}

be two third-order tensors. Theface-wise productbetween

A

and

B

is defined as:

{(A ▵ B)}^{(i_{3})} = A^{(i_{3})} B^{(i_{3})}, i_{3} = 1, \dots, n_{3} .

Here,

A^{(i_{3})}

and

B^{(i_{3})}

denote the

i_{3}

-th frontal slices of

A

and

B

, respectively, which are matrices of sizes

n_{1} \times k

and

k \times n_{2}

. The product

A^{(i_{3})} B^{(i_{3})}

is the standard matrix multiplication.

Definition 4.

Let M be a right invertible matrix of size

p \times n_{3}

. We define theright invertible operator Lon two third-order tensors as

L : F_{n_{3}}^{n_{1} \times n_{2}} \to F_{p}^{n_{1} \times n_{2}}, A \mapsto A \times_{3} M

and its inverse as

R : F_{p}^{n_{1} \times n_{2}} \to F_{n_{3}}^{n_{1} \times n_{2}}, B \mapsto B \times_{3} N

where N is the right inverse matrix of M.

In the following, if no confusion arises, we do not distinguish the right invertible operator L defined by the same matrix M on different tensor spaces.

Proposition 1.

The operator L is surjective (onto), the operator R is injective (one-to-one), and

L R

is the identity mapping.

Proof.

To show that L is surjective, we need to show that for any

B \in F_{p}^{n_{1} \times n_{2}}

, there exists a

A \in F_{n_{3}}^{n_{1} \times n_{2}}

such that

L (A) = B

. Consider

A = B \times_{3} N

. Then,

L (A) = L (B \times_{3} N) = (B \times_{3} N) \times_{3} M = (B \times_{3} M N)

Using the property

M N = I

, the identity matrix, we obtain

(B \times_{3} N) \times_{3} M = B \times_{3} I = B

. Thus,

L (A) = B

, showing that L is surjective.

To show that R is injective, we need to show that if

R (B) = R (C)

, then

B = C

. Assume that

R (B) = R (C)

. Then,

B \times_{3} N = C \times_{3} N

. Applying the operator L to both sides, we obtain

L (B \times_{3} N) = L (C \times_{3} N)

. We have

(B \times_{3} N) \times_{3} M = (C \times_{3} N) \times_{3} M

. Using the property

M N = I

, we obtain

B \times_{3} (M N) = C \times_{3} (M N)

which simplifies to

B \times_{3} I = C \times_{3} I

. Thus,

B = C

. This shows that R is injective.

To show that

L R

is the identity mapping, we need to show that for any

B \in F_{p}^{n_{1} \times n_{2}}

,

L R (B) = B

. Consider

L R (B) = L (B \times_{3} N) = (B \times_{3} N) \times_{3} M

. Using the property

M N = I

, we obtain

(B \times_{3} N) \times_{3} M = B \times_{3} (N \times_{3} M) = B \times_{3} (M N) = B \times_{3} I = B

. Thus,

L R (B) = B

, showing that

L R

is the identity mapping.

□

Definition 5.

Let L be the right invertible operator between

F_{n_{3}}^{n_{1} \times n_{2}}

and

F_{p}^{n_{1} \times n_{2}}

, R be its inverse between

F_{p}^{n_{1} \times n_{2}}

and

F_{n_{3}}^{n_{1} \times n_{2}}

. We define thekernelof the right invertible operator L as

\begin{matrix} ker (L) & = {A \in F_{n_{3}}^{n_{1} \times n_{2}} ∣ L (A) = 0 \in F_{p}^{n_{1} \times n_{2}}} \\ = {A \in F_{n_{3}}^{n_{1} \times n_{2}} ∣ A \times_{3} M = 0 \in F_{p}^{n_{1} \times n_{2}}} \end{matrix}

Proposition 2.

Let L and R be defined as above; then, we have

ker (L) = {B ∣ \exists A \in F_{n_{3}}^{n_{1} \times n_{2}} such that B = (I - R L) (A)}

where I is the identity map on

F_{n_{3}}^{n_{1} \times n_{2}}

.

Proof.

First, we show that

ker (L) \subseteq {B ∣ \exists A \in F_{n_{3}}^{n_{1} \times n_{2}} such that B = (I - R L) (A)}

. Assuming

B \in ker (L)

, then

L (B) = 0

, we need to show that there exists an

A \in F_{n_{3}}^{n_{1} \times n_{2}}

such that

B = (I - R L) (A)

. Consider

A = B

; then,

(I - R L) (B) = B - R L (B)

, since

B \in ker (L)

, we have

L (B) = 0

. Therefore,

(I - R L) (B) = B - R (0) = B - 0 = B

. Thus,

B = (I - R L) (B)

, which shows that

B \in {B ∣ \exists A \in F_{n_{3}}^{n_{1} \times n_{2}} such that B = (I - R L) (A)}

.

On the other hand, we show that

{B ∣ \exists A \in F_{n_{3}}^{n_{1} \times n_{2}} such that B = (I - R L) (A)} \subseteq ker (L)

. Assume

B \in {B ∣ \exists A \in F_{n_{3}}^{n_{1} \times n_{2}} such that B = (I - R L) (A)}

. Then, there exists an

A \in F_{n_{3}}^{n_{1} \times n_{2}}

such that

B = (I - R L) (A) = A - R L (A)

. Thus,

L (B) = L ((I - R L) (A)) = L (A - R L (A)) = L (A) - L (R L (A))

. By Proposition 1,

L R

is the identity mapping, so

L (R L (A)) = L (A)

. Thus,

L (B) = L (A) - L (A) = 0

. This shows that

B \in ker (L)

.

Therefore, we have

ker (L) = {B ∣ \exists A \in F_{n_{3}}^{n_{1} \times n_{2}} such that B = (I - R L) (A)}

. This completes the proof. □

Using the functions

vec

and

unvec

, we can give an equivalent description of the operators L and R in terms of tube tensors

L : F_{n_{3}}^{n_{1} \times n_{2}} \to F_{p}^{n_{1} \times n_{2}}, A [i, j, :] \mapsto unvec (M vec (A [i, j, :]))

for all

i = 1, 2, \dots, n_{1}

and

j = 1, 2, \dots, n_{2}

,

R : F_{p}^{n_{1} \times n_{2}} \to F_{n_{3}}^{n_{1} \times n_{2}}, B [i, j, :] \mapsto unvec (N vec (B [i, j, :]))

for all

i = 1, 2, \dots, n_{1}

and

j = 1, 2, \dots, n_{2}

.

Definition 6.

Let L be the right invertible operator defined by matrix M of size

p \times n_{3}

between

F_{n_{3}}^{n_{1} \times k}

and

F_{n_{3}}^{n_{1} \times k}

, R be its inverse defined by matrix N of size

n_{3} \times p

between

F_{p}^{n_{1} \times n_{2}}

and

F_{n_{3}}^{n_{1} \times n_{2}}

. Theright invertible transform productfor two third-order tensors

A \in F_{n_{3}}^{n_{1} \times k}

and

B \in F_{n_{3}}^{k \times n_{2}}

based on

L, R

is denoted by

*_{L, R} : F_{n_{3}}^{n_{1} \times k} \times F_{n_{3}}^{k \times n_{2}} ⟶ F_{n_{3}}^{n_{1} \times n_{2}}

and is defined as

A *_{L, R} B = ((A \times_{3} M) ▵ (B \times_{3} M)) \times_{3} N

If there is no confusion and for simplicity, we denote

A *_{L, R} B = R (L (A) ▵ L (B))

Proposition 3.

Let L be the right invertible operator between

F_{n_{3}}^{n_{1} \times n_{2}}

and

F_{p}^{n_{1} \times n_{2}}

, and R be its inverse between

F_{p}^{n_{1} \times n_{2}}

and

F_{n_{3}}^{n_{1} \times n_{2}}

. For

A \in F_{n_{3}}^{n_{1} \times k}

and

B \in F_{n_{3}}^{k \times n_{2}}

, the

(i, j)

-th tube fibre of

A *_{L, R} B

can be computed as the appropriate sum of tube fibre products; i.e.,

{(A *_{L, R} B)}_{i, j} = \sum_{k = 1}^{r} a_{i k} *_{L, R} b_{k j},

where

a_{i k}

and

b_{k j}

are the tube fibres of

A

and

B

, respectively.

Proof.

According to the definition, L maps

A \in F_{n_{3}}^{n_{1} \times k}

to

F_{p}^{n_{1} \times k}

and

B \in F_{n_{3}}^{k \times n_{2}}

to

F_{p}^{k \times n_{2}}

. Therefore,

L (A) \in F_{p}^{n_{1} \times k}

and

L (B) \in F_{p}^{k \times n_{2}} .

By definition, ▵ is a matrix multiplication operation, so

L (A) ▵ L (B) \in F_{p}^{n_{1} \times n_{2}} .

Apply R to map the result back from

F_{p}^{n_{1} \times n_{2}}

to

F_{n_{3}}^{n_{1} \times n_{2}}

, we have

R (L (A) ▵ L (B)) \in F_{n_{3}}^{n_{1} \times n_{2}} .

By definition,

{(A *_{L, R} B)}_{i, j}

is the

(i, j)

-th tube fibre of

A *_{L, R} B

. We can express it as

{(A *_{L, R} B)}_{i, j} = R {(L (A) ▵ L (B))}_{i, j} .

According to the properties of matrix multiplication, the

(i, j)

-th element of

L (A) ▵ L (B)

can be written as

{(L (A) ▵ L (B))}_{i, j} = \sum_{k = 1}^{r} {(L (A))}_{i, k} \cdot {(L (B))}_{k, j} .

Apply R to each term in the sum

R ({(L (A))}_{i, k} \cdot {(L (B))}_{k, j}) = R (L (a_{i k}) \cdot L (b_{k j})) .

By the definition of the

*_{L, R}

product, we have

R (L (a_{i k}) \cdot L (b_{k j})) = a_{i k} *_{L, R} b_{k j} .

Therefore, the

(i, j)

-th tube fibre of

A *_{L, R} B

can be expressed as

{(A *_{L, R} B)}_{i, j} = \sum_{k = 1}^{r} a_{i k} *_{L, R} b_{k j} .

This completes the proof. □

Now, we provide the algorithm for computing the

*_{L, R}

product of two tensors in Algorithm 1.

Algorithm 1 Compute the

*_{L, R}

product of two tensors

The

*_{L, R}

product retains several properties of the tensor product defined via invertible linear transformations[20]. However, owing to the generally noninvertible properties of the right invertible transformation used to define the

*_{L, R}

product, several properties exhibit notable differences.

Lemma 1.

If

A, B, C

are third-order tensors of appropriate sizes, then the following statements are true:

Associative Property: $(A *_{L, R} B) *_{L, R} C = A *_{L, R} (B *_{L, R} C)$
Distributive Property:

$A *_{L, R} (B + C) = (A *_{L, R} B) + (A *_{L, R} C)$

$(B + C) *_{L, R} A = (B *_{L, R} A) + (C *_{L, R} A)$
Noncommutative Property: In general, $A *_{L, R} B \neq B *_{L, R} A$

Proof.

Associativity: Let

A, B, C

be third-order tensors in

F^{n_{1} \times n_{2} \times n_{3}}

; then, by the linear properties of operators L and R and the fact that

L R

is the identity transformation, we have

\begin{matrix} A *_{L, R} (B *_{L, R} C) & = R (L (A) ▵ R (L (B) ▵ L (C))) \\ = R (L (A) ▵ L (B) ▵ L (C)) \\ = R (L (R (L (A) ▵ L (B))) ▵ L (C)) \\ = (A *_{L, R} B) *_{L, R} C \end{matrix}

Distributivity:

\begin{matrix} A *_{L, R} (B + C) & = R (L (A) ▵ L (B + C)) \\ = R (L (A) ▵ (L (B) + L (C))) \\ = R (L (A) ▵ L (B) + L (A) ▵ L (C)) \\ = R (L (A) ▵ L (B)) + R (L (A) ▵ L (C)) \\ = A *_{L, R} B + A *_{L, R} C \end{matrix}

Similarly,

\begin{matrix} (B + C) *_{L, R} A & = R ((L (B) + L (C)) ▵ L (A)) \\ = R (L (B) ▵ L (A) + L (C) ▵ L (A)) \\ = R (L (B) ▵ L (A)) + R (L (C) ▵ L (A)) \\ = B *_{L, R} A + C *_{L, R} A \end{matrix}

Noncommutativity: In general, the

*_{L, R}

product is not commutative. This is because

L (A) ▵ L (B) \neq L (B) ▵ L (A)

in most cases, leading to

A *_{L, R} B \neq B *_{L, R} A

. □

The above proof of associativity shows that the transformation

L R

is the identity transformation. This also indicates that, under this definition, if L is a left invertible transformation, the associative property does not hold.

Theorem 1.

(Commutativity of the

*_{L, R}

product for tube tensors). If

A, B \in F^{1 \times 1 \times n_{3}}

, then the

*_{L, R}

product is commutative. Specifically,

A *_{L, R} B = B *_{L, R} A .

Proof.

Recall the definition of the

*_{L, R}

product,

A *_{L, R} B = R (L (A) ▵ L (B)),

where L is a linear transformation and R is its right invertible transformation. For tube tensors

A, B \in F^{1 \times 1 \times n_{3}}

, let

M, N

be the matrix of the linear transformation L and the right invertible transformation R, respectively; then,

A *_{L, R} B = R (L (A) ▵ L (B)) = u n v e c (N (M v e c (A) ▵ M v e c (B)))

. For vectors, the operation ▵ is typically a component-wise multiplication for these vectors. Since component-wise multiplication is commutative, we have

(M v e c (A)) ▵ (M v e c (B)) = (M v e c (B)) ▵ (M v e c (A)) .

that is,

L (A) ▵ L (B) = L (B) ▵ L (A) .

Applying the right invertible transformation R to both sides, we obtain

R (L (A) ▵ L (B)) = R (L (B) ▵ L (A)) .

By the definition of the

*_{L, R}

product, this implies

A *_{L, R} B = B *_{L, R} A .

Thus, the

*_{L, R}

product is commutative for scalar tensors in

F^{1 \times 1 \times n_{3}}

. □

Definition 7

(

*_{L, R}

conjugate transpose). Let L be the right invertible operator between

C_{n_{3}}^{n_{1} \times n_{2}}

and

C_{p}^{n_{1} \times n_{2}}

. Theconjugate transposeof a third-order tensor

A \in C_{n_{3}}^{n_{1} \times n_{2}}

, denoted by

A^{H}

, is the tensor

A^{H} \in C_{n_{3}}^{n_{2} \times n_{1}}

obtained by conjugate transposing each of the frontal slices in the transform domain. Specifically,

L {(A^{H})}^{(i)} = {(L (A^{(i)}))}^{H}, for i = 1, 2, \dots, n_{3},

where

L (A^{(i)})

denotes the i-th frontal slice of

A

and

{(L (A^{(i)}))}^{H}

denotes the conjugate transpose of the i-th frontal slice. However, since L is not necessarily invertible, different tensors may transform into the same tensor. Therefore, the conjugate transpose of

A

is not necessarily unique.

If

A^{H}

is the conjugate transpose of

A

, then

A^{H} + N

where

N \in ker (L)

is also a conjugate transpose of

A

. This is because for any

N \in ker (L)

, we have

L (N) = 0

; thus,

L (A^{H} + N) = L (A^{H}) + L (N) = L (A^{H}) = {(L (A))}^{H}

. Therefore,

A^{H} + N

satisfies the same properties as

A^{H}

in the transform domain, making it another valid conjugate transpose of

A

.

Proposition 4.

The multiplication reversal property of the

*_{L, R}

conjugate transpose holds:

A^{H} *_{L, R} B^{H} = {[B *_{L, R} A]}^{H}

.

Proof.

By definition, we have

\begin{matrix} L {(A^{H} *_{L, R} B^{H})}^{(i)} & = ({(L {(A)}^{(i)})}^{H} {(L {(B)}^{(i)})}^{H}) \\ = {(L {(B)}^{(i)} L {(A)}^{(i)})}^{H} \\ = (L {(B *_{L} A)}^{(i)}) . \end{matrix}

The result follows from equality in the transform domain by definition. □

Definition 8

(

*_{L, R}

identity tensor). Let L be the right invertible operator between

C_{n_{3}}^{n \times n}

and

C_{p}^{n \times n}

. Theidentity tensor

I \in F_{n_{3}}^{n \times n}

is defined such that its frontal slices in the transform domain are all identity matrices. Formally, for

i = 1, 2, \dots, p

,

L {(I)}_{:, :, i} = I_{n},

where

I_{n}

denotes the

n \times n

identity matrix.

It is clear that

A *_{L, R} I = I *_{L, R} A = A

. It is easy to say that

A *_{L, R} (I + N) = (I + N) *_{L, R} A = A

, for all

N \in K e r (L)

.

Definition 9

(

*_{L, R}

unitary tensor). Let

Q \in C_{n_{3}}^{n \times n}

.

Q

is said to beunitaryif

Q^{H} *_{L, R} Q = Q *_{L, R} Q^{H} = J

.

It is clear that if

Q

is unitary, then

Q + N

for all

N \in K e r (L)

is also unitary according to the definition above.

Definition 10

(F-diagonal/F-upper/F-lower tensor). Let L be the right invertible operator between

C_{n_{3}}^{n_{1} \times n_{2}}

and

C_{p}^{n_{1} \times n_{2}}

and

A \in F_{n_{3}}^{n_{1} \times n_{2}}

. Then,

A

is called anF-diagonal/F-upper/F-lower tensorif all frontal slices

L {(A)}^{(i)}, i = 1, 2, \dots, p

of

A

are diagonal/upper triangular/lower triangular matrices in the transform domain.

Having established the requisite algebraic framework, we are equipped to define a

*_{L, R}

product-based tensor singular value decomposition. This formulation closely parallels the L SVD introduced in Reference[20].

Theorem 2

(

*_{L, R}

SVD). Let L be the right invertible operator defined by matrix M of size

p \times n_{3}

between

F_{n_{3}}^{n_{1} \times n_{2}}

and

F_{p}^{n_{1} \times n_{2}}

, R be its inverse defined by matrix N of size

n_{3} \times p

between

F_{p}^{n_{1} \times n_{2}}

and

F_{n_{3}}^{n_{1} \times n_{2}}

. Let

A \in F_{n_{3}}^{n_{1} \times n_{2}}

; then, we have the following

*_{L, R}

SVD:

R L (A) = U *_{L, R} S *_{L, R} V^{H}

where

U \in F_{n_{3}}^{n_{1} \times n_{1}}

and

V \in F_{n_{3}}^{n_{2} \times n_{2}}

are unitary tensors and

S \in F_{n_{3}}^{n_{1} \times n_{2}}

is an F-diagonal tensor. The main steps for computing the tensor

*_{L, R}

SVD are summarised in the following algorithm.

Definition 11.

Define the $*_{L, R}$ SVD error $E$ of the

*_{L, R}

SVD decomposition as

E = A - U *_{L, R} S *_{L, R} V^{H},

where

R L (A) = U *_{L, R} S *_{L, R} V^{H} .

When L is an invertible linear transformation, the

*_{L, R}

SVD error

E

of the

*_{L, R}

SVD decomposition is a zero tensor.

Algorithm 2

*_{L, R}

SVD under the

*_{L, R}

product

Now, we can define the tensor multirank and tubal rank associated with the tensor

*_{L, R}

product and the

*_{L, R}

SVD as follows.

It is known that the singular values of a matrix have a decreasing order property. Let

R L (A) = U *_{L, R} S *_{L, R} V^{H}

be the

*_{L, R}

SVD of

A \in F_{n_{3}}^{n_{1} \times n_{2}}

. The entries on the diagonal of the first frontal slice

S (:, :, 1)

have the same decreasing property, i.e.,

S (i, i, 1) \geq S (j, j, 1) \geq 0, \forall 1 \geq i < j \geq n^{'},

where

n^{'} = min (n_{1}, n_{2})

. This property holds since the matrix N in the definition of the

*_{L, R}

product gives

S (i, i, 1) = N v e c (\hat{S} (i, i, :)),

and the entries on the diagonal of

\hat{S} (:, :, j)

are the singular values.

Definition 12

(

*_{L, R}

tensor multirank and

*_{L, R}

tubal rank). Let L be the right invertible operator defined by matrix M of size

p \times n_{3}

between

F_{n_{3}}^{n_{1} \times n_{2}}

and

F_{p}^{n_{1} \times n_{2}}

, R be its inverse defined by matrix N of size

n_{3} \times p

between

F_{p}^{n_{1} \times n_{2}}

and

F_{n_{3}}^{n_{1} \times n_{2}}

. Let

A \in F_{n_{3}}^{n_{1} \times n_{2}}

, the $*_{L, R}$ tubal rankof

A

under the

*_{L, R}

product is defined by the number of nonzero tubes of

S

, where

S

is obtained from the

*_{L, R}

SVD, i.e.,

{rank}_{*_{L, R t}} (A) = card ({i ∣ S (i, i, :) \neq 0}) = card ({i ∣ S (i, i, 1) \neq 0})

Definition 13

(

*_{L, R}

tensor average rank). For a tensor

A \in F^{n_{1} \times n_{2} \times n_{3}}

, the $*_{L, R}$ tensor average rankof

A

, denoted as

r a n k_{*_{L, R a}} (A)

, is defined as

r a n k_{*_{L, R a}} (A) = \frac{1}{p} \sum_{i = 1}^{p} rank (L {(A)}^{(i)})

where

L {(A)}^{(i)}

represents the i-th frontal slice of

A

in the transform domain and L is the right invertible linear transformation used in the definition of the

*_{L, R}

product.

Theorem 3.

Let

A \in F^{n_{3} \times n_{3}}

be an invertible matrix, and let

M_{p \times n_{3}}

be the matrix formed by retaining the first p rows of

M_{1}

. Then, for any tensor

A

,

{rank}_{*_{L R t p_{1}}} (A) \leq {rank}_{*_{L R t p_{2}}} (A)

with

p_{1} \leq p_{2}

, where

{rank}_{*_{L R t p_{i}}} (A)

is the $*_{L, R}$ tubal rankof

A

associated with

M_{p_{i} \times n_{3}}

,

i = 1, 2

.

Note that similar conclusions hold for the $*_{L, R}$ tensor average rank.

Theorem 4.

For any tensor

A \in R_{n_{3}}^{n_{1} \times n_{2}}

, the relationship between the

*_{L, R}

tensor average rank and the

*_{L, R}

tensor tube rank of

A

satisfies

{rank}_{*_{L, R a}} (A) \leq {rank}_{*_{L, R t}} (A)

Proof.

Given that the

*_{L, R a}

SVD of

A

has k nonzero singular values in

A (:, :, 1)

, that is,

{rank}_{*_{L, R t}} (A) = card ({i ∣ S (i, i, 1) \neq 0}) = k

, the inequality of the singular value of each frontal slice

S (:, :, i)

decreases. Therefore, for each i, we have:

{rank}_{*_{L, R t}} (A) \geq rank (A (:, :, i))

Summing the above inequality over all i and taking the average, we obtain

\frac{1}{n_{3}} \sum_{i = 1}^{n_{3}} {rank}_{*_{L, R t}} (A) = {rank}_{*_{L, R t}} (A) \geq \frac{1}{n_{3}} \sum_{i = 1}^{n_{3}} rank (A (:, :, i))

Thus, we have

{rank}_{*_{L, R t}} (A) \geq {rank}_{*_{L, R a}} (A)

This theorem demonstrates that the

*_{L, R}

tensor average rank is always less than or equal to the

*_{L, R}

tensor tube rank. This relationship indicates that the

*_{L, R}

tensor tube rank can be seen as an upper bound on the average rank of the frontal slices of the tensor. This result is significant in tensor decomposition and low-rank approximation problems, especially in the design and analysis of robust tensor decomposition algorithms.

Theorem 5.

The

*_{L, R}

average rank of a tensor

A

is less than or equal to its CP rank.

{rank}_{*_{L, R a}} (A) \leq {rank}_{CP} (A)

Proof.

Assume that the CP decomposition of the tensor

A

is

A = \sum_{r = 1}^{R} λ_{r} a_{r} \circ b_{r} \circ c_{r},

where R is the CP rank Consider the i-th frontal slice

L {(A)}^{(i)}

of the tensor

A

in the transform domain. According to the CP decomposition, each frontal slice can be expressed as

L {(A)}^{(i)} = \sum_{r = 1}^{R} λ_{r} L {(\vec{c_{r}})}_{i} {\vec{a}}_{r} {\vec{b}}_{r}^{T}

. Since

L {(A)}^{(i)}

is a linear combination of R rank-1 matrices, the rank of

L {(A)}^{(i)}

is at most R:

rank (L {(A)}^{(i)}) \leq R .

According to the definition of

*_{L, R}

tensor average rank, we have

{rank}_{*_{L, R a}} (A) = \frac{1}{p} \sum_{i = 1}^{p} rank (L {(A)}^{(i)}) .

Since each

L {(A)}^{(i)}

has a rank of at most R, we obtain

{rank}_{*_{L, R a}} (A) \leq \frac{1}{p} \sum_{i = 1}^{p} R = R .

Therefore, we have

{rank}_{*_{L, R a}} (A) \leq {rank}_{CP} (A) .

This completes the proof. □

Recall the inequality

{rank}_{C P} (A) \leq {rank}_{Tucker} (A)

, we have

Corollary 1.

The

*_{L, R}

average rank of a tensor

A

is less than or equal to its Tucker rank.

{rank}_{*_{L, R a}} (A) \leq {rank}_{Tucker} (A)

These results provide insights into the structure of tensors and are useful in various applications involving tensor decompositions and low-rank approximations.

4. Tensor Robust Principal Component Analysis with Novel Tensor Multiplication

In the field of image processing, the low-rank property of images is an important characteristic. However, the low-rank function is nonconvex and highly sensitive to parameter choices and noise, making the optimisation problem challenging to solve. Because the properties of the rank function are poor, researchers have proposed the use of the nuclear norm, which approximates the tensor rank. This convex relaxation of the tensor rank not only simplifies the optimisation problem but also improves its convergence properties and robustness. In the following, the

*_{L, R}

nuclear norm is defined as a measure that captures the low-rank structure of the tensor after the right invertible linear transformation pair

(L_{p \times n_{3}}, R_{n_{3} \times p})

is applied.

4.1. Tensor Nuclear Norm with New Tensor Multiplication

Definition 14.

Let

A \in F_{n_{3}}^{n_{1} \times n_{2}}

and

M_{n_{3} \times p}

be a right invertible matrix for defining the

*_{L, R}

product, and then define thenuclear normof

A

under the

*_{L, R}

product as follows:

{∥ A ∥}_{*_{L, R}} = \sum_{i = 1}^{p} {∥ L {(A)}^{(i)} ∥}_{*}

By this definition, we have

{∥ A + N ∥}_{*_{L, R}} = {∥ A ∥}_{*_{L, R}}

for all

N \in K e r (L)

.

Since both definitions are based on slices, combining the conclusion that the nuclear norm is the convex envelope of the matrix rank, the convex envelope of the tensor average rank

{rank}_{*_{L, R a}} (A)

in the meaning of the

*_{L, R}

product is the tensor nuclear norm

{∥ A ∥}_{*_{L, R}}

.

The standard alternating direction method of multipliers (ADMM) is used in the process of solving the tensor robust principal component analysis with tensor nuclear norm problem. A crucial step in this process is the computation of the proximal operator of the tensor nuclear norm (TNN).

The optimisation subproblem for the TRPCA can be formulated as follows:

min_{X \in R_{n_{3}}^{n_{1} \times n_{2}}} {τ ∥ X ∥}_{*_{L, R}} + \frac{1}{2} {∥ X - Y ∥}_{F}^{2},

(2)

where

{∥ \cdot ∥}_{*_{L, R}}

denotes the tensor nuclear norm based on the

*_{L, R}

product,

{∥ \cdot ∥}_{F}

denotes the Frobenius norm, and

τ > 0

is a regularisation parameter.

We provide an approximate solution for this optimisation problem[21], which can be interpreted as the proximal operator of the matrix nuclear norm. Let

R L (Y) = U *_{L, R} S *_{L, R} V^{*}

represent the

*_{L, R}

singular value decomposition (SVD) of tensor

Y \in R_{n_{3}}^{n_{1} \times n_{2}}

. For each

τ > 0

, we define the tensor singular value thresholding (

*_{L, R}

SVT) operator as follows:

R (D_{τ} (L (Y))) = U *_{L, R} S_{τ} *_{L, R} V^{*},

(3)

where

S_{τ} = R ({(\hat{S} - τ)}_{+}, [3]) .

(4)

where

R (\cdot, k)

is the R transformation in the

*_{L, R}

product, which applies the soft-thresholding operation to the frontal slices of the tensor along the k-th dimension. Specifically, for each frontal slice

S (:, :, i)

of the tensor

S

, the soft-thresholding operation is given by

{(L (S) (:, :, i) - τ)}_{+} = max (L (max (S) (:, :, i) - τ, 0) .

Note that

L (S)

represents the singular value tensor obtained from the SVD decomposition of

L (Y)

in the transform field. The notation

r_{+}

denotes the positive part of r, i.e.,

r_{+} = max (r, 0)

. This operator applies a soft-thresholding rule to the singular values of the frontal slices of

L (Y)

, effectively shrinking these values towards zero. The

*_{L, R}

SVT operator is the proximity operator associated with the

*_{L, R}

TNN.

This formulation ensures that the solution

R (D_{τ} (R L (Y)))

is a low-rank approximation of the original tensor

Y

while maintaining a balance between rank minimisation and the data fidelity term. The

*_{L, R}

SVT operator effectively reduces the rank of the tensor by thresholding the singular values, making it a powerful tool in tensor completion and robust principal component analysis tasks.

Theorem 6.

Let

A \in F_{n_{3}}^{n_{1} \times n_{2}}

,

τ > 0

, and L be a right inverse linear transformation. Let the matrix representation of L be

M_{p \times n_{3}}

, which consists of the first p rows of an orthogonal matrix. Then,

R (D_{τ} (L (A))) = arg min_{Y \in F^{n_{1} \times n_{2} \times n_{3}}} \{τ {|| Y ||}_{*_{L, R}} + \frac{1}{2} {∥ L (X - Y) ∥}_{F}^{2}\},

where

{|| \cdot ||}_{*_{L, R}}

denotes the tensor nuclear norm based on the

*_{L, R}

product,

{∥ \cdot ∥}_{F}^{2}

denotes the Frobenius norm, and

τ > 0

is a regularisation parameter.

The closed-form expression for the

*_{L, R}

SVT operator

D_{τ} (Y)

is a natural extension of the t-SVT operator[3].

Proof.

By using the properties of the Frobenius norm and the tensor nuclear norm under orthogonal transformations:

{|| A ||}_{F} = {|| \bar{A} ||}_{F},

and

{|| A ||}_{*} = {|| \tilde{A} ||}_{*},

where

\bar{A}

and

\tilde{A}

are the matrices and tensors obtained through the orthogonal transformations L and R.

Since L and R are orthogonal transformations, they preserve the Frobenius norm and the nuclear norm, i.e.,

{L A}_{F} = {|| A ||}_{F} and {|| R A ||}_{F} = {|| A ||}_{F},

and

{|| L A ||}_{*} = {|| A ||}_{*} and {|| R A ||}_{*} = {|| A ||}_{*} .

Therefore, the original problem can be written as follows:

arg min_{X} (τ {|| X ||}_{*} + \frac{1}{2} {∥ X - Y ∥}_{F}^{2}) .

By Theorem 2.1 in Reference[7], the i-th frontal slice of

D_{τ} (Y)

solves the i-th subproblem of the above minimisation problem. Hence,

D_{τ} (Y)

solves the entire problem. □

Theorem 7.

Let

A \in F_{n_{3}}^{n_{1} \times n_{2}}

and

τ > 0

. Let

M_{p \times n_{3}}

be the matrix corresponding to L, satisfying

∥ \frac{(X - Y) \times_{3} {M ∥}_{F}^{2}}{{(X - Y) ∥}_{F}^{2}} \approx 1

, then

R (D_{τ} (L (A))) \approx arg min_{Y \in F_{n_{3}}^{n_{1} \times n_{2}}} \{τ {|| Y ||}_{*_{L, R}} + \frac{1}{2} {∥ X - Y ∥}_{F}^{2}\},

where

{|| \cdot ||}_{*_{L, R}}

denotes the tensor nuclear norm based on the

*_{L, R}

product,

{∥ \cdot ∥}_{F}^{2}

denotes the Frobenius norm, and

τ > 0

is a regularisation parameter.

Proof.

Let

M_{p \times n_{3}}

be the matrix corresponding to the truncated orthogonal transformation L. The condition

∥ \frac{(X - Y) \times_{3} {M ∥}_{F}^{2}}{{(X - Y) ∥}_{F}^{2}} \approx 1

implies that

By Theorem 6, we have

□

Algorithm 3

*_{L, R}

Tensor Singular Value Thresholding (

*_{L, R}

SVT)

4.2. ADMM Algorithm for Tensor Robust Principal Component Analysis

We consider the following optimisation problem

min_{L, E} {∥ L ∥}_{*_{L, R}} + λ {∥ E ∥}_{1} subject to L + E = X

(5)

where

L

is the low-rank tensor;

E

is the sparse tensor;

X

is the observed tensor;

{∥ L ∥}_{*_{L, R}}

is the tensor nuclear norm (low-rank regularisation term);

{∥ E ∥}_{1}

is the tensor

L_{1}

norm (sparsity regularisation term); and

λ

is the balancing parameter between low-rank and sparsity.

To solve this problem via the alternating direction method of multipliers (ADMM), we introduce the augmented Lagrangian function as follows:

L_{μ} (L, E, Y) = {∥ L ∥}_{*_{L, R}} + {λ ∥ E ∥}_{1} + 〈 Y, L + E - X 〉 + + \frac{μ}{2} {∥ L + E - X ∥}_{F}^{2}

(6)

where

Y

is the Lagrange multiplier and

μ

is the penalty parameter.

The ADMM algorithm updates

L

,

E

, and

Y

iteratively.

Update $L$ :

L^{k + 1} = arg min_{L} {∥ L ∥}_{*_{L, R}} + \frac{μ}{2} {∥ L + E^{k} - X + \frac{1}{μ} Y^{k} ∥}_{F}^{2}

(7)

Update $E$ :

E^{k + 1} = arg min_{E} {λ ∥ E ∥}_{1} + \frac{μ}{2} {∥ L^{k + 1} + E - X + \frac{1}{μ} Y^{k} ∥}_{F}^{2}

(8)

Update $Y$ :

Y^{k + 1} = Y^{k} + μ (L^{k + 1} + E^{k + 1} - X)

(9)

Convergence Conditions:

∥ L^{k + 1} - L^{k} ∥_{F} < ϵ, ∥ E^{k + 1} - E^{k} ∥_{F} < ϵ, {∥ L^{k + 1} + E^{k + 1} - X ∥}_{F} < ϵ,

where

ϵ

is a small positive tolerance.

In the following, we present the solution of the subproblem in the ADMM.

Solution of Update $L$ : Approximate solution of the subproblem to update the low-rank tensor

L

L^{k + 1} = arg min_{L} {∥ L ∥}_{*_{L, R}} + \frac{μ}{2} {∥ L + E^{k} - X + \frac{1}{μ} Y^{k} ∥}_{F}^{2}

(10)

First, we set

Z = E^{k} - X + \frac{1}{μ} Y^{k}

then, the objective function can be rewritten as

L^{k + 1} = arg min_{L} {∥ L ∥}_{*_{L, R}} + \frac{μ}{2} {∥ L - (- Z) ∥}_{F}^{2}

This problem can be solved via the tensor singular value thresholding operation

R (D_{τ} (L (A)))

, as defined in Equation 3; here, we denote it as t-SVT. The t-SVT operation is defined as

t - SVT (T, τ) \approx arg min_{L} {∥ L ∥}_{*_{L, R}} + \frac{1}{2 τ} {∥ L - T ∥}_{F}^{2}

where

τ

is the threshold parameter. In our subproblem,

T = - Z

and

τ = \frac{1}{μ}

; thus,

L^{k + 1} = t - SVT (- Z, \frac{1}{μ})

Substituting back

Z

, we obtain

L^{k + 1} = t - SVT (X - E^{k} - \frac{1}{μ} Y^{k}, \frac{1}{μ})

Solution of Update $E$ : Subproblem 2 can be solved via the soft-thresholding operator

E^{k + 1} = SoftShrink (X - L^{k + 1} - \frac{1}{μ} Y^{k}, \frac{λ}{μ})

where the soft-thresholding operator is defined as

SoftShrink (x, τ) = sign (x) \cdot max (| x | - τ, 0)

This operation effectively shrinks the singular values of the tensor towards zero, promoting a low-rank structure in the tensor

X

.

4.3. Computational Complexity Analysis

In this section, we analyse the computational complexity of the proposed method, which uses the alternating direction method of multipliers (ADMM) to solve the tensor robust principal component analysis (TRPCA) problem. The initialisation step, involving the setup of the low-rank component

L

, the sparse noise component

E

, and the penalty term

Y

, has a complexity of

O (n_{1} n_{2} n_{3})

, where

n_{1}

,

n_{2}

, and

n_{3}

are the dimensions of the tensor. Each ADMM iteration consists of updating

L_{k + 1}

by performing singular value decomposition (SVD) on each of the p slices of the transformed tensor, resulting in a complexity of

O (p \cdot min (n_{1}^{2} n_{2}, n_{1} n_{2}^{2}))

. Updating

E_{k + 1}

via soft thresholding has a complexity of

O (n_{1} n_{2} n_{3})

, whereas updating

Y_{k + 1}

and checking the convergence conditions both have a complexity of

O (n_{1} n_{2} n_{3})

. Assuming k iterations, the total complexity of the ADMM process is

O (k \cdot (n_{1} n_{2} n_{3} + p \cdot min (n_{1}^{2} n_{2}, n_{1} n_{2}^{2})))

. From the complexity analysis, it is evident that the first dimension p of the right inverse matrix has a linear relationship with the computational complexity.

Algorithm 4 Solve (5) via ADMM

5. Numerical Experiments

In this section, we apply the

*_{L, R}

TRPCA algorithm to both randomly generated synthetic data and real-world data for grey video denoising and motion detection. This approach allows us to evaluate the performance of the algorithm and demonstrate its advantages. All the experiments were conducted on a personal computer equipped with an Intel(R) Core(TM) i7-9750H 2.30 GHz processor and 8 GB of RAM. We conduct our experiments via Python. We applied the

*_{L, R}

TRPCA algorithm to recover the original tensor from those corrupted by random noise. To simulate sparse noise interference in real-world applications, we added sparse noise to the original data tensor

X

. Specifically, we first created a zero matrix sparse_noise with the same shape as

X

. Next, we randomly selected 1% of the elements in

X

as noise points, ensuring that each position was chosen only once. For these selected positions, we assigned random values drawn from a uniform distribution

[0, 1]

. Finally, we added the generated sparse noise matrix sparse_noise to the original data

X

to obtain the noisy data

X_{bar}

.

We use the peak signal-to-noise ratio (PSNR) for each slice along the third dimension, defined as

{PSNR}_{k} = 10 {log}_{10} (\frac{{∥ M (:, :, k) ∥}_{2}^{2}}{\frac{1}{n_{1} n_{2}} {∥ X (:, :, k) - \hat{M} (:, :, k) ∥}_{F}^{2}}),

(11)

To evaluate the recovery performance of each slice,

M (:, :, k)

represents the k-th slice along the third dimension of the original tensor data;

\hat{M} (:, :, k)

represents the k-th slice along the third dimension of the recovered tensor data after processing;

X (:, :, k)

represents the k-th slice along the third dimension of the observed tensor data, which may contain noise or other distortions; and

n_{1}, n_{2}

represents the sizes of the first and second dimensions of the tensor slices. The PSNR is a measure of the quality of the recovered data compared with the original data. Higher PSNR values indicate better recovery performance.

The relative squared error (RSE) for each slice along the third dimension, used to measure the relative error between the recovered data and the original data, is defined as follows:

{RSE}_{k} = \frac{∥ X (:, :, k) - \hat{M} {(:, :, k) ∥}_{F}^{2}}{∥ X (:, :, k) - \bar{X} {(:, :, k) ∥}_{F}^{2}},

(12)

where

\bar{X} (:, :, k)

represents the mean of the k-th slice along the third dimension of the observed tensor data, i.e., the average value of all the elements at each position in the slice. The RSE measures the relative squared error between the recovered data and the original data. Lower RSE values indicate better recovery performance. We choose the PSNR, relative squared error (RSE), and runtime as the evaluation metrics to assess the denoising performance of various

*_{L, R}

TRPCA methods.

The right invertible matrix is obtained by removing some rows from a

200 \times 200

discrete cosine transform (DCT) matrix. The size parameter denotes the number of rows retained from the DCT matrix. Specifically, we set the size p to 20, 40, 60, 100, and 200. When the size is 200, it corresponds to the classical TRPCA method, which is based on the c-product[20]. The classical TRPCA method, which is based on the c-product method, outperforms robust principal component analysis (RPCA) and sparse nonnegative matrix factorisation (SNN) by providing better recovery accuracy and noise reduction in the processed data by Reference[15]. Therefore, in our comparative experiments, we only compare different size cases. On the basis of the related literature and our experimental calculations, we set

λ

to

λ = 30 \times {(max (m, n) \times l)}^{- 0.5} .

5.1. Synthetic Data

In this section, we evaluate the performance of the proposed

*_{L, R}

nuclear norm-based TRPCA method using synthetic data. The synthetic data are generated as a random tensor of dimensions

100 \times 100 \times 200

normalised to the range [0, 1]. Sparse noise is added to the original tensor, affecting approximately

1 %

of the elements. The noisy tensor is then used as input to the TRPCA algorithm. For simplicity, we will use the condition

\frac{∥ X \times_{3} {M ∥}_{F}^{2}}{{∥ X ∥}_{F}^{2}}

to approximate the analysis of

\frac{∥ (X - Y) \times_{3} {M ∥}_{F}^{2}}{{∥ (X - Y) ∥}_{F}^{2}}

.

The synthetic data are generated as a random tensor

X

of dimensions

100 \times 100 \times 200

, normalised to the range [0, 1]. We compute the ratio

\frac{∥ X \times_{3} {M ∥}_{F}^{2}}{{∥ X ∥}_{F}^{2}}

for different sizes of the right invertible matrix M.

Table 1 shows the computed ratios for different sizes of the right invertible matrix M. These ratios indicate that for artificial random data, the ratio

\frac{∥ X \times_{3} {M ∥}_{F}^{2}}{{∥ X ∥}_{F}^{2}}

approaches 1 as the size of M increases.

The experiment involves varying the size of the right invertible linear transformation matrix, denoted as p, from 20 to 200 in steps. For each size, the TRPCA algorithm is applied to the noisy tensor, and the results are evaluated via the peak signal-to-noise ratio (PSNR) and relative squared error (RSE) metrics. The PSNR and RSE values are calculated for all frames of the tensor.

The results show that the proposed method effectively recovers the low-rank component of the tensor while maintaining computational efficiency and stability. The PSNR values increase as the size of the right invertible matrix increases, indicating better recovery performance. The RSE values also decrease, confirming the improved accuracy of the low-rank approximation.

Table 2 summarises the runtime of the TRPCA algorithm for different sizes of the right invertible matrix. As the size increases, the runtime also increases, but the improvement in PSNR and reduction in RSE justify the additional computational cost.

Figure 1 and Figure 2 provide visualisations of the PSNR and RSE values for different sizes of the right invertible matrix. For synthetic data, the tensor recovery performance of the algorithm with smaller right inverse matrices has a noticeable gap compared with that when the size is 200.

5.2 Video Denoising and Anomaly Detection

In these experiments, we used the Test001 subset from the UCSD_Anomaly_Dataset directory. The UCSD anomaly detection dataset is a widely used benchmark for evaluating anomaly detection algorithms in video surveillance. Test001 includes 200 frames from a video sequence, which we used to construct a video tensor. For more information about the dataset, please refer to UCSD Anomaly Detection Dataset.

The video data are represented as a tensor of dimensions

158 \times 238 \times 200

. We compute the ratio

\frac{∥ X \times_{3} {M ∥}_{F}^{2}}{{∥ X ∥}_{F}^{2}}

for different sizes of the right invertible matrix M.

Table 3 shows the computed ratios for different sizes of the right invertible matrix M. These ratios indicate that for real data, the ratio

\frac{∥ X \times_{3} {M ∥}_{F}^{2}}{{∥ X ∥}_{F}^{2}}

approaches 1 more closely than synthetic data do. This confirms that real data are more consistent with the condition

\frac{∥ X \times_{3} {M ∥}_{F}^{2}}{{∥ X ∥}_{F}^{2}} \approx 1 .

Figure 3 shows the PSNR values for different sizes across various frames. Figure 4 shows the RSE values for different sizes across various frames. Combining the information from the previous text and the new details about Figure 3 and Figure 4, we can analyse the experimental results as follows.

We can deduce that for most frames, the PSNR values remain relatively close across different sizes, suggesting consistent recovery quality. Interestingly, in certain specific frames, smaller sizes (such as 20 and 40) manage to achieve slightly higher PSNR values than larger sizes (such as 100 and 200), suggesting a possible advantage in terms of recovery performance.

Similarly, Figure 4 shows the relative squared error (RSE) values for different sizes across various frames. Upon examining this figure, we notice that for most frames, the RSE values are relatively close across different sizes, further supporting the notion of consistent recovery quality. Once again, in some specific bands, smaller sizes (e.g., 20 and 40) display slightly lower RSE values than larger sizes (e.g., 200), reinforcing the trend observed in the PSNR analysis.

In Table 4, we present the runtime analysis for the

*_{L, R}

TRPCA method applied to video sequences with different DCT truncation sizes. The DCT truncation sizes considered are 20, 40, 60, 100, and 200. The runtime for each size is measured in seconds. As shown in Table 4, the runtime increases significantly with increasing DCT truncation size. This trend indicates that larger DCT truncation sizes, although providing better recovery performance, come at the cost of increased computational time. Therefore, a trade-off between recovery quality and computational efficiency must be considered when selecting the appropriate DCT truncation size for practical applications.

On the basis of the numerical results, several conclusions can be drawn regarding the impact of varying truncation levels in the definition of the

*_{L, R}

TRPCA algorithm using different-sized right invertible matrices derived from the discrete cosine transform (DCT).

First, in Figure 5, the top row presents the original images from frames 40, 70, and 120, whereas the second row shows the corresponding noisy versions. The other row shows the recovery results achieved via the

*_{L, R}

TRPCA method with a right invertible matrix constructed from the different truncations of the DCT matrix. The bottom row shows the recovery results when the

*_{L, R}

TRPCA approach with a right invertible matrix formed by the top 200 rows of the DCT matrix. In Figure 6, the top row again features the original images from the same set of frames, followed by the noisy versions. The other row shows the residual results after denoising via the

*_{L, R}

TRPCA method with a right invertible matrix constructed from the different truncations of the DCT matrix. Notably, these recovered images are significantly closer to the original images, suggesting that incorporating more DCT coefficients enhances the recovery process, albeit at the cost of increased computational complexity. These figures highlight the varying recovery efficacy across different frames within the video sequence. For example, frame 120, where a child quickly rides a bicycle into the scene, yields less satisfactory recovery results than slower-motion scenes such as frames 40 and 70 do. This observation highlights the impact of anomalous situations on video recovery, particularly how fast movements generally compromise recovery accuracy. Although the recovery results are not as good for larger sizes, this method can still detect anomalous situations effectively.

Nonetheless, adopting larger TRPCA truncation sizes enables superior denoising and recovery outcomes even in regions with rapid motion. Consequently, adjusting the number of DCT coefficients employed in the

*_{L, R}

TRPCA method offers a trade-off between recovery quality and computational efficiency. Specifically, smaller sizes yield higher computational efficiency for frames with slow image transformations, whereas larger sizes enhance recovery quality in the presence of fast-moving objects because of the accelerated pixel variations at the same location, which facilitates dynamic detection.

To optimise both recovery quality and computational efficiency in video sequence denoising and recovery tasks, it is advisable to segment the video according to its motion intensity. Smaller transformation sizes can be applied to sections with slow motion to conserve computation time, whereas larger sizes can be utilised in parts with rapid motion to ensure better precision. This adaptive strategy strikes a balance between the two objectives, ensuring optimal performance tailored to specific application demands.

In light of these findings, it becomes evident that the

*_{L, R}

TRPCA method offers a balance between recovery quality and computational efficiency. Smaller sizes tend to provide competitive recovery performance in certain situations, whereas larger sizes may offer advantages in other situations. By adapting the size of the right invertible matrix constructed from the DCT according to the motion intensity in the video sequence, users can strike a balance between speed and accuracy, optimising the overall performance of the algorithm for their specific needs. This adaptability makes the

*_{L, R}

TRPCA method is a practical choice for real-world applications, particularly when factors such as dynamic detection and computational efficiency are considered.

In summary, the

*_{L, R}

TRPCA method effectively balances recovery quality and computational efficiency, rendering it a practical choice for real-world scenarios. When prioritising anomaly detection, smaller sizes should be considered because of their enhanced computational efficiency. Thus, careful consideration must be given to balancing recovery quality and computational efficiency in practical applications to attain optimal performance.

6. Conclusion

This paper introduces a new method for tensor robust principal component analysis (TRPCA) by extending traditional tensor product definitions via right-invertible linear transformations. This extension addresses the limitations of existing tensor product algorithms in terms of the flexibility of linear transformations and significantly improves computational efficiency, reducing both runtime and storage requirements. As a result, the method is particularly well suited for handling large-scale multilinear datasets.

A new class of tensor products for third-order tensors is introduced via right-invertible linear transformations. This generalisation of existing tensor product definitions provides a more flexible framework for tensor operations. A novel TRPCA model is proposed that leverages these new tensor products to increase computational efficiency and robustness. The use of the tensor nuclear norm in this model offers a more efficient and robust solution for tensor data.

Theoretical analyses are provided for the exact recovery of low-rank and sparse components via the new tensor nuclear norm. Under certain conditions, the solution to the convex TRPCA model can perfectly recover the underlying low-rank and sparse components. The recovery guarantees of traditional robust principal component analysis (RPCA) are special cases of this broader framework.

Extensive numerical experiments were conducted to validate the efficiency and effectiveness of the proposed algorithms in various applications, including image recovery and background modelling. The results demonstrate that the

*_{L, R}

TRPCA method outperforms traditional TRPCA on the basis of the c-product, which is a state-of-the-art method. The method is highly efficient and scalable, making it suitable for processing large volumes of data in applications such as hyperspectral imaging and video processing.

The reduced computational requirements and running time make the method ideal for use in embedded systems, mobile devices, and remote sensing applications where resources are limited. The method’s efficiency also ensures that it can handle real-time data processing, making it valuable for applications such as surveillance systems, financial market analysis, and healthcare monitoring. The potential future work includes extending the proposed tensor products and TRPCA model to higher-order tensors and further optimising the method’s computational efficiency and scalability for even larger datasets.

Potential applications of the

*_{L, R}

TRPCA method in new domains, such as medical imaging, climate modelling, and social network analysis, are explored. In conclusion, the

*_{L, R}

TRPCA method provides a robust and efficient solution for tensor data analysis, maintaining high recovery quality while significantly reducing running time. This makes it a valuable tool for a wide range of applications, especially in big data scenarios and resource-limited environments.

Author Contributions

Conceptualization, H.Z. and J.F.; methodology, H.Z.; software, H.Z.; validation, H.Z. and W.L.; formal analysis, H.Z.; investigation, H.Z.; resources, J.F.; data curation, H.Z.; writing—original draft preparation, H.Z.; writing—review and editing, H.Z.; visualization, H.Z.; supervision, W.L.; project administration, J.F.; funding acquisition, J.F. All authors have read and agreed to the published version of the manuscript.

Funding

Feng Jun’s work was supported in part by the Natural Science Foundation of Sichuan Province (Grant Nos. 2023NSFSC0020 and 2024NSFSC0083)

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors wish to thank the reviewers for suggestions and comments that helped to improve the manuscript.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Candès, E.J.; Li, X.D.; Ma, Y.; Wright, J. Robust principal component analysis? J. ACM. 2011, 58, 1–37. [Google Scholar] [CrossRef]
Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; Ma, Y. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 171–184. [Google Scholar] [CrossRef] [PubMed]
Lu C.; Feng J.; Chen Y.; Liu W.; Lin Z.; Yan S. Tensor robust principal component analysis: exact recovery of corrupted low-rank tensors via convex optimization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, 5249–5257.
Netrapalli, P.; Niranjan, UN.; Sanghavi, S.; Anandkumar, A.; Jain, P. Non-convex Robust PCA. In: Advances in Neural Information Processing Systems. 2014, 27, 5443–5457. [Google Scholar]
Lu C.; Peng X.; Wei Y. Low-rank tensor completion with a new tensor nuclear norm induced by invertible linear transforms. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, 5989-5997.
Geng, X.; Guo, Q.; Hui, S.; Yang, M.; Zhang, C. Tensor robust PCA with nonconvex and nonlocal regularization. Comput. Vis. Image Underst. 2024, 243, 104007. [Google Scholar] [CrossRef]
Cai, J.F.; Candès, E.J.; Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 2010, 20, 1956–1982. [Google Scholar] [CrossRef]
Fazel M. Matrix Rank Minimization with Applications. PhD thesis, Stanford University, Stanford, CA, 2002.
Mu C.; Huang B.; Wright J.; Goldfarb D. Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery. In: Proceedings of the 31st International Conference on Machine Learning (ICML). 2014, 73-81.
Wright, J.; Ganesh, A.; Rao, S.; Peng, Y.; Ma, Y. Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. Advances in Neural Information Processing Systems. 2009, 22, 16977–16989. [Google Scholar]
Cai H.; Liu J.; Yin W. Learned Robust PCA: A Scalable Deep Unfolding Approach for High-Dimensional Outlier Detection. In: Advances in Neural Information Processing Systems (NeurIPS), 2021, 16977–16989.
Kolda, T.G.; Bader, B.W. Tensor Decompositions and Applications. SIAM Rev. 2009, 51, 455–500. [Google Scholar] [CrossRef]
Kilmer, M.E.; Braman, K.; Hao, N.; Hoover, R.C. Third-order tensors as operators on matrices: a theoretical and computational framework with applications in imaging. SIAM J. Matrix Anal. Appl. 2013, 34, 148–172. [Google Scholar] [CrossRef]
Zhang, A. , Xia D. Tensor SVD: statistical and computational limits. IEEE Trans. Inf. Theory 2018, 64, 7311–7338. [Google Scholar] [CrossRef]
Lu, C.; Feng, J.; Chen, Y.; Liu, W.; Lin, Z.; Yan, S. Tensor robust principal component analysis with a new tensor nuclear norm. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 925–938. [Google Scholar] [CrossRef] [PubMed]
Xu, Z.; Yang, J.H.; Wang, L.C.; Wang, F.; Yan, H.X. Tensor robust principal component analysis with total generalized variation for high-dimensional data recovery. Appl. Math. Comput. 2024, 483, 128980. [Google Scholar] [CrossRef]
Tang, K.; Fan, Y.; Song, Y. Improvement of robust tensor principal component analysis based on generalized nonconvex approach. Appl. Intell. 2024, 54, 7377–7396. [Google Scholar] [CrossRef]
Kilmer, M.E.; Martin, C.D. Factorization strategies for third-order tensors. Linear Algebra Appl., 2011, 435, 641–658. [Google Scholar] [CrossRef]
Keegan K.; Newman E. Projected tensor-tensor products for efficient computation of optimal multiway data representations. arXiv preprint:2409.19402..2024.
Kernfeld, E.; Kilmer, M.; Aeron, S. Tensor-tensor products with invertible linear transforms. Linear Algebra Appl. 2015, 481, 545–570. [Google Scholar] [CrossRef]
Lu C.; Zhu C.; Xu C.; Yan S.; Lin Z. Generalized singular value thresholding. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15. AAAI Press 2015, 1805–1811.

Figure 1. PSNR Comparison across Different Sizes

Figure 2. RSE Comparison across Different Sizes

Figure 3. PSNR Comparison across Different Sizess

Figure 4. RSE Comparison across Different Sizes

Figure 5. Comparison of Denoised Images with

*_{L, R}

TRPCA: Sizes = 20, 200(c-TRPCA) and frames=40,70,120

Figure 5. Comparison of Denoised Images with

*_{L, R}

TRPCA: Sizes = 20, 200(c-TRPCA) and frames=40,70,120

Figure 6. Comparison of Residuals After Denoising with

*_{L, R}

TRPCA: Sizes = 20,200(c-TRPCA) and frames=40,70,120

Figure 6. Comparison of Residuals After Denoising with

*_{L, R}

TRPCA: Sizes = 20,200(c-TRPCA) and frames=40,70,120

Table 1. Ratio of Frobenius Norms for Different Sizes

Size	20	40	60	100	200(c-TRPCA)
Ratio	0.9550	0.9639	0.9695	0.9788	1.0000

Table 2. Comparison of runtime of different sizes

Size	20	40	60	100	200(c-TRPCA)
Run Time (seconds)	128.2	150.9	258.8	314.2	694.7

Table 3. Ratio of Frobenius Norms for Different Sizes

Size	20	40	60	100	200
Ratio	0.9824	0.9923	0.9957	0.9984	1.0000

Table 4. Comparison of runtime of different sizes

Size	20	40	60	100	200 (c-product)
Run Time (seconds)	402.63	550.03	664.75	1254.44	2026.47

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Efficient Tensor Robust Principal Analysis via Right Invertible Matrix Based Tensor Products

Abstract

Keywords:

Subject:

1. Introduction

2. Preliminaries

2.1. Notations

2.2. Right invertible transform operator

3. Definition and Properties of the New Tensor Product

4. Tensor Robust Principal Component Analysis with Novel Tensor Multiplication

4.1. Tensor Nuclear Norm with New Tensor Multiplication

4.2. ADMM Algorithm for Tensor Robust Principal Component Analysis

4.3. Computational Complexity Analysis

5. Numerical Experiments

5.1. Synthetic Data

5.2 Video Denoising and Anomaly Detection

6. Conclusion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe