A New Proof Of Eckart-Young-Mirsky Theorem

Haoyuan Wang

doi:10.20944/preprints202502.1203.v1

Submitted:

16 February 2025

Posted:

17 February 2025

You are already at the latest version

Abstract

It is apologize to upload this incomplete draft since the time limitation, yet in this paper, we are going to give a new proof of the Eckart-Young-Mirsky Theorem which is crucial in machine learning, image and date processing etc.

Keywords:

Frobenius Norm

;

SVD

;

Low Rank Approximation

Subject:

Computer Science and Mathematics - Mathematics

MSC: 15A18; 15A60

1. Introduction

The Eckart–Young–Mirsky theorem is a fundamental result in matrix approximation, stating that for a given matrix A and rank k, the best rank-k approximation in the Frobenius norm (or any unitarily invariant norm) is obtained by truncating the Singular Value Decomposition (SVD) of A.[1,3] Formally, if

A = U Σ V^{T}

is the SVD of A and

Σ_{k}

is obtained from

Σ

by keeping only the k largest singular values (and setting the rest to zero), then

A_{k} = U Σ_{k} V^{T}

is the unique minimizer of

∥ A - X ∥

over all rank-k matrices X [2,4].

This theorem underpins numerous applications, from image compression to principal component analysis, yet standard proofs often rely on variational arguments or operator norm inequalities that can obscure geometric intuition [5]. In this paper, we present a more elementary proof(only using basic linear algebraic).

2. Proof

In this section, we will give an elementary and short proof of the Eckart-Young-Mirsky Theorem.

Let A be a real matrix with

rank (A) = r

and

σ_{1} \geq σ_{2} \geq . . . \geq σ_{r} > 0

in a descending order be all the non-zero singular values of A. The SVD factors A into

A = U Σ V^{t} = U (\begin{matrix} Λ & 0 \\ 0 & 0 \end{matrix}) V^{t}

where U and V are orthogonal matrices and

Λ = diag (σ_{1}, σ_{2}, \dots, σ_{r})

is a

r \times r

diagonal matrix.

Let

0 < k < r

be an integer. Define

A_{k} = U Σ_{k} V^{t} = U (\begin{matrix} Λ_{k} & 0 \\ 0 & 0 \end{matrix}) V^{t}

with

Λ_{k} = diag (σ_{1}, σ_{2}, \dots, σ_{k}, 0, \dots, 0)

being a

r \times r

diagonal matrix.

The Eckart-Young-Mirsky Theorem states that

{∥A - A_{k}∥}_{F} = \sqrt{\sum_{i = k + 1}^{r} σ_{i}^{2}} = min_{rank (X) = k} {∥ A - X ∥}_{F}

(1)

where

{∥ \cdot ∥}_{F}

is the Frobenius norm defined by

{∥ A ∥}_{F} = \sqrt{trace (A^{t} A)} = \sqrt{\sum_{i = 1}^{m} \sum_{j = 1}^{n} {|a_{i j}|}^{2}} .

(2)

for any real matrix

A = {(a_{i j})}_{m \times n}

.

In the following, we will relax the condition

rank (X) = k

to

rank (X) \leq k

and prove that

min_{rank (X) \leq k} {∥ A - X ∥}_{F} = \sqrt{\sum_{i = k + 1}^{r} σ_{i}^{2}} .

(3)

Let

X = U Y V^{t} = U (\begin{matrix} M & * \\ * \end{matrix}) V^{t}

with M being a

r \times r

matrix of

rank (M) \leq k

. By (2),

{∥ A - X ∥}_{F} = ∥ U (Σ - Y) V^{t} ∥_{F} = {∥ Σ - Y ∥}_{F} \geq {∥ Λ - M ∥}_{F} .

Therefore, to show (3), it suffices to prove

{∥ Λ - M ∥}_{F} \geq \sqrt{\sum_{i = k + 1}^{r} σ_{i}^{2}} .

(4)

as

X = A_{k}

achieves the minimum.

Fix a k-dimensional subspace

W \subset R^{r}

such that the column vectors of M lie in W. Choose an orthonormal basis

v_{1}, v_{2}, . . ., v_{p}, v_{p + 1}, . . ., v_{r}

of

R^{r}

such that

v_{p + 1}, . . ., v_{r}

span W, where

p + k = r

. Let

Λ = (σ_{1} e_{1}, σ_{2} e_{2}, \dots, σ_{r} e_{r}) and M = (w_{1}, w_{2}, \dots, w_{r})

where

σ_{i} e_{i}

-s and

w_{i}

-s are column vectors of

Λ

and M, respectively. We have

{∥Λ - M∥}_{F}^{2} = \sum_{i = 1}^{r} {∥σ_{i} e_{i} - w_{i}∥}^{2} .

(5)

To minimize (5),

w_{i}

should be the projection of

σ_{i} e_{i}

onto W, i.e.,

σ_{i} e_{i} - w_{i} = \sum_{j = 1}^{p} 〈σ_{i} e_{i}, v_{j}〉 v_{j}

where

〈,〉

is the standard inner product. Then for any M whose column vectors are in W,

min {∥Λ - M∥}_{F}^{2} = \sum_{i = 1}^{r} \sum_{j = 1}^{p} {〈e_{i}, v_{j}〉}^{2} σ_{i}^{2} .

(6)

The coefficients of

σ_{i}^{2}

-s of (6) satisfy:

\begin{matrix} 0 \leq & \sum_{j = 1}^{p} {〈e_{i}, v_{j}〉}^{2} \leq 〈e_{i}, e_{i}〉 \leq 1; \\ \sum_{i = 1}^{r} & \sum_{j = 1}^{p} {〈e_{i}, v_{j}〉}^{2} = \sum_{j = 1}^{p} \sum_{i = 1}^{r} {〈e_{i}, v_{j}〉}^{2} = p . \end{matrix}

Since

σ_{1}^{2} \geq σ_{2}^{2} \geq . . . \geq σ_{r}^{2} > 0

are in descending order and their coefficients all belong to

[0, 1]

with the sum being p, to minimize the right hand side of (6), the coefficients should concentrate to the lowest p singular values. Therefore,

\sum_{i = 1}^{r} \sum_{j = 1}^{p} {〈e_{i}, v_{j}〉}^{2} σ_{i}^{2} \geq \sum_{i = k + 1}^{r} σ_{i}^{2} .

(7)

3. Conclusion

This paper offers an elementary yet powerful proof of the Eckart-Young-Mirsky theorem, which is essential for many fields, such as machine learning, image processing, and data science. By demonstrating the best rank-k approximation through a clear application of basic linear algebra techniques, the paper contributes to a deeper understanding of low-rank matrix approximation. This work simplifies the theorem’s proof, making it more accessible for those familiar with basic matrix theory and reinforcing its crucial role in real-world applications like dimensionality reduction, data compression, and statistical analysis.

Combining (6) and (7), we get (4), which concludes the proof.

References

N. Kishore Kumar and J. Schneider, Literature survey on low rank approximation of matrices, Linear and Multi-linear Algebra, vol. 65, no. 11, pp. 2212-2244, 2017.
L. N. Trefethen and D. Bau, Numerical Linear Algebra, SIAM, 1997.
G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press, 2013.
G. W. Stewart, Matrix Algorithms: Volume I: Basic Decompositions, SIAM, 1998.
R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, 1985.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

A New Proof Of Eckart-Young-Mirsky Theorem

Abstract

Keywords:

Subject:

1. Introduction

2. Proof

3. Conclusion

References

MDPI Initiatives

Important Links

Subscribe