Preprint
Essay

This version is not peer-reviewed.

A New Proof Of Eckart-Young-Mirsky Theorem

Submitted:

16 February 2025

Posted:

17 February 2025

You are already at the latest version

Abstract
It is apologize to upload this incomplete draft since the time limitation, yet in this paper, we are going to give a new proof of the Eckart-Young-Mirsky Theorem which is crucial in machine learning, image and date processing etc.
Keywords: 
;  ;  

1. Introduction

The Eckart–Young–Mirsky theorem is a fundamental result in matrix approximation, stating that for a given matrix A and rank k, the best rank-k approximation in the Frobenius norm (or any unitarily invariant norm) is obtained by truncating the Singular Value Decomposition (SVD) of A.[1,3] Formally, if
A = U Σ V T
is the SVD of A and Σ k is obtained from Σ by keeping only the k largest singular values (and setting the rest to zero), then
A k = U Σ k V T
is the unique minimizer of A X over all rank-k matrices X [2,4].
This theorem underpins numerous applications, from image compression to principal component analysis, yet standard proofs often rely on variational arguments or operator norm inequalities that can obscure geometric intuition [5]. In this paper, we present a more elementary proof(only using basic linear algebraic).

2. Proof

In this section, we will give an elementary and short proof of the Eckart-Young-Mirsky Theorem.
Let A be a real matrix with rank ( A ) = r and σ 1 σ 2 . . . σ r > 0 in a descending order be all the non-zero singular values of A. The SVD factors A into
A = U Σ V t = U Λ 0 0 0 V t
where U and V are orthogonal matrices and Λ = diag ( σ 1 , σ 2 , , σ r ) is a r × r diagonal matrix.
Let 0 < k < r be an integer. Define
A k = U Σ k V t = U Λ k 0 0 0 V t
with Λ k = diag ( σ 1 , σ 2 , , σ k , 0 , , 0 ) being a r × r diagonal matrix.
The Eckart-Young-Mirsky Theorem states that
A A k F = i = k + 1 r σ i 2 = min rank ( X ) = k A X F
where · F is the Frobenius norm defined by
A F = trace ( A t A ) = i = 1 m j = 1 n a i j 2 .
for any real matrix A = ( a i j ) m × n .
In the following, we will relax the condition rank ( X ) = k to rank ( X ) k and prove that
min rank ( X ) k A X F = i = k + 1 r σ i 2 .
Let
X = U Y V t = U M * * V t
with M being a r × r matrix of rank ( M ) k . By (2),
A X F = U ( Σ Y ) V t F = Σ Y F Λ M F .
Therefore, to show (3), it suffices to prove
Λ M F i = k + 1 r σ i 2 .
as X = A k achieves the minimum.
Fix a k-dimensional subspace W R r such that the column vectors of M lie in W. Choose an orthonormal basis v 1 , v 2 , . . . , v p , v p + 1 , . . . , v r of R r such that v p + 1 , . . . , v r span W, where p + k = r . Let
Λ = ( σ 1 e 1 , σ 2 e 2 , , σ r e r ) and M = ( w 1 , w 2 , , w r )
where σ i e i -s and w i -s are column vectors of Λ and M, respectively. We have
Λ M F 2 = i = 1 r σ i e i w i 2 .
To minimize (5), w i should be the projection of σ i e i onto W, i.e.,
σ i e i w i = j = 1 p σ i e i , v j v j
where , is the standard inner product. Then for any M whose column vectors are in W,
min Λ M F 2 = i = 1 r j = 1 p e i , v j 2 σ i 2 .
The coefficients of σ i 2 -s of (6) satisfy:
0 j = 1 p e i , v j 2 e i , e i 1 ; i = 1 r j = 1 p e i , v j 2 = j = 1 p i = 1 r e i , v j 2 = p .
Since σ 1 2 σ 2 2 . . . σ r 2 > 0 are in descending order and their coefficients all belong to [ 0 , 1 ] with the sum being p, to minimize the right hand side of (6), the coefficients should concentrate to the lowest p singular values. Therefore,
i = 1 r j = 1 p e i , v j 2 σ i 2 i = k + 1 r σ i 2 .

3. Conclusion

This paper offers an elementary yet powerful proof of the Eckart-Young-Mirsky theorem, which is essential for many fields, such as machine learning, image processing, and data science. By demonstrating the best rank-k approximation through a clear application of basic linear algebra techniques, the paper contributes to a deeper understanding of low-rank matrix approximation. This work simplifies the theorem’s proof, making it more accessible for those familiar with basic matrix theory and reinforcing its crucial role in real-world applications like dimensionality reduction, data compression, and statistical analysis.
Combining (6) and (7), we get (4), which concludes the proof.

References

  1. N. Kishore Kumar and J. Schneider, Literature survey on low rank approximation of matrices, Linear and Multi-linear Algebra, vol. 65, no. 11, pp. 2212-2244, 2017.
  2. L. N. Trefethen and D. Bau, Numerical Linear Algebra, SIAM, 1997.
  3. G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press, 2013.
  4. G. W. Stewart, Matrix Algorithms: Volume I: Basic Decompositions, SIAM, 1998.
  5. R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, 1985.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated