On the minimum polynomial and applications

In this note we study the computation of the minimum polynomial of a matrix A and how we can use it for the computation of the matrix A n . We also describe the form of the elements of the matrix A − n and we will see that it is closely related with the computation of the Drazin generalized inverse of A . Next we study the computation of the exponential matrix and ﬁnally we give a simple proof of the Leverrier - Faddeev algorithm for the computation of the characteristic polynomial.


Introduction
We interested here for the computation of the minimum polynomial of a matrix A m×m and its applications. In particular we are interested for the computation of the matrix A n for every n ∈ Z (if is invertible) and for the computation of the exponential matrix e tA . It is well known how one can compute them in the case where the matrix A is diagonalizable but here we are interested for a general method. We will see that the Cayley-Hamilton theorem and the notion of the generalized Vandermonde determinant are very crucial in this direction. In order to compute the matrices A n and e tA we can use the characteristic polynomial or the minimum polynomial. In the case where the minimum polynomial is of much less degree in comparison with the characteristic then we will see that using the minimum polynomial one can compute the desired matrices by much less computational effort.
Moreover, will prove that if the matrix A is invertible and we have computed the matrix A n for n ∈ N then the matrix A −n can be produced directly from the elements of the matrix A n . If we denote these elements as a ij (n) then we shall prove that the elements of A −n are the a ij (−n). Moreover, we will see that even in the case where the matrix is not invertible the elements a ij (−1) are well defined. So, what is the role of the matrix D that contains the elements a ij (−1)? We shall see that the matrix D is the Drazin inverse (a notion of a generalized inverse of a matrix). Therefore, one way to compute the Drazin generalized inverse of the matrix A is first to compute the matrix A n and then substitute n by −1. Of course, if the matrix A is invertible then we arrive at the matrix A −1 .
Finally, we will study the Leverrier-Faddeev algorithm for the computation of the characteristic polynomial, giving a simple proof of the construction.

Minimum Polynomial
In this section we will restate the definition of the minimum polynomial and some well known results about it. By P A we denote the set of all the polynomials that vanishes by the matrix A, Obviously P A is nonempty because it contains the characteristic polynomial δ A (k) of the matrix A. Let q(k) = k r + b r−1 k r−1 + · · · + b 0 the polynomial that belongs to P A with less degree, that is q(A) = 0 and there is not another polynomial with less degree than q(k) that vanishes by A. We can prove that this polynomial is unique and we call it the minimum polynomial.
Proposition 1 If p ∈ P A then p is divided by the minimum polynomial q(k). The minimum polynomial q(k) is unique among the monic polynomials.
Proposition 2 The characteristic polynomial δ(k) of a matrix A n×n and the minimum polynomial q have the same roots. Moreover, if the characteristic polynomial has n distinct roots then δ(k) = q(k).
Theorem 3 The matrix A n×n is diagonalizable iff the minimum polynomial has the form where k 1 , · · · , k m the m ≤ n distinct eigenvalues of the matrix.
We will now describe a method in order to calculate the minimum polynomial directly, without compute first the characteristic polynomial.
Let the minimum polynomial has the following form with r ≤ n. Then From the above matrix equality we can construct a linear system for the unknowns a 0 , · · · , a r−1 and this linear system has a unique solution since the minimum polynomial is unique. What is the matrix of this linear system? The left hand side is a matrix and each element is equal with the corresponding element of the right hand side, i.e. with the corresponding element of the matrix −A r . That is we have n 2 equations with r unknowns. The matrix of the above system is a matrix C n 2 ×r . In order to construct this matrix we will put the elements of the identity matrix (with any order) at the first column of C, then at the second column we will put the matrix A (with the same order as before), and final the matrix A r−1 at the r column of the matrix C. The row echelon form of this matrix will have r leading ones at the first r columns, side by side. If we construct now the augmented matrix (C|X) where X is the column that contains the matrix −A r then the row echelon matrix will have again r (and not r + 1) leading ones side by side at the first r columns. If it has r + 1 leading ones then the above system will not have a solution and that contradicts with the fact that there exists a unique polynomial that satisfies the above matrix equality.
The system has infinitely many solutions with one free parameter. One of them is the coefficients of the minimum polynomial. Choosing a r = 1 we get the coefficients of the minimum polynomial. Similarly the system a n A n will have infinitely many solutions. Choosing a r = 1 and a r+1 = · · · = a n = 0 we will get the coefficients of the minimum polynomial. The r here is the number of the leading ones that are located side by side at the first r columns and the next column does not have a leading one. Therefore, if A n×n is a given matrix then we construct the matrix B n 2 ×(n+1) which contains in its columns the matrices I, A, · · · , A n with any order but the same for any column. We compute the reduced row echelon formB. The number r of the leading ones located at the first columns side by side will give us the degree of the minimum polynomial. That is, in order to find the degree of the minimum polynomial one has to find the first column that has no leading one, i.e. find the minimum r = 1, 2, · · · such that B r+1,r+1 = 0. Then the degree of the minimum polynomial equals r and we set a r = 1 and a r+1 = · · · = a n = 0. Therefore the minimum polynomial will be the following We will compute the minimum polynomial. The matrix B is and the reduced row echelon form is the followinĝ That is the minimum polynomial is The matrix A n for n ∈ Z Let us recall first the Cayley-Hamilton theorem that is very crucial in our approach.

Theorem 5 (Cayley-Hamilton) For every matrix A m×m with characteristic polynomial
it holds that Remark 6 By the fundamental theorem of algebra it is well known that the sum of the multiplicities of the roots of a polynomial of degree m equals m. Therefore any polynomial is such that the sum of the multiplicities is greater than m has to be the zero polynomial, i.e. all the coefficients are zero.
Let a polynomial p(k) of degree greater than or equal m and let the matrix A m×m . If we want to compute the matrix p(A) then we can divide p(k) by δ A (k) arriving at p(k) = δ A (k)π(k)+v(k) for every k ∈ R. Therefore the polynomial λ(k) = p(k)−δ A (k)π(k)−v(k) has all its coefficients equals to zero because vanishes for all k ∈ R. So we see that In particular, if we want to compute the n-th power of a matrix A m×m where n > m, we divide (theoretically) the polynomial k n by the characteristic polynomial arriving at k n = δ A (k)π(k) + v(k). Thus we have to compute the coefficients of the unknown polynomial v(k). For this we can use the eigenvalues of the matrix A which satisfy the equation In the case where some eigenvalue is of multiplicity greater than one then we construct more equations by differentiating the equation k n = δ A (k)π(k) + v(k) and then substitute the eigenvalue. We end up by the following linear system of equations where by v (i) and by (k n ) (i) we mean the i-th derivative of the function. This system has always a unique solution because the determinant is a generalized Vandermonde determinant (see [6]). To see that, one first observes that this system has at least one solution because the number of equations equals the number of unknowns. Supposing that there exists two solutions of the system we see that their difference satisfy the homogeneous system. Note that v(k) is a polynomial of degree m − 1 so the sum of the multiplicity of its roots equals to m − 1. But we also see that the same polynomial has the z i as its roots and the sum of their multiplicity are m. Therefore it has to vanish everywhere so there is only one solution of the system.
Once we have computed the polynomial v(k), then from the relation substituting A instead of k it follows that A n = v(A) because by the Cayley-Hamilton theorem we have that δ A (A) = 0. If the matrix A is real then one can easily prove that the matrix v(A) = A n is real.
This holds even in the case where n ≤ m. The unique solution is a n (n) = 1 and a i (n) = 0 for i = 0, 1, · · · , n − 1, n + 1, · · · , m that is therefore it also holds that v(A) = A n where n ≤ m.

Remark 8 We can use the minimum polynomial instead of the characteristic in order to
compute the nth power of the matrix. Indeed, denoting by m A (k) the minimum polynomial we can write k n = m A (k)l(k) +v(k). In order to compute the polynomialv(k) we can use the roots of the minimum polynomial and construct a linear system as above. This system admits also a unique solution and therefore we have computed the matrix A n which equals the matrixv(A). In this approach we see that the linear system that we construct in order to compute the coefficients of the polynomialv(k) has l equations when the degree of the minimum polynomial is l − 1. Comparing with the system using the characteristic polynomial we see that the computational effort is much less in the case where l is much less than m. Note also that we can use any polynomial that vanishes by the matrix A in order to compute the matrix A n .
Let the elements of A n are a ij (n). We shall prove that when A is invertible then the elements of A −1 are the a ij (−1) and generally the elements of A −n are the a ij (−n).
the characteristic polynomial of A m×m . By the Cayley-Hamilton theorem we know that δ A (A) = 0. If the matrix A is invertible then from the equality δ A (A) = 0 it follows that We can compute the matrix A n by using the Cayley-Hamilton theorem as we have described. To do this, we consider a polynomial v(k) = b m−1 A m−1 + · · · + b 0 and we solve the system when A has eigenvalues k 1 , · · · , k l of multiplicity r 1 , · · · , r l . We will prove that if we solve the above system for n = −1 we will get the inverse of A. This linear system has a unique solution for every n ∈ Z because its determinant is a generalized Vandermonde determinant. In the case where n = −1 we will see that this unique solution is for a 0 for k = 0, · · · , m − 1. Indeed, the first equation of this linear system is Substituting the b k we arrive at δ A (k 1 ) = 0 because k 1 is an eigenvalue.
The second equation is We write the left hand side in the form The quantity ma m k m−1 We have proved that solving the linear system 1 for Suppose now that we want to compute the matrix A −k for k ∈ N. Set B = A k and we compute the powers of B which are such that . Therefore the elements of the matrix A −k are a ij (−k) when the elements of A k are the a ij (k).
Remark 10 (Generalized inverse of Drazin) As we can see, even in the case where the the matrix A is not invertible, there always exist the matrix B with elements a ij (−1). In this case the matrix B coincides with the generalized inverse of Drazin (see the proof of theorem 9 and theorem 7.5.2 of [4]). Therefore, one way to compute the generalized inverse of Drazin is to calculate the matrix A n and then get the matrix B with elements the a ij (−1).
Example 11 We will compute the n-th power of the matrix P where P = 3 0 1 1 by using the Cayley-Hamilton theorem. We compute the eigenvalues of the matrix which are k 1 = 1 and k 2 = 3. Next we write By the Cayley-Hamilton theorem we have that P n = v(P ). We have to compute the coefficients of the polynomial v(k). In order to do this we construct the following linear system which have the solution a = 3 n −1 2 and b = 3 2 (1 − 3 n−1 ). Thus By using theorem 9 we can now compute the inverse of the matrix P which is Example 12 We will compute the n-th power of the matrix The eigenvalues are k 1 = 3 (double) and k 2 = −5.
Dividing k n by the characteristic polynomial we obtain k n = t(k)δ P (k) + v(k) where v(k) = ak 2 + bk + c. In order to compute the unknown coefficients of v(k) we will solve the following linear system where k ̸ = 0. As before we have to compute the unknown coefficients of the polynomial v(k) = b 2 k 2 +b 1 k +b 0 in order to compute the matrix A n . So we have to solve the following system Therefore the matrix A n has the form

Using theorem 9 we can immediately compute the inverse of the matrix A which is
Similarly, we can have the matrix A −n for every n ∈ N.

The exponential matrix
In this section we will study the function of a matrix and in particular the exponential matrix. We will give three definitions for the function of a matrix. The first one is by the diagonalization of the matrix, the second via the Taylor series of the function and the third one via the Cayley-Hamilton theorem. We shall prove that all the above definitions are equivalent. More about this topic can be found in [7].
• Let A n×n a matrix which is diagonalizable, that is there exist an invertible matrix P such that A = P DP −1 where the diagonal matrix D contains the eigenvalue. Let a function f : C → C such that it is well possed at the eigenvalues of the matrix A. Then by f (A) we mean the matrix where k 1 , · · · , k n are the eigenvalues of the matrix A not necessarily distinguished.
• Let the function f : R → R which is such that all the derivatives are well defined in all of R and that are absolutely bounded in every bounded interval (for example the function e x ). We define the matrix f (A) as This matrix is well defined because the infinite series converges. Indeed, define by ∆ = max i,j |A ij | we have that We will prove it by induction. For k = 1 is obvious that |(A 1 ) ij | ≤ n 0 ∆. We assume that it holds for some k, that is |(A k ) ij | ≤ n k−1 ∆ k and we will prove that it holds for k + 1.
We have that k! (A k ) ij converges absolutely and therefore the matrix f (A) is well defined.
• We will now describe the definition of the matrix f (A) via the Cayley-Hamilton theorem.
Let A n×n a matrix with characteristic polynomial δ A (k).
In the case where the function f : C → C is not a polynomial we can approximate it with a polynomial p(k) and then set f (A) := p(A). We will describe next how this can be done.

Definition 14 Let
A n×n a matrix which has eigenvalues k 1 , · · · , k r of multiplicity l 1 ,· · ·,l r respectively. We say that a function ϕ : U ⊆ R → R is well posed at the eigenvalues of the matrix A when there exist the derivatives ϕ (d) (k i ) for 0 ≤ d ≤ l 1 − 1. Let the minimum polynomial has roots the k 1 , · · · , k r of multiplicity l ′ 1 ,· · ·,l ′ r respectively. We say that the function ϕ is well posed on the roots of the minimum polynomial when there exist the derivatives Let a function f which is well posed at the eigenvalues of the matrix A, which are k 1 , · · · , k m of multiplicity r 1 , · · · , r m respectively. By P σ(A) we denote all the polynomials p(k) which have the property that p (l i ) (k i ) = f (l i ) (k i ) for every 0 ≤ l i ≤ r i − 1 where l i ∈ N and for every eigenvalue k i of the matrix A. Note that the set P σ(A) is nonempty (Hermite interpolation) and moreover it contains infinite many polynomials because in this set there exist polynomials that approximate the function f and in other points excluding the eigenvalues of A. Here by f (l i ) we mean the l i -th derivative of the function.
The polynomials that belong in this set have a common property. Dividing any polynomial p(k) ∈ P σ(A) by the characteristic of A we get that where v(k) is a polynomial of at most n − 1 degree and is independent of p(k) ∈ P σ(A) , which is very crucial in our point of view. Setting k = A and using the fact that δ A (A) = 0 we arrive at p(A) = v(A).
We can compute the polynomial v(k) constructing n equations using the eigenvalues (and their multiplicity) of A. For example substituting k by the eigenvalue k 1 we obtain the equation If this eigenvalue has multiplicity two or more then we construct more equations by differentiation and so we have supposing that the derivatives of f are well defined on the eigenvalues of A. Therefore we construct the following linear system for the unknown polynomial v where by v (i) and f (i) we mean the i-th derivative of the function. For i = 0 we mean the function itself. The above linear system has a unique solution because the determinant of the corresponding matrix is a generalized Vandermonde determinant. Solving the above system we will compute the polynomial v(k) and we define as the Note that the matrix v(A) depends only on the function f and the eigenvalues of A. Moreover, if the matrix A is a real matrix with eigenvalues k 1 , · · · , k r of multiplicity l 1 , · · · , l r and f : C → C is such that f (q i ) (k i ) = f (q i ) (k i ) (as for example the function f (z) = e z ) for every q i ∈ N such that 0 ≤ q i ≤ l i − 1 and for every i = 1, · · · , r then the matrix f (A) is real.

Example 15 Let the matrix
We will compute the exponential matrix e tA for t ∈ R. The eigenvalues of the matrix are k 1 = i and k 2 = −i. We will compute the polynomial v(k) = ak + b which is such that Then we can write So the exponential matrix is the matrix e tA = aA + bI = cos t sin t − sin t cos t .
In this section we will prove that all the above definitions are equivalent. That is, we will prove that Suppose that the matrix A n×n is diagonalizable with eigenvalues k 1 , · · · , k n not necessarily distinguished.
Let a function f (x) which can be expanded in a Taylor series around zero and converges absolutely in every closed interval of the form [−M, M ], that is Because the matrix is diagonalizable it holds that On the other hand, in order to compute the matrix A l by using the Cayley -Hamilton theorem we should compute the unique solution of the system a n−1 (l)k n−1 1 + · · · + a 0 (l) = k l 1 . . . (4) a n−1 (l)k n−1 n + · · · + a 0 (l) = k l n Therefore the form of A l is A l = a n−1 (l)A n−1 + · · · + a 0 (l)I n×n , for l = 0, 1, · · · , Remark 16 Note that when l < n then the unique solution of the system is a l (l) = 1 and all the other coefficients are all equal to zero.
Substituting on 3 we conclude that the matrix f (A) via diagonalization or by the Taylor series is l! a n−1 (l) b n−1 We are allowed to rearrange the terms because we have proved the absolute convergence.
In order to prove that we obtain the same form of the matrix f (A) via the Cayley-Hamilton theorem we should multiply every equation of the system 4 by f (l) (0) l! . Summing every equation for l = 0 to m we arrive at The coefficient b m i are obvious finite for every m. This linear system has a unique solution because the determinant of the matrix of the system is a generalized Vandermonde determinant (see [6]). Moreover, the matrix of the system is invertible and independent of m. Therefore it follows that where K is the matrix of this linear system and F the right hand side of the system.
l! a i (l) < ∞ as m → ∞. Setting m = ∞ in the system 6 we compute the unique solution of the system that is b 0 , · · · , b n−1 .
Therefore the matrix f (A) has the following form via the Cayley-Hamilton theorem Noting the relation 5 we observe that the above system drive us to the matrix f D (A).
In the case where there exist some eigenvalues with multiplicity two or more then at the system 4 will appear equalities via the differentiation. Let the eigenvalue k i is of multiplicity r. Then the system 4 contains the following equations, a n−1 (l)k n−1 i + · · · + a 0 (l) = k l i a n−1 (l)(n − 1)k n−2 i + · · · + a 1 (l) = lk l−1 i . . . a n−1 (l)(n − r)k n−1−r i + · · · + a r (l) = l · (l − 1) · · · (l − r + 1)k l−r i We multiply these equations by f (l) (0) l! and then we sum from l = 0 to ∞. Therefore the system 6 will contain the following equations We observe that the right hand side of the above equations equals to f (k i ), f ′ (k i ), · · · , f (r) (k i ).
We will now explain the fact that we can use the minimum polynomial instead of the characteristic.
Let the minimum polynomial is of the form . The polynomials that belongs to the set P σ(A) has the following property. If we divide any them by the minimum polynomial then p(k) = m A (k)π(k) +v(k). The polynomialv(k) is independent of p(k).
Remark 17 Denoting by v(k) the polynomial which is such that and byv(k) the polynomial which is such that Let us summarize below the steps that we have to do in order to compute the matrices A n and e tA given a matrix A m×m .
(ii) We compute the characteristic polynomial (or even better the minimum polynomial) in the form (k − k 1 ) r 1 (k − k 2 ) r 2 · · · (k − k l ) r l where k 1 , · · · , k l the eigenvalues of the matrix A.
(iii) We consider a polynomial v(k) = a m−1 k m−1 +· · ·+a 1 k +a 0 in the case were we want to use the characteristic polynomial. If we have computed the minimum polynomial and r 1 + · · · + r l = q ≤ m then v(k) = a q−1 k q−1 + · · · + a 0 where r 1 , · · · , r l are the multiplicities of the eigenvalues concerning the minimum polynomial. We construct the following linear system v(k 1 ) = a m−1 k m−1 (iv) We solve for the unknown coefficients a 0 , · · · , a m−1 . Then In order to compute the exponential matrix e tA we just set f (x) = e tx and we follow the same procedure. All the above remain true even in the case where the matrix A is complex.
Example 18 Let the matrix We will compute the matrices A n and e tA using the minimum polynomial of A.
In order to compute the minimum polynomial we construct the matrix B 9×4 in which at the first column we place the matrix I, at the second column the matrix A and so on.
We arrive at the matrix Next we compute the reduced row echelon form of the matrix B which iŝ We see that the number of the leading ones is three so the degree of the minimum polynomial is also three. That is the degree of the minimum polynomial equals to the degree of the characteristic polynomial. We find the coefficients of the minimum polynomial at the next column after the last leading one. That is, the minimum polynomial is the following Note that the matrix is not diagonalizable. Set now f (x) = x n and g(x) = e tx . We construct the following linear system The matrix of this system is For the matrix A n it follows that a = 6.25 − 6.25 · 0.6 n − 2.5 · n · 0.6 n−1 b = −7.5 + 7.5 · 0.6 n + 4 · n · 0.6 n−1 c = 2.25 − 1.25 · 0.6 n − 1.5 · n · 0.6 n−1 and therefore A n = aA 2 + bA + cI, that is In order to check the validity of the above one can use induction. Moreover, substituting n by −1 we take the inverse of the matrix A (see theorem 9). For the exponential matrix we have a = 6.25 · e t − 6.25 · e 0.6t − 2.5 · t · e 0.6t b = −7.5 · e t + 7.5 · e 0.6t + 4te 0.6t c = 2.25 · e t − 1.25 · e 0.6t − 1.5 · t · e 0.6t Therefore e tA = aA 2 + bA + cI, that is In order to check the validity of the above we can use the fact that the exponential matrix is the unique matrix P (t) that satisfies the following differential equation

Leverrier -Faddeev Algorithm
In this section we will study the Leverrier -Faddeev algorithm for the computation of the characteristic polynomial of a matrix A. The construction of this algorithm that we are going to present comes from the paper [8] giving a simpler form. Let the matrix A m×m and δ A (k) = k m +a 1 k m−1 +· · ·+a m its characteristic polynomial. Let the matrices N 1 , · · · , N m (that we are going to compute them below) such that (kI − A)N (k) = Iδ A (k) (9) where N (k) = N 1 k m−1 + · · · + N m . We consider where k i for i = 1, · · · , m the eigenvalues of the matrix A. Moreover Remark 19 (Computing the minimum and characteristic polynomial) If we want to compute the minimum and the characteristic polynomial we can use the Leverrier -Faddeev Algorithm. We construct the matrix B as we have described (in order to compute the minimum polynomial) and then the reduced row echelon formB. At this stage we can compute the minimum polynomial of degree r but using the Leverrier -Faddeev Algorithm we can compute also the characteristic. Using this algorithm we can find the coefficients a r+1 , · · · , a m−1 of the characteristic δ A (k) = k m + a m−1 k m−1 + · · · + a 0 . Then, by using the reduced row echelon form of B, i.e. the matrixB, we compute the rest of the coefficients.